[ 
https://issues.apache.org/jira/browse/SPARK-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13766.
---------------------------------
       Resolution: Fixed
         Assignee: Hyukjin Kwon
    Fix Version/s: 2.0.0

> Inconsistent file extensions and omitted file extensions written by CSV, TEXT 
> and JSON data sources
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13766
>                 URL: https://issues.apache.org/jira/browse/SPARK-13766
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> Currently, the output (part-files) from CSV, TEXT and JSON data sources do 
> not have file extensions such as .csv, .txt and .json (except for compression 
> extensions such as .gz, .deflate and .bz4).
> In addition, it looks Parquet has the extensions (in part-files) such as 
> .gz.parquet or .snappy.parquet according to compression codecs whereas ORC 
> does not have such extensions but it is just .orc.
> So, in a simple view, currently the extensions are set as below:
> {code}
> TEXT, CSV and JSON - [.COMPRESSION_CODEC_NAME]
> Parquet -  [.COMPRESSION_CODEC_NAME].parquet
> ORC - .orc
> {code}
> It would be great if we have a consistent naming for them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to