Re: Inconsistent file extensions and omitting file extensions written by CSV, TEXT and JSON data sources.

Hyukjin Kwon Wed, 09 Mar 2016 02:35:05 -0800

This discussion is going to the Jira. Please refer the Jira if anyone is
interested in this.
On 9 Mar 2016 6:31 p.m., "Sean Owen" <[email protected]> wrote:


> From your JIRA, it seems like you're referring to the "part-*" files.
> These files are effectively an internal representation, and I would
> not expect them to have such an extension. For example, you're not
> really guaranteed that the way the data breaks up leaves each file a
> valid JSON doc.
>
> On Wed, Mar 9, 2016 at 5:49 AM, Hyukjin Kwon <[email protected]> wrote:
> > Hi all,
> >
> > Currently, the output from CSV, TEXT and JSON data sources does not have
> > file extensions such as .csv, .txt and .json (except for compression
> > extensions such as .gz, .deflate and .bz4).
> >
> > In addition, it looks Parquet has the extensions such as .gz.parquet or
> > .snappy.parquet according to compression codecs whereas ORC does not have
> > such extensions but it is just .orc.
> >
> > I tried to search some JIRAs related with this but I could not find yet
> but
> > I did not open a JIRA directly because I feel like this is already
> concerned
> >
> > Maybe could I open a JIRA for this inconsistent file extensions?
> >
> > It would be thankful if you give me some feedback
> >
> > Thanks!
>

Re: Inconsistent file extensions and omitting file extensions written by CSV, TEXT and JSON data sources.

Reply via email to