[jira] [Comment Edited] (SPARK-14726) Support for sampling when inferring schema in CSV data source

Hyukjin Kwon (JIRA) Sun, 26 Mar 2017 07:09:58 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247378#comment-15247378
 ]


Hyukjin Kwon edited comment on SPARK-14726 at 3/26/17 2:09 PM:
---------------------------------------------------------------

This is currently not supported. I will work on this if it is decided to be 
supported. [~rxin]


was (Author: hyukjin.kwon):
This is currently not supported. I can work on this but I feel a bit hesitating 
because I believe CSV data source is ported mainly for "small data world". But 
I believe there are a lot of users dealing with large CSV files. 
I will work on this if it is decided to be supported. [~rxin]

> Support for sampling when inferring schema in CSV data source
> -------------------------------------------------------------
>
>                 Key: SPARK-14726
>                 URL: https://issues.apache.org/jira/browse/SPARK-14726
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Bomi Kim
>
> Currently, I am using CSV data source and trying to get used to Spark 2.0 
> because it has built-in CSV data source.
> I realized that CSV data source infers schema with all the data. JSON data 
> source supports sampling ratio option.
> It would be great if CSV data source has this option too (or is this 
> supported already?).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14726) Support for sampling when inferring schema in CSV data source

Reply via email to