[jira] [Comment Edited] (SPARK-17878) Support for multiple null values when reading CSV data

Hyukjin Kwon (JIRA) Tue, 11 Oct 2016 17:50:59 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567124#comment-15567124
 ]


Hyukjin Kwon edited comment on SPARK-17878 at 10/12/16 12:50 AM:
-----------------------------------------------------------------

Oh, I didn't mean I am against this. I am just wondering if it is just possible 
to deal with this in general. If it is not easy for now, I'd rather support 
this idea if we should deal with this problem. (Actually, one of the votes is 
from me :))


was (Author: hyukjin.kwon):
Oh, I didn't mean I am against this. I am just wondering if it is just possible 
to deal with this in general. If it is not easy for now, I support this idea.

> Support for multiple null values when reading CSV data
> ------------------------------------------------------
>
>                 Key: SPARK-17878
>                 URL: https://issues.apache.org/jira/browse/SPARK-17878
>             Project: Spark
>          Issue Type: Story
>          Components: SQL
>    Affects Versions: 2.0.1
>            Reporter: Hossein Falaki
>
> There are CSV files out there with multiple values that are supposed to be 
> interpreted as null. As a result, multiple spark users have asked for this 
> feature built into the CSV data source. It can be easily implemented in a 
> backwards compatible way:
> - Currently CSV data source supports an option named {{nullValue}}.
> - We can add logic in {{CSVOptions}} to understands option names that match 
> {{nullValue[\d]}}. This way user can specify a query with multiple or one 
> null value.
> {code}
> val df = spark.read.format("CSV").option("nullValue1", 
> "-").option("nullValue2", "*")....
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17878) Support for multiple null values when reading CSV data

Reply via email to