[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649039#comment-15649039 ]
Eric Liang commented on SPARK-17916: ------------------------------------ We're hitting this as a regression from 2.0 as well. Ideally, we don't want the empty string to be treated specially in any scenario. The only logic that converts it to nulls should be due to the nullValue option. > CSV data source treats empty string as null no matter what nullValue option is > ------------------------------------------------------------------------------ > > Key: SPARK-17916 > URL: https://issues.apache.org/jira/browse/SPARK-17916 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.1 > Reporter: Hossein Falaki > > When user configures {{nullValue}} in CSV data source, in addition to those > values, all empty string values are also converted to null. > {code} > data: > col1,col2 > 1,"-" > 2,"" > {code} > {code} > spark.read.format("csv").option("nullValue", "-") > {code} > We will find a null in both rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org