[ https://issues.apache.org/jira/browse/SPARK-18906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755604#comment-15755604 ]
Apache Spark commented on SPARK-18906: -------------------------------------- User 'kubatyszko' has created a pull request for this issue: https://github.com/apache/spark/pull/16319 > CSV parser should return null for empty (or with "") numeric columns. > --------------------------------------------------------------------- > > Key: SPARK-18906 > URL: https://issues.apache.org/jira/browse/SPARK-18906 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.1 > Reporter: Kuba Tyszko > Priority: Minor > > Spark allows user to set a nullValue that will indicate certain value's > translation to a null type , for example string "NA" could be the one. > Data sources that use such nullValue but also have other columns that may > contain empty values may not be parsed correctly. > The change resolves that by assuming that: > when column is infered as numeric > its field will be set to null when parsing fails, for example upon seeing > empty value or an empty string. > Example: > --------------- > |char|int1|int2| > --------------- > |a|1|2| > --------------- > |a||0| > --------------- > |NA|""|""| > ---------------- > This example illustrates that column "char" may contain an empty value > indicated as "NA", column int1 has a "true null" value but then both int1 and > int2 columns have an empty string set as their values. > In such situation parsing will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org