[ https://issues.apache.org/jira/browse/SPARK-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maxim Gekk resolved SPARK-15125. -------------------------------- Resolution: Fixed Fix Version/s: 2.4.0 The issue has been fixed by https://github.com/apache/spark/commit/7a2d4895c75d4c232c377876b61c05a083eab3c8 > CSV data source recognizes empty quoted strings in the input as null. > ---------------------------------------------------------------------- > > Key: SPARK-15125 > URL: https://issues.apache.org/jira/browse/SPARK-15125 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Suresh Thalamati > Priority: Major > Fix For: 2.4.0 > > > CSV data source does not differentiate between empty quoted strings and empty > fields as null. In some scenarios user would want to differentiate between > these values, especially in the context of SQL where NULL , and empty string > have different meanings If input data happens to be dump from traditional > relational data source, users will see different results for the SQL queries. > {code} > Repro: > Test Data: (test.csv) > year,make,model,comment,price > 2017,Tesla,Mode 3,looks nice.,35000.99 > 2016,Chevy,Bolt,"",29000.00 > 2015,Porsche,"",, > scala> val df= sqlContext.read.format("csv").option("header", > "true").option("inferSchema", "true").option("nullValue", > null).load("/tmp/test.csv") > df: org.apache.spark.sql.DataFrame = [year: int, make: string ... 3 more > fields] > scala> df.show > +----+-------+------+-----------+--------+ > |year| make| model| comment| price| > +----+-------+------+-----------+--------+ > |2017| Tesla|Mode 3|looks nice.|35000.99| > |2016| Chevy| Bolt| null| 29000.0| > |2015|Porsche| null| null| null| > +----+-------+------+-----------+--------+ > Expected: > +----+-------+------+-----------+--------+ > |year| make| model| comment| price| > +----+-------+------+-----------+--------+ > |2017| Tesla|Mode 3|looks nice.|35000.99| > |2016| Chevy| Bolt| | 29000.0| > |2015|Porsche| | null| null| > +----+-------+------+-----------+--------+ > {code} > Testing a fix for the this issue. I will give a shot at submitting a PR for > this soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org