[ 
https://issues.apache.org/jira/browse/SPARK-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303690#comment-15303690
 ] 

Takeshi Yamamuro commented on SPARK-15585:
------------------------------------------

We cannot pass `null` at `quote` for univocity parsers because the argument 
type is `char`.
So, I think `CSVOptions#getChar` cannot return `null`.
On the other hand, spark-csv uses commons CSV and it can set null at `quote` 
(See: 
https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvRelation.scala#L82).

It seems we can get the same behaviour with spark-csv if we set  'u0000' at 
quote when `null` passed.
https://github.com/maropu/spark/compare/master...SPARK-15585

Also, we need to fix readwriter.py according to this issue (any default quote 
is obviously set there)?
AFAIK there is no way for pyspark to pass `null` into CsvOptions#getChar.
https://github.com/apache/spark/blob/master/python/pyspark/sql/readwriter.py#L375




> Don't use null in data source options to indicate default value
> ---------------------------------------------------------------
>
>                 Key: SPARK-15585
>                 URL: https://issues.apache.org/jira/browse/SPARK-15585
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Reynold Xin
>            Priority: Critical
>
> See email: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/changed-behavior-for-csv-datasource-and-quoting-in-spark-2-0-0-SNAPSHOT-td17704.html
> We'd need to change DataFrameReader/DataFrameWriter in Python's 
> csv/json/parquet/... functions to put the actual default option values as 
> function parameters, rather than setting them to None. We can then in 
> CSVOptions.getChar (and JSONOptions, etc) to actually return null if the 
> value is null, rather  than setting it to default value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to