[ https://issues.apache.org/jira/browse/SPARK-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davies Liu resolved SPARK-13266. -------------------------------- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12494 [https://github.com/apache/spark/pull/12494] > Python DataFrameReader converts None to "None" instead of null > -------------------------------------------------------------- > > Key: SPARK-13266 > URL: https://issues.apache.org/jira/browse/SPARK-13266 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 1.6.0 > Environment: Linux standalone but probably applies to all > Reporter: mathieu longtin > Labels: easyfix, patch > Fix For: 2.0.0 > > > If you do something like this: > {code:none} > tsv_loader = sqlContext.read.format('com.databricks.spark.csv') > tsv_loader.options(quote=None, escape=None) > {code} > The loader sees the string "None" as the _quote_ and _escape_ options. The > loader should get a _null_. > An easy fix is to modify *python/pyspark/sql/readwriter.py* near the top, > correct the _to_str_ function. Here's the patch: > {code:none} > diff --git a/python/pyspark/sql/readwriter.py > b/python/pyspark/sql/readwriter.py > index a3d7eca..ba18d13 100644 > --- a/python/pyspark/sql/readwriter.py > +++ b/python/pyspark/sql/readwriter.py > @@ -33,10 +33,12 @@ __all__ = ["DataFrameReader", "DataFrameWriter"] > def to_str(value): > """ > - A wrapper over str(), but convert bool values to lower case string > + A wrapper over str(), but convert bool values to lower case string, and > keep None > """ > if isinstance(value, bool): > return str(value).lower() > + elif value is None: > + return value > else: > return str(value) > {code} > This has been tested and works great. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org