Lavan Vivekanandasarma created SPARK-56854:
----------------------------------------------

             Summary: Filter None values in PySpark 
DataFrame[Stream]Reader/Writer .option(s) for parity with Spark Connect
                 Key: SPARK-56854
                 URL: https://issues.apache.org/jira/browse/SPARK-56854
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 5.0.0
            Reporter: Lavan Vivekanandasarma


  Classic PySpark's DataFrame[Stream]Reader/Writer.option(key, None) and
  .options(**\{k: None}) forward None to the JVM as Java null. This diverges
  from the Spark Connect Python client (which has filtered None since
  SPARK-49263) and from OptionUtils._set_opts at
  python/pyspark/sql/readwriter.py:41-53, which already filters None.

  Example: spark.read.options(nullValue=None).schema("a STRING, b 
STRING").csv(path)
  For a row '"",val', Classic returns [Row(a='', b='val')] while Connect
  returns [Row(a=None, b='val')].

  Proposal: filter None from the public option and options methods on
  DataFrameReader, DataFrameWriter, DataFrameWriterV2, DataStreamReader,
  and DataStreamWriter, so Classic matches Connect and _set_opts. After
  the change, option(k, None) is a no-op and options(**\{k: None}) drops
  None entries before forwarding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to