This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 9f07e4a747b [SPARK-43389][SQL] Added a null check for lineSep option
9f07e4a747b is described below

commit 9f07e4a747b0e2a62b954db3c9be425c924da47a
Author: Gurpreet Singh <gdhu...@gmail.com>
AuthorDate: Thu Jul 13 18:17:45 2023 -0500

    [SPARK-43389][SQL] Added a null check for lineSep option
    
    ### What changes were proposed in this pull request?
    
    ### Why are the changes needed?
    
    - `spark.read.csv` throws `NullPointerException` when lineSep is set to None
    - More details about the issue here: 
https://issues.apache.org/jira/browse/SPARK-43389
    
    ### Does this PR introduce _any_ user-facing change?
    
    ~~Users now should be able to explicitly set `lineSep` as `None` without 
getting an exception~~
    After some discussion, it was decided to add a `require` check for `null` 
instead of letting it through.
    
    ### How was this patch tested?
    
    Tested the changes with a python script that explicitly sets `lineSep` to 
`None`
    ```python
    from pyspark.sql import SparkSession
    
    # Create a SparkSession
    spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
    
    # Read CSV into a DataFrame
    df = spark.read.csv("/tmp/hello.csv", header=True, inferSchema=True, 
lineSep=None)
    
    # Also tested the following case when options are passed before invoking 
.csv
    #df = spark.read.option("lineSep", 
None).csv("/Users/gdhuper/Documents/tmp/hello.csv", header=True, 
inferSchema=True)
    
    # Show the DataFrame
    df.show()
    
    # Stop the SparkSession
    spark.stop()
    ```
    
    Closes #41904 from gdhuper/gdhuper/SPARK-43389.
    
    Authored-by: Gurpreet Singh <gdhu...@gmail.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 .../src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala    | 1 +
 .../org/apache/spark/sql/execution/datasources/text/TextOptions.scala    | 1 +
 2 files changed, 2 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
index 2b6b60fdf76..f4ad1f2f2e5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
@@ -254,6 +254,7 @@ class CSVOptions(
    * A string between two consecutive JSON records.
    */
   val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { sep =>
+    require(sep != null, "'lineSep' cannot be a null value.")
     require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
     // Intentionally allow it up to 2 for Window's CRLF although multiple
     // characters have an issue with quotes. This is intentionally 
undocumented.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
index f26f05cbe1c..468d58974ed 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
@@ -45,6 +45,7 @@ class TextOptions(@transient private val parameters: 
CaseInsensitiveMap[String])
   val encoding: Option[String] = parameters.get(ENCODING)
 
   val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { lineSep =>
+    require(lineSep != null, s"'$LINE_SEP' cannot be a null value.")
     require(lineSep.nonEmpty, s"'$LINE_SEP' cannot be an empty string.")
 
     lineSep


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to