This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 9f07e4a747b [SPARK-43389][SQL] Added a null check for lineSep option 9f07e4a747b is described below commit 9f07e4a747b0e2a62b954db3c9be425c924da47a Author: Gurpreet Singh <gdhu...@gmail.com> AuthorDate: Thu Jul 13 18:17:45 2023 -0500 [SPARK-43389][SQL] Added a null check for lineSep option ### What changes were proposed in this pull request? ### Why are the changes needed? - `spark.read.csv` throws `NullPointerException` when lineSep is set to None - More details about the issue here: https://issues.apache.org/jira/browse/SPARK-43389 ### Does this PR introduce _any_ user-facing change? ~~Users now should be able to explicitly set `lineSep` as `None` without getting an exception~~ After some discussion, it was decided to add a `require` check for `null` instead of letting it through. ### How was this patch tested? Tested the changes with a python script that explicitly sets `lineSep` to `None` ```python from pyspark.sql import SparkSession # Create a SparkSession spark = SparkSession.builder.appName("HelloWorld").getOrCreate() # Read CSV into a DataFrame df = spark.read.csv("/tmp/hello.csv", header=True, inferSchema=True, lineSep=None) # Also tested the following case when options are passed before invoking .csv #df = spark.read.option("lineSep", None).csv("/Users/gdhuper/Documents/tmp/hello.csv", header=True, inferSchema=True) # Show the DataFrame df.show() # Stop the SparkSession spark.stop() ``` Closes #41904 from gdhuper/gdhuper/SPARK-43389. Authored-by: Gurpreet Singh <gdhu...@gmail.com> Signed-off-by: Sean Owen <sro...@gmail.com> --- .../src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala | 1 + .../org/apache/spark/sql/execution/datasources/text/TextOptions.scala | 1 + 2 files changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala index 2b6b60fdf76..f4ad1f2f2e5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala @@ -254,6 +254,7 @@ class CSVOptions( * A string between two consecutive JSON records. */ val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { sep => + require(sep != null, "'lineSep' cannot be a null value.") require(sep.nonEmpty, "'lineSep' cannot be an empty string.") // Intentionally allow it up to 2 for Window's CRLF although multiple // characters have an issue with quotes. This is intentionally undocumented. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala index f26f05cbe1c..468d58974ed 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala @@ -45,6 +45,7 @@ class TextOptions(@transient private val parameters: CaseInsensitiveMap[String]) val encoding: Option[String] = parameters.get(ENCODING) val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { lineSep => + require(lineSep != null, s"'$LINE_SEP' cannot be a null value.") require(lineSep.nonEmpty, s"'$LINE_SEP' cannot be an empty string.") lineSep --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org