[ https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Dahler updated SPARK-37654: ----------------------------------- Description: h2. Description A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if the row contains a _null_ value at the requested index. {code:java} java.lang.NullPointerException at org.apache.spark.sql.Row.getSeq(Row.scala:319) at org.apache.spark.sql.Row.getSeq$(Row.scala:319) at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) at org.apache.spark.sql.Row.getList(Row.scala:327) at org.apache.spark.sql.Row.getList$(Row.scala:326) at org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) ... {code} Prior to 3.1.1, the code would not throw an exception and instead would return a null _Seq_ instance. h2. Reproduction # Start a new spark-shell instance # Execute the following script: {code:scala} import org.apache.spark.sql.Row Row(Seq("value")).getSeq(0) Row(Seq()).getSeq(0) Row(null).getSeq(0) {code} h3. Expected Output res2 outputs a _null_ value. {code:java} scala> import org.apache.spark.sql.Row import org.apache.spark.sql.Row scala> scala> Row(Seq("value")).getSeq(0) res0: Seq[Nothing] = List(value) scala> Row(Seq()).getSeq(0) res1: Seq[Nothing] = List() scala> Row(null).getSeq(0) res2: Seq[Nothing] = null {code} h3. Actual Output res2 throws a NullPointerException. {code:java} scala> import org.apache.spark.sql.Row import org.apache.spark.sql.Row scala> scala> Row(Seq("value")).getSeq(0) res0: Seq[Nothing] = List(value) scala> Row(Seq()).getSeq(0) res1: Seq[Nothing] = List() scala> Row(null).getSeq(0) java.lang.NullPointerException at org.apache.spark.sql.Row.getSeq(Row.scala:319) at org.apache.spark.sql.Row.getSeq$(Row.scala:319) at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) ... 47 elided {code} h3. Environments Tested Tested against the following releases using the provided reproduction steps: # spark-3.0.3-bin-hadoop2.7 - Succeeded {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.3 /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code} # spark-3.1.2-bin-hadoop3.2 - Failed {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.2 /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code} # spark-3.2.0-bin-hadoop3.2 - Failed {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.2.0 /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code} h2. Regression Source The regression appears to have been introduced in [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], which addressed [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526] h2. Work Around This regression can be worked around by using _Row.isNullAt(int)_ and handling the null scenario in user code, prior to calling _Row.getSeq(int)_ or _Row.getList(int)_. was: h2. Description A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if the row contains a _null_ value at the requested index. {code:java} java.lang.NullPointerException at org.apache.spark.sql.Row.getSeq(Row.scala:319) at org.apache.spark.sql.Row.getSeq$(Row.scala:319) at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) at org.apache.spark.sql.Row.getList(Row.scala:327) at org.apache.spark.sql.Row.getList$(Row.scala:326) at org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) ... {code} Prior to 3.1.1, the code would not throw an exception and instead would return a null _Seq_ instance. h2. Reproduction # Start a new spark-shell instance # Execute the following script: {code:scala} import org.apache.spark.sql.Row Row(Seq("value")).getSeq(0) Row(Seq()).getSeq(0) Row(null).getSeq(0) {code} h3. Expected Output res2 outputs a _null_ value. {code:java} scala> import org.apache.spark.sql.Row import org.apache.spark.sql.Row scala> scala> Row(Seq("value")).getSeq(0) res0: Seq[Nothing] = List(value) scala> Row(Seq()).getSeq(0) res1: Seq[Nothing] = List() scala> Row(null).getSeq(0) res2: Seq[Nothing] = null {code} h3. Actual Output res2 throws a NullPointerException. {code:java} scala> import org.apache.spark.sql.Row import org.apache.spark.sql.Row scala> scala> Row(Seq("value")).getSeq(0) res0: Seq[Nothing] = List(value) scala> Row(Seq()).getSeq(0) res1: Seq[Nothing] = List() scala> Row(null).getSeq(0) java.lang.NullPointerException at org.apache.spark.sql.Row.getSeq(Row.scala:319) at org.apache.spark.sql.Row.getSeq$(Row.scala:319) at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) ... 47 elided {code} h3. Environments Tested Tested against the following releases using the provided reproduction steps: # spark-3.0.3-bin-hadoop2.7 - Succeeded {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.3 /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code} # spark-3.1.2-bin-hadoop3.2 - Failed {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.2 /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code} # spark-3.2.0-bin-hadoop3.2 - Failed {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.2.0 /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code} h2. Regression Source The regression appears to have been introduced in [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], which addressed [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526] h2. Work Around This regression can be worked around by using _Row.isNullAt(int)_ and handling the null scenario in user code, prior to calling _Row.getSeq(int)_ or _Row.getList(int)_. > Regression - NullPointerException in Row.getSeq when field null > --------------------------------------------------------------- > > Key: SPARK-37654 > URL: https://issues.apache.org/jira/browse/SPARK-37654 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.1, 3.1.2, 3.2.0 > Reporter: Brandon Dahler > Priority: Major > > h2. Description > A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if > the row contains a _null_ value at the requested index. > {code:java} > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > at org.apache.spark.sql.Row.getList(Row.scala:327) > at org.apache.spark.sql.Row.getList$(Row.scala:326) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) > ... > {code} > > Prior to 3.1.1, the code would not throw an exception and instead would > return a null _Seq_ instance. > h2. Reproduction > # Start a new spark-shell instance > # Execute the following script: > {code:scala} > import org.apache.spark.sql.Row > Row(Seq("value")).getSeq(0) > Row(Seq()).getSeq(0) > Row(null).getSeq(0) {code} > h3. Expected Output > res2 outputs a _null_ value. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > res2: Seq[Nothing] = null > {code} > h3. Actual Output > res2 throws a NullPointerException. > {code:java} > scala> import org.apache.spark.sql.Row > import org.apache.spark.sql.Row > scala> > scala> Row(Seq("value")).getSeq(0) > res0: Seq[Nothing] = List(value) > scala> Row(Seq()).getSeq(0) > res1: Seq[Nothing] = List() > scala> Row(null).getSeq(0) > java.lang.NullPointerException > at org.apache.spark.sql.Row.getSeq(Row.scala:319) > at org.apache.spark.sql.Row.getSeq$(Row.scala:319) > at > org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) > ... 47 elided > {code} > h3. Environments Tested > Tested against the following releases using the provided reproduction steps: > # spark-3.0.3-bin-hadoop2.7 - Succeeded > {code:java} > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.0.3 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.1.2-bin-hadoop3.2 - Failed > {code:java} > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.1.2 > /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > # spark-3.2.0-bin-hadoop3.2 - Failed > {code:java} > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.2.0 > /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_312) {code} > h2. Regression Source > The regression appears to have been introduced in > [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], > which addressed > [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526] > h2. Work Around > This regression can be worked around by using _Row.isNullAt(int)_ and > handling the null scenario in user code, prior to calling _Row.getSeq(int)_ > or _Row.getList(int)_. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org