[jira] [Updated] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null

Brandon Dahler (Jira) Wed, 15 Dec 2021 10:00:09 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brandon Dahler updated SPARK-37654:
-----------------------------------
    Description: 
h2. Description

A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if the 
row contains a _null_ value at the requested index.
{code:java}
java.lang.NullPointerException
        at org.apache.spark.sql.Row.getSeq(Row.scala:319)
        at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
        at 
org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
        at org.apache.spark.sql.Row.getList(Row.scala:327)
        at org.apache.spark.sql.Row.getList$(Row.scala:326)
        at 
org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
        ...
{code}
 

Prior to 3.1.1, the code would not throw an exception and instead would return 
a null _Seq_ instance.
h2. Reproduction
 # Start a new spark-shell instance
 # Execute the following script:
{code:scala}
import org.apache.spark.sql.Row

Row(Seq("value")).getSeq(0)
Row(Seq()).getSeq(0)
Row(null).getSeq(0) {code}

h3. Expected Output

res2 outputs a _null_ value.
{code:java}
scala> import org.apache.spark.sql.Row
import org.apache.spark.sql.Row

scala>

scala> Row(Seq("value")).getSeq(0)
res0: Seq[Nothing] = List(value)

scala> Row(Seq()).getSeq(0)
res1: Seq[Nothing] = List()

scala> Row(null).getSeq(0)
res2: Seq[Nothing] = null
{code}
h3. Actual Output

res2 throws a NullPointerException.
{code:java}
scala> import org.apache.spark.sql.Row
import org.apache.spark.sql.Row

scala>

scala> Row(Seq("value")).getSeq(0)
res0: Seq[Nothing] = List(value)

scala> Row(Seq()).getSeq(0)
res1: Seq[Nothing] = List()

scala> Row(null).getSeq(0)
java.lang.NullPointerException
  at org.apache.spark.sql.Row.getSeq(Row.scala:319)
  at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
  at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
  ... 47 elided
{code}

h3. Environments Tested
Tested against the following releases using the provided reproduction steps:
 # spark-3.0.3-bin-hadoop2.7 - Succeeded
{code:java}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.3
      /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 
{code}
 # spark-3.1.2-bin-hadoop3.2 - Failed
{code:java}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
      /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 
{code}
 # spark-3.2.0-bin-hadoop3.2 - Failed
{code:java}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 
{code}


h2. Regression Source
The regression appears to have been introduced in 
[25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317],
 which addressed [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]

h2. Work Around
This regression can be worked around by using _Row.isNullAt(int)_ and handling 
the null scenario in user code, prior to calling _Row.getSeq(int)_ or 
_Row.getList(int)_.

  was:
h2. Description

A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if the 
row contains a _null_ value at the requested index.
{code:java}
java.lang.NullPointerException
        at org.apache.spark.sql.Row.getSeq(Row.scala:319)
        at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
        at 
org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
        at org.apache.spark.sql.Row.getList(Row.scala:327)
        at org.apache.spark.sql.Row.getList$(Row.scala:326)
        at 
org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
        ...
{code}
 

Prior to 3.1.1, the code would not throw an exception and instead would return 
a null _Seq_ instance.
h2. Reproduction
 # Start a new spark-shell instance
 # Execute the following script:
{code:scala}
import org.apache.spark.sql.Row

Row(Seq("value")).getSeq(0)
Row(Seq()).getSeq(0)
Row(null).getSeq(0) {code}

h3. Expected Output

res2 outputs a _null_ value.
{code:java}
scala> import org.apache.spark.sql.Row
import org.apache.spark.sql.Row

scala>

scala> Row(Seq("value")).getSeq(0)
res0: Seq[Nothing] = List(value)

scala> Row(Seq()).getSeq(0)
res1: Seq[Nothing] = List()

scala> Row(null).getSeq(0)
res2: Seq[Nothing] = null
{code}
h3. Actual Output

res2 throws a NullPointerException.
{code:java}
scala> import org.apache.spark.sql.Row
import org.apache.spark.sql.Row

scala>

scala> Row(Seq("value")).getSeq(0)
res0: Seq[Nothing] = List(value)

scala> Row(Seq()).getSeq(0)
res1: Seq[Nothing] = List()

scala> Row(null).getSeq(0)
java.lang.NullPointerException
  at org.apache.spark.sql.Row.getSeq(Row.scala:319)
  at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
  at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
  ... 47 elided
{code}

h3. Environments Tested
Tested against the following releases using the provided reproduction steps:
 # spark-3.0.3-bin-hadoop2.7 - Succeeded
{code:java}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.3
      /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 
{code}
 # spark-3.1.2-bin-hadoop3.2 - Failed
{code:java}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
      /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 
{code}
 # spark-3.2.0-bin-hadoop3.2 - Failed
{code:java}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 
{code}
 
h2. Regression Source
The regression appears to have been introduced in 
[25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317],
 which addressed [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]

h2. Work Around
This regression can be worked around by using _Row.isNullAt(int)_ and handling 
the null scenario in user code, prior to calling _Row.getSeq(int)_ or 
_Row.getList(int)_.


> Regression - NullPointerException in Row.getSeq when field null
> ---------------------------------------------------------------
>
>                 Key: SPARK-37654
>                 URL: https://issues.apache.org/jira/browse/SPARK-37654
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.1, 3.1.2, 3.2.0
>            Reporter: Brandon Dahler
>            Priority: Major
>
> h2. Description
> A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if 
> the row contains a _null_ value at the requested index.
> {code:java}
> java.lang.NullPointerException
>       at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>       at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>       at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>       at org.apache.spark.sql.Row.getList(Row.scala:327)
>       at org.apache.spark.sql.Row.getList$(Row.scala:326)
>       at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
>         ...
> {code}
>  
> Prior to 3.1.1, the code would not throw an exception and instead would 
> return a null _Seq_ instance.
> h2. Reproduction
>  # Start a new spark-shell instance
>  # Execute the following script:
> {code:scala}
> import org.apache.spark.sql.Row
> Row(Seq("value")).getSeq(0)
> Row(Seq()).getSeq(0)
> Row(null).getSeq(0) {code}
> h3. Expected Output
> res2 outputs a _null_ value.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> res2: Seq[Nothing] = null
> {code}
> h3. Actual Output
> res2 throws a NullPointerException.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   ... 47 elided
> {code}
> h3. Environments Tested
> Tested against the following releases using the provided reproduction steps:
>  # spark-3.0.3-bin-hadoop2.7 - Succeeded
> {code:java}
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.0.3
>       /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.1.2-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
>       /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.2.0-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>       /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
> h2. Regression Source
> The regression appears to have been introduced in 
> [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317],
>  which addressed 
> [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]
> h2. Work Around
> This regression can be worked around by using _Row.isNullAt(int)_ and 
> handling the null scenario in user code, prior to calling _Row.getSeq(int)_ 
> or _Row.getList(int)_.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null

Reply via email to