GitHub user pierre-borckmans opened a pull request:

    https://github.com/apache/spark/pull/10429

    [SPARK 12477][SQL] Tungsten projection fails for null values in array fields

    Accessing null elements in an array field fails when tungsten is enabled.
    It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.
    
    This PR solves this by checking if the accessed element in the array field 
is null, in the generated code.
    
    Example:
    ```
    // Array of String
    case class AS( as: Seq[String] )
    val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
    dfAS.registerTempTable("T_AS")
    for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from 
T_AS").collect.mkString(","))}
    ```
    
    With Tungsten disabled:
    ```
    0 = [a]
    1 = [null]
    2 = [b]
    ```
    
    With Tungsten enabled:
    ```
    15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 
15)
    java.lang.NullPointerException
        at 
org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
        at 
org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
        at 
org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pierre-borckmans/spark 
SPARK-12477_Tungsten-Projection-Null-Element-In-Array

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10429.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10429
    
----
commit b6a79e7fe73b5a1cabbc39a50fa4e47dd4f2a079
Author: pierre-borckmans <pierre.borckm...@realimpactanalytics.com>
Date:   2015-12-22T08:43:55Z

    CHECK if element in array field is null

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to