[ https://issues.apache.org/jira/browse/SPARK-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067942#comment-15067942 ]
Apache Spark commented on SPARK-12477: -------------------------------------- User 'pierre-borckmans' has created a pull request for this issue: https://github.com/apache/spark/pull/10429 > [SQL] Tungsten projection fails for null values in array fields > --------------------------------------------------------------- > > Key: SPARK-12477 > URL: https://issues.apache.org/jira/browse/SPARK-12477 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.2, 1.6.0 > Reporter: Pierre Borckmans > > Accessing null elements in an array field fails when tungsten is enabled. > The following code works in Spark 1.3.1, and in Spark > 1.5 with Tungsten > disabled: > {code} > // Array of String > case class AS( as: Seq[String] ) > val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF > dfAS.registerTempTable("T_AS") > for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select as[$i] from > T_AS").collect.mkString(","))} > // Array of Int > case class AI( ai: Seq[Option[Int]] ) > val dfAI = sc.parallelize( Seq( AI ( Seq(Some(1),None,Some(2) ) ) ) ).toDF > dfAI.registerTempTable("T_AI") > for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select ai[$i] from > T_AI").collect.mkString(","))} > // Array of struct[Int,String] > case class B(x: Option[Int], y: String) > case class A( b: Seq[B] ) > val df1 = sc.parallelize( Seq( A ( Seq( B(Some(1),"a"),B(Some(2),"b"), > B(None, "c"), B(Some(4),null), B(None,null), null ) ) ) ).toDF > df1.registerTempTable("T1") > val df2 = sc.parallelize( Seq( A ( Seq( B(Some(1),"a"),B(Some(2),"b"), > B(None, "c"), B(Some(4),null), B(None,null), null ) ), A(null) ) ).toDF > df2.registerTempTable("T2") > for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select b[$i].x, > b[$i].y from T1").collect.mkString(","))} > for (i <- 0 to 10) { println(i + " = " + sqlContext.sql(s"select b[$i].x, > b[$i].y from T2").collect.mkString(","))} > // Struct[Int,String] > case class C(b: B) > val df3 = sc.parallelize( Seq( C ( B(Some(1),"test") ), C(null) ) ).toDF > df3.registerTempTable("T3") > sqlContext.sql("select b.x, b.y from T3").collect > {code} > With Tungsten enabled, it reaches NullPointerException. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org