[ https://issues.apache.org/jira/browse/SPARK-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969996#comment-15969996 ]
Jacek Laskowski commented on SPARK-20299: ----------------------------------------- It does work for 2.1. It does not for 2.2.0-SNAPSHOT. Steps to reproduce: 1. Download the nightly build from http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/ (used {{spark-2.2.0-SNAPSHOT-bin-hadoop2.7.tgz}} from 2017-04-15 08:16) {code} ➜ spark-2.2.0-SNAPSHOT-bin-hadoop2.7 ./bin/spark-submit --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0-SNAPSHOT /_/ Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121 Branch HEAD Compiled by user jenkins on 2017-04-15T08:05:06Z Revision fb036c4413c2cd4d90880d080f418ec468d6c0fc Url https://github.com/apache/spark.git Type --help for more information. {code} 2. Execute the following and you'll *surely* see the exception: {code} scala> Seq(("1", null.asInstanceOf[Int]), ("2", 1)).toDS java.lang.RuntimeException: Error while encoding: java.lang.NullPointerException staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, true) AS _1#0 assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2 AS _2#1 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:454) at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:454) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:454) at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:377) at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:246) ... 48 elided Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_1$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:287) ... 58 more {code} > NullPointerException when null and string are in a tuple while encoding > Dataset > ------------------------------------------------------------------------------- > > Key: SPARK-20299 > URL: https://issues.apache.org/jira/browse/SPARK-20299 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Jacek Laskowski > Priority: Minor > > When creating a Dataset from a tuple with {{null}} and a string, NPE is > reported. When either is removed, it works fine. > {code} > scala> Seq((1, null.asInstanceOf[Int]), (2, 1)).toDS > res43: org.apache.spark.sql.Dataset[(Int, Int)] = [_1: int, _2: int] > scala> Seq(("1", null.asInstanceOf[Int]), ("2", 1)).toDS > java.lang.RuntimeException: Error while encoding: > java.lang.NullPointerException > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true], top > level Product input object), - root class: "scala.Tuple2")._1, true) AS _1#474 > assertnotnull(assertnotnull(input[0, scala.Tuple2, true], top level Product > input object), - root class: "scala.Tuple2")._2 AS _2#475 > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) > at > org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:454) > at > org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:454) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:454) > at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:377) > at > org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:246) > ... 48 elided > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_1$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:287) > ... 58 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org