[ 
https://issues.apache.org/jira/browse/SPARK-26770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-26770:
------------------------------
      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

I'm not sure; it's a user code error and it does get an informative exception 
about the cause. The place it gets checked is about the right place.

> Misleading/unhelpful error message when wrapping a null in an Option
> --------------------------------------------------------------------
>
>                 Key: SPARK-26770
>                 URL: https://issues.apache.org/jira/browse/SPARK-26770
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: sam
>            Priority: Minor
>
> This
> {code}
> // Using options to indicate nullable fields
> case class Product(productID: Option[Int],
>                                productName: Option[String])
> val productExtract: Dataset[Product] =
>         spark.createDataset(Seq(
>           Product(
>             productID = Some(6050286),
>             // user mistake here, should be `None` not `Some(null)`
>             productName = Some(null)
>           )))
> productExtract.count()
> {code}
> will give an error like the one below.  This error is thrown from quite deep 
> down, but there should be some handling logic further up to check for nulls 
> and to give a more informative error message.  E.g. it could tell the user 
> which field is null, it could detect the `Some(null)` error and suggest using 
> `None`.
> Whatever the exception it shouldn't be NPE, since this is clearly a user 
> error, so should be some kind of user error exception.
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in 
> stage 1.0 (TID 276, 10.139.64.8, executor 1): java.lang.NullPointerException
>       at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>       at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:620)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>       at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>       at org.apache.spark.scheduler.Task.run(Task.scala:112)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> I've seen quite a few other people with this error, but I don't think it's 
> for the same reason:
> https://docs.databricks.com/spark/latest/data-sources/tips/redshift-npe.html
> https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/Dt6ilC9Dn54
> https://issues.apache.org/jira/browse/SPARK-17195
> https://issues.apache.org/jira/browse/SPARK-18859
> https://github.com/datastax/spark-cassandra-connector/issues/1062
> https://stackoverflow.com/questions/39875711/spark-sql-2-0-nullpointerexception-with-a-valid-postgresql-query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to