[jira] [Updated] (SPARK-26770) Misleading/unhelpful error message when wrapping a null in an Option

2019-03-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-26770:
--
  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

I'm not sure; it's a user code error and it does get an informative exception 
about the cause. The place it gets checked is about the right place.

> Misleading/unhelpful error message when wrapping a null in an Option
> 
>
> Key: SPARK-26770
> URL: https://issues.apache.org/jira/browse/SPARK-26770
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: sam
>Priority: Minor
>
> This
> {code}
> // Using options to indicate nullable fields
> case class Product(productID: Option[Int],
>productName: Option[String])
> val productExtract: Dataset[Product] =
> spark.createDataset(Seq(
>   Product(
> productID = Some(6050286),
> // user mistake here, should be `None` not `Some(null)`
> productName = Some(null)
>   )))
> productExtract.count()
> {code}
> will give an error like the one below.  This error is thrown from quite deep 
> down, but there should be some handling logic further up to check for nulls 
> and to give a more informative error message.  E.g. it could tell the user 
> which field is null, it could detect the `Some(null)` error and suggest using 
> `None`.
> Whatever the exception it shouldn't be NPE, since this is clearly a user 
> error, so should be some kind of user error exception.
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in 
> stage 1.0 (TID 276, 10.139.64.8, executor 1): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:620)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:112)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> I've seen quite a few other people with this error, but I don't think it's 
> for the same reason:
> https://docs.databricks.com/spark/latest/data-sources/tips/redshift-npe.html
> https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/Dt6ilC9Dn54
> https://issues.apache.org/jira/browse/SPARK-17195
> https://issues.apache.org/jira/browse/SPARK-18859
> https://github.com/datastax/spark-cassandra-connector/issues/1062
> https://stackoverflow.com/questions/39875711/spark-sql-2-0-nullpointerexception-with-a-valid-postgresql-query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26770) Misleading/unhelpful error message when wrapping a null in an Option

2019-02-12 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-26770:
---
Component/s: (was: Spark Core)
 SQL

> Misleading/unhelpful error message when wrapping a null in an Option
> 
>
> Key: SPARK-26770
> URL: https://issues.apache.org/jira/browse/SPARK-26770
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: sam
>Priority: Major
>
> This
> {code}
> // Using options to indicate nullable fields
> case class Product(productID: Option[Int],
>productName: Option[String])
> val productExtract: Dataset[Product] =
> spark.createDataset(Seq(
>   Product(
> productID = Some(6050286),
> // user mistake here, should be `None` not `Some(null)`
> productName = Some(null)
>   )))
> productExtract.count()
> {code}
> will give an error like the one below.  This error is thrown from quite deep 
> down, but there should be some handling logic further up to check for nulls 
> and to give a more informative error message.  E.g. it could tell the user 
> which field is null, it could detect the `Some(null)` error and suggest using 
> `None`.
> Whatever the exception it shouldn't be NPE, since this is clearly a user 
> error, so should be some kind of user error exception.
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in 
> stage 1.0 (TID 276, 10.139.64.8, executor 1): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:620)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:112)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> I've seen quite a few other people with this error, but I don't think it's 
> for the same reason:
> https://docs.databricks.com/spark/latest/data-sources/tips/redshift-npe.html
> https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/Dt6ilC9Dn54
> https://issues.apache.org/jira/browse/SPARK-17195
> https://issues.apache.org/jira/browse/SPARK-18859
> https://github.com/datastax/spark-cassandra-connector/issues/1062
> https://stackoverflow.com/questions/39875711/spark-sql-2-0-nullpointerexception-with-a-valid-postgresql-query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org