[jira] [Commented] (SPARK-23835) When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1

2018-04-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425705#comment-16425705
 ] 

Apache Spark commented on SPARK-23835:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/20976

> When Dataset.as converts column from nullable to non-nullable type, null 
> Doubles are converted silently to -1
> -
>
> Key: SPARK-23835
> URL: https://issues.apache.org/jira/browse/SPARK-23835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> I constructed a DataFrame with a nullable java.lang.Double column (and an 
> extra Double column).  I then converted it to a Dataset using ```as[(Double, 
> Double)]```.  When the Dataset is shown, it has a null.  When it is collected 
> and printed, the null is silently converted to a -1.
> Code snippet to reproduce this:
> {code}
> val localSpark = spark
> import localSpark.implicits._
> val df = Seq[(java.lang.Double, Double)](
>   (1.0, 2.0),
>   (3.0, 4.0),
>   (Double.NaN, 5.0),
>   (null, 6.0)
> ).toDF("a", "b")
> df.show()  // OUTPUT 1: has null
> df.printSchema()
> val data = df.as[(Double, Double)]
> data.show()  // OUTPUT 2: has null
> data.collect().foreach(println)  // OUTPUT 3: has -1
> {code}
> OUTPUT 1 and 2:
> {code}
> ++---+
> |   a|  b|
> ++---+
> | 1.0|2.0|
> | 3.0|4.0|
> | NaN|5.0|
> |null|6.0|
> ++---+
> {code}
> OUTPUT 3:
> {code}
> (1.0,2.0)
> (3.0,4.0)
> (NaN,5.0)
> (-1.0,6.0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23835) When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1

2018-04-02 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423046#comment-16423046
 ] 

Michael Armbrust commented on SPARK-23835:
--

I believe the correct semantics are to throw a {{NullPointerException}} if the 
users tries to deserialize a null value into a type that doesn't support nulls. 
 See [this test case for an 
example|https://github.com/apache/spark/blob/b2edc30db1dcc6102687d20c158a2700965fdf51/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1428-L1443].
 We must just be missing the assertion in this case.

> When Dataset.as converts column from nullable to non-nullable type, null 
> Doubles are converted silently to -1
> -
>
> Key: SPARK-23835
> URL: https://issues.apache.org/jira/browse/SPARK-23835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> I constructed a DataFrame with a nullable java.lang.Double column (and an 
> extra Double column).  I then converted it to a Dataset using ```as[(Double, 
> Double)]```.  When the Dataset is shown, it has a null.  When it is collected 
> and printed, the null is silently converted to a -1.
> Code snippet to reproduce this:
> {code}
> val localSpark = spark
> import localSpark.implicits._
> val df = Seq[(java.lang.Double, Double)](
>   (1.0, 2.0),
>   (3.0, 4.0),
>   (Double.NaN, 5.0),
>   (null, 6.0)
> ).toDF("a", "b")
> df.show()  // OUTPUT 1: has null
> df.printSchema()
> val data = df.as[(Double, Double)]
> data.show()  // OUTPUT 2: has null
> data.collect().foreach(println)  // OUTPUT 3: has -1
> {code}
> OUTPUT 1 and 2:
> {code}
> ++---+
> |   a|  b|
> ++---+
> | 1.0|2.0|
> | 3.0|4.0|
> | NaN|5.0|
> |null|6.0|
> ++---+
> {code}
> OUTPUT 3:
> {code}
> (1.0,2.0)
> (3.0,4.0)
> (NaN,5.0)
> (-1.0,6.0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23835) When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1

2018-04-02 Thread Marco Gaido (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422082#comment-16422082
 ] 

Marco Gaido commented on SPARK-23835:
-

Actually this is not the first time we see this. Previously, we said that it 
was a user error, since if the data is a nullable Double, you should convert it 
using {{.as[Option[Double]]}}.

Anyway, enforcing this would mean avoiding the conversion of a nullable value 
to Dobule/Int/etc. (throwing an exception during analysis); but this can break 
existing users' applications (where maybe null are not present). Or we can 
eventually asserting there there is no null if we try to convert to primitive 
type (better than the previous I think).

> When Dataset.as converts column from nullable to non-nullable type, null 
> Doubles are converted silently to -1
> -
>
> Key: SPARK-23835
> URL: https://issues.apache.org/jira/browse/SPARK-23835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> I constructed a DataFrame with a nullable java.lang.Double column (and an 
> extra Double column).  I then converted it to a Dataset using ```as[(Double, 
> Double)]```.  When the Dataset is shown, it has a null.  When it is collected 
> and printed, the null is silently converted to a -1.
> Code snippet to reproduce this:
> {code}
> val localSpark = spark
> import localSpark.implicits._
> val df = Seq[(java.lang.Double, Double)](
>   (1.0, 2.0),
>   (3.0, 4.0),
>   (Double.NaN, 5.0),
>   (null, 6.0)
> ).toDF("a", "b")
> df.show()  // OUTPUT 1: has null
> df.printSchema()
> val data = df.as[(Double, Double)]
> data.show()  // OUTPUT 2: has null
> data.collect().foreach(println)  // OUTPUT 3: has -1
> {code}
> OUTPUT 1 and 2:
> {code}
> ++---+
> |   a|  b|
> ++---+
> | 1.0|2.0|
> | 3.0|4.0|
> | NaN|5.0|
> |null|6.0|
> ++---+
> {code}
> OUTPUT 3:
> {code}
> (1.0,2.0)
> (3.0,4.0)
> (NaN,5.0)
> (-1.0,6.0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23835) When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1

2018-03-31 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421346#comment-16421346
 ] 

Liang-Chi Hsieh commented on SPARK-23835:
-

What is the better behavior it should have?

> When Dataset.as converts column from nullable to non-nullable type, null 
> Doubles are converted silently to -1
> -
>
> Key: SPARK-23835
> URL: https://issues.apache.org/jira/browse/SPARK-23835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> I constructed a DataFrame with a nullable java.lang.Double column (and an 
> extra Double column).  I then converted it to a Dataset using ```as[(Double, 
> Double)]```.  When the Dataset is shown, it has a null.  When it is collected 
> and printed, the null is silently converted to a -1.
> Code snippet to reproduce this:
> {code}
> val localSpark = spark
> import localSpark.implicits._
> val df = Seq[(java.lang.Double, Double)](
>   (1.0, 2.0),
>   (3.0, 4.0),
>   (Double.NaN, 5.0),
>   (null, 6.0)
> ).toDF("a", "b")
> df.show()  // OUTPUT 1: has null
> df.printSchema()
> val data = df.as[(Double, Double)]
> data.show()  // OUTPUT 2: has null
> data.collect().foreach(println)  // OUTPUT 3: has -1
> {code}
> OUTPUT 1 and 2:
> {code}
> ++---+
> |   a|  b|
> ++---+
> | 1.0|2.0|
> | 3.0|4.0|
> | NaN|5.0|
> |null|6.0|
> ++---+
> {code}
> OUTPUT 3:
> {code}
> (1.0,2.0)
> (3.0,4.0)
> (NaN,5.0)
> (-1.0,6.0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23835) When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1

2018-03-30 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420809#comment-16420809
 ] 

Michael Armbrust commented on SPARK-23835:
--

/cc [~cloud_fan]

> When Dataset.as converts column from nullable to non-nullable type, null 
> Doubles are converted silently to -1
> -
>
> Key: SPARK-23835
> URL: https://issues.apache.org/jira/browse/SPARK-23835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> I constructed a DataFrame with a nullable java.lang.Double column (and an 
> extra Double column).  I then converted it to a Dataset using ```as[(Double, 
> Double)]```.  When the Dataset is shown, it has a null.  When it is collected 
> and printed, the null is silently converted to a -1.
> Code snippet to reproduce this:
> {code}
> val localSpark = spark
> import localSpark.implicits._
> val df = Seq[(java.lang.Double, Double)](
>   (1.0, 2.0),
>   (3.0, 4.0),
>   (Double.NaN, 5.0),
>   (null, 6.0)
> ).toDF("a", "b")
> df.show()  // OUTPUT 1: has null
> df.printSchema()
> val data = df.as[(Double, Double)]
> data.show()  // OUTPUT 2: has null
> data.collect().foreach(println)  // OUTPUT 3: has -1
> {code}
> OUTPUT 1 and 2:
> {code}
> ++---+
> |   a|  b|
> ++---+
> | 1.0|2.0|
> | 3.0|4.0|
> | NaN|5.0|
> |null|6.0|
> ++---+
> {code}
> OUTPUT 3:
> {code}
> (1.0,2.0)
> (3.0,4.0)
> (NaN,5.0)
> (-1.0,6.0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org