[jira] [Updated] (SPARK-30688) Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF
[ https://issues.apache.org/jira/browse/SPARK-30688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-30688: - Affects Version/s: 3.0.0 2.4.4 > Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF > -- > > Key: SPARK-30688 > URL: https://issues.apache.org/jira/browse/SPARK-30688 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.4.4, 3.0.0 >Reporter: Rajkumar Singh >Priority: Major > > > {code:java} > scala> spark.sql("select unix_timestamp('20201', 'ww')").show(); > +-+ > |unix_timestamp(20201, ww)| > +-+ > | null| > +-+ > > scala> spark.sql("select unix_timestamp('20202', 'ww')").show(); > -+ > |unix_timestamp(20202, ww)| > +-+ > | 1578182400| > +-+ > > {code} > > > This seems to happen for leap year only, I dig deeper into it and it seems > that Spark is using the java.text.SimpleDateFormat and try to parse the > expression here > [org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652] > {code:java} > formatter.parse( > t.asInstanceOf[UTF8String].toString).getTime / 1000L{code} > but fail and SimpleDateFormat unable to parse the date throw Unparseable > Exception but Spark handle it silently and returns NULL. > > *Spark-3.0:* I did some tests where spark no longer using the legacy > java.text.SimpleDateFormat but java date/time API, it seems date/time API > expect a valid date with valid format > org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter#parse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30688) Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF
[ https://issues.apache.org/jira/browse/SPARK-30688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-30688: - Affects Version/s: (was: 3.0.0) > Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF > -- > > Key: SPARK-30688 > URL: https://issues.apache.org/jira/browse/SPARK-30688 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Rajkumar Singh >Priority: Major > > > {code:java} > scala> spark.sql("select unix_timestamp('20201', 'ww')").show(); > +-+ > |unix_timestamp(20201, ww)| > +-+ > | null| > +-+ > > scala> spark.sql("select unix_timestamp('20202', 'ww')").show(); > -+ > |unix_timestamp(20202, ww)| > +-+ > | 1578182400| > +-+ > > {code} > > > This seems to happen for leap year only, I dig deeper into it and it seems > that Spark is using the java.text.SimpleDateFormat and try to parse the > expression here > [org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652] > {code:java} > formatter.parse( > t.asInstanceOf[UTF8String].toString).getTime / 1000L{code} > but fail and SimpleDateFormat unable to parse the date throw Unparseable > Exception but Spark handle it silently and returns NULL. > > *Spark-3.0:* I did some tests where spark no longer using the legacy > java.text.SimpleDateFormat but java date/time API, it seems date/time API > expect a valid date with valid format > org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter#parse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30688) Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF
[ https://issues.apache.org/jira/browse/SPARK-30688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-30688: - Affects Version/s: 3.0.0 > Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF > -- > > Key: SPARK-30688 > URL: https://issues.apache.org/jira/browse/SPARK-30688 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 3.0.0 >Reporter: Rajkumar Singh >Priority: Major > > > {code:java} > scala> spark.sql("select unix_timestamp('20201', 'ww')").show(); > +-+ > |unix_timestamp(20201, ww)| > +-+ > | null| > +-+ > > scala> spark.sql("select unix_timestamp('20202', 'ww')").show(); > -+ > |unix_timestamp(20202, ww)| > +-+ > | 1578182400| > +-+ > > {code} > > > This seems to happen for leap year only, I dig deeper into it and it seems > that Spark is using the java.text.SimpleDateFormat and try to parse the > expression here > [org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652] > {code:java} > formatter.parse( > t.asInstanceOf[UTF8String].toString).getTime / 1000L{code} > but fail and SimpleDateFormat unable to parse the date throw Unparseable > Exception but Spark handle it silently and returns NULL. > > *Spark-3.0:* I did some tests where spark no longer using the legacy > java.text.SimpleDateFormat but java date/time API, it seems date/time API > expect a valid date with valid format > org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter#parse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30688) Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF
[ https://issues.apache.org/jira/browse/SPARK-30688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajkumar Singh updated SPARK-30688: --- Description: {code:java} scala> spark.sql("select unix_timestamp('20201', 'ww')").show(); +-+ |unix_timestamp(20201, ww)| +-+ | null| +-+ scala> spark.sql("select unix_timestamp('20202', 'ww')").show(); -+ |unix_timestamp(20202, ww)| +-+ | 1578182400| +-+ {code} This seems to happen for leap year only, I dig deeper into it and it seems that Spark is using the java.text.SimpleDateFormat and try to parse the expression here [org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652] {code:java} formatter.parse( t.asInstanceOf[UTF8String].toString).getTime / 1000L{code} but fail and SimpleDateFormat unable to parse the date throw Unparseable Exception but Spark handle it silently and returns NULL. *Spark-3.0:* I did some tests where spark no longer using the legacy java.text.SimpleDateFormat but java date/time API, it seems date/time API expect a valid date with valid format org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter#parse was: {code:java} scala> spark.sql("select unix_timestamp('20201', 'ww')").show(); +-+ |unix_timestamp(20201, ww)| +-+ | null| +-+ scala> spark.sql("select unix_timestamp('20202', 'ww')").show(); -+ |unix_timestamp(20202, ww)| +-+ | 1578182400| +-+ {code} This seems to happen for leap year only, I dig deeper into it and it seems that Spark is using the java.text.SimpleDateFormat and try to parse the expression here [org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652] {code:java} formatter.parse( t.asInstanceOf[UTF8String].toString).getTime / 1000L{code} but fail and SimpleDateFormat unable to parse the date throw Unparseable Exception but Spark handle it silently and returns NULL. > Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF > -- > > Key: SPARK-30688 > URL: https://issues.apache.org/jira/browse/SPARK-30688 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Rajkumar Singh >Priority: Major > > > {code:java} > scala> spark.sql("select unix_timestamp('20201', 'ww')").show(); > +-+ > |unix_timestamp(20201, ww)| > +-+ > | null| > +-+ > > scala> spark.sql("select unix_timestamp('20202', 'ww')").show(); > -+ > |unix_timestamp(20202, ww)| > +-+ > | 1578182400| > +-+ > > {code} > > > This seems to happen for leap year only, I dig deeper into it and it seems > that Spark is using the java.text.SimpleDateFormat and try to parse the > expression here > [org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652] > {code:java} > formatter.parse( > t.asInstanceOf[UTF8String].toString).getTime / 1000L{code} > but fail and SimpleDateFormat unable to parse the date throw Unparseable > Exception but Spark handle it silently and returns NULL. > > *Spark-3.0:* I did some tests where spark no longer using the legacy > java.text.SimpleDateFormat but java date/time API, it seems date/time API > expect a valid date with valid format > org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter#parse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org