[ https://issues.apache.org/jira/browse/SPARK-28515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Költringer updated SPARK-28515: --------------------------------------- Description: I am not sure if this is a bug - but it was a very unexpected behavior, so I'd like some clarification. When parsing datetime-strings, when the date-time in question falls into the range of a "summer time switch" (e.g. in (most of) Europe, on 2015-03-29 at 2am the clock was forwarded to 3am), the {{to_timestamp}} method returns {{NULL}}. Minimal Example (using Python): {{>>> df = spark.createDataFrame([('201503290159',), ('201503290200',)], ['date_str'])}} {{>>> df.withColumn('timestamp', F.to_timestamp('date_str', 'yyyyMMddhhmm')).show()}} {{+-------------+------------------+ }} {{| date_str| timestamp|}} {{+-------------+------------------+}} {{|201503290159|2015-03-29 01:59:00|}} {{|201503290200| null|}} {{+-------------+------------------+}} A solution (or workaround) is to set the time zone for Spark to UTC: {{spark.conf.set("spark.sql.session.timeZone", "UTC")}} (see e.g. [https://stackoverflow.com/q/52594762)] Plain Java does not do this, e.g. this works as expected: {{SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddhhmm"); }} {{Date parsedDate = dateFormat.parse("201503290201"); }} {{Timestamp timestamp = new java.sql.Timestamp(parsedDate.getTime());}} So, is this really the intended behaviour? Is there documentation about this? THX. was: I am not sure if this is a bug - but it was a very unexpected behavior, so I'd like some clarification. When parsing datetime-strings, when the date-time in question falls into the range of a "summer time switch" (e.g. in (most of) Europe, on 2015-03-29 at 2am the clock was forwarded to 3am), the {{to_timestamp}} method returns {{NULL}}. Minimal Example (using Python): {{>>> df = spark.createDataFrame([('201503290159',), ('201503290200',)], ['date_str'])}} {{>>> df.withColumn('timestamp', F.to_timestamp('date_str', 'yyyyMMddhhmm')).show()}} {{+------------+-------------------+ }} {{| date_str| timestamp|}} {{+------------+-------------------+}} {{|201503290159|2015-03-29 01:59:00|}} {{|201503290200| null|}} {{+------------+-------------------+}} A solution (or workaround) is to set the time zone for Spark to UTC: {{spark.conf.set("spark.sql.session.timeZone", "UTC")}} (see e.g. [https://stackoverflow.com/q/52594762)] Plain Java does not do this, e.g. this works as expected: {{ SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddhhmm"); Date parsedDate = dateFormat.parse("201503290201"); Timestamp timestamp = new java.sql.Timestamp(parsedDate.getTime());}} So, is this really the intended behaviour? Is there documentation about this? THX. > to_timestamp returns null for summer time switch dates > ------------------------------------------------------ > > Key: SPARK-28515 > URL: https://issues.apache.org/jira/browse/SPARK-28515 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.3 > Environment: Spark 2.4.3 on Linux 64bit, openjdk-8-jre-headless > Reporter: Andreas Költringer > Priority: Major > > I am not sure if this is a bug - but it was a very unexpected behavior, so > I'd like some clarification. > When parsing datetime-strings, when the date-time in question falls into the > range of a "summer time switch" (e.g. in (most of) Europe, on 2015-03-29 at > 2am the clock was forwarded to 3am), the {{to_timestamp}} method returns > {{NULL}}. > Minimal Example (using Python): > {{>>> df = spark.createDataFrame([('201503290159',), ('201503290200',)], > ['date_str'])}} > {{>>> df.withColumn('timestamp', F.to_timestamp('date_str', > 'yyyyMMddhhmm')).show()}} > {{+-------------+------------------+ > }} > {{| date_str| timestamp|}} > {{+-------------+------------------+}} > {{|201503290159|2015-03-29 01:59:00|}} > {{|201503290200| null|}} > {{+-------------+------------------+}} > A solution (or workaround) is to set the time zone for Spark to UTC: > {{spark.conf.set("spark.sql.session.timeZone", "UTC")}} > (see e.g. [https://stackoverflow.com/q/52594762)] > > Plain Java does not do this, e.g. this works as expected: > {{SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddhhmm"); }} > {{Date parsedDate = dateFormat.parse("201503290201"); }} > {{Timestamp timestamp = new java.sql.Timestamp(parsedDate.getTime());}} > > So, is this really the intended behaviour? Is there documentation about this? > THX. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org