Leandro Rosa created SPARK-27450: ------------------------------------ Summary: Timestamp cast fails when ISO8601 string omits zero minutes or seconds Key: SPARK-27450 URL: https://issues.apache.org/jira/browse/SPARK-27450 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Environment: Spark 2.3.x Reporter: Leandro Rosa
ISO8601 allows to omit zero minutes, seconds and milliseconds. {quote} |hh:mm:ss.sss|_or_|hhmmss.sss| |hh:mm:ss|_or_|hhmmss| |hh:mm|_or_|hhmm| | |hh| {quote} {quote}Either the seconds, or the minutes and seconds, may be omitted from the basic or extended time formats for greater brevity but decreased accuracy: [hh]:[mm], [hh][mm] and [hh] are the resulting reduced accuracy time formats {quote} Source: [Wikipedia ISO8601|https://en.wikipedia.org/wiki/ISO_8601] Popular libs, such as [ZonedDateTime|[https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html]], respect that. However, Timestamp cast fails silently. {code:java} import org.apache.spark.sql.types._ val df1 = Seq(("2017-08-01T02:33")).toDF("eventTimeString") // NON-ISO8601 (missing TZ offset) [OK] val new_df1 = df1 .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) new_df1.show(false) +----------------+-------------------+ |eventTimeString |eventTimeTS | +----------------+-------------------+ |2017-08-01T02:33|2017-08-01 02:33:00| +----------------+-------------------+ {code} {code:java} val df2 = Seq(("2017-08-01T02:33Z")).toDF("eventTimeString") // ISO8601 [FAIL] val new_df2 = df2 .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) new_df2.show(false) +-----------------+-----------+ |eventTimeString |eventTimeTS| +-----------------+-----------+ |2017-08-01T02:33Z|null | +-----------------+-----------+ {code} {code:java} val df3 = Seq(("2017-08-01T02:33-03:00")).toDF("eventTimeString") // ISO8601 [FAIL] val new_df3 = df3 .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) new_df3.show(false) +----------------------+-----------+ |eventTimeString |eventTimeTS| +----------------------+-----------+ |2017-08-01T02:33-03:00|null | +----------------------+-----------+ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org