Github user michal-databricks commented on the issue: https://github.com/apache/spark/pull/21169 @gatorsmile I was trying to understand the underlying issue with to/from_utc_timestamp. It took more time than I expected. Here is the outcome. These functions are not standard and seem to exist in Spark, Hive, Impala and Db2 products originating from Netezza. I did not find the evidence but it seems likely that Impala is the originator of the functions. The problem is that Impala implements TIMESTAMP differently that other systems on the list (Impala way is the standard SQL way). In Spark and Hive the TIMESTAMP always represents a specific UTC bound moment in time, so from/to_utc_timestamp have to resort to tricks to achieve Impala-like behavior. This specific issue is (SPARK-23715) the result of this. Similar problems existed in Hive (HIVE-12706). As for the actual change proposed, I don't mind it, but also don't think it will be very helpful. The previous behavior in some way was correct (given what the functions actually do and not going by the user documentation). Returning null does not indicate how to solve the problem (I see at least it is documented in the release notes). Also, if I understand correctly, this solution only works for the specific situation where the input for the function is of string type and the following expression will still return the 'incorrect' result: `from_utc_timestamp('1970-01-01 00:00:00+00:00' + interval '5' minute, 'GMT')` `from_utc_timestamp(cast('1970-01-01 00:00:00+00:00' as timestamp), 'GMT')` `from_utc_timestamp(timestamp_type_column, 'GMT')`
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org