Github user michal-databricks commented on the issue:

    https://github.com/apache/spark/pull/21169
  
    @gatorsmile I was trying to understand the underlying issue with 
to/from_utc_timestamp. It took more time than I expected. Here is the outcome.
    These functions are not standard and seem to exist in Spark, Hive, Impala 
and Db2 products originating from Netezza. I did not find the evidence but it 
seems likely that Impala is the originator of the functions. The problem is 
that Impala implements TIMESTAMP differently that other systems on the list 
(Impala way is the standard SQL way). In Spark and Hive the TIMESTAMP always 
represents a specific UTC bound moment in time, so from/to_utc_timestamp have 
to resort to tricks to achieve Impala-like behavior. This specific issue is 
(SPARK-23715) the result of this. Similar problems existed in Hive (HIVE-12706).
    As for the actual change proposed, I don't mind it, but also don't think it 
will be very helpful. The previous behavior in some way was correct (given what 
the functions actually do and not going by the user documentation). Returning 
null does not indicate how to solve the problem (I see at least it is 
documented in the release notes).
    Also, if I understand correctly, this solution only works for the specific 
situation where the input for the function is of string type and the following 
expression will still return the 'incorrect' result:
    `from_utc_timestamp('1970-01-01 00:00:00+00:00' + interval '5' minute, 
'GMT')`
    `from_utc_timestamp(cast('1970-01-01 00:00:00+00:00'  as timestamp), 'GMT')`
    `from_utc_timestamp(timestamp_type_column, 'GMT')`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to