[ https://issues.apache.org/jira/browse/HIVE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Harris updated HIVE-6040: ------------------------------ Description: unix_timestamp() is the root of string to timestamp conversion operations. Therefore, when local cluster timezone is NOT set to UTC the results produced by to_utc_timestamp() may be confusing to the user. The query below was run on a cluster with the local timezone set to 'America/Denver' For reference, 1386000000 = 2013-12-02 16:00:00 GMT SELECT from_unixtime(1386000000) as test01, -- test01 = 2013-12-02 09:00:00 -- local timezone applies to from_unixtime() unix_timestamp(from_unixtime(1386000000)) as test02, -- test02 = 1386000000 -- local timezone applies to both unix_timestamp and from_unixtime() unix_timestamp('2013-12-02 16:00:00') as test03, -- test03 = 1386025200 -- local timezone applies to from_unixtime() from_utc_timestamp(1386000000, 'UTC') as test04, -- test04 = 1970-01-16 18:00:00 -- This demonstrates the bug from HIVE-3454 cast(from_utc_timestamp(cast(1386000000 as DOUBLE),'UTC') as BIGINT) as test05, -- test05 = 1386000000 -- one work-around to HIVE-3454 cast(from_utc_timestamp(1386000000 * 1.0,'UTC') as BIGINT) as test06, -- test06 = 1386000000 -- a second work-around to HIVE-3454 cast(from_utc_timestamp(cast(cast(1386000000 as DOUBLE) as TIMESTAMP),'UTC') as BIGINT) as test07, --test07 = 1386000000 --direct cast conversion of DOUBLE to TIMESTAMP works cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test08, --test08 = 1386025200 -- same result as test03, cast conversion uses local timezone setting cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'UTC') as BIGINT) as test09, --test09 = 1386025200 --same as test08 demonstrating consistency even when using from_utc_timestamp() cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'America/Denver') as BIGINT) as test10, --test10 = 1386000000 --not exactly intuitive, but this is the correct result cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00'),'America/Denver') as BIGINT) as test11, --test11= 1360825 --similar to test10, but producing HIVE-3454 bug cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00') * 1.0,'America/Denver') as BIGINT) as test12, --test12=1386000000 --like test10/test11 with HIVE-3454 work-around. cast(to_utc_timestamp('2013-12-02 16:00:00','UTC') as BIGINT) as test13, --test13=1386025200 --intuitively this seems to be the correct approach --the result is wrong. cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test14, --test14=1386025200 --this result could be confusing to the user cast( from_utc_timestamp( cast( '2013-12-02 16:00:00' as TIMESTAMP ), '\${system:user.timezone}' ) as BIGINT ) as test15, --test15=1386000000 --like test10, using hiveconf system variable substitution cast( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00') * 1.0, '\${system:user.timezone}' ) as BIGINT) as test16, --test16=1386000000 --like test12, using hiveconf system variable substitution cast( to_utc_timestamp( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00' ) * 1.0, '\${system:user.timezone}' ), 'UTC') as BIGINT) as test17 --test17=1386000000 --universal work-around, produces results expected by test13 FROM dummy To work with datetime string conversions using to_utc_timestamp in a way that produces consistent results, regardless of cluster timezone configuration, the following work-around is required: to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}'), '<logsource_timezone>') One solution could be to add a utc_timestamp() udf to hive and for to_utc_timestamp() to use that for implicit datetime string conversions. was: unix_timestamp() is the root of string to timestamp conversion operations. Therefore, when local cluster timezone is NOT set to UTC the results produced by to_utc_timestamp() may be confusing to the user. The query below was run on a cluster with the local timezone set to 'America/Denver' For reference, 1386000000 = 2013-12-02 16:00:00 GMT SELECT from_unixtime(1386000000) as test01, -- test01 = 2013-12-02 09:00:00 -- local timezone applies to from_unixtime() unix_timestamp(from_unixtime(1386000000)) as test02, -- test02 = 1386000000 -- local timezone applies to both unix_timestamp and from_unixtime() unix_timestamp('2013-12-02 16:00:00') as test03, -- test03 = 1386025200 -- local timezone applies to from_unixtime() from_utc_timestamp(1386000000, 'UTC') as test04, -- test04 = 1970-01-16 18:00:00 -- This demonstrates the bug from HIVE-3454 cast(from_utc_timestamp(cast(1386000000 as DOUBLE),'UTC') as BIGINT) as test05, -- test05 = 1386000000 -- one work-around to HIVE-3454 cast(from_utc_timestamp(1386000000 * 1.0,'UTC') as BIGINT) as test06, -- test06 = 1386000000 -- a second work-around to HIVE-3454 cast(from_utc_timestamp(cast(cast(1386000000 as DOUBLE) as TIMESTAMP),'UTC') as BIGINT) as test07, --test07 = 1386000000 --direct cast conversion of DOUBLE to TIMESTAMP works cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test08, --test08 = 1386025200 -- same result as test03, cast conversion uses local timezone setting cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'UTC') as BIGINT) as test09, --test09 = 1386025200 --same as test08 demonstrating consistency even when using from_utc_timestamp() cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'America/Denver') as BIGINT) as test10, --test10 = 1386000000 --not exactly intuitive, but this is the correct result cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00'),'America/Denver') as BIGINT) as test11, --test11= 1360825 --similar to test10, but producing HIVE-3454 bug cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00') * 1.0,'America/Denver') as BIGINT) as test12, --test12=1386000000 --like test10/test11 with HIVE-3454 work-around. cast(to_utc_timestamp('2013-12-02 16:00:00','UTC') as BIGINT) as test13, --test13=1386025200 --intuitively this seems to be the correct approach --the result is wrong. cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test14, --test14=1386025200 --this result could be confusing to the user cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'\${system:user.timezone}') as BIGINT) as test15, --test15=1386000000 --like test10, using hiveconf system variable substitution cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}') as BIGINT) as test16, --test16=1386000000 --like test12, using hiveconf system variable substitution cast(to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}'), 'UTC') as BIGINT) as test17 --test17=1386000000 --universal work-around, produces results expected by test13 FROM dummy To work with datetime string conversions using to_utc_timestamp in a way that produces consistent results, regardless of cluster timezone configuration, the following work-around is required: to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}'), '<logsource_timezone>') One solution could be to add a utc_timestamp() udf to hive and for to_utc_timestamp() to use that for implicit datetime string conversions. > to_utc_timestamp() not intuitive when cluster timezone is not set to UTC > ------------------------------------------------------------------------ > > Key: HIVE-6040 > URL: https://issues.apache.org/jira/browse/HIVE-6040 > Project: Hive > Issue Type: Bug > Components: Types, UDF > Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, > 0.12.1 > Reporter: Ryan Harris > Priority: Minor > > unix_timestamp() is the root of string to timestamp conversion operations. > Therefore, when local cluster timezone is NOT set to UTC the results produced > by to_utc_timestamp() may be confusing to the user. > The query below was run on a cluster with the local timezone set to > 'America/Denver' > For reference, 1386000000 = 2013-12-02 16:00:00 GMT > SELECT > from_unixtime(1386000000) as test01, > -- test01 = 2013-12-02 09:00:00 > -- local timezone applies to from_unixtime() > unix_timestamp(from_unixtime(1386000000)) as test02, > -- test02 = 1386000000 > -- local timezone applies to both unix_timestamp and from_unixtime() > unix_timestamp('2013-12-02 16:00:00') as test03, > -- test03 = 1386025200 > -- local timezone applies to from_unixtime() > from_utc_timestamp(1386000000, 'UTC') as test04, > -- test04 = 1970-01-16 18:00:00 > -- This demonstrates the bug from HIVE-3454 > cast(from_utc_timestamp(cast(1386000000 as DOUBLE),'UTC') as BIGINT) as > test05, > -- test05 = 1386000000 > -- one work-around to HIVE-3454 > cast(from_utc_timestamp(1386000000 * 1.0,'UTC') as BIGINT) as test06, > -- test06 = 1386000000 > -- a second work-around to HIVE-3454 > cast(from_utc_timestamp(cast(cast(1386000000 as DOUBLE) as TIMESTAMP),'UTC') > as BIGINT) as test07, > --test07 = 1386000000 > --direct cast conversion of DOUBLE to TIMESTAMP works > cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test08, > --test08 = 1386025200 > -- same result as test03, cast conversion uses local timezone setting > cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'UTC') as > BIGINT) as test09, > --test09 = 1386025200 > --same as test08 demonstrating consistency even when using > from_utc_timestamp() > cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as > TIMESTAMP),'America/Denver') as BIGINT) as test10, > --test10 = 1386000000 > --not exactly intuitive, but this is the correct result > cast(from_utc_timestamp(unix_timestamp('2013-12-02 > 16:00:00'),'America/Denver') as BIGINT) as test11, > --test11= 1360825 > --similar to test10, but producing HIVE-3454 bug > cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00') * > 1.0,'America/Denver') as BIGINT) as test12, > --test12=1386000000 > --like test10/test11 with HIVE-3454 work-around. > cast(to_utc_timestamp('2013-12-02 16:00:00','UTC') as BIGINT) as test13, > --test13=1386025200 > --intuitively this seems to be the correct approach > --the result is wrong. > cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test14, > --test14=1386025200 > --this result could be confusing to the user > cast( from_utc_timestamp( cast( '2013-12-02 16:00:00' as TIMESTAMP ), > '\${system:user.timezone}' ) as BIGINT ) as test15, > --test15=1386000000 > --like test10, using hiveconf system variable substitution > cast( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00') * 1.0, > '\${system:user.timezone}' ) as BIGINT) as test16, > --test16=1386000000 > --like test12, using hiveconf system variable substitution > cast( to_utc_timestamp( from_utc_timestamp( unix_timestamp( '2013-12-02 > 16:00:00' ) * 1.0, '\${system:user.timezone}' ), 'UTC') as BIGINT) as test17 > --test17=1386000000 > --universal work-around, produces results expected by test13 > FROM dummy > To work with datetime string conversions using to_utc_timestamp in a way that > produces consistent results, regardless of cluster timezone configuration, > the following work-around is required: > to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 > 16:00:00')*1.0,'\${system:user.timezone}'), '<logsource_timezone>') > One solution could be to add a utc_timestamp() udf to hive and for > to_utc_timestamp() to use that for implicit datetime string conversions. -- This message was sent by Atlassian JIRA (v6.1.4#6159)