[ https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964196#comment-16964196 ]
Tim Armstrong commented on IMPALA-3933: --------------------------------------- Dates don't have timezones or any kind of conversions, so yes - everything is a lot simpler. > Time zone definitions of Hive/Spark and Impala differ for historical dates > -------------------------------------------------------------------------- > > Key: IMPALA-3933 > URL: https://issues.apache.org/jira/browse/IMPALA-3933 > Project: IMPALA > Issue Type: New Feature > Components: Backend > Affects Versions: impala 2.3 > Reporter: Adriano Simone > Priority: Minor > > How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true > Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause > data skew (improper converting) upon the reading for dates earlier than 1900 > (not sure about the exact date). > The following example was run on a server which is in CEST timezone, thus the > time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't > checked the exact starting date of DST computation), and GMT+2 when summer > daylight saving time was applied. > create table itst (col1 int, myts timestamp) stored as parquet; > From impala: > {code:java} > insert into itst values (1,'2016-04-15 12:34:45'); > insert into itst values (2,'1949-04-15 12:34:45'); > insert into itst values (3,'1753-04-15 12:34:45'); > insert into itst values (4,'1752-04-15 12:34:45'); > {code} > from hive > {code:java} > insert into itst values (5,'2016-04-15 12:34:45'); > insert into itst values (6,'1949-04-15 12:34:45'); > insert into itst values (7,'1753-04-15 12:34:45'); > insert into itst values (8,'1752-04-15 12:34:45'); > {code} > From impala > {code:java} > select * from itst order by col1; > {code} > Result: > {code:java} > Query: select * from itst > +------+---------------------+ > | col1 | myts | > +------+---------------------+ > | 1 | 2016-04-15 12:34:45 | > | 2 | 1949-04-15 12:34:45 | > | 3 | 1753-04-15 12:34:45 | > | 4 | 1752-04-15 12:34:45 | > | 5 | 2016-04-15 10:34:45 | > | 6 | 1949-04-15 10:34:45 | > | 7 | 1753-04-15 11:34:45 | > | 8 | 1752-04-15 11:34:45 | > +------+---------------------+ > {code} > The timestamps are looking good, the DST differences can be seen (hive > inserted it in local time, but impala shows it in UTC) > From impala after setting the command line argument > "--convert_legacy_hive_parquet_utc_timestamps=true" > {code:java} > select * from itst order by col1; > {code} > The result in this case: > {code:java} > Query: select * from itst order by col1 > +------+---------------------+ > | col1 | myts | > +------+---------------------+ > | 1 | 2016-04-15 12:34:45 | > | 2 | 1949-04-15 12:34:45 | > | 3 | 1753-04-15 12:34:45 | > | 4 | 1752-04-15 12:34:45 | > | 5 | 2016-04-15 12:34:45 | > | 6 | 1949-04-15 12:34:45 | > | 7 | 1753-04-15 12:51:05 | > | 8 | 1752-04-15 12:51:05 | > +------+---------------------+ > {code} > It seems that instead of 11:34:45 it is showing 12:51:05. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org