[ 
https://issues.apache.org/jira/browse/IMPALA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966686#comment-16966686
 ] 

Attila Jeges edited comment on IMPALA-3933 at 11/4/19 2:12 PM:
---------------------------------------------------------------

[~mylogi...@gmail.com] [~grahn]
There are different timestamp/date related interop issues mixed up in the 
comments:

1. Impala 3.1+ & Hive 2.1 and Impala 3.1+ & Hive 3.1
Parquet TIMESTAMPs written by Hive are read back incorrectly by Impala and 
vice-versa.
This affects all TIMESTAMPs if the Hive client node's current timezone is not 
UTC.
DATE values are not affected by this issue.
This issue can be fixed by using the 
"--convert_legacy_hive_parquet_utc_timestamps=true" startup flag.

2. Impala 3.1+ & Hive 2.1
Historical Parquet/Avro TIMESTAMPs/DATEs written by Hive 2.1 are read back 
incorrectly by Impala and vice-versa.
This affects pre-1582-10-15 TIMESTAMPs and DATEs only and happens because 
Impala uses proleptic Gregorian calendar for pre-1582-10-15 dates while Hive 
2.1 uses Julian calendar.
There is no workaround for this issue, but Hive 3.1 switched to proleptic 
Gregorian calendar too, so with Hive 3.1+ this shouldn't be an issue anymore.

3. Impala 3.1+ & Hive 2.1 and Impala 3.1+ & Hive 3.1
This is the issue that is described in this JIRA.
This happens if Impala and Hive use different timezone databases. By default 
Impala uses the timezone database that comes with the OS while Hive uses the 
Java timezone database.
DATEs are not affected by this issue, only TIMESTAMPs.
To work around this issue Impala can be pointed to a custom timezone database. 
The process is documented here: 
https://impala.apache.org/docs/build/html/topics/impala_custom_timezones.html



was (Author: attilaj):
[~mylogi...@gmail.com] [~grahn]
There are different timestamp/date related interop issues mixed up in the 
comments:

1. Impala 3.1+ & Hive 2.1 and Impala 3.1+ & Hive 3.1
Parquet TIMESTAMPs written by Hive are read back incorrectly by Impala and 
vice-versa.
This affects all TIMESTAMPs if the Hive client node's current timezone is not 
UTC.
DATE values are not affected by this issue.
This issue can be fixed by using the 
"--convert_legacy_hive_parquet_utc_timestamps=true" startup flag.

2. Impala 3.1+ & Hive 2.1
Historical Parquet/Avro TIMESTAMPs/DATEs written by Hive 2.1 are read back 
incorrectly by Impala and vice-versa.
This affects pre-1582-10-15 TIMESTAMPs and DATEs only and happens because 
Impala uses proleptic Gregorian calendar for pre-1582-10-15 dates while Hive 
2.1 uses Julian calendar.
There is no workaround for this issue, but Hive 3.1 switched to proleptic 
Gregorian calendar too, so with Hive 3.1+ this shouldn't be an issue anymore.

3. Impala 3.1+ & Hive 2.1 and Impala 3.1+ & Hive 3.1
This is the issue that is described in this JIRA.
This happens if Impala and Hive use different timezone databases. By default 
Impala uses the timezone database that comes with the OS while Hive uses the 
Java timezone database.
To work around this issue Impala can be pointed to a custom timezone database. 
The process is documented here: 
https://impala.apache.org/docs/build/html/topics/impala_custom_timezones.html


> Time zone definitions of Hive/Spark and Impala differ for historical dates
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-3933
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3933
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: impala 2.3
>            Reporter: Adriano Simone
>            Priority: Minor
>
> How the TIMESTAMP skew with convert_legacy_hive_parquet_utc_timestamps=true
> Enabling --convert_legacy_hive_parquet_utc_timestamps=true seems to cause 
> data skew (improper converting) upon the reading for dates earlier than 1900 
> (not sure about the exact date).
> The following example was run on a server which is in CEST timezone, thus the 
> time difference is GMT+1 for dates before 1900 (I'm not sure, I haven't 
> checked the exact starting date of DST computation), and GMT+2 when summer 
> daylight saving time was applied.
> create table itst (col1 int, myts timestamp) stored as parquet;
> From impala:
> {code:java}
> insert into itst values (1,'2016-04-15 12:34:45');
> insert into itst values (2,'1949-04-15 12:34:45');
> insert into itst values (3,'1753-04-15 12:34:45');
> insert into itst values (4,'1752-04-15 12:34:45');
> {code}
> from hive
> {code:java}
> insert into itst values (5,'2016-04-15 12:34:45');
> insert into itst values (6,'1949-04-15 12:34:45');
> insert into itst values (7,'1753-04-15 12:34:45');
> insert into itst values (8,'1752-04-15 12:34:45');
> {code}
> From impala
> {code:java}
> select * from itst order by col1;
> {code}
> Result:
> {code:java}
> Query: select * from itst
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 10:34:45 |
> | 6    | 1949-04-15 10:34:45 |
> | 7    | 1753-04-15 11:34:45 |
> | 8    | 1752-04-15 11:34:45 |
> +------+---------------------+
> {code}
> The timestamps are looking good, the DST differences can be seen (hive 
> inserted it in local time, but impala shows it in UTC)
> From impala after setting the command line argument 
> "--convert_legacy_hive_parquet_utc_timestamps=true"
> {code:java}
> select * from itst order by col1;
> {code}
> The result in this case:
> {code:java}
> Query: select * from itst order by col1
> +------+---------------------+
> | col1 | myts                |
> +------+---------------------+
> | 1    | 2016-04-15 12:34:45 |
> | 2    | 1949-04-15 12:34:45 |
> | 3    | 1753-04-15 12:34:45 |
> | 4    | 1752-04-15 12:34:45 |
> | 5    | 2016-04-15 12:34:45 |
> | 6    | 1949-04-15 12:34:45 |
> | 7    | 1753-04-15 12:51:05 |
> | 8    | 1752-04-15 12:51:05 |
> +------+---------------------+
> {code}
> It seems that instead of 11:34:45 it is showing 12:51:05.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to