[ 
https://issues.apache.org/jira/browse/HIVE-21002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated HIVE-21002:
---------------------------------
    Description: 
Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x 
incorrectly. As an example session to demonstrate this problem, create a 
dataset using Hive version 2.x in America/Los_Angeles:
{code:sql}
hive> create table ts_‹format› (ts timestamp) stored as ‹format›;
hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
{code}
Querying this table by issuing
{code:sql}
hive> select * from ts_‹format›;
{code}
from different time zones using different versions of Hive and different 
storage formats gives the following results:
|‹format›|Writer time zone|Reader time zone|Hive 2.x|Hive 3.1|
|Avro and Parquet|America/Los_Angeles|America/Los_Angeles|2018-01-01 
*00*:00:00.0|2018-01-01 *08*:00:00.0|
|Avro and Parquet|America/Los_Angeles|Europe/Paris|2018-01-01 
*09*:00:00.0|2018-01-01 *08*:00:00.0|
|Textfile and ORC|America/Los_Angeles|America/Los_Angeles|2018-01-01 
00:00:00.0|2018-01-01 00:00:00.0|
|Textfile and ORC|America/Los_Angeles|Europe/Paris|2018-01-01 
00:00:00.0|2018-01-01 00:00:00.0|

*Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored 
in Avro and Parquet formats.* Apache ORC behaviour has not changed because it 
was modified to adjust timestamps to retain backwards compatibility. Textfile 
behaviour has not changed, because its processing involves parsing and 
formatting instead of proper serializing and deserializing, so they inherently 
had LocalDateTime semantics even in Hive 2.x.

  was:
Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x 
incorrectly. As an example session to demonstrate this problem, create a 
dataset using Hive version 2.x in America/Los_Angeles:
{code:sql}
hive> create table ts_‹format› (ts timestamp) stored as ‹format›;
hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
{code}
Querying this table by issuing
{code:sql}
hive> select * from ts_‹format›;
{code}
from different time zones using different versions of Hive and different 
storage formats gives the following results:
|‹format›|Time zone|Hive 2.x|Hive 3.1|
|Avro and Parquet|America/Los_Angeles|2018-01-01 *00*:00:00.0|2018-01-01 
*08*:00:00.0|
|Avro and Parquet|Europe/Paris|2018-01-01 *09*:00:00.0|2018-01-01 *08*:00:00.0|
|Textfile and ORC|America/Los_Angeles|2018-01-01 00:00:00.0|2018-01-01 
00:00:00.0|
|Textfile and ORC|Europe/Paris|2018-01-01 00:00:00.0|2018-01-01 00:00:00.0|

*Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored 
in Avro and Parquet formats.* Apache ORC behaviour has not changed because it 
was modified to adjust timestamps to retain backwards compatibility. Textfile 
behaviour has not changed, because its processing involves parsing and 
formatting instead of proper serializing and deserializing, so they inherently 
had LocalDateTime semantics even in Hive 2.x.


> Backwards incompatible change: Hive 3.1 reads back Avro and Parquet 
> timestamps written by Hive 2.x incorrectly
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21002
>                 URL: https://issues.apache.org/jira/browse/HIVE-21002
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.1.1
>            Reporter: Zoltan Ivanfi
>            Priority: Major
>
> Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x 
> incorrectly. As an example session to demonstrate this problem, create a 
> dataset using Hive version 2.x in America/Los_Angeles:
> {code:sql}
> hive> create table ts_‹format› (ts timestamp) stored as ‹format›;
> hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
> {code}
> Querying this table by issuing
> {code:sql}
> hive> select * from ts_‹format›;
> {code}
> from different time zones using different versions of Hive and different 
> storage formats gives the following results:
> |‹format›|Writer time zone|Reader time zone|Hive 2.x|Hive 3.1|
> |Avro and Parquet|America/Los_Angeles|America/Los_Angeles|2018-01-01 
> *00*:00:00.0|2018-01-01 *08*:00:00.0|
> |Avro and Parquet|America/Los_Angeles|Europe/Paris|2018-01-01 
> *09*:00:00.0|2018-01-01 *08*:00:00.0|
> |Textfile and ORC|America/Los_Angeles|America/Los_Angeles|2018-01-01 
> 00:00:00.0|2018-01-01 00:00:00.0|
> |Textfile and ORC|America/Los_Angeles|Europe/Paris|2018-01-01 
> 00:00:00.0|2018-01-01 00:00:00.0|
> *Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored 
> in Avro and Parquet formats.* Apache ORC behaviour has not changed because it 
> was modified to adjust timestamps to retain backwards compatibility. Textfile 
> behaviour has not changed, because its processing involves parsing and 
> formatting instead of proper serializing and deserializing, so they 
> inherently had LocalDateTime semantics even in Hive 2.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to