Stamatis Zampetakis created HIVE-25219:
------------------------------------------
Summary: Backward incompatible timestamp serialization in Avro for
certain timezones
Key: HIVE-25219
URL: https://issues.apache.org/jira/browse/HIVE-25219
Project: Hive
Issue Type: Bug
Components: Serializers/Deserializers
Affects Versions: 3.1.0
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
Fix For: 4.0.0
HIVE-12192, HIVE-20007 changed the way that timestamp computations are
performed and to some extend how timestamps are serialized and deserialized in
files (Parquet, Avro).
In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro
files is not backwards compatible. In other words writing timestamps with a
version of Hive that includes HIVE-12192/HIVE-20007 and reading them with
another (not including the previous issues) may lead to different results
depending on the default timezone of the system.
Consider the following scenario where the default system timezone is set to
US/Pacific.
At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
LOCATION '/tmp/hiveexttbl/employee';
INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
SELECT * FROM employee;
{code}
|1|1880-01-01 00:00:00|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|
At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
LOCATION '/tmp/hiveexttbl/employee';
SELECT * FROM employee;
{code}
|1|1879-12-31 23:52:58|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|
The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)