[ 
https://issues.apache.org/jira/browse/HIVE-20980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742233#comment-16742233
 ] 

Zoltan Ivanfi commented on HIVE-20980:
--------------------------------------

[~jcamachorodriguez] The addition of session-local time zones was orthogonal to 
the semantics change and it seemed to make sense to restore the timezone-aware 
semantics based on the session-local time zone rather than the server time 
zone. That being said, I do not have a strong preference towards either one, so 
if you prefer one over the other, we are fine with your choice.

There is an isAdjustedToUTC parameter in parquet-format indeed, which will be 
made available in the upcoming parquet-mr 1.11.0 release. It is also one of the 
reasons why I would prefer the TIMESTAMP and TIMESTAMP WITHOUT TIME ZONE types 
to behave differently for Parquet. The isAdjustedToUTC annotates int64 
timestamps, while previously we used int96 timestamps. Writing int64 timestamps 
is a breaking change in itself, so it should only be done at the user's 
explicit request. However, a configuration switch would not suffice for this 
purpose, because the necessity of writing backwards-compatible int96 timestamp 
for any single table would prevent every other table from using the new int64 
timestamps as well.

At the same time, introducing new semantics for timestamps breaks the existing 
rule that an int96 written by Impala is LocalDateTime but an int96 written by 
Hive or Spark is Instant. To prevent further confusion, the new semantics 
should never be written into int96 timestamps, only int64 ones, because the 
latter allow saving semantics metadata in the isAdjustedToUTC type parameter.

Handling the old TIMESTAMP type behave in the legacy way and writing only int64 
timestamps with new TIMESTAMP WITH LOCAL TIME ZONE type resolves these two 
problems in a nice way. (Please see [this 
appendix|https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.gonr2yqv3e77]
 of the proposal for details.) It is true that TIMESTAMP will behave 
differently between different file formats again, but that inconsisteny has 
historically been a part of Hive and fixing that would be a breaking change.

> Reinstate Parquet timestamp conversion between HS2 time zone and UTC
> --------------------------------------------------------------------
>
>                 Key: HIVE-20980
>                 URL: https://issues.apache.org/jira/browse/HIVE-20980
>             Project: Hive
>          Issue Type: Sub-task
>          Components: File Formats
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>         Attachments: HIVE-20980.1.patch, HIVE-20980.2.patch, 
> HIVE-20980.2.patch
>
>
> With HIVE-20007, Parquet timestamps became timezone-agnostic. This means that 
> timestamps written after the change are read exactly as they were written; 
> but timestamps stored before this change are effectively converted from the 
> writing HS2 server time zone to GMT time zone. This patch reinstates the 
> original behavior: timestamps are converted to UTC before write and from UTC 
> before read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to