[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971840#comment-15971840
 ] 

Ashutosh Chauhan commented on HIVE-16418:
-----------------------------------------

We need to think about storage type for Timestamp in different stages of query 
processing:

* On-disk format : Whether to store TZ or not. Primary concern is fidelity of 
original data and secondary concern is storage efficiency.
* In-memory format : On which computations are performed. As I see it, our 
current Timestamp choice here is inappropriate. Issue is java.sql.Timestamp 
(which implicitly assumes local Timezone) doesnt correspond to either sql 
Timestamp (which is essentially zoneless ) or Timestamp with Timezone (which 
has zone, but java.sql.Timestamp doesnt allow you to set). As I suggested 
in-memory representation (i.e. on which all computations are performed) should 
either directly use  LocalTimeZone and ZonedTimeZone or model its behavior on 
it.
* Serialization format: To transfer timestamp between different vertices. Here 
primary concern is performance which comes if TZ is stored separately.

In light of above, I am ok with your proposal of using choice #2, but I think 
you still need to think about in-memory format. Because apart from 
to_utc_timestamp and related udfs implementing new type : Timestamp with Time 
Zone with java.sql.Timestamp will be error-prone.

> Allow HiveKey to skip some bytes for comparison
> -----------------------------------------------
>
>                 Key: HIVE-16418
>                 URL: https://issues.apache.org/jira/browse/HIVE-16418
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to