[ 
https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967464#comment-15967464
 ] 

Rui Li commented on HIVE-16418:
-------------------------------

[~gopalv] - thanks for the review.
My plan is to only allow GMT timezone format, which means '2005-04-03 10:01:00 
Asia/Shanghai' will be converted to '2005-04-03 10:01:00 GMT+08:00' internally. 
Per Jason's 
[comment|https://issues.apache.org/jira/browse/HIVE-14412?focusedCommentId=15527345&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15527345],
 the timezone part shouldn't be used for comparison. Therefore, '2005-04-03 
10:01:00 GMT+08:00' == '2005-04-03 02:01:00 GMT'. And if you run a 
count(distinct) on these two timestamps, the result should be 1.

I agree this may cause some confusion in queries with distinct/goupBy like you 
mentioned. [~jdere], [~xuefuz] could you please share how this should be 
handled according to the SQL standard?

This patch could have been included in HIVE-14412. But I'd like to get some 
early feedbacks and suggestions. The basic idea is to store all the 
non-comparable bytes at the beginning of HiveKey. A boolean is added to HiveKey 
to indicate whether such bytes exist. And these bytes will be skipped 
accordingly in comparison. In serialized format, the boolean will be encoded 
using the MSB of the length part. Does this make sense?

> Allow HiveKey to skip some bytes for comparison
> -----------------------------------------------
>
>                 Key: HIVE-16418
>                 URL: https://issues.apache.org/jira/browse/HIVE-16418
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent 
> them from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to