[ 
https://issues.apache.org/jira/browse/ORC-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838699#comment-15838699
 ] 

Owen O'Malley commented on ORC-135:
-----------------------------------

I'd propose extending the file format like:

{code:title=orc_proto.proto}
message TimestampStatistics {
  // min,max values saved as milliseconds since epoch
  optional sint64 minimum = 1;
  optional sint64 maximum = 2;
  optional sint64 minimumUtc = 3;
  optional sint64 maximumUtc = 4;
}
{code}

where minimumUtc and maximumUtc are defined as <time in utc> - <epoch in utc> 
in milliseconds. We stop setting minimum and maximum and only set minimumUtc 
and maximumUtc. Old readers will not see the new min and max and new readers 
will ignore old values.

> PPD for timestamp is wrong when reader and writer timezones are different
> -------------------------------------------------------------------------
>
>                 Key: ORC-135
>                 URL: https://issues.apache.org/jira/browse/ORC-135
>             Project: Orc
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Critical
>
> When reader and writer timezones are different, PPD evaluation does not 
> offset the timezone when reading the min and max values. This can result is 
> wrong PPD evaluation and hence incorrect results.
> Example:
> Table written in US/Eastern timezone. All values in this table are 
> "2007-08-01 00:00:00.0".
> {code:title=PPD disabled}
> hive> set hive.optimize.index.filter=false;
> hive> select ORDER_DATE from ORDER_FACT_small where ORDER_DATE='2007-08-01 
> 00:00:00.0' limit 1;
> 2007-08-01 00:00:00.0
> OK
> {code}
> {code:title=PPD enabled}
> set hive.optimize.index.filter=true;
> select ORDER_DATE from ORDER_FACT_small where ORDER_DATE='2007-08-01 
> 00:00:00.0' limit 1;
> OK
> {code}
> No rows are returned when PPD is enabled (reader timezone is UTC)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to