[ https://issues.apache.org/jira/browse/IMPALA-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Csaba Ringhofer resolved IMPALA-7568. ------------------------------------- Resolution: Implemented Fix Version/s: Impala 3.2.0 > Implement timezone aware parquet stat filtering for timestamp columns > --------------------------------------------------------------------- > > Key: IMPALA-7568 > URL: https://issues.apache.org/jira/browse/IMPALA-7568 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Csaba Ringhofer > Assignee: Csaba Ringhofer > Priority: Major > Labels: parquet, timestamp > Fix For: Impala 3.2.0 > > > Parquet timestamp columns can contain UTC normalized data, which means that > the data is stored in UTC but it is expected to be shown in local time (to > be consistent with Hive). This is done by converting these timestamp from UTC > to local time during scanning. > This conversion has to be considered during min/max stat filtering, otherwise > some row groups can be incorrectly skipped. For this reason IMPALA-7559 > disables stat filtering on UTC normalized timestamp columns. > This ticket deals with creating a correct implementation to be able re-enable > stat filtering for these columns. > DST and historical rule changes add some complexity to this. UTC->local > mapping can be non-monotonous, and local->UTC mapping can be ambiguous. The > non-monotonous mapping means that if tMin <= t <= tMax is true in UTC does > not imply that the same is true in local time. > The solution I see is to convert min/max of the predicate from local to UTC > and resolve ambiguity by choosing the earlier time in case of min, and the > later time in case of max. These UTC values can be compared with stats safely. > Note the timezone rules can be different in Hive and Impala (especially > historical ones), so we cannot ensure that Impala gives exactly the same > results as Hive. The goal is to ensure that Impala returns the same rows with > and without stat filtering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org