Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11431 )

Change subject: IMPALA-7559: Disable stat filtering for UTC-normalized 
timestamp columns
......................................................................

IMPALA-7559: Disable stat filtering for UTC-normalized timestamp columns

If convert_legacy_hive_parquet_utc_timestamps=true and the Parquet
file is by parquet-mr (also used by Hive), then timestamps are
converted from UTC to local time during scanning. Stat filtering
did not handle this case correctly and compared UTC min/max values
from stats with local min/max values from predicates. This could
lead to skipping row groups incorrectly.

Note that parquet-mr only writes stats if min and max are equal,
because it cannot order timestamps correctly, so the only case
affected here is when every value is the same in the column chunk.

It would be possible to implement stat filtering correctly, but
this is non-trivial because of DST and historical timezone rule
changes.

Testing:
- added a Hive generated parquet file + custom cluster test
  that could reproduce this issue

Change-Id: Id4c02230993f2390c03d513f08bae2e9d3d538fa
Reviewed-on: http://gerrit.cloudera.org:8080/11431
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/parquet-column-readers.cc
M testdata/data/README
A testdata/data/hive_single_value_timestamp.parq
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
6 files changed, 74 insertions(+), 12 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11431
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id4c02230993f2390c03d513f08bae2e9d3d538fa
Gerrit-Change-Number: 11431
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to