Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17678

to look at the new patch set (#5).

Change subject: IMPALA-7087, IMPALA-8131: Read decimals from Parquet files with 
different precision/scale
......................................................................

IMPALA-7087, IMPALA-8131: Read decimals from Parquet files with different 
precision/scale

IMPALA-7087 is about reading Parquet decimal columns with lower
precision/scale than table metadata.
IMPALA-8131 is about reading Parquet decimal columns with higher
scale than table metadata.

Both are resolved by this patch. It reuses some parts from an
earlier change request from Sahil Takiar:
https://gerrit.cloudera.org/#/c/12163/

A new utility class has been introduced, ParquetDataConverter which does
the data conversion. It also helps to decide whether data conversion
is needed or not.

NULL values are returned in case of overflows. This behavior is
consistent with Hive.

Parquet column stats reader is also updated to convert the decimal
values. The stats reader is used to evaluate min/max conjuncts. It
works well because later we also evaluate the conjuncts on the
converted values anyway.

The status of different filterings:
 * dictionary filtering: disabled for columns that need conversion
 * runtime bloom filters: work on the converted values
 * runtime min/max filters: work on the converted values

It's OK for the filters to work on the converted values because we'd
use those values later anyway.

This patch also enables schema evolution of decimal columns of Iceberg
tables.

Testing:
 * added e2e tests

Change-Id: Icefa7e545ca9f7df1741a2d1225375ecf54434da
---
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
A be/src/exec/parquet/parquet-data-converter.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M testdata/data/README
A testdata/data/binary_decimal_precision_and_scale_widening.parquet
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test
M tests/query_test/test_scanners.py
16 files changed, 717 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17678/5
--
To view, visit http://gerrit.cloudera.org:8080/17678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icefa7e545ca9f7df1741a2d1225375ecf54434da
Gerrit-Change-Number: 17678
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to