Hello Tim Armstrong, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11000

to look at the new patch set (#4).

Change subject: IMPALA-5542: Impala cannot scan Parquet decimal stored as 
int64_t/int32_t
......................................................................

IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t

The Decimal type in Parquet is a logical type. That means
the Parquet file stores some physical/primitive type that
is annotated by the DECIMAL tag to make it behave like
decimals.

The allowed physical types for decimals are INT32, INT64,
FIXED, and BINARY. Before this commit Impala could only
read decimals stored as FIXED or BINARY.

Spark decided to write decimals as INT32 or INT64 when
their precision allows it:
(1 <= precision <= 9) ==> INT32
(10 <= precision <= 18) ==> INT64

I updated our column readers to accept INT32 and INT64
as valid physical types for decimals.

Testing:
* extended parquet-plain-test.cc
* added Parquet files generated by Spark 2.3.1
  and updated test_scanners.py

Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-plain-test.cc
M testdata/data/README
A testdata/data/decimal_stored_as_int32.parquet
A testdata/data/decimal_stored_as_int64.parquet
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test
M tests/query_test/test_scanners.py
9 files changed, 109 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/11000/4
--
To view, visit http://gerrit.cloudera.org:8080/11000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794
Gerrit-Change-Number: 11000
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to