Hello Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11984

to look at the new patch set (#5).

Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from 
Parquet
......................................................................

IMPALA-7853: Add support to read int64 NANO timestamps from Parquet

PARQUET-1387 added int64 timestamps with nanosecond precision that
stores timestamps as nanoseconds since the Unix epoch.
As 64 bits are not enough to represent the whole 1400..9999 range
of Impala timestamps, this new type works with a limited range:
1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC

The benefit of the reduced range is that no validation is necessary
during scanning, as every possible 64 bit value represents a valid
timestamp in Impala. This may mean that this has the potential be
the fastest way to store timestamps in Impala + Parquet.

Another way NANO differs from MICRO and MILLI is that NANO can
be only described with new logical types in Parquet, it has no
converted type equivalent. This made implementing CREATE TABLE
LIKE PARQUET less trivial than it was for MICRO/MILLI: the type
conversion logic in ParquetHelper.java had to be rewritten to
use LogicalTypeAnnotation instead of ConvertedType.

The changes on Java side also made bumping CDH_BUILD_NUMBER
necessary.

Testing:
- added a new testfile with int64 nano timestamps
- ran core tests

Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
---
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M bin/impala-config.sh
M common/thrift/parquet.thrift
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M testdata/data/README
A testdata/data/int64_timestamps_nano.parquet
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test
M tests/query_test/test_scanners.py
13 files changed, 178 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/11984/5
--
To view, visit http://gerrit.cloudera.org:8080/11984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Gerrit-Change-Number: 11984
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to