Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-4363: Add Parquet timestamp validation
......................................................................

IMPALA-4363: Add Parquet timestamp validation

Before this patch, we would simply read the INT96 Parquet timestamp
representation and assume that it's valid. However, not all bit
permutations represent a valid timestamp. One of the boost functions
raised an exception (that we did't catch) when passed an invalid
boost date object, which resulted in a crash. This patch fixes
problem by validating that the timestamp falls into 1400..9999 date
range as we are scanning Parquet.

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/data/out_of_range_timestamp.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
A 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M tests/query_test/test_scanners.py
9 files changed, 133 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>

Reply via email to