Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14832


Change subject: IMPALA-8184: Add timestamp validation to Orc scanner
......................................................................

IMPALA-8184: Add timestamp validation to Orc scanner

Hive can write timestamps that are outside Impala's valid
range (Impala: 1400-9999 Hive: 0001-9999). This change adds
validation logic to Orc reading that replaces out-of-range
timestamps with NULLs and adds a warning to the query.

The logic is very similar to the existing validation in
Parquet. Some differences:
- "time of day" is not checked separately as it doesn't make
  sense with Orc's encoding
- instead of column name only column id added to the warning

Testing:
- added a simple EE test that scans an existing Orc file

Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
---
M be/src/exec/orc-column-readers.cc
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.orc
A 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test
M tests/query_test/test_scanners.py
6 files changed, 42 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14832/1
--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>

Reply via email to