[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner

2019-12-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14832


Change subject: IMPALA-8184: Add timestamp validation to Orc scanner
..

IMPALA-8184: Add timestamp validation to Orc scanner

Hive can write timestamps that are outside Impala's valid
range (Impala: 1400- Hive: 0001-). This change adds
validation logic to Orc reading that replaces out-of-range
timestamps with NULLs and adds a warning to the query.

The logic is very similar to the existing validation in
Parquet. Some differences:
- "time of day" is not checked separately as it doesn't make
  sense with Orc's encoding
- instead of column name only column id added to the warning

Testing:
- added a simple EE test that scans an existing Orc file

Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
---
M be/src/exec/orc-column-readers.cc
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.orc
A 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test
M tests/query_test/test_scanners.py
6 files changed, 42 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14832/1
--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner

2019-12-04 Thread Csaba Ringhofer (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14832

to look at the new patch set (#2).

Change subject: IMPALA-8184: Add timestamp validation to Orc scanner
..

IMPALA-8184: Add timestamp validation to Orc scanner

Hive can write timestamps that are outside Impala's valid
range (Impala: 1400- Hive: 0001-). This change adds
validation logic to Orc reading that replaces out-of-range
timestamps with NULLs and adds a warning to the query.

The logic is very similar to the existing validation in
Parquet. Some differences:
- "time of day" is not checked separately as it doesn't make
  sense with Orc's encoding
- instead of column name only column id is added to the warning

Testing:
- added a simple EE test that scans an existing Orc file

Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
---
M be/src/exec/orc-column-readers.cc
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.orc
A 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test
M tests/query_test/test_scanners.py
6 files changed, 43 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14832/2
--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner

2019-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to Orc scanner
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5200/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 04 Dec 2019 19:50:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner

2019-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to Orc scanner
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5199/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 04 Dec 2019 19:58:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner

2019-12-05 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to Orc scanner
..


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py@1327
PS2, Line 1327:
nit: unnecessary ws


http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py@1328
PS2, Line 1328: """
nit: you could place it into the previous line



--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 05 Dec 2019 10:53:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner

2019-12-05 Thread Norbert Luksa (Code Review)
Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to Orc scanner
..


Patch Set 2: Code-Review+1

(4 comments)

Found some nits, otherwise lgtm.

http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py@443
PS2, Line 443: Orc
nit: ORC


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README@456
PS2, Line 456: Orc
nit: ORC


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test
File 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test:

http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@6
PS2, Line 6: Orc
nit: ORC


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@16
PS2, Line 16: Orc
nit: same here



--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 05 Dec 2019 10:54:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner

2019-12-05 Thread Csaba Ringhofer (Code Review)
Hello Norbert Luksa, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14832

to look at the new patch set (#3).

Change subject: IMPALA-8184: Add timestamp validation to ORC scanner
..

IMPALA-8184: Add timestamp validation to ORC scanner

Hive can write timestamps that are outside Impala's valid
range (Impala: 1400- Hive: 0001-). This change adds
validation logic to ORC reading that replaces out-of-range
timestamps with NULLs and adds a warning to the query.

The logic is very similar to the existing validation in
Parquet. Some differences:
- "time of day" is not checked separately as it doesn't make
  sense with ORC's encoding
- instead of column name only column id is added to the warning

Testing:
- added a simple EE test that scans an existing ORC file

Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
---
M be/src/exec/orc-column-readers.cc
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.orc
A 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test
M tests/query_test/test_scanners.py
6 files changed, 42 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14832/3
--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner

2019-12-05 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to ORC scanner
..


Patch Set 3: Code-Review+1

(6 comments)

Carry +1

http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py@443
PS2, Line 443: ORC
> nit: ORC
Done


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README@456
PS2, Line 456: ORC
> nit: ORC
Done


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test
File 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test:

http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@6
PS2, Line 6: ORC
> nit: ORC
Done


http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@16
PS2, Line 16: ORC
> nit: same here
Done


http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py@1327
PS2, Line 1327: T
> nit: unnecessary ws
Done


http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py@1328
PS2, Line 1328: tes
> nit: you could place it into the previous line
Done



--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 05 Dec 2019 13:11:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner

2019-12-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to ORC scanner
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5209/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 05 Dec 2019 13:40:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner

2019-12-05 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to ORC scanner
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 05 Dec 2019 18:16:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner

2019-12-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to ORC scanner
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 06 Dec 2019 09:51:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner

2019-12-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to ORC scanner
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5319/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 06 Dec 2019 09:51:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner

2019-12-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to ORC scanner
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 06 Dec 2019 14:16:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner

2019-12-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14832 )

Change subject: IMPALA-8184: Add timestamp validation to ORC scanner
..

IMPALA-8184: Add timestamp validation to ORC scanner

Hive can write timestamps that are outside Impala's valid
range (Impala: 1400- Hive: 0001-). This change adds
validation logic to ORC reading that replaces out-of-range
timestamps with NULLs and adds a warning to the query.

The logic is very similar to the existing validation in
Parquet. Some differences:
- "time of day" is not checked separately as it doesn't make
  sense with ORC's encoding
- instead of column name only column id is added to the warning

Testing:
- added a simple EE test that scans an existing ORC file

Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Reviewed-on: http://gerrit.cloudera.org:8080/14832
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/orc-column-readers.cc
M common/thrift/generate_error_codes.py
M testdata/data/README
A testdata/data/out_of_range_timestamp.orc
A 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test
M tests/query_test/test_scanners.py
6 files changed, 42 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/14832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490
Gerrit-Change-Number: 14832
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy