[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14832 Change subject: IMPALA-8184: Add timestamp validation to Orc scanner .. IMPALA-8184: Add timestamp validation to Orc scanner Hive can write timestamps that are outside Impala's valid range (Impala: 1400- Hive: 0001-). This change adds validation logic to Orc reading that replaces out-of-range timestamps with NULLs and adds a warning to the query. The logic is very similar to the existing validation in Parquet. Some differences: - "time of day" is not checked separately as it doesn't make sense with Orc's encoding - instead of column name only column id added to the warning Testing: - added a simple EE test that scans an existing Orc file Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 --- M be/src/exec/orc-column-readers.cc M common/thrift/generate_error_codes.py M testdata/data/README A testdata/data/out_of_range_timestamp.orc A testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test M tests/query_test/test_scanners.py 6 files changed, 42 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14832/1 -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14832 to look at the new patch set (#2). Change subject: IMPALA-8184: Add timestamp validation to Orc scanner .. IMPALA-8184: Add timestamp validation to Orc scanner Hive can write timestamps that are outside Impala's valid range (Impala: 1400- Hive: 0001-). This change adds validation logic to Orc reading that replaces out-of-range timestamps with NULLs and adds a warning to the query. The logic is very similar to the existing validation in Parquet. Some differences: - "time of day" is not checked separately as it doesn't make sense with Orc's encoding - instead of column name only column id is added to the warning Testing: - added a simple EE test that scans an existing Orc file Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 --- M be/src/exec/orc-column-readers.cc M common/thrift/generate_error_codes.py M testdata/data/README A testdata/data/out_of_range_timestamp.orc A testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test M tests/query_test/test_scanners.py 6 files changed, 43 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14832/2 -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to Orc scanner .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5200/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 04 Dec 2019 19:50:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to Orc scanner .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5199/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 04 Dec 2019 19:58:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to Orc scanner .. Patch Set 2: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py@1327 PS2, Line 1327: nit: unnecessary ws http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py@1328 PS2, Line 1328: """ nit: you could place it into the previous line -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 05 Dec 2019 10:53:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to Orc scanner
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to Orc scanner .. Patch Set 2: Code-Review+1 (4 comments) Found some nits, otherwise lgtm. http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py@443 PS2, Line 443: Orc nit: ORC http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README@456 PS2, Line 456: Orc nit: ORC http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test File testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test: http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@6 PS2, Line 6: Orc nit: ORC http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@16 PS2, Line 16: Orc nit: same here -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 05 Dec 2019 10:54:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner
Hello Norbert Luksa, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14832 to look at the new patch set (#3). Change subject: IMPALA-8184: Add timestamp validation to ORC scanner .. IMPALA-8184: Add timestamp validation to ORC scanner Hive can write timestamps that are outside Impala's valid range (Impala: 1400- Hive: 0001-). This change adds validation logic to ORC reading that replaces out-of-range timestamps with NULLs and adds a warning to the query. The logic is very similar to the existing validation in Parquet. Some differences: - "time of day" is not checked separately as it doesn't make sense with ORC's encoding - instead of column name only column id is added to the warning Testing: - added a simple EE test that scans an existing ORC file Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 --- M be/src/exec/orc-column-readers.cc M common/thrift/generate_error_codes.py M testdata/data/README A testdata/data/out_of_range_timestamp.orc A testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test M tests/query_test/test_scanners.py 6 files changed, 42 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14832/3 -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to ORC scanner .. Patch Set 3: Code-Review+1 (6 comments) Carry +1 http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/14832/2/common/thrift/generate_error_codes.py@443 PS2, Line 443: ORC > nit: ORC Done http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/data/README@456 PS2, Line 456: ORC > nit: ORC Done http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test File testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test: http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@6 PS2, Line 6: ORC > nit: ORC Done http://gerrit.cloudera.org:8080/#/c/14832/2/testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test@16 PS2, Line 16: ORC > nit: same here Done http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py@1327 PS2, Line 1327: T > nit: unnecessary ws Done http://gerrit.cloudera.org:8080/#/c/14832/2/tests/query_test/test_scanners.py@1328 PS2, Line 1328: tes > nit: you could place it into the previous line Done -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 05 Dec 2019 13:11:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to ORC scanner .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/5209/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 05 Dec 2019 13:40:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to ORC scanner .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 05 Dec 2019 18:16:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to ORC scanner .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 06 Dec 2019 09:51:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to ORC scanner .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5319/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 06 Dec 2019 09:51:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to ORC scanner .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 06 Dec 2019 14:16:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8184: Add timestamp validation to ORC scanner
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14832 ) Change subject: IMPALA-8184: Add timestamp validation to ORC scanner .. IMPALA-8184: Add timestamp validation to ORC scanner Hive can write timestamps that are outside Impala's valid range (Impala: 1400- Hive: 0001-). This change adds validation logic to ORC reading that replaces out-of-range timestamps with NULLs and adds a warning to the query. The logic is very similar to the existing validation in Parquet. Some differences: - "time of day" is not checked separately as it doesn't make sense with ORC's encoding - instead of column name only column id is added to the warning Testing: - added a simple EE test that scans an existing ORC file Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Reviewed-on: http://gerrit.cloudera.org:8080/14832 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/orc-column-readers.cc M common/thrift/generate_error_codes.py M testdata/data/README A testdata/data/out_of_range_timestamp.orc A testdata/workloads/functional-query/queries/DataErrorsTest/orc-out-of-range-timestamp.test M tests/query_test/test_scanners.py 6 files changed, 42 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/14832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8ee2ba83a54f93d37e8832e064f2c8418b503490 Gerrit-Change-Number: 14832 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Zoltan Borok-Nagy