[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12168 ) Change subject: IMPALA-6503: Support reading complex types from ORC format files .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2153/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 15 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 18 Feb 2019 14:48:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Hello Zoltan Borok-Nagy, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12168 to look at the new patch set (#15). Change subject: IMPALA-6503: Support reading complex types from ORC format files .. IMPALA-6503: Support reading complex types from ORC format files We've supported reading primitive types from ORC files (IMPALA-5717). In this patch we add support for complex types (struct/array/map). In IMPALA-5717, we leverage the ORC lib to parse ORC binaries (data in io buffer read from DiskIoMgr). The ORC lib can materialize ORC column binaries into its representation (orc::ColumnVectorBatch). Then we transform values in orc::ColumnVectorBatch into impala::Tuples in hdfs-orc-scanner. We don't need to do anything about decoding/decompression since they are handled by the ORC lib. Fortunately, the ORC lib already supports complex types, we can still leverage it to support complex types. What we need to add in IMPALA-6503 are two things: 1. Specify which nested columns we need in the form required by the ORC lib (Get list of ORC type ids from tuple descriptors) 2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into Impala's representation (Slots/Tuples/RowBatches) To format the materialization, we implement several ORC column readers in hdfs-orc-scanner. Each kind of reader treats a column type and transforms outputs of the ORC lib into tuple/slot values. Tests: * Enable existing tests for complex types (test_nested_types.py, test_tpch_nested_queries.py) for ORC. Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h A be/src/exec/orc-column-readers.cc A be/src/exec/orc-column-readers.h A be/src/exec/orc-metadata-utils.cc A be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java A testdata/ComplexTypesTbl/README A testdata/ComplexTypesTbl/nonnullable.orc A testdata/ComplexTypesTbl/nullable.orc M testdata/bin/create-load-data.sh M testdata/bin/load_nested.py M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-query/queries/QueryTest/max-nesting-depth.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit-single-node.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test M testdata/workloads/tpch_nested/tpch_nested_core.csv M testdata/workloads/tpch_nested/tpch_nested_dimensions.csv M testdata/workloads/tpch_nested/tpch_nested_exhaustive.csv M testdata/workloads/tpch_nested/tpch_nested_pairwise.csv M tests/query_test/test_nested_types.py M tests/query_test/test_tpch_nested_queries.py 29 files changed, 1,864 insertions(+), 461 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12168/15 -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 15 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12267 ) Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2152/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 18 Feb 2019 10:29:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()
Gabor Kaszab has abandoned this change. ( http://gerrit.cloudera.org:8080/12267 ) Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() .. Abandoned Will deliver a feature in one patch instead of splitting it to 2 parts. -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/12267 ) Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() .. Patch Set 7: There was an ongoing discussion on the Jira during this code review and the outcome was that I won't deliver the functionality in 2 separate parts. As a result I'll implement the SQL:2016 related patterns for CAST(FORMAT) and upload them to review in a single patch with this one. I think that this review was successful as there are only few comments mostly related to testing the old pattern. Let me abandon this one not to distract anyone and get back with the patch that contains the SQL pattern support as well. Thanks for the review for all of you! -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 18 Feb 2019 09:57:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()
Hello Greg Rahn, Paul Rogers, Philip Zeyliger, Attila Jeges, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12267 to look at the new patch set (#7). Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() .. IMPALA-4018 Part1: Add FORMAT clause in CAST() This enhancement introduces FORMAT clause for CAST() operator. The FORMAT clause is applicable for casts between string types and timestamp types. The same format pattern is accepted as what e.g. the to_timestamp() function accepts. Usage example: SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-dd-'); Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 --- M be/src/exprs/CMakeLists.txt A be/src/exprs/cast-expr.cc A be/src/exprs/cast-expr.h M be/src/exprs/cast-functions-ir.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M common/thrift/Exprs.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CastExpr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java A testdata/workloads/functional-query/queries/QueryTest/cast_format.test A tests/query_test/test_cast_with_format.py 14 files changed, 392 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12267/7 -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/12267 ) Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() .. Patch Set 7: (8 comments) http://gerrit.cloudera.org:8080/#/c/12267/5/be/src/exprs/cast-expr.h File be/src/exprs/cast-expr.h: http://gerrit.cloudera.org:8080/#/c/12267/5/be/src/exprs/cast-expr.h@43 PS5, Line 43: // This is empty in case of FunctionContext::THREAD_LOCAL. > I see, I forgot about the error path. Done http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java File fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java: http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java@93 PS6, Line 93: return new CastExpr(targetType, this); > Rather than changing calls everywhere, would recommend leaving the two-arg That sounds reasonable. Thanks for spotting. Done. http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@92 PS6, Line 92: public CastExpr(TypeDef targetTypeDef, Expr e) { > I suspect only one of these is called from the parser. That one can take th The implicit cast constructor used internally could also receive a format in case the cast target type is char[]. In that case the input is converted to string as an intermediate step and then to char[] as a final step. Check line #271 in this file. http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@1362 PS6, Line 1362: return new CastExpr(targetType, this); > See prior comment. Done http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java: http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@478 PS6, Line 478: return new CastExpr(targetType, this); > Again. Done http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@157 PS6, Line 157: return new CastExpr(targetType, this); > Again. Done http://gerrit.cloudera.org:8080/#/c/12267/6/testdata/workloads/functional-query/queries/QueryTest/cast_format.test File testdata/workloads/functional-query/queries/QueryTest/cast_format.test: http://gerrit.cloudera.org:8080/#/c/12267/6/testdata/workloads/functional-query/queries/QueryTest/cast_format.test@4 PS6, Line 4: > The tests here are great. Only suggestion is that we have to ensure we test That would be nice, however please note the following: - This is just a pattern handling re-use of existing functionality that already has coverage. - As discussed on the Jira, with CAST(FORMAT) I won't follow the existing pattern matching. Instead I'll implement the ISO SQL:2016 support so testing the current matching here doesn't make sense as I'll rewrite it anyway. Also please note, that this patch won't make it to the repo alone, I have to extend it to use the new datetime patterns and then send them as 1 patch together. http://gerrit.cloudera.org:8080/#/c/12267/5/tests/query_test/test_cast_with_format.py File tests/query_test/test_cast_with_format.py: http://gerrit.cloudera.org:8080/#/c/12267/5/tests/query_test/test_cast_with_format.py@37 PS5, Line 37: def test_cast_format_without_row(self, vector): > I saw this pattern in several tests: Aaaaww, indeed. Thx! -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 18 Feb 2019 09:53:45 + Gerrit-HasComments: Yes