[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC format files
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2153/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 15
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 18 Feb 2019 14:48:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-02-18 Thread Quanlong Huang (Code Review)
Hello Zoltan Borok-Nagy, Tim Armstrong, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12168

to look at the new patch set (#15).

Change subject: IMPALA-6503: Support reading complex types from ORC format files
..

IMPALA-6503: Support reading complex types from ORC format files

We've supported reading primitive types from ORC files (IMPALA-5717).
In this patch we add support for complex types (struct/array/map).

In IMPALA-5717, we leverage the ORC lib to parse ORC binaries (data in
io buffer read from DiskIoMgr). The ORC lib can materialize ORC column
binaries into its representation (orc::ColumnVectorBatch). Then we
transform values in orc::ColumnVectorBatch into impala::Tuples in
hdfs-orc-scanner. We don't need to do anything about decoding/decompression
since they are handled by the ORC lib. Fortunately, the ORC lib already
supports complex types, we can still leverage it to support complex types.

What we need to add in IMPALA-6503 are two things:
1. Specify which nested columns we need in the form required by the ORC
  lib (Get list of ORC type ids from tuple descriptors)
2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into
  Impala's representation (Slots/Tuples/RowBatches)

To format the materialization, we implement several ORC column readers
in hdfs-orc-scanner. Each kind of reader treats a column type and
transforms outputs of the ORC lib into tuple/slot values.

Tests:
* Enable existing tests for complex types (test_nested_types.py,
test_tpch_nested_queries.py) for ORC.

Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
A be/src/exec/orc-column-readers.cc
A be/src/exec/orc-column-readers.h
A be/src/exec/orc-metadata-utils.cc
A be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
A testdata/ComplexTypesTbl/README
A testdata/ComplexTypesTbl/nonnullable.orc
A testdata/ComplexTypesTbl/nullable.orc
M testdata/bin/create-load-data.sh
M testdata/bin/load_nested.py
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M testdata/workloads/functional-query/queries/QueryTest/max-nesting-depth.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit-single-node.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test
M testdata/workloads/tpch_nested/tpch_nested_core.csv
M testdata/workloads/tpch_nested/tpch_nested_dimensions.csv
M testdata/workloads/tpch_nested/tpch_nested_exhaustive.csv
M testdata/workloads/tpch_nested/tpch_nested_pairwise.csv
M tests/query_test/test_nested_types.py
M tests/query_test/test_tpch_nested_queries.py
29 files changed, 1,864 insertions(+), 461 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12168/15
--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 15
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()

2019-02-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12267 )

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2152/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 18 Feb 2019 10:29:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()

2019-02-18 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has abandoned this change. ( http://gerrit.cloudera.org:8080/12267 
)

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
..


Abandoned

Will deliver a feature in one patch instead of splitting it to 2 parts.
--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()

2019-02-18 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12267 )

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
..


Patch Set 7:

There was an ongoing discussion on the Jira during this code review and the 
outcome was that I won't deliver the functionality in 2 separate parts. As a 
result I'll implement the SQL:2016 related patterns for CAST(FORMAT) and upload 
them to review in a single patch with this one.

I think that this review was successful as there are only few comments mostly 
related to testing the old pattern. Let me abandon this one not to distract 
anyone and get back with the patch that contains the SQL pattern support as 
well.

Thanks for the review for all of you!


--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 18 Feb 2019 09:57:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()

2019-02-18 Thread Gabor Kaszab (Code Review)
Hello Greg Rahn, Paul Rogers, Philip Zeyliger, Attila Jeges, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12267

to look at the new patch set (#7).

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
..

IMPALA-4018 Part1: Add FORMAT clause in CAST()

This enhancement introduces FORMAT clause for CAST()
operator. The FORMAT clause is applicable for casts between
string types and timestamp types. The same format pattern is
accepted as what e.g. the to_timestamp() function accepts.

Usage example:
SELECT CAST('01-02-2019' AS TIMESTAMP FORMAT 'MM-dd-');

Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
---
M be/src/exprs/CMakeLists.txt
A be/src/exprs/cast-expr.cc
A be/src/exprs/cast-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M common/thrift/Exprs.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A testdata/workloads/functional-query/queries/QueryTest/cast_format.test
A tests/query_test/test_cast_with_format.py
14 files changed, 392 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12267/7
--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()

2019-02-18 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12267 )

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
..


Patch Set 7:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/12267/5/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/12267/5/be/src/exprs/cast-expr.h@43
PS5, Line 43:   // This is empty in case of FunctionContext::THREAD_LOCAL.
> I see, I forgot about the error path.
Done


http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
File fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java:

http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java@93
PS6, Line 93:   return new CastExpr(targetType, this);
> Rather than changing calls everywhere, would recommend leaving the two-arg
That sounds reasonable. Thanks for spotting. Done.


http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@92
PS6, Line 92:   public CastExpr(TypeDef targetTypeDef, Expr e) {
> I suspect only one of these is called from the parser. That one can take th
The implicit cast constructor used internally could also receive a format in 
case the cast target type is char[]. In that case the input is converted to 
string as an intermediate step and then to char[] as a final step. Check line 
#271 in this file.


http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/Expr.java@1362
PS6, Line 1362: return new CastExpr(targetType, this);
> See prior comment.
Done


http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java:

http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@478
PS6, Line 478:   return new CastExpr(targetType, this);
> Again.
Done


http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java:

http://gerrit.cloudera.org:8080/#/c/12267/6/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@157
PS6, Line 157:   return new CastExpr(targetType, this);
> Again.
Done


http://gerrit.cloudera.org:8080/#/c/12267/6/testdata/workloads/functional-query/queries/QueryTest/cast_format.test
File testdata/workloads/functional-query/queries/QueryTest/cast_format.test:

http://gerrit.cloudera.org:8080/#/c/12267/6/testdata/workloads/functional-query/queries/QueryTest/cast_format.test@4
PS6, Line 4: 
> The tests here are great. Only suggestion is that we have to ensure we test
That would be nice, however please note the following:
- This is just a pattern handling re-use of existing functionality that already 
has coverage.
- As discussed on the Jira, with CAST(FORMAT) I won't follow the existing 
pattern matching. Instead I'll implement the ISO SQL:2016 support so testing 
the current matching here doesn't make sense as I'll rewrite it anyway.

Also please note, that this patch won't make it to the repo alone, I have to 
extend it to use the new datetime patterns and then send them as 1 patch 
together.


http://gerrit.cloudera.org:8080/#/c/12267/5/tests/query_test/test_cast_with_format.py
File tests/query_test/test_cast_with_format.py:

http://gerrit.cloudera.org:8080/#/c/12267/5/tests/query_test/test_cast_with_format.py@37
PS5, Line 37:   def test_cast_format_without_row(self, vector):
> I saw this pattern in several tests:
Aaaaww, indeed. Thx!



--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 18 Feb 2019 09:53:45 +
Gerrit-HasComments: Yes