Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17983 )

Change subject: IMPALA-10920: Zipping unnest for arrays
......................................................................


Patch Set 5:

(43 comments)

http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h
File be/src/exec/unnest-node.h:

http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@40
PS2, Line 40: SlotRefs
> If they are always SlotRefs, can't the type of the vector be std::vector<Sl
This vector is passed to ScalarExpr::Create() that populates it, but that 
Create() function receives a vector of ScalarExpr pointers so have to keep them 
as they are now.


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@42
PS2, Line 42: 'UnnestNode::coll_values_', but instead manually retrieve the 
slot values to support
            :   /// projection (see cla
> I think it should be mentioned that 'coll_values_' and the class comment ar
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@46
PS2, Line 46:  UnnestPlanNode.
> 'coll_expr_evals_' has been removed.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@46
PS2, Line 46: t in
            :   /// InitCollExpr().
> It seems no longer to be true that they are set in Prepare() and set to NUL
Right, these are set to null in a different class not here.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@50
PS2, Line 50: Set in InitCollE
> It seems no longer to be the case.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@109
PS2, Line 109: .
> Nit: belong.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@116
PS2, Line 116: doesn
> Nit: doesn't.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@126
PS2, Line 126: These slots are always set to NULL in
             :   /// Open() as a simple projection.
> Does not seem to be true anymore.
In fact it's partially true. These still are projected to NULL but not set in 
Prepare.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@131
PS2, Line 131:
> Does not seem to be true.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.h@144
PS2, Line 144: length
> typo
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.cc
File be/src/exec/unnest-node.cc:

http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.cc@60
PS2, Line 60: DCHECK(coll_expr->IsSlotRef(
> This DCHECK is superfluous because slot_ref==null iff coll_expr==nullptr, a
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.cc@90
PS2, Line 90:     num_collections_counter_(nullptr) {
> nit: usually the constant is the second argument.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.cc@143
PS2, Line 143:       if (num_tuples > 0) {
> nit: here and at line 146: using std::max would be easier to read IMO
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.cc@174
PS2, Line 174:
> Nit: should be collections.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/be/src/exec/unnest-node.cc@218
PS2, Line 218:       (*coll_slot_descs_)[coll_idx]->children_tuple_de
> Can't we use nullptr everywhere? My understanding is that nullptr as tuple
As I remember I had to create the tuple in the row batch and set it to null 
explicitly otherwise there is a point somewhere later where Impala crashes.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/cup/sql-parser.cup@315
PS2, Line 315: KW_UNNEST
> We should add to the comment that we have added a new keyword, making this
Mentioned it in the commit message.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/cup/sql-parser.cup@3030
PS2, Line 3030: Providing multiple UNNEST() in the FROM clause is not supported.
> It this something that we want to support later?
It depends on what semantics you want to achieve with multiple unnests. In case 
you want to do zipping unnest between the arrays in the same unnest but do 
joining unnest with the ones on different unnest then this might be feasible, 
but a bit hard to understand.
And anyway zipping and joining unnest is not allowed together atm.
I'm not really convinced we want to support this later as I see no added value.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/cup/sql-parser.cup@3091
PS2, Line 3091:
> What if the user provides more aliases than paths?
Indeed, thx :) Added some tests as well to cover this.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@574
PS2, Line 574: tableRefsFromUnnestExpr
> Nit: Should end in an underscore (_).
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@706
PS2, Line 706:     return aliasMap_.containsKey(uniqueAlias);
             :   }
> Nit: 'return aliasMap_.containsKey(uniqueAlias)' would be simpler.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@966
PS2, Line 966: Expr;
> Isn't this a bug at this point if the type is not a SlotRef? Shouldn't we a
I found a use case where collExpr is not a SlotRef. I used to have a 
Precondition here but that broke. Frankly, I don't remember what the issue was 
exactly, but for instance this query would break the Precondition:

select arr1.item
from unnest(functional_parquet.complextypes_arrays.arr1)
where arr1.item < 2;


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@389
PS2, Line 389: typed columns because "
> Does this make it necessary to re-analyse 'fromClause_'?
Yes, a re-analysis is required. In PS3 I did a change that forces re-analysis 
in this case to fix a known bug.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java
File fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java:

http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java@31
PS2, Line 31:  represent
> Could we have a class comment saying that this is for zipping unnests in th
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java@36
PS2, Line 36:
> Don't we have to call removeItemFromPath() here? What if 'path' already con
The path has to point to an array and not to it's items. I path contains an 
"item" already that means that the underlying data structure is an 
array<array<something>> so it's fin to add another "item" at the end so that 
the path will point to the item of the inner array.

Added a test to cover an error when "unnest(arr1.item)" and arr1.item not an 
array.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java@42
PS2, Line 42:
> Can we actually unnest non-top-level arrays? Even if not, I think this may
Agreed. Meanwhile I removed the support for arr.item where it is not an array 
itself. I think it happened in PS3.
About nested arrays: It is not supported atm to unnest a nested array as 
arr.item. I'm not sure we want to do that, or should we open a jira to 
implement later.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java@68
PS2, Line 68:     // 'rawPath_' points to an array and we need a SlotRef to 
refer to the item of the
> Have we ensured that 'rawPath_' doesn't already contain "item"? See comment
In resolveAndVerifyRawPath() we check that the given path points to an array. 
This is enough to check in my opinion.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java@142
PS2, Line 142:     Preconditions.checkNotNull(rawPath_);
> Why don't we just remove the last element from the list? Do we want to keep
It's fine to remove the item. I'm not sure why I've done it this way.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java@146
PS2, Line 146:   }
             :
             :   @Override
             :   p
> Can't we just inherit this method without writing it out?
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/analysis/UnnestExpr.java@154
PS2, Line 154:  }
             : }
             :
             :
> Can't we just inherit this method without writing it out?
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@164
PS2, Line 164:     for (CollectionTableRef collRef : tblRefs) {
             :       tupleIds_.add(collRef.getDesc().getId());
             :       tblRefIds_.add(collRef.getDesc().getId());
             :     }
> Couldn't we to this in the UnnestNode's constructor instead and avoid addin
The reason I do it here is that I have to pass a List of CollectionTableRef to 
UnnestNode, but then I won't be able to extract any tuple IDs from that list 
before calling the PlanNode's constructor as the "super(...)" call has to be 
the first instruction in a constructor.
I could pass all these IDs to the UnnestNode constructor in a separate 
parameter but that seemed redundant for me as they are already present in the 
CollectionTableRefs.


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1047
PS2, Line 1047: the
> nit: extra "the"
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1060
PS2, Line 1060:     refs.removeAll(reducedCollectionRefs);
> Dumb question: is it sure that 'refs' doesn't contain duplicates? Can it be
You can't provide the same collection multiple times in the unnest so here you 
won't end up having duplicates.

select arr1.item from functional_parquet.arr_tbl t, UNNEST(t.arr1, t.arr1);
AnalysisException: Duplicate table alias: 'arr1'

When giving in select list, you can give the same array multiple times but it 
doesn't cause any trouble because only a single CollectionTableRef is created 
for them in the background:
select unnest(arr1), unnest(arr1) from functional_parquet.arr_tbl;
+------+------+
| item | item |
+------+------+
| 1    | 1    |
| 2    | 2    |
+------+------+


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@2175
PS2, Line 2175: ning of the gi
> Could you document what 'collectionRefs' is used for?
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/planner/UnnestNode.java
File fe/src/main/java/org/apache/impala/planner/UnnestNode.java:

http://gerrit.cloudera.org:8080/#/c/17983/2/fe/src/main/java/org/apache/impala/planner/UnnestNode.java@74
PS2, Line 74:     super.init(analyzer);
            :     conjuncts_ = orderConjunctsByCost(conjuncts_);
            :
> This doesn't seem true anymore, right?
I guess it isn't :) Comment removed.


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/data/README@662
PS2, Line 662: bles
> nit: double "to"
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/data/README@662
PS2, Line 662: The purpose of introducing these tables
> Nit: "The purpose OF introducing THESE tableS IS to ..."
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test
File 
testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test:

http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test@91
PS2, Line 91: the first one has some null
> The string array doesn't have NULL items in this test. Is it ok?
The comment is misleading. Only arr1 has nulls. Fixed the comment.


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test@183
PS2, Line 183: ====
> Would it add value if we also tested the combination (NULL, empty)?
>From the UnnestNode's perspective having a NULL or empty array as an input is 
>the same. I think it is fine not to cover that.


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test@222
PS2, Line 222: doesn't wo
> Nit: doesn't work
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test@311
PS2, Line 311: :
> Nit: a colon (:) would be better in the error message.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-select-list.test
File 
testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-select-list.test@55
PS2, Line 55:
> Nit: q
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-select-list.test@76
PS2, Line 76: omp
> Nit: delete 'an'.
Done


http://gerrit.cloudera.org:8080/#/c/17983/2/testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-select-list.test@138
PS2, Line 138: ---- QUERY
             : # Struct type is given for the unnest operator.
> The difference from the previous test is only the array in the FROM clause.
I wanted to differentiate between these two use cases:
1: unnesting the same array both in the select list and in the from clause
2: have an unnest both in the select list and from clause but not for the same 
arrays.



--
To view, visit http://gerrit.cloudera.org:8080/17983
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic58ff6579ecff03962e7a8698edfbe0684ce6cf7
Gerrit-Change-Number: 17983
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Nov 2021 14:48:12 +0000
Gerrit-HasComments: Yes

Reply via email to