Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
......................................................................


Patch Set 9:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG@29
PS9, Line 29: in expression substitutions (in
            :     org.apache.impala.analysis.Expr.substituteImpl), if there is 
no
            :     substitution mapping for a field of a struct but there is a 
mapping
            :     for an enclosing struct, find the mapping for that enclosing 
struct
            :     and from its subexpressions use the one corresponding to the
            :     original expression for substitution
> If I'm not mistaken with this approach every time you unsuccessfully try to
I think both solutions have upsides and downsides. If we have big structs and 
we select the majority of the subfields than adding all subfields in the first 
place would be better; if we don't select most of the subfields then we add 
them unnecessarily and pollute the substitute map. If we have small structs I 
think there isn't a big difference.

But if you feel it would add value I can experiment with it.


http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG@44
PS9, Line 44: Tests
> Would it make sense to somehow assert on the 'tuple-size' in the scan node?
Done, added a planner test.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java@1148
PS9, Line 1148: SlotRef overrides this method
              :     // so we need no check here.
> This part in my opinion is only relevant when the reader knew that here was
You're right that it may be confusing. Reworded the comment to something 
(hopefully) less confusing.

Actually it is this method, 'substituteImpl()' that SlotRef overrides, not 
resetAnalysisState(). In the overridden method 'resetAnalysisState()' is not 
called at all.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java@1154
PS9, Line 1154:   protected final void 
callSubstituteImplOnChildren(ExprSubstitutionMap smap,
> This doesn't add much just extracts the for loop into a different function.
It is also called in SlotRef::substituteImpl() (SlotRef overrides 
substituteImpl()).


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java:

http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@106
PS9, Line 106: structField
> Do we know that 'structField' is in fact a field of a struct? Or do we perf
We don't know here. Modified the function comment to include that in case it is 
not actually a struct field, null is returned.

Also modified 'findEnclosingStruct()' which now checks whether  'structField' 
is indeed within a struct, see there.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@106
PS9, Line 106:     SlotRef enclosingStruct = findEnclosingStruct(structField, 
remainingPath);
> If you read just this 4 lines of code you would have no idea what 'remainin
Done


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@118
PS9, Line 118:     // TODO: Can we somehow know we are in a struct? The 
following doesn't seem to work:
Instead of checking getDesc().getParent().getParentSlotDesc() I added a check 
that looks at structFieldPath.getMatchedTypes: this list should be longer than 
one if there is a parent.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@120
PS9, Line 120:     // if (getDesc().getParent().getParentSlotDesc() == null) 
return null;
> This stinks for me. In case structField is indeed a field of a struct then
This is actually possible, for example in the query
```
with sub as (                                                                   
                                                                                
                     
    select id, outer_struct from 
functional_orc_def.complextypes_nested_structs)                                 
                                                                                
                                        
select sub.id, sub.outer_struct.inner_struct2 from sub;
```

when the SlotRef corresponding to 'sub.outer_struct.inner_struct2' its 
SlotDescriptor is not in the itemTuple of 'outer_struct' but in the tuple 
descriptor of the inline view 'sub'. It is the substitution that is being 
performed here that will put 'sub.outer_struct.inner_struct2'. into the 
itemTuple of outer_struct.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@124
PS9, Line 124:       final Expr e = getLhs().get(i);
> This 'e' is expected to be a struct, right? I'd do that check here and do a
I don't really like continue, if the code grows in the future it may make it 
difficult to understand. Checking whether 'expr' is a struct is done in 
'checkConditionsAndRemovePrefix'.

Anyway, I refactored this part a bit, gave more describing names and added 
comments. Let's see what you think of the new version, we can still change it 
if you are not satisfied.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@130
PS9, Line 130:         // TODO: Is it always true?
> In general, could you please put some effort to eliminate the new TODOs you
Some actually are. I think this TODO can be removed, though.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@141
PS9, Line 141: checkConditionsAndRemovePrefix
> Is there a way to be more specific with the name. "checkConditions" could m
Changed function and argument names.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@148
PS9, Line 148:     if (refPath == null) return null;
> if 'expr'/'ref' is analyzed is there a case when it doesn't have a resolved
I've had a look at it and there shouldn't be such a case. Replaced this with a 
precondition check.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@153
PS9, Line 153: SlotRef root
> could you emphasise more that 'root' is a struct?
Done


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@154
PS9, Line 154: root
> Shouldn't you check if 'root' has already been analysed before calling getR
Done


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Path.java@469
PS9, Line 469: path
             :    * after removing the prefix
> The return type is a list of strings so I guess this function returns a 'ra
Done


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@182
PS9, Line 182:   public void getEnclosingSlotAndTupleDescs(List<SlotDescriptor> 
slotDescs,
> Is this just a helper function used within SlotDescriptor? I haven't found
This was inspired by SlotRef::getIdsHelper 
(https://github.com/apache/impala/blob/3f51a6a761bcb93675088afc155ccc3d2a380401/fe/src/main/java/org/apache/impala/analysis/SlotRef.java#L381),
 which is public, but you're right that we don't use it anywhere else so I 
agree that it could be private.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@285
PS9, Line 285:   public boolean hasDesc() {
> nit: should fit into a single line
Done


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@409
PS9, Line 409:       // TODO: Is it possible that we don't YET have 
itemTupleDesc?
> Could you dig deeper a bit to eliminate this TODO?
It definitely happens so we can't assert it here. I've looked a bit into it and 
I think it happens with the expressions that have not been substituted.


http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@410
PS9, Line 410:       // Preconditions.checkState(itemTupleDesc != null);
> nit: commented code
Done


http://gerrit.cloudera.org:8080/#/c/17847/9/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/17847/9/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@260
PS9, Line 260: ====
> I wonder if providing an inline view in a different way would also trigger
No, the following query runs fine and returns the correct result:
```
SELECT id, outer_struct.str FROM (SELECT id,outer_struct FROM 
functional_orc_def.complextypes_nested_structs) v
WHERE outer_struct.inner_struct2.i > 2;
```

Does it mean that this AnalysisException message is incorrect, that the problem 
is not complex types in subqueries but missing support for IN? Or is a inline 
view not really a subquery?



--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Jan 2022 18:48:55 +0000
Gerrit-HasComments: Yes

Reply via email to