Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/17847 )
Change subject: IMPALA-10838: Error when struct returned from WITH() ...................................................................... Patch Set 9: (20 comments) http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG@29 PS9, Line 29: in expression substitutions (in : org.apache.impala.analysis.Expr.substituteImpl), if there is no : substitution mapping for a field of a struct but there is a mapping : for an enclosing struct, find the mapping for that enclosing struct : and from its subexpressions use the one corresponding to the : original expression for substitution > If I'm not mistaken with this approach every time you unsuccessfully try to I think both solutions have upsides and downsides. If we have big structs and we select the majority of the subfields than adding all subfields in the first place would be better; if we don't select most of the subfields then we add them unnecessarily and pollute the substitute map. If we have small structs I think there isn't a big difference. But if you feel it would add value I can experiment with it. http://gerrit.cloudera.org:8080/#/c/17847/9//COMMIT_MSG@44 PS9, Line 44: Tests > Would it make sense to somehow assert on the 'tuple-size' in the scan node? Done, added a planner test. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java@1148 PS9, Line 1148: SlotRef overrides this method : // so we need no check here. > This part in my opinion is only relevant when the reader knew that here was You're right that it may be confusing. Reworded the comment to something (hopefully) less confusing. Actually it is this method, 'substituteImpl()' that SlotRef overrides, not resetAnalysisState(). In the overridden method 'resetAnalysisState()' is not called at all. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Expr.java@1154 PS9, Line 1154: protected final void callSubstituteImplOnChildren(ExprSubstitutionMap smap, > This doesn't add much just extracts the for loop into a different function. It is also called in SlotRef::substituteImpl() (SlotRef overrides substituteImpl()). http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@106 PS9, Line 106: structField > Do we know that 'structField' is in fact a field of a struct? Or do we perf We don't know here. Modified the function comment to include that in case it is not actually a struct field, null is returned. Also modified 'findEnclosingStruct()' which now checks whether 'structField' is indeed within a struct, see there. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@106 PS9, Line 106: SlotRef enclosingStruct = findEnclosingStruct(structField, remainingPath); > If you read just this 4 lines of code you would have no idea what 'remainin Done http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@118 PS9, Line 118: // TODO: Can we somehow know we are in a struct? The following doesn't seem to work: Instead of checking getDesc().getParent().getParentSlotDesc() I added a check that looks at structFieldPath.getMatchedTypes: this list should be longer than one if there is a parent. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@120 PS9, Line 120: // if (getDesc().getParent().getParentSlotDesc() == null) return null; > This stinks for me. In case structField is indeed a field of a struct then This is actually possible, for example in the query ``` with sub as ( select id, outer_struct from functional_orc_def.complextypes_nested_structs) select sub.id, sub.outer_struct.inner_struct2 from sub; ``` when the SlotRef corresponding to 'sub.outer_struct.inner_struct2' its SlotDescriptor is not in the itemTuple of 'outer_struct' but in the tuple descriptor of the inline view 'sub'. It is the substitution that is being performed here that will put 'sub.outer_struct.inner_struct2'. into the itemTuple of outer_struct. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@124 PS9, Line 124: final Expr e = getLhs().get(i); > This 'e' is expected to be a struct, right? I'd do that check here and do a I don't really like continue, if the code grows in the future it may make it difficult to understand. Checking whether 'expr' is a struct is done in 'checkConditionsAndRemovePrefix'. Anyway, I refactored this part a bit, gave more describing names and added comments. Let's see what you think of the new version, we can still change it if you are not satisfied. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@130 PS9, Line 130: // TODO: Is it always true? > In general, could you please put some effort to eliminate the new TODOs you Some actually are. I think this TODO can be removed, though. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@141 PS9, Line 141: checkConditionsAndRemovePrefix > Is there a way to be more specific with the name. "checkConditions" could m Changed function and argument names. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@148 PS9, Line 148: if (refPath == null) return null; > if 'expr'/'ref' is analyzed is there a case when it doesn't have a resolved I've had a look at it and there shouldn't be such a case. Replaced this with a precondition check. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@153 PS9, Line 153: SlotRef root > could you emphasise more that 'root' is a struct? Done http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@154 PS9, Line 154: root > Shouldn't you check if 'root' has already been analysed before calling getR Done http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/Path.java@469 PS9, Line 469: path : * after removing the prefix > The return type is a list of strings so I guess this function returns a 'ra Done http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@182 PS9, Line 182: public void getEnclosingSlotAndTupleDescs(List<SlotDescriptor> slotDescs, > Is this just a helper function used within SlotDescriptor? I haven't found This was inspired by SlotRef::getIdsHelper (https://github.com/apache/impala/blob/3f51a6a761bcb93675088afc155ccc3d2a380401/fe/src/main/java/org/apache/impala/analysis/SlotRef.java#L381), which is public, but you're right that we don't use it anywhere else so I agree that it could be private. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@285 PS9, Line 285: public boolean hasDesc() { > nit: should fit into a single line Done http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@409 PS9, Line 409: // TODO: Is it possible that we don't YET have itemTupleDesc? > Could you dig deeper a bit to eliminate this TODO? It definitely happens so we can't assert it here. I've looked a bit into it and I think it happens with the expressions that have not been substituted. http://gerrit.cloudera.org:8080/#/c/17847/9/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@410 PS9, Line 410: // Preconditions.checkState(itemTupleDesc != null); > nit: commented code Done http://gerrit.cloudera.org:8080/#/c/17847/9/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test File testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test: http://gerrit.cloudera.org:8080/#/c/17847/9/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@260 PS9, Line 260: ==== > I wonder if providing an inline view in a different way would also trigger No, the following query runs fine and returns the correct result: ``` SELECT id, outer_struct.str FROM (SELECT id,outer_struct FROM functional_orc_def.complextypes_nested_structs) v WHERE outer_struct.inner_struct2.i > 2; ``` Does it mean that this AnalysisException message is incorrect, that the problem is not complex types in subqueries but missing support for IN? Or is a inline view not really a subquery? -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 9 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Thu, 20 Jan 2022 18:48:55 +0000 Gerrit-HasComments: Yes