Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17847 )
Change subject: IMPALA-10838: Error when struct returned from WITH() ...................................................................... Patch Set 18: (20 comments) Nice patch! :) http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1415 PS17, Line 1415: List<Type> ancestorTypes = matchedTypes.subList(0, matchedTypes.size() - 1); No need to create a sublist here, just make a for loop that skips the last item. This way you can get rid of both the local Lists introduced here. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1438 PS17, Line 1438: * However, if the order is reversed, we are in a more difficult situation. If we try to Can't you get rid og this hassle if you changed the processing order of the expressions? E.g. always process structs first. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1462 PS17, Line 1462: int highestAncestorDistance) throws AnalysisException { Could you please describe the purpose of the function parameters in the comment above? http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1525 PS17, Line 1525: structTuple.setParentSlotDesc(desc); nit: this could go to the top of this function so that a user could more easily see that we accept structs as parameter here. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java@154 PS17, Line 154: // If we are inside a struct, we need to materialise all enclosing struct nit: I think this comment would be better at the callsite. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java: http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@22 PS17, Line 22: import java.util.stream.Collectors; This is not needed anymore, right? http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java: http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@336 PS17, Line 336: // createAuxPredicatesForStructMembers(analyzer, (SlotRef) colExpr); Honestly, I don't know much about aux predicates so won't be able to answer this. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@286 PS17, Line 286: preregisterStructSlotRefPathsWithAnalyzer(); I have asked this question in another comment but let me duplicate it: Instead of pre-registering the struct SlotRefs here, wouldn't re-ordering the select list items help? E.g. Putting everything that involves a struct to the beginning of the select list. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@327 PS17, Line 327: // Preregister select list paths that point to structs. This will be used to unify nit: This should be a function comment in above the function signature. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@351 PS17, Line 351: List Could this be a HashSet to avoid duplicates? http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@354 PS17, Line 354: List<Path> paths = slotRefs.stream().map(this::slotRefToResolvedPath) I observed you really don't like for loops and ifs :D http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@361 PS17, Line 361: private Path slotRefToResolvedPath(SlotRef slotRef) { I think this should be a part of SlotRef instead of SelectStmt. The whole content of this function reminds me the beginning of SlotRef.analyzeImpl() anyway. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@361 PS17, Line 361: slotRefToResolvedPath The naming should be something different to emphasise that this is not just a get method but also resolves the rawPath. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@363 PS17, Line 363: Path resolvedPath = analyzer_.resolvePathWithMasking(slotRef.getRawPath(), If you do this resolvePathWithMasking() call here would the very same be still called when analyzing SlotRefs in the select list in analyzeSelectClause()? http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@370 PS17, Line 370: } catch (AnalysisException e) { Shouldn't we just let the AnalysisException be re-thrown from this function? Is there a case when we just want to swallow the exception and deliberately continue the execution? http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@204 PS17, Line 204: final TupleDescriptor structTupleDesc = parentStructSlotDesc != null ? No need to introduce 'structTupleDesc' as its only purpose is to be assigned to 'tupleDesc'. You could directly put these to 'tupleDesc'. http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@266 PS17, Line 266: public List<String> getRawPath() { nit: fits in one line http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/17847/17/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@779 PS17, Line 779: public void testOnlyNeededStructFieldsMaterialized() throws ImpalaException { With this testing approach we wouldn't verify the exact size of a row just that the size of a row in one query matches the size of a row in another query. But we know the size of the structs under test here so we could be more specific with the asserts, right? e.g. Assume we query an int struct field in a regular select and using a WITH, and with this patch both would take let's say 1234567bytes for an integer instead of 4. This test would still succeed, right? http://gerrit.cloudera.org:8080/#/c/17847/17/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test File testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test: http://gerrit.cloudera.org:8080/#/c/17847/17/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@283 PS17, Line 283: ---- QUERY The below 2 tests would be also good candidates to be checked for the row size as well in PlannerTest. http://gerrit.cloudera.org:8080/#/c/17847/17/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@334 PS17, Line 334: 'NULL','NULL' Could you please also add some tests where we select a struct and some of its fields (or do a WHERE on the fields) where the struct is inside a collection? -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 18 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Comment-Date: Thu, 10 Mar 2022 15:24:15 +0000 Gerrit-HasComments: Yes