Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17847 )
Change subject: IMPALA-10838: Error when struct returned from WITH() ...................................................................... Patch Set 15: (5 comments) http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java: http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@294 PS15, Line 294: don;t typo http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/Path.java@475 PS15, Line 475: public List<String> getRawPathWithoutPrefix(Path prefix) { Are we still using this somewhere? http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@194 PS15, Line 194: parentStructSlotDesc.getParent() : null; nit: +2 indentation http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@218 PS15, Line 218: * TODO: {type}.getSlotSize() is used in many other places, for example in : * HdfsScanNode.java:getStatsNumRows and ColumnStats.java. Is it a problem? In those : * places SlotDescriptors are not available, either only types or 'Column'. Good question. I don't know how stats handling deals with structs - do we completely ignore it, or calculate stats for members individually? I think that the size of a type should be irrelevant in both cases. If we would calculate stats for the structs themselves, then I think that only the number of nulls would be interesting. In HdfsScanNode.java:getStatsNumRow we only check the size for scalar types: https://github.com/apache/impala/blob/d59ec73990d89bfa4d4fa3d8fe598d53eb2918b7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1522 This code only runs in case there is no row count stat for a table or partition, so we have to estimate the number of rows based on the size of the files + ALL columns within. So if we would want to include structs in this estimate, we should simply include all members as if it was materialized. This is a very rough estimate anyway, so there is no clear good answer. I think that stat handling of structs could totally deserve a separate jira :) http://gerrit.cloudera.org:8080/#/c/17847/15/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test File testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test: http://gerrit.cloudera.org:8080/#/c/17847/15/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@121 PS15, Line 121: ---- QUERY Can you add tests where different kind of view-s are used, e.g. a WITH clause which is used within an inline view? I have added tests like this in https://gerrit.cloudera.org/#/c/17811/31/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test e.g. # Unnesting array returned by view wrapped in inline view + WITH clause. with v2 as (select id, int_array from complextypes_arrays_only_view) select v.id, a.item from (select id, int_array from v2) v, v.int_array a; -- To view, visit http://gerrit.cloudera.org:8080/17847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57 Gerrit-Change-Number: 17847 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Comment-Date: Tue, 08 Feb 2022 10:17:25 +0000 Gerrit-HasComments: Yes