Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17847 )

Change subject: IMPALA-10838: Error when struct returned from WITH()
......................................................................


Patch Set 15:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@294
PS15, Line 294: don;t
typo


http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/Path.java@475
PS15, Line 475:   public List<String> getRawPathWithoutPrefix(Path prefix) {
Are we still using this somewhere?


http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@194
PS15, Line 194:         parentStructSlotDesc.getParent() : null;
nit: +2 indentation


http://gerrit.cloudera.org:8080/#/c/17847/15/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@218
PS15, Line 218:    * TODO: {type}.getSlotSize() is used in many other places, 
for example in
              :    * HdfsScanNode.java:getStatsNumRows and ColumnStats.java. Is 
it a problem? In those
              :    * places SlotDescriptors are not available, either only 
types or 'Column'.
Good question.
I don't know how stats handling deals with structs - do we completely ignore 
it, or calculate stats for members individually? I think that the size of a 
type should be irrelevant in both cases. If we would calculate stats for the 
structs themselves, then I think that only the number of nulls would be 
interesting.

In HdfsScanNode.java:getStatsNumRow we only check the size for scalar types:
https://github.com/apache/impala/blob/d59ec73990d89bfa4d4fa3d8fe598d53eb2918b7/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1522

This code only runs in case there is no row count stat for a table or 
partition, so we have to estimate the number of rows based on the size of the 
files + ALL columns within. So if we would want to include structs in this 
estimate, we should simply include all members as if it was materialized. This 
is a very rough estimate anyway, so there is no clear good answer.

I think that stat handling of structs could totally deserve a separate jira :)


http://gerrit.cloudera.org:8080/#/c/17847/15/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/17847/15/testdata/workloads/functional-query/queries/QueryTest/nested-struct-in-select-list.test@121
PS15, Line 121: ---- QUERY
Can you add tests where different kind of view-s are used, e.g. a WITH clause 
which is used within an inline view?

I have added tests like this in 
https://gerrit.cloudera.org/#/c/17811/31/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test

e.g.
# Unnesting array returned by view wrapped in inline view + WITH clause.
with v2 as (select id, int_array from complextypes_arrays_only_view)
select v.id, a.item from (select id, int_array from v2) v, v.int_array a;



--
To view, visit http://gerrit.cloudera.org:8080/17847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iadb9233677355b85d424cc3f22b00b5a3bf61c57
Gerrit-Change-Number: 17847
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Feb 2022 10:17:25 +0000
Gerrit-HasComments: Yes

Reply via email to