Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
......................................................................


Patch Set 5:

(9 comments)

Thanks for the comments!

Currently Hive allows you to create a table with a column named 
INPUT__FILE__NAME, so I didn't forbid that. In Hive such user columns shadow 
the virtual column, so I followed that behavior.

I opened IMPALA-11322 to track proper estimations for virtual columns. 
Currently we overestimate both memory and cardinality. I'm not sure if we need 
stats for the lengths, as we only allocate memory for INPUT__FILE__NAME per 
file per scanner, i.e. even very long file names are not a problem.

http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@28
PS4, Line 28: Special care is needed for virtual columns when column masking/row
> nit: typo 'masking'
Done


http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@29
PS4, Line 29: filtering is applicable on them. They are added as "hidden" select
> nit: typo 'filtering'
Done


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@164
PS4, Line 164:       // Virtual columns are hidden in the masking view, which 
means they don't
             :       // participate in star expansion.
             :       // E.g. during masking the following query is rewritten 
(where vc is a virtual col):
             :       // SELECT vc, * FROM t; ===>
             :       //     SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, 
... FROM t) v;
             :       // In which case the '*' in the outer "SELECT vc, *" 
shouldn't contain 'v.vc'
             :       // because in that case it would be doubled:
             :       // SELECT vc, vc, c1, c2, ... FROM (...);
             :       // Hence virtual columns are hidden select list items. 
They are also hidden
             :       // when they are not masked, but other columns are.
> nit: most of this could be part of the VirtualColumn class comment.
I added that virtual columns are not included in star expansions. But I didn't 
want to explain table masking views there, to keep the comment simple.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@24
PS4, Line 24:
> nit: Could we add some notes here, things I think would be useful:
Thanks, I added most of it.

I didn't add "it does not contain any table specific values", because we might 
need to store table stats in the future for proper cardinality estimations.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@35
PS4, Line 35:
> Maybe we could use NONE here? Similar to SlotDescriptor. I am a bit afraid
We are returning a VirtualColumn here, not a TVirtualColumn.

Having a NONE virtual column instance would be weird a bit, as such virtual 
columns shouldn't exist. It could also potentially mask some bugs.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@188
PS4, Line 188:
> nit: empty new line
Done


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@366
PS4, Line 366:             "select input__file__name, * from 
functional_parquet.complextypestbl c, " +
> line too long (98 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@373
PS4, Line 373:     AnalysisError(
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@380
PS4, Line 380:         "Could not resolve column/field reference: 
'c.int_array.input__file__name'");
> line too long (93 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 May 2022 17:31:47 +0000
Gerrit-HasComments: Yes

Reply via email to