Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 )
Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name ...................................................................... Patch Set 5: (9 comments) Thanks for the comments! Currently Hive allows you to create a table with a column named INPUT__FILE__NAME, so I didn't forbid that. In Hive such user columns shadow the virtual column, so I followed that behavior. I opened IMPALA-11322 to track proper estimations for virtual columns. Currently we overestimate both memory and cardinality. I'm not sure if we need stats for the lengths, as we only allocate memory for INPUT__FILE__NAME per file per scanner, i.e. even very long file names are not a problem. http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@28 PS4, Line 28: Special care is needed for virtual columns when column masking/row > nit: typo 'masking' Done http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@29 PS4, Line 29: filtering is applicable on them. They are added as "hidden" select > nit: typo 'filtering' Done http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@164 PS4, Line 164: // Virtual columns are hidden in the masking view, which means they don't : // participate in star expansion. : // E.g. during masking the following query is rewritten (where vc is a virtual col): : // SELECT vc, * FROM t; ===> : // SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, ... FROM t) v; : // In which case the '*' in the outer "SELECT vc, *" shouldn't contain 'v.vc' : // because in that case it would be doubled: : // SELECT vc, vc, c1, c2, ... FROM (...); : // Hence virtual columns are hidden select list items. They are also hidden : // when they are not masked, but other columns are. > nit: most of this could be part of the VirtualColumn class comment. I added that virtual columns are not included in star expansions. But I didn't want to explain table masking views there, to keep the comment simple. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@24 PS4, Line 24: > nit: Could we add some notes here, things I think would be useful: Thanks, I added most of it. I didn't add "it does not contain any table specific values", because we might need to store table stats in the future for proper cardinality estimations. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@35 PS4, Line 35: > Maybe we could use NONE here? Similar to SlotDescriptor. I am a bit afraid We are returning a VirtualColumn here, not a TVirtualColumn. Having a NONE virtual column instance would be weird a bit, as such virtual columns shouldn't exist. It could also potentially mask some bugs. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@188 PS4, Line 188: > nit: empty new line Done http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@366 PS4, Line 366: "select input__file__name, * from functional_parquet.complextypestbl c, " + > line too long (98 > 90) Done http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@373 PS4, Line 373: AnalysisError( > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@380 PS4, Line 380: "Could not resolve column/field reference: 'c.int_array.input__file__name'"); > line too long (93 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 26 May 2022 17:31:47 +0000 Gerrit-HasComments: Yes