[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Special care is needed for virtual columns when column masking/row filtering is applicable on them. They are added as "hidden" select list items to the table masking views which means they don't expand by * expressions. They still need to be included in * expressions though when they are coming from user-written views. Testing: * analyzer tests * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Reviewed-on: http://gerrit.cloudera.org:8080/18514 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructField.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test M tests/authorization/test_ranger.py M tests/query_test/test_scanners.py 35 files changed, 882 insertions(+), 64 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Jun 2022 13:02:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10727/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Jun 2022 08:48:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 8: Code-Review+2 (1 comment) Carry +2 Thanks everyone for the review! Yeah, currently the estimates are a bit off. We overestimate the NDV and underestimate the the average row size. http://gerrit.cloudera.org:8080/#/c/18514/7/be/src/exec/file-metadata-utils.cc File be/src/exec/file-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/18514/7/be/src/exec/file-metadata-utils.cc@72 PS7, Line 72: slot->ptr = filename_copy; : slot->len = le > nit: need spaces around "=" Done -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Jun 2022 08:31:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18514 to look at the new patch set (#8). Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Special care is needed for virtual columns when column masking/row filtering is applicable on them. They are added as "hidden" select list items to the table masking views which means they don't expand by * expressions. They still need to be included in * expressions though when they are coming from user-written views. Testing: * analyzer tests * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructField.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test M tests/authorization/test_ranger.py M tests/query_test/test_scanners.py 35 files changed, 882 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/8 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8201/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Jun 2022 08:32:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Jun 2022 08:32:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 7: Code-Review+2 (2 comments) Thanks for adding the column masking tests! Carrying Tamas's +1. For the estimation issue, I'm +1 on dealing with it on follow-up JIRAs. http://gerrit.cloudera.org:8080/#/c/18514/7/be/src/exec/file-metadata-utils.cc File be/src/exec/file-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/18514/7/be/src/exec/file-metadata-utils.cc@72 PS7, Line 72: slot->ptr=filename_copy; : slot->len=len; nit: need spaces around "=" http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test File testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test: http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test@4 PS3, Line 4: select input__file__name, * from alltypestiny; > Thanks for noticing that. I have to admit it wasn't trivial to fix for me, Cool! Thanks for fixing it! -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Jun 2022 02:27:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 7: Code-Review+1 Thanks for the update and opening the Jira, Zoltan! The change LGTM! Just wanted to discuss one more thing regarding estimations, I think the nodes above the scan nodes could underestimate the memory requirements for virtual columns. Ie.: joining two tables, the tables have 1 files each and the scanner returns 1 row from each file. Currently the INPUT__FILE__NAME column is estimated to be 12B, but a path can be 100+B, this means that the join will have to deal with 2MB instead of the estimated 0.24MB. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 07 Jun 2022 14:08:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Jun 2022 16:25:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10683/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Jun 2022 12:15:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8178/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Jun 2022 11:57:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18514 to look at the new patch set (#7). Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Special care is needed for virtual columns when column masking/row filtering is applicable on them. They are added as "hidden" select list items to the table masking views which means they don't expand by * expressions. They still need to be included in * expressions though when they are coming from user-written views. Testing: * analyzer tests * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructField.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test M tests/authorization/test_ranger.py M tests/query_test/test_scanners.py 35 files changed, 882 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/7 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8170/ -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 01 Jun 2022 17:19:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10670/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 01 Jun 2022 13:09:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18514 to look at the new patch set (#6). Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Special care is needed for virtual columns when column masking/row filtering is applicable on them. They are added as "hidden" select list items to the table masking views which means they don't expand by * expressions. They still need to be included in * expressions though when they are coming from user-written views. Testing: * analyzer tests * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructField.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test M tests/authorization/test_ranger.py M tests/query_test/test_scanners.py 35 files changed, 886 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/6 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8170/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 01 Jun 2022 12:50:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10647/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 26 May 2022 17:37:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 5: (9 comments) Thanks for the comments! Currently Hive allows you to create a table with a column named INPUT__FILE__NAME, so I didn't forbid that. In Hive such user columns shadow the virtual column, so I followed that behavior. I opened IMPALA-11322 to track proper estimations for virtual columns. Currently we overestimate both memory and cardinality. I'm not sure if we need stats for the lengths, as we only allocate memory for INPUT__FILE__NAME per file per scanner, i.e. even very long file names are not a problem. http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@28 PS4, Line 28: Special care is needed for virtual columns when column masking/row > nit: typo 'masking' Done http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@29 PS4, Line 29: filtering is applicable on them. They are added as "hidden" select > nit: typo 'filtering' Done http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@164 PS4, Line 164: // Virtual columns are hidden in the masking view, which means they don't : // participate in star expansion. : // E.g. during masking the following query is rewritten (where vc is a virtual col): : // SELECT vc, * FROM t; ===> : // SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, ... FROM t) v; : // In which case the '*' in the outer "SELECT vc, *" shouldn't contain 'v.vc' : // because in that case it would be doubled: : // SELECT vc, vc, c1, c2, ... FROM (...); : // Hence virtual columns are hidden select list items. They are also hidden : // when they are not masked, but other columns are. > nit: most of this could be part of the VirtualColumn class comment. I added that virtual columns are not included in star expansions. But I didn't want to explain table masking views there, to keep the comment simple. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@24 PS4, Line 24: > nit: Could we add some notes here, things I think would be useful: Thanks, I added most of it. I didn't add "it does not contain any table specific values", because we might need to store table stats in the future for proper cardinality estimations. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@35 PS4, Line 35: > Maybe we could use NONE here? Similar to SlotDescriptor. I am a bit afraid We are returning a VirtualColumn here, not a TVirtualColumn. Having a NONE virtual column instance would be weird a bit, as such virtual columns shouldn't exist. It could also potentially mask some bugs. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@188 PS4, Line 188: > nit: empty new line Done http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@366 PS4, Line 366: "select input__file__name, * from functional_parquet.complextypestbl c, " + > line too long (98 > 90) Done http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@373 PS4, Line 373: AnalysisError( > line too long (93 > 90) Done http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@380 PS4, Line 380: "Could not resolve column/field reference: 'c.int_array.input__file__name'"); > line too long (93 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 5 Gerrit-Owner:
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18514 to look at the new patch set (#5). Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Special care is needed for virtual columns when column masking/row filtering is applicable on them. They are added as "hidden" select list items to the table masking views which means they don't expand by * expressions. They still need to be included in * expressions though when they are coming from user-written views. Testing: * analyzer tests * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructField.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test M tests/authorization/test_ranger.py M tests/query_test/test_scanners.py 35 files changed, 885 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/5 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 4: (6 comments) Hi Zoltan, pretty nifty change! I think we should define input__file__name as a keyword, currently if I create a table with a column input__file__name the insert succeeds, then the 'select *' leaks the filename. In addition, this column will not be considered during estimations, so the row size estimations will be off. Maybe in a future Jira we could track the file length statistics as well. http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@28 PS4, Line 28: Special care is needed for virtual columns when column masing/row nit: typo 'masking' http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@29 PS4, Line 29: fitering is applicable on them. They are added as "hidden" select nit: typo 'filtering' http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@164 PS4, Line 164: // Virtual columns are hidden in the masking view, which means they don't : // participate in star expansion. : // E.g. during masking the following query is rewritten (where vc is a virtual col): : // SELECT vc, * FROM t; ===> : // SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, ... FROM t) v; : // In which case the '*' in the outer "SELECT vc, *" shouldn't contain 'v.vc' : // because in that case it would be doubled: : // SELECT vc, vc, c1, c2, ... FROM (...); : // Hence virtual columns are hidden select list items. They are also hidden : // when they are not masked, but other columns are. nit: most of this could be part of the VirtualColumn class comment. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@24 PS4, Line 24: nit: Could we add some notes here, things I think would be useful: - what is a virtual column - singleton, it does not contain any table specific values - added to every Table object - skipped from * expansion, masking http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@35 PS4, Line 35: null Maybe we could use NONE here? Similar to SlotDescriptor. I am a bit afraid of a possible NPE. http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@188 PS4, Line 188: nit: empty new line -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 24 May 2022 12:28:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8140/ -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 23 May 2022 13:11:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8140/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 23 May 2022 08:48:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10607/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 May 2022 16:04:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 4: (4 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/18514/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/18514/3/be/src/exec/hdfs-scan-node-base.cc@432 PS3, Line 432: > nit: no indention? Done http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/analysis/Path.java@279 PS3, Line 279: if (rootTable_ == null) return false; : if (rootDesc_ != null) { : if (rootDesc_.getType() != rootTable_.getType().getItemType()) { : // 'rootDesc_' describes a collection tuple. Currently we only allow virtual : // columns at the table-level. : return false; : } : } : if (rawPath_.size() != 1) return false; > Can we add some test coverage in AnalyzerTest for these? Added a few tests to AnalyzerTest. http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@633 PS3, Line 633: addVirtualColumn(VirtualColu > Can we do this in Table#clearColumns()? The function name doesn't indicate Done http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test File testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test: http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test@4 PS3, Line 4: select input__file__name, * from alltypestiny; > I checked the Hive behaviors and found that it's allowed to use INPUT__FILE Thanks for noticing that. I have to admit it wasn't trivial to fix for me, but now I think I got it right. Please take a look. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 May 2022 15:45:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18514 to look at the new patch set (#4). Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Special care is needed for virtual columns when column masing/row fitering is applicable on them. They are added as "hidden" select list items to the table masking views which means they don't expand by * expressions. They still need to be included in * expressions though when they are coming from user-written views. Testing: * analyzer tests * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/StructField.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test M tests/authorization/test_ranger.py M tests/query_test/test_scanners.py 34 files changed, 834 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/4 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java: http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@366 PS4, Line 366: "select input__file__name, * from functional_parquet.complextypestbl c, c.int_array"); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@373 PS4, Line 373: "select id, nested_struct.input__file__name from functional_parquet.complextypestbl", line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@380 PS4, Line 380: "select id, nested_struct.input__file__name from functional_parquet.complextypestbl", line too long (93 > 90) -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 20 May 2022 15:45:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test File testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test: http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test@4 PS3, Line 4: select input__file__name, * from alltypestiny; > Can we copy some tests here to testdata/workloads/functional-query/queries/ I checked the Hive behaviors and found that it's allowed to use INPUT__FILE__NAME when there are effective masking policies. I also realized that Impala allows SHOW FILES statements in such cases. So I think we should support INPUT__FILE__NAME with masking policies in Impala. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 16 May 2022 00:35:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 3: (4 comments) The patch looks pretty good to me! I'm going to do another round of more detailed review. Left some comments first. http://gerrit.cloudera.org:8080/#/c/18514/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/18514/3/be/src/exec/hdfs-scan-node-base.cc@432 PS3, Line 432: nit: no indention? http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/analysis/Path.java@279 PS3, Line 279: if (rootTable_ == null) return false; : if (rootDesc_ != null) { : if (rootDesc_.getType() != rootTable_.getType().getItemType()) { : // 'rootDesc_' describes a collection tuple. Currently we only allow virtual : // columns at the table-level. : return false; : } : } : if (rawPath_.size() != 1) return false; Can we add some test coverage in AnalyzerTest for these? http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@633 PS3, Line 633: getVirtualColumns().clear(); Can we do this in Table#clearColumns()? The function name doesn't indicate the clear. http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test File testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test: http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test@4 PS3, Line 4: select input__file__name, * from alltypestiny; Can we copy some tests here to testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test? I saw a error like this: I0512 17:16:11.198302 24544 jni-util.cc:286] ee40ba3ebe04e6c1:e912b582] java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:492) at org.apache.impala.analysis.StatementBase.castResultExprs(StatementBase.java:114) at org.apache.impala.analysis.AnalysisContext.reAnalyze(AnalysisContext.java:620) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:542) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:468) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2018) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1926) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1750) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164) I feel like we don't need to support virtual columns when the table has Ranger column-masking/row-filtering policies on the current user. Because the user is not a privileged user so should not be aware of the internal details (e.g. file names) of this table. So we just need to fail the query elegantly. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 12 May 2022 11:44:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 3: The verify job failed on unrelated HBase tests. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 12 May 2022 09:56:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8092/ -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Wed, 11 May 2022 17:42:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10557/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Wed, 11 May 2022 13:30:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18514 to look at the new patch set (#3). Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Testing: * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test M tests/query_test/test_scanners.py 25 files changed, 570 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/3 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18514 to look at the new patch set (#2). Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name Hive has virtual column INPUT__FILE__NAME which returns the data file name that stores the actual row. It can be used in several ways, see the above two Jira tickets for examples. This virtual column is also needed to support position-based delete files in Iceberg V2 tables. This patch also adds the foundations to support further table-level virtual columns later. Virtual columns are stored at the table level in a separate list from the table schema. During path resolution in Path.resolve() we also try to resolve virtual columns. Slot descriptors also store the information whether they refer to a virtual column. Currently we only add the INPUT__FILE__NAME virtual column. The value of this column can be set in the template tuple of the scanners. All kinds of operations are possible on this virtual column, users can invoke additional functions on it, can filter rows, can group by, etc. Testing: * added e2e tests Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/orc-column-readers.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test A testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test M tests/query_test/test_scanners.py 25 files changed, 558 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/2 -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate
[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18514 ) Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8092/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61 Gerrit-Change-Number: 18514 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Wed, 11 May 2022 13:11:26 + Gerrit-HasComments: No