[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Special care is needed for virtual columns when column masking/row
filtering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.

Testing:
 * analyzer tests
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Reviewed-on: http://gerrit.cloudera.org:8080/18514
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
M tests/authorization/test_ranger.py
M tests/query_test/test_scanners.py
35 files changed, 882 insertions(+), 64 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Jun 2022 13:02:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10727/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Jun 2022 08:48:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-08 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 8: Code-Review+2

(1 comment)

Carry +2

Thanks everyone for the review!

Yeah, currently the estimates are a bit off. We overestimate the NDV and 
underestimate the the average row size.

http://gerrit.cloudera.org:8080/#/c/18514/7/be/src/exec/file-metadata-utils.cc
File be/src/exec/file-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18514/7/be/src/exec/file-metadata-utils.cc@72
PS7, Line 72: slot->ptr = filename_copy;
: slot->len = le
> nit: need spaces around "="
Done



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Jun 2022 08:31:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-08 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18514

to look at the new patch set (#8).

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Special care is needed for virtual columns when column masking/row
filtering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.

Testing:
 * analyzer tests
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
M tests/authorization/test_ranger.py
M tests/query_test/test_scanners.py
35 files changed, 882 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8201/ 
DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Jun 2022 08:32:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Jun 2022 08:32:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-07 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 7: Code-Review+2

(2 comments)

Thanks for adding the column masking tests! Carrying Tamas's +1.

For the estimation issue, I'm +1 on dealing with it on follow-up JIRAs.

http://gerrit.cloudera.org:8080/#/c/18514/7/be/src/exec/file-metadata-utils.cc
File be/src/exec/file-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18514/7/be/src/exec/file-metadata-utils.cc@72
PS7, Line 72: slot->ptr=filename_copy;
: slot->len=len;
nit: need spaces around "="


http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test:

http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test@4
PS3, Line 4: select input__file__name, * from alltypestiny;
> Thanks for noticing that. I have to admit it wasn't trivial to fix for me,
Cool! Thanks for fixing it!



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Jun 2022 02:27:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-07 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 7: Code-Review+1

Thanks for the update and opening the Jira, Zoltan! The change LGTM!

Just wanted to discuss one more thing regarding estimations, I think the nodes 
above the scan nodes could underestimate the memory requirements for virtual 
columns.
Ie.: joining two tables, the tables have 1 files each and the scanner 
returns 1 row from each file. Currently the INPUT__FILE__NAME column is 
estimated to be 12B, but a path can be 100+B, this means that the join will 
have to deal with 2MB instead of the estimated 0.24MB.


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 07 Jun 2022 14:08:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Jun 2022 16:25:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10683/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Jun 2022 12:15:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8178/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Jun 2022 11:57:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-02 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18514

to look at the new patch set (#7).

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Special care is needed for virtual columns when column masking/row
filtering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.

Testing:
 * analyzer tests
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
M tests/authorization/test_ranger.py
M tests/query_test/test_scanners.py
35 files changed, 882 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8170/


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Jun 2022 17:19:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10670/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Jun 2022 13:09:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-01 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18514

to look at the new patch set (#6).

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Special care is needed for virtual columns when column masking/row
filtering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.

Testing:
 * analyzer tests
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
M tests/authorization/test_ranger.py
M tests/query_test/test_scanners.py
35 files changed, 886 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-06-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8170/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Jun 2022 12:50:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10647/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 26 May 2022 17:37:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-26 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 5:

(9 comments)

Thanks for the comments!

Currently Hive allows you to create a table with a column named 
INPUT__FILE__NAME, so I didn't forbid that. In Hive such user columns shadow 
the virtual column, so I followed that behavior.

I opened IMPALA-11322 to track proper estimations for virtual columns. 
Currently we overestimate both memory and cardinality. I'm not sure if we need 
stats for the lengths, as we only allocate memory for INPUT__FILE__NAME per 
file per scanner, i.e. even very long file names are not a problem.

http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@28
PS4, Line 28: Special care is needed for virtual columns when column masking/row
> nit: typo 'masking'
Done


http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@29
PS4, Line 29: filtering is applicable on them. They are added as "hidden" select
> nit: typo 'filtering'
Done


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@164
PS4, Line 164:   // Virtual columns are hidden in the masking view, which 
means they don't
 :   // participate in star expansion.
 :   // E.g. during masking the following query is rewritten 
(where vc is a virtual col):
 :   // SELECT vc, * FROM t; ===>
 :   // SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, 
... FROM t) v;
 :   // In which case the '*' in the outer "SELECT vc, *" 
shouldn't contain 'v.vc'
 :   // because in that case it would be doubled:
 :   // SELECT vc, vc, c1, c2, ... FROM (...);
 :   // Hence virtual columns are hidden select list items. 
They are also hidden
 :   // when they are not masked, but other columns are.
> nit: most of this could be part of the VirtualColumn class comment.
I added that virtual columns are not included in star expansions. But I didn't 
want to explain table masking views there, to keep the comment simple.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@24
PS4, Line 24:
> nit: Could we add some notes here, things I think would be useful:
Thanks, I added most of it.

I didn't add "it does not contain any table specific values", because we might 
need to store table stats in the future for proper cardinality estimations.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@35
PS4, Line 35:
> Maybe we could use NONE here? Similar to SlotDescriptor. I am a bit afraid
We are returning a VirtualColumn here, not a TVirtualColumn.

Having a NONE virtual column instance would be weird a bit, as such virtual 
columns shouldn't exist. It could also potentially mask some bugs.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@188
PS4, Line 188:
> nit: empty new line
Done


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@366
PS4, Line 366: "select input__file__name, * from 
functional_parquet.complextypestbl c, " +
> line too long (98 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@373
PS4, Line 373: AnalysisError(
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@380
PS4, Line 380: "Could not resolve column/field reference: 
'c.int_array.input__file__name'");
> line too long (93 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 5
Gerrit-Owner: 

[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-26 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18514

to look at the new patch set (#5).

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Special care is needed for virtual columns when column masking/row
filtering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.

Testing:
 * analyzer tests
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-in-table.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
M tests/authorization/test_ranger.py
M tests/query_test/test_scanners.py
35 files changed, 885 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-24 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 4:

(6 comments)

Hi Zoltan, pretty nifty change!

I think we should define input__file__name as a keyword, currently if I create 
a table with a column input__file__name the insert succeeds, then the 'select 
*' leaks the filename.

In addition, this column will not be considered during estimations, so the row 
size estimations will be off. Maybe in a future Jira we could track the file 
length statistics as well.

http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@28
PS4, Line 28: Special care is needed for virtual columns when column masing/row
nit: typo 'masking'


http://gerrit.cloudera.org:8080/#/c/18514/4//COMMIT_MSG@29
PS4, Line 29: fitering is applicable on them. They are added as "hidden" select
nit: typo 'filtering'


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@164
PS4, Line 164:   // Virtual columns are hidden in the masking view, which 
means they don't
 :   // participate in star expansion.
 :   // E.g. during masking the following query is rewritten 
(where vc is a virtual col):
 :   // SELECT vc, * FROM t; ===>
 :   // SELECT vc, * FROM (SELECT MASK(vc) as vc, c1, c2, 
... FROM t) v;
 :   // In which case the '*' in the outer "SELECT vc, *" 
shouldn't contain 'v.vc'
 :   // because in that case it would be doubled:
 :   // SELECT vc, vc, c1, c2, ... FROM (...);
 :   // Hence virtual columns are hidden select list items. 
They are also hidden
 :   // when they are not masked, but other columns are.
nit: most of this could be part of the VirtualColumn class comment.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@24
PS4, Line 24:
nit: Could we add some notes here, things I think would be useful:
 - what is a virtual column
 - singleton, it does not contain any table specific values
 - added to every Table object
 - skipped from * expansion, masking


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@35
PS4, Line 35: null
Maybe we could use NONE here? Similar to SlotDescriptor. I am a bit afraid of a 
possible NPE.


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@188
PS4, Line 188:
nit: empty new line



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 24 May 2022 12:28:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8140/


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 23 May 2022 13:11:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8140/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 23 May 2022 08:48:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10607/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 May 2022 16:04:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 4:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/18514/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/18514/3/be/src/exec/hdfs-scan-node-base.cc@432
PS3, Line 432:
> nit: no indention?
Done


http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/analysis/Path.java@279
PS3, Line 279: if (rootTable_ == null) return false;
 : if (rootDesc_ != null) {
 :   if (rootDesc_.getType() != 
rootTable_.getType().getItemType()) {
 : // 'rootDesc_' describes a collection tuple. Currently 
we only allow virtual
 : // columns at the table-level.
 : return false;
 :   }
 : }
 : if (rawPath_.size() != 1) return false;
> Can we add some test coverage in AnalyzerTest for these?
Added a few tests to AnalyzerTest.


http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@633
PS3, Line 633: addVirtualColumn(VirtualColu
> Can we do this in Table#clearColumns()? The function name doesn't indicate
Done


http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test:

http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test@4
PS3, Line 4: select input__file__name, * from alltypestiny;
> I checked the Hive behaviors and found that it's allowed to use INPUT__FILE
Thanks for noticing that. I have to admit it wasn't trivial to fix for me, but 
now I think I got it right. Please take a look.



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 May 2022 15:45:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-20 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18514

to look at the new patch set (#4).

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Special care is needed for virtual columns when column masing/row
fitering is applicable on them. They are added as "hidden" select
list items to the table masking views which means they don't
expand by * expressions. They still need to be included in *
expressions though when they are coming from user-written views.

Testing:
 * analyzer tests
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/StructField.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
M tests/authorization/test_ranger.py
M tests/query_test/test_scanners.py
34 files changed, 834 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/4
--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java:

http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@366
PS4, Line 366: "select input__file__name, * from 
functional_parquet.complextypestbl c, c.int_array");
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@373
PS4, Line 373: "select id, nested_struct.input__file__name from 
functional_parquet.complextypestbl",
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/18514/4/fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java@380
PS4, Line 380: "select id, nested_struct.input__file__name from 
functional_parquet.complextypestbl",
line too long (93 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 May 2022 15:45:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-15 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test:

http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test@4
PS3, Line 4: select input__file__name, * from alltypestiny;
> Can we copy some tests here to testdata/workloads/functional-query/queries/
I checked the Hive behaviors and found that it's allowed to use 
INPUT__FILE__NAME when there are effective masking policies. I also realized 
that Impala allows SHOW FILES statements in such cases. So I think we should 
support INPUT__FILE__NAME with masking policies in Impala.



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 16 May 2022 00:35:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-12 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 3:

(4 comments)

The patch looks pretty good to me! I'm going to do another round of more 
detailed review. Left some comments first.

http://gerrit.cloudera.org:8080/#/c/18514/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/18514/3/be/src/exec/hdfs-scan-node-base.cc@432
PS3, Line 432:
nit: no indention?


http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/analysis/Path.java@279
PS3, Line 279: if (rootTable_ == null) return false;
 : if (rootDesc_ != null) {
 :   if (rootDesc_.getType() != 
rootTable_.getType().getItemType()) {
 : // 'rootDesc_' describes a collection tuple. Currently 
we only allow virtual
 : // columns at the table-level.
 : return false;
 :   }
 : }
 : if (rawPath_.size() != 1) return false;
Can we add some test coverage in AnalyzerTest for these?


http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/18514/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@633
PS3, Line 633: getVirtualColumns().clear();
Can we do this in Table#clearColumns()? The function name doesn't indicate the 
clear.


http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test:

http://gerrit.cloudera.org:8080/#/c/18514/3/testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test@4
PS3, Line 4: select input__file__name, * from alltypestiny;
Can we copy some tests here to 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test?
 I saw a error like this:

I0512 17:16:11.198302 24544 jni-util.cc:286] ee40ba3ebe04e6c1:e912b582] 
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:492)
at 
org.apache.impala.analysis.StatementBase.castResultExprs(StatementBase.java:114)
at 
org.apache.impala.analysis.AnalysisContext.reAnalyze(AnalysisContext.java:620)
at 
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:542)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:468)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2018)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1926)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1750)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164)

I feel like we don't need to support virtual columns when the table has Ranger 
column-masking/row-filtering policies on the current user. Because the user is 
not a privileged user so should not be aware of the internal details (e.g. file 
names) of this table. So we just need to fail the query elegantly.



--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 12 May 2022 11:44:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 3:

The verify job failed on unrelated HBase tests.


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 12 May 2022 09:56:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8092/


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 11 May 2022 17:42:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10557/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 11 May 2022 13:30:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-11 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18514

to look at the new patch set (#3).

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Testing:
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
M tests/query_test/test_scanners.py
25 files changed, 570 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/3
--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-11 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Tamas Mate, Gergely Fürnstáhl, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18514

to look at the new patch set (#2).

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..

IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column for file name

Hive has virtual column INPUT__FILE__NAME which returns the data file
name that stores the actual row. It can be used in several ways, see the
above two Jira tickets for examples. This virtual column is also needed
to support position-based delete files in Iceberg V2 tables.

This patch also adds the foundations to support further table-level
virtual columns later. Virtual columns are stored at the table level
in a separate list from the table schema. During path resolution
in Path.resolve() we also try to resolve virtual columns. Slot
descriptors also store the information whether they refer to a virtual
column.

Currently we only add the INPUT__FILE__NAME virtual column. The value
of this column can be set in the template tuple of the scanners.

All kinds of operations are possible on this virtual column, users
can invoke additional functions on it, can filter rows, can group by,
etc.

Testing:
 * added e2e tests

Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/file-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name-complextypes.test
A 
testdata/workloads/functional-query/queries/QueryTest/virtual-column-input-file-name.test
M tests/query_test/test_scanners.py
25 files changed, 558 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/18514/2
--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] IMPALA-801, IMPALA-8011: Add INPUT FILE NAME virtual column for file name

2022-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18514 )

Change subject: IMPALA-801, IMPALA-8011: Add INPUT__FILE__NAME virtual column 
for file name
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8092/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I498591f1db08a91a5c846df59086d2291df4ff61
Gerrit-Change-Number: 18514
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Wed, 11 May 2022 13:11:26 +
Gerrit-HasComments: No