[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..

IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

If the Iceberg table has Avro delete files (e.g. by setting
'write.delete.format.default'='avro') then Impala won't be able to read
the contents of the delete files properly. It is because the avro
schema is not set properly for the virtual delete table.

Testing:
 * added e2e tests with position delete files of all kinds

Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Reviewed-on: http://gerrit.cloudera.org:8080/21301
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 
Reviewed-by: Gabor Kaszab 
---
M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
M tests/query_test/test_iceberg.py
3 files changed, 143 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Daniel Becker: Looks good to me, but someone else must approve
  Gabor Kaszab: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-23 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 23 Apr 2024 08:54:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@87
PS1, Line 87:   if (desc.hdfsTable.isSetAvroSchema()) {
> I guess the issue is also true for AVRO equality delete files. Should we al
Yes, it would definitely be useful to have such tests. Probably in a separate 
CR, as adding such tables is cumbersome.


http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test:

http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test@92
PS1, Line 92: 
row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_mixed_formats_partitioned/data/j_trunc=2/.*-data-.*.orc','.*B','','.*'
> there should be 2 ORC data files in the j_trunc=2, right? One for (2,2) and
With VERIFY_IS_SUBSET we only check that each line is present in the result 
set. I.e. adding more lines with the same content wouldn't have an effect: 
https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/tests/common/test_result_verifier.py#L258-L259



--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Apr 2024 15:30:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-22 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@87
PS1, Line 87:   if (desc.hdfsTable.isSetAvroSchema()) {
I guess the issue is also true for AVRO equality delete files. Should we also 
add test coverage for that? (could be separate patch)


http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test:

http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test@92
PS1, Line 92: 
row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_mixed_formats_partitioned/data/j_trunc=2/.*-data-.*.orc','.*B','','.*'
there should be 2 ORC data files in the j_trunc=2, right? One for (2,2) and one 
for (3,3). You only check for 1 of such line.



--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 22 Apr 2024 15:20:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-16 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 16 Apr 2024 09:48:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 15 Apr 2024 21:39:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15895/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 15 Apr 2024 16:58:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-15 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21301


Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..

IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

If the Iceberg table has Avro delete files (e.g. by setting
'write.delete.format.default'='avro') then Impala won't be able to read
the contents of the delete files properly. It is because the avro
schema is not set properly for the virtual delete table.

Testing:
 * added e2e tests with position delete files of all kinds

Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
---
M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test
M tests/query_test/test_iceberg.py
3 files changed, 143 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/21301/1
--
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly

2024-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21301 )

Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't 
read properly
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10541/ 
DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/21301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6
Gerrit-Change-Number: 21301
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 15 Apr 2024 16:34:40 +
Gerrit-HasComments: No