[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly If the Iceberg table has Avro delete files (e.g. by setting 'write.delete.format.default'='avro') then Impala won't be able to read the contents of the delete files properly. It is because the avro schema is not set properly for the virtual delete table. Testing: * added e2e tests with position delete files of all kinds Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Reviewed-on: http://gerrit.cloudera.org:8080/21301 Tested-by: Impala Public Jenkins Reviewed-by: Daniel Becker Reviewed-by: Gabor Kaszab --- M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test M tests/query_test/test_iceberg.py 3 files changed, 143 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Daniel Becker: Looks good to me, but someone else must approve Gabor Kaszab: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2024 08:54:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: (2 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java: http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@87 PS1, Line 87: if (desc.hdfsTable.isSetAvroSchema()) { > I guess the issue is also true for AVRO equality delete files. Should we al Yes, it would definitely be useful to have such tests. Probably in a separate CR, as adding such tables is cumbersome. http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test: http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test@92 PS1, Line 92: row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_mixed_formats_partitioned/data/j_trunc=2/.*-data-.*.orc','.*B','','.*' > there should be 2 ORC data files in the j_trunc=2, right? One for (2,2) and With VERIFY_IS_SUBSET we only check that each line is present in the result set. I.e. adding more lines with the same content wouldn't have an effect: https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/tests/common/test_result_verifier.py#L258-L259 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 22 Apr 2024 15:30:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java: http://gerrit.cloudera.org:8080/#/c/21301/1/fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java@87 PS1, Line 87: if (desc.hdfsTable.isSetAvroSchema()) { I guess the issue is also true for AVRO equality delete files. Should we also add test coverage for that? (could be separate patch) http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test: http://gerrit.cloudera.org:8080/#/c/21301/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test@92 PS1, Line 92: row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_mixed_formats_partitioned/data/j_trunc=2/.*-data-.*.orc','.*B','','.*' there should be 2 ORC data files in the j_trunc=2, right? One for (2,2) and one for (3,3). You only check for 1 of such line. -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 22 Apr 2024 15:20:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 16 Apr 2024 09:48:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 15 Apr 2024 21:39:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15895/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 15 Apr 2024 16:58:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21301 Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly If the Iceberg table has Avro delete files (e.g. by setting 'write.delete.format.default'='avro') then Impala won't be able to read the contents of the delete files properly. It is because the avro schema is not set properly for the virtual delete table. Testing: * added e2e tests with position delete files of all kinds Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 --- M fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-format-position-deletes.test M tests/query_test/test_iceberg.py 3 files changed, 143 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/21301/1 -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21301 ) Change subject: IMPALA-13002: Iceberg V2 tables with Avro delete files aren't read properly .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10541/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff13198991caf32c51cd9e0ace4454fd00216cf6 Gerrit-Change-Number: 21301 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 15 Apr 2024 16:34:40 + Gerrit-HasComments: No