[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/21435/9/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/21435/9/tests/query_test/test_scanners.py@528 PS9, Line 528: def test_iceberg_query_v2(self, vector): > I think it was needed for the second part for some reason. I reverted it fr Okay, so this was needed because of a new DCHECK in iceberg-delete-builder.cc. I'll need to re-add this to the squashed patch as well. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 03 Jul 2024 12:13:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 9: (1 comment) > Patch Set 9: > > This is substantially rewritten with RoaringBitmaps. Do you plan to combine > them? I kept them separate because of the code review. But since now all reviewers are familiar with both patches I see no good reason to not squash them. http://gerrit.cloudera.org:8080/#/c/21435/9/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/21435/9/tests/query_test/test_scanners.py@528 PS9, Line 528: def test_iceberg_query_v2(self, vector): > I'm not sure I understand why this test change was needed but I most probab I think it was needed for the second part for some reason. I reverted it from the squashed patch for now. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 01 Jul 2024 13:00:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has abandoned this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Abandoned Abandoning in favor of https://gerrit.cloudera.org/c/21557/ -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 9: This is substantially rewritten with RoaringBitmaps. Do you plan to combine them? -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 28 Jun 2024 21:44:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/21435/9/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/21435/9/tests/query_test/test_scanners.py@528 PS9, Line 528: def test_iceberg_query_v2(self, vector): I'm not sure I understand why this test change was needed but I most probably miss something here. For me the test changes in this patch seem irrelevant to the purpose of the performance improvement of the patch. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 25 Jun 2024 12:58:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16396/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Jun 2024 11:43:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Jun 2024 11:25:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/21435/8/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/8/be/src/exec/iceberg-delete-builder.cc@319 PS8, Line 319: ) > Now that this function doesn't return void, I think it helps readability to Done http://gerrit.cloudera.org:8080/#/c/21435/8/testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test: http://gerrit.cloudera.org:8080/#/c/21435/8/testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test@5 PS8, Line 5: UNION node which behaves : # differently > Does it mean that the result will be different? Or only that we can't asser The results can be different because how rand() is being used in the fragment instance. I haven't insvestigated all the details, but we get different results when both data files are scanned in the same fragment VS the data files are scheduled on different nodes. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Jun 2024 11:18:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#9). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h A testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test M tests/query_test/test_scanners.py 5 files changed, 164 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/9 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 8: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/21435/8/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/8/be/src/exec/iceberg-delete-builder.cc@319 PS8, Line 319: ) Now that this function doesn't return void, I think it helps readability to add annotation for the return value "-> Status". http://gerrit.cloudera.org:8080/#/c/21435/8/testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test: http://gerrit.cloudera.org:8080/#/c/21435/8/testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test@5 PS8, Line 5: UNION node which behaves : # differently Does it mean that the result will be different? Or only that we can't assert specific values in the profile? -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Jun 2024 09:32:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 Jun 2024 02:01:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10732/ -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Jun 2024 21:26:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16389/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Jun 2024 21:02:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#8). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h A testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test M tests/query_test/test_scanners.py 5 files changed, 164 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/8 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10736/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Jun 2024 20:39:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/21435/7/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/21435/7/tests/query_test/test_scanners.py@526 PS7, Line 526: @ flake8: F811 redefinition of unused 'test_iceberg_query' from line 520 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Jun 2024 17:14:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16384/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Jun 2024 16:49:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#7). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h A testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test M tests/query_test/test_scanners.py 5 files changed, 164 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/7 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10732/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Jun 2024 16:17:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10729/ -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Jun 2024 14:41:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10729/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 Jun 2024 09:29:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 17 Jun 2024 15:32:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16368/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 17 Jun 2024 15:12:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 6: (3 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@305 PS5, Line 305: vect > I think using the actual type (vector&) is more readable. Done http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@327 PS5, Line 327: join > "this IcebergDeleteBuilder"? It was intentional from me to write 'join', as the associated JOIN fragment processes the data files, and in the builder's context it should be clear which 'join' we are referring to. http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@328 PS5, Line 328: process > Nit: "processes". Done -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 17 Jun 2024 14:47:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#6). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 113 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/6 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 5: (4 comments) Thanks Zoltán! http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@242 PS3, Line 242: auto > The actual type is a pair. Yes, it's also a pity that structured bindings don't allow type annotation. http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@305 PS5, Line 305: auto I think using the actual type (vector&) is more readable. http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@327 PS5, Line 327: join "this IcebergDeleteBuilder"? http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@328 PS5, Line 328: process Nit: "processes". -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 17 Jun 2024 14:41:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16366/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 17 Jun 2024 14:11:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 5: (9 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@70 PS3, Line 70: s > Nit: this should be "Processes". Done http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@121 PS3, Line 121: ctor of vectors means we can just allocate another vector to hold : // subsequent eleme > Could you clarify this? Is it about 'intermediate_delete_rows_' in addition Rephrased a bit, I hope it's clearer now. http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@149 PS3, Line 149: s > We usually take non-const (output) arguments by pointer. Done, also switched the parameters, as we prefer to have output parameters at the end. http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@184 PS3, Line 184: Inte > I think using the actual type (IntermediateDeleteRowVector&) is more readab Done http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@242 PS3, Line 242: auto > I think using the actual type (IntermediateDeleteRowVector&) is more readab The actual type is a pair. I switched to structured binding as I think with proper variable names it is more readable. http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@250 PS3, Line 250: e_ve > I think using the actual type (vector) is more readable. Also L252 Done http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@296 PS3, Line 296: cons > I think using the actual type (vector) is more readable. Also L306 Done http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@319 PS3, Line 319: > Can we take it by const ref? Done http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@327 PS3, Line 327: > Can it happen that (*path) is not a key in 'intermediate_delete_rows_' but Done -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 17 Jun 2024 13:47:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#5). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 113 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/5 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@70 PS3, Line 70: d Nit: this should be "Processes". http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@121 PS3, Line 121: The newly added vectors have double capacity so we don't end up having : // too many vectors Could you clarify this? Is it about 'intermediate_delete_rows_' in addition to the final map, or the embedded vectors? http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@149 PS3, Line 149: & We usually take non-const (output) arguments by pointer. http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@184 PS3, Line 184: auto I think using the actual type (IntermediateDeleteRowVector&) is more readable. http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@242 PS3, Line 242: auto I think using the actual type (IntermediateDeleteRowVector&) is more readable. http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@250 PS3, Line 250: auto I think using the actual type (vector) is more readable. Also L252. http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@296 PS3, Line 296: auto I think using the actual type (vector) is more readable. Also L306. http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@319 PS3, Line 319: StringValue* Can we take it by const ref? http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@327 PS3, Line 327: } Can it happen that (*path) is not a key in 'intermediate_delete_rows_' but path->Len() != 0? If not, we could add DCHECK(false) in a following ELSE branch, with an explaining comment. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 14 Jun 2024 16:37:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 3: (6 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@95 PS2, Line 95: // Collects the processed data files' file paths and fills 'intermediate_delete_rows_' > The function comment is not valid now. Would be nice to mention that this p Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@124 PS2, Line 124: // Keys are data file paths processed by the join operator. Values are the file > Can you add a comment what the keys and values are for in this map? Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@146 PS2, Line 146: : /// Inserts the conte > nit: seems a weird place to break the line Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@184 PS2, Line 184: auto& delete_vector = > To increase readability, do you think it'd make sense to introduce a variab Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@259 PS2, Line 259: rrent delet > nit: I'd break the line before the first param, and then each param could f Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@325 PS2, Line 325: state->LogError( > KrpcDataStreamSender already filters out NULL file_paths. Is there a need t We can see NULLs when NUM_NODES=1 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 16:31:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16333/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 16:54:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#3). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 108 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/3 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@95 PS2, Line 95: // Checks distribution mode and collects the processed data files' file path in case The function comment is not valid now. Would be nice to mention that this populates the keys of intermediate_delete_rows_ and adds empty vectors to the values. http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@124 PS2, Line 124: using IntermediateDeleteRowHashTable = Can you add a comment what the keys and values are for in this map? http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@146 PS2, Line 146: IntermediateDeleteRowVector& : intermediate_vector nit: seems a weird place to break the line http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@184 PS2, Line 184: retval.first->second. To increase readability, do you think it'd make sense to introduce a variable/reference for 'retval.first->second'? http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@259 PS2, Line 259: std::unique nit: I'd break the line before the first param, and then each param could fit into a (separate) single line. http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@325 PS2, Line 325: ErrorMsg(TErrorCode::GENERAL, "NULL found as file_path in delete file")); KrpcDataStreamSender already filters out NULL file_paths. Is there a need to add an error message here too? Can't we DCHECK for non-nullness? Or is this because of some data corruption scenario? -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 May 2024 14:58:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16211/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 May 2024 13:09:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#2). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf (cherry picked from commit d08315fe5c57ccb5b197cd196b62eeedf7d90ec3) --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 101 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/2 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 1: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10655/ -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 May 2024 19:54:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has removed a vote on this change. Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10655/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 May 2024 14:53:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 1: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10646/ -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 16 May 2024 21:28:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10646/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 16 May 2024 16:25:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21435 Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 101 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/1 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy