Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21452 )
Change subject: IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder ...................................................................... Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.h@148 PS5, Line 148: Impl To unambiguously differentiate it from FinalizeBuildImplParallel(), this could be called FinalizeBuildImplSequential(). http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@266 PS5, Line 266: int probes_waiting; : { : unique_lock<mutex> l(separate_build_lock_); : probes_waiting = probes_waiting_on_builder_; : } Do you think it would be worth extracting this to a member function in JoinBuilder? Then 'probes_waiting_on_builder_' could also be private. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@275 PS5, Line 275: min(data_file_count, probes_waiting) * 2 / 3); Nit: we usually don't match the indentation of arguments but use 4 spaces for continuation indentation. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@279 PS5, Line 279: auto Using the actual type would be more readable. Applies also to L322 and L324. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@296 PS5, Line 296: DeleteRowVector final_delete_vec = GetFinalDeleteRowVector(intermediate_vec); : intermediate_vec.clear(); : unique_lock<mutex> l(separate_build_lock_); : deleted_rows_[*path] = std::move(final_delete_vec); This is the same as the loop body of FinalizeBuildImpl() with these exceptions: 1. there is locking here 2. this function uses BuildWorkItem instead of std::pair<const StringValue, DeleteRowVector>. These could be unified in the same function so there would be no need for this lambda: 1. a boolean template parameter would decide whether we need to lock and the function could use "constexpr if" 2. the ThreadPool could also use std::pair<const StringValue, DeleteRowVector> as its template parameter, so BuildWorkItem would no longer be needed. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@304 PS5, Line 304: auto Using the actual type would be more readable. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@319 PS5, Line 319: move Because 'intermediate_vec' is a const ref, this will not actually move the vector but copy it, see https://stackoverflow.com/questions/28595117/why-can-we-use-stdmove-on-a-const-object So I think 'intermediate_vec' should not be const and then 'intermediate_vec.clear()' could be brought here from L297 and L282. -- To view, visit http://gerrit.cloudera.org:8080/21452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 Gerrit-Change-Number: 21452 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 17 Jun 2024 14:26:46 +0000 Gerrit-HasComments: Yes