Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21452 )
Change subject: IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder ...................................................................... Patch Set 6: (6 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@266 PS5, Line 266: int probes_waiting = NumProbesWaiting(); : : // Let's be conservative and only use 2/3 of available theads, because the scanner : // threads are not as CPU heavy as the sort threads. : r > Do you think it would be worth extracting this to a member function in Join Yeah, I think it's nice to extract it in a function. I didn't make 'probes_waiting_on_builder_' private though, as currently it is in a nice /// BEGIN /// END section of members that I did not want to mess up. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@275 PS5, Line 275: for (auto& [path, intermediate_vec] : intermediate_delete_rows_) { > Nit: we usually don't match the indentation of arguments but use 4 spaces f For formulas I prefer readability over the 4 spaces rule. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@279 PS5, Line 279: > Using the actual type would be more readable. Applies also to L322 and L324 Actual type is a pair, I switched to structured binding. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@296 PS5, Line 296: TURN_IF_ERROR(pool.Init()); : for (auto& [path, intermediate_vec] : intermediate_delete_rows_) { : BuildWorkItem work_item; : work_item.path = &path; > This is the same as the loop body of FinalizeBuildImpl() with these excepti Thanks for the suggestions. About 1: I'd have to do something like: if (SHOULD_LOCK) separate_build_lock_.lock() deleted_rows_[*path] = std::move(final_delete_vec); if (SHOULD_LOCK) separate_build_lock_.unlock(); I don't really like such constructs as we lose RAII. To keep RAII, we could do: std::unique_lock<std::mutex> guard(separate_build_lock_, std::defer_lock_t{}); if (SHOULD_LOCK) { guard.lock(); } deleted_rows_[*path] = std::move(final_delete_vec); But it feels a bit too complex, also we are creating an unnecessary guard object when SHOULD_LOCK is false (though the compiler might be smart enough to eliminate). So for now I'd leave it in its current form. I'd also keep BuildWorkItem as it makes the code a bit more explicit / readable. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@304 PS5, Line 304: medi > Using the actual type would be more readable. Switched to structured binding. http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@319 PS5, Line 319: > Because 'intermediate_vec' is a const ref, this will not actually move the Nice catch! -- To view, visit http://gerrit.cloudera.org:8080/21452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 Gerrit-Change-Number: 21452 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 17 Jun 2024 16:39:20 +0000 Gerrit-HasComments: Yes