Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21452 )

Change subject: IMPALA-13088: (part 2) Parallelize final sorts in 
IcebergDeleteBuilder
......................................................................


Patch Set 6:

(6 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@266
PS5, Line 266: int probes_waiting = NumProbesWaiting();
             :
             :   // Let's be conservative and only use 2/3 of available theads, 
because the scanner
             :   // threads are not as CPU heavy as the sort threads.
             :   r
> Do you think it would be worth extracting this to a member function in Join
Yeah, I think it's nice to extract it in a function. I didn't make 
'probes_waiting_on_builder_' private though, as currently it is in a nice /// 
BEGIN /// END section of members that I did not want to mess up.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@275
PS5, Line 275:   for (auto& [path, intermediate_vec] : 
intermediate_delete_rows_) {
> Nit: we usually don't match the indentation of arguments but use 4 spaces f
For formulas I prefer readability over the 4 spaces rule.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@279
PS5, Line 279:
> Using the actual type would be more readable. Applies also to L322 and L324
Actual type is a pair, I switched to structured binding.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@296
PS5, Line 296: TURN_IF_ERROR(pool.Init());
             :   for (auto& [path, intermediate_vec] : 
intermediate_delete_rows_) {
             :     BuildWorkItem work_item;
             :     work_item.path = &path;
> This is the same as the loop body of FinalizeBuildImpl() with these excepti
Thanks for the suggestions.

About 1:
I'd have to do something like:

  if (SHOULD_LOCK) separate_build_lock_.lock()

  deleted_rows_[*path] = std::move(final_delete_vec);

  if (SHOULD_LOCK) separate_build_lock_.unlock();

I don't really like such constructs as we lose RAII.

To keep RAII, we could do:

 std::unique_lock<std::mutex> guard(separate_build_lock_, std::defer_lock_t{});
 if (SHOULD_LOCK) {
   guard.lock();
 }
 deleted_rows_[*path] = std::move(final_delete_vec);

But it feels a bit too complex, also we are creating an unnecessary guard 
object when SHOULD_LOCK is false (though the compiler might be smart enough to 
eliminate).

So for now I'd leave it in its current form. I'd also keep BuildWorkItem as it 
makes the code a bit more explicit / readable.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@304
PS5, Line 304: medi
> Using the actual type would be more readable.
Switched to structured binding.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@319
PS5, Line 319: 
> Because 'intermediate_vec' is a const ref, this will not actually move the
Nice catch!



--
To view, visit http://gerrit.cloudera.org:8080/21452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
Gerrit-Change-Number: 21452
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Jun 2024 16:39:20 +0000
Gerrit-HasComments: Yes

Reply via email to