Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21452 )

Change subject: IMPALA-13088: (part 2) Parallelize final sorts in 
IcebergDeleteBuilder
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.h@148
PS5, Line 148: Impl
To unambiguously differentiate it from FinalizeBuildImplParallel(), this could 
be called FinalizeBuildImplSequential().


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@266
PS5, Line 266: int probes_waiting;
             :   {
             :     unique_lock<mutex> l(separate_build_lock_);
             :     probes_waiting = probes_waiting_on_builder_;
             :   }
Do you think it would be worth extracting this to a member function in 
JoinBuilder? Then 'probes_waiting_on_builder_' could also be private.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@275
PS5, Line 275:              min(data_file_count, probes_waiting) * 2 / 3);
Nit: we usually don't match the indentation of arguments but use 4 spaces for 
continuation indentation.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@279
PS5, Line 279: auto
Using the actual type would be more readable. Applies also to L322 and L324.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@296
PS5, Line 296: DeleteRowVector final_delete_vec = 
GetFinalDeleteRowVector(intermediate_vec);
             :     intermediate_vec.clear();
             :     unique_lock<mutex> l(separate_build_lock_);
             :     deleted_rows_[*path] = std::move(final_delete_vec);
This is the same as the loop body of FinalizeBuildImpl() with these exceptions:
1. there is locking here
2. this function uses BuildWorkItem instead of std::pair<const StringValue, 
DeleteRowVector>.

These could be unified in the same function so there would be no need for this 
lambda:
1. a boolean template parameter would decide  whether we need to lock and the 
function could use "constexpr if"
2. the ThreadPool could also use std::pair<const StringValue, DeleteRowVector> 
as its template parameter, so BuildWorkItem would no longer be needed.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@304
PS5, Line 304: auto
Using the actual type would be more readable.


http://gerrit.cloudera.org:8080/#/c/21452/5/be/src/exec/iceberg-delete-builder.cc@319
PS5, Line 319: move
Because 'intermediate_vec' is a const ref, this will not actually move the 
vector but copy it, see
https://stackoverflow.com/questions/28595117/why-can-we-use-stdmove-on-a-const-object

So I think 'intermediate_vec' should not be const and then 
'intermediate_vec.clear()' could be brought here from L297 and L282.



--
To view, visit http://gerrit.cloudera.org:8080/21452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
Gerrit-Change-Number: 21452
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Jun 2024 14:26:46 +0000
Gerrit-HasComments: Yes

Reply via email to