Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg 
Position Delete operator
......................................................................


Patch Set 20:

(31 comments)

http://gerrit.cloudera.org:8080/#/c/19850/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19850/20//COMMIT_MSG@9
PS20, Line 9: Implemented new exec node to optimize iceberg v2 read performance.
> Please elaborate on how this new operator works.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@50
PS20, Line 50: Since it is expected to only be
             : /// created and used by IcebergDeletePlanNode only, the 
DataSinkConfig::Init()
             : /// and DataSinkConfig::CreateSink() are not implemented for it.
> I see this comment is also present at PhjBuilderConfig, but I'm not sure if
Good catch, skimmed through git history, the comment is obsolate now for PHJ too


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@57
PS20, Line 57: a
> nit: an
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@95
PS20, Line 95: ///
> Please write a short summary about how the builder works.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@100
PS20, Line 100:
> nit: unnecessary empty line.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@158
PS20, Line 158: spilling partitions
> This builder doesn't partition its data and doesn't spill.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@180
PS20, Line 180: const
> Could be 'static const', or 'static constexpr'
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@180
PS20, Line 180: _
> If it will be a static member then we'll not need the last '_'.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@21
PS20, Line 21: #include <iomanip>
> Unused?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@156
PS20, Line 156:         // if(delete_scan_node == nullptr){}
> Code line commented out
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@175
PS20, Line 175:       auto tuple_id_ = 
delete_scan_node->tnode_->hdfs_scan_node.tuple_id;
              :       auto tuple_desc_ = 
runtime_state_->desc_tbl().GetTupleDescriptor(tuple_id_);
              :       DCHECK(tuple_desc_->table_desc() != NULL);
              :       auto hdfs_table_ =
              :           static_cast<const 
HdfsTableDescriptor*>(tuple_desc_->table_desc());
              :       HdfsPartitionDescriptor* partition_desc =
              :           hdfs_table_->GetPartition(split.partition_id());
> These should be the same for all the splits as there's a single HMS partiti
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@184
PS20, Line 184: hdfs_table_->IsIcebergTable()
> This should be a DCHECK I think.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@200
PS20, Line 200:       DCHECK(retval.second == true);
> We wn't hit this in RELEASE build, so the next statement would probably cra
IIRC this is not even an issue, as we can read multiple splits of the same file.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@289
PS20, Line 289: inline bool IcebergDeleteBuilder::AppendRow(
              :     BufferedTupleStream* stream, TupleRow* row, Status* status) 
{
              :   if (LIKELY(stream->AddRow(row, status))) return true;
              :   return false;
              : }
> Unused?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@331
PS20, Line 331: auto it = deleted_rows_.find(*file_path);
> nit: for readability, can we move this out of the if stmt?
Done, maybe in a few years... :)


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@61
PS20, Line 61: s
> Isn't it only one hash table?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@68
PS20, Line 68:
> + 'that'?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@81
PS20, Line 81:
> Redundant empty line
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@88
PS20, Line 88: virtual
> 'virtual' keywords are redundant here
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@106
PS20, Line 106: ,
> nit: .?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@119
PS20, Line 119:
> Unnecessary empty line
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@141
PS20, Line 141: the
> multiple 'the'
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@212
PS20, Line 212: const
> Can be 'static const' or 'constexpr'. Also the last '_' is not needed in th
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@25
PS20, Line 25: #include "codegen/llvm-codegen.h"
> Unused
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@163
PS20, Line 163: }
              : v
> missing empty line
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@166
PS20, Line 166: attached attached
> multiple 'attached'
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@218
PS20, Line 218:   // Putting SCOPED_TIMER in the IR version of 
ProcessProbeBatch() causes weird exception
              :   // handling IR in the xcompiled function, so call it here 
instead.
> This operator doesn't do codegen.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@333
PS20, Line 333: namespace impala {
> This line could be moved to L46 where we have 'using namespace impala'
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@347
PS20, Line 347:   current_probe_pos_ = *probe_pos;
> There could be a fast path when file_path == previous_file_path_
As we discussed. that would needs a full string comparison, which would be 
somewhat faster on the fast path (we spare a hashing and possible (but 
unlikely) collision resolutions), but a costly on the slow path. It still worth 
it if we assume there are bunch of deletes per file


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@354
PS20, Line 354: current_file_path_ != previous_file_path_
> Why do we need both 'current_file_path_' when we also have the parameter 'f
If we use the objects from the unordered map, we can compare the pointers, 
file_path coming from the Tuple


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@380
PS20, Line 380: current_deleted_pos_row_id_
> To me it seems like that the code has the assumption that the file position
Done



--
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 20
Gerrit-Owner: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 28 Jun 2023 13:21:57 +0000
Gerrit-HasComments: Yes

Reply via email to