Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )
Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator ...................................................................... Patch Set 10: (34 comments) Thanks, went through the backend, build/probe logic refactor and frontend is still TODO http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/hash-table.h File be/src/exec/hash-table.h: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/hash-table.h@442 PS10, Line 442: friend class IcebergDeleteBuilder; : friend class IcebergDeleteNode; > Are these lines needed? Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@18 PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_BUILDER_H : #define IMPALA_EXEC_ICEBERG_DELETE_BUILDER_H > nit: use #pragma once Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@57 PS10, Line 57: a > nit: an Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@58 PS10, Line 58: /// IcebergDeleteNode. > nit: fits previous line Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@63 PS10, Line 63: a > nit: an Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@110 PS10, Line 110: each probe thread working on the same set of : /// partitions > Obsolete comment, as this operator doesn't partition its inputs. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@120 PS10, Line 120: long long > nit: we use int64_t instead of long long in the Impala code Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@120 PS10, Line 120: h; > nit: missing '_' at and of member name. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@157 PS10, Line 157: hash_partitions_ > There is no 'hash_partitions_' Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@160 PS10, Line 160: that that > nit: multiple 'that' Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@164 PS10, Line 164: Also used by > nit: end of sentence is missing Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@58 PS10, Line 58: using strings::Substitute > nit: common/names.h already has this using. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@95 PS10, Line 95: eq_join_conjuncts > We could add DCHECK that the size of it is 2. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@127 PS10, Line 127: Hash > IcebergPositionDelete? Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@150 PS10, Line 150: Hash > IcebergPositionDelete? Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@206 PS10, Line 206: boost > Can we use std::filesystem? Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h File be/src/exec/iceberg-delete-node.h: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@18 PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_NODE_H : #define IMPALA_EXEC_ICEBERG_DELETE_NODE_H > nit: use #pragma once Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@37 PS10, Line 37: class BloomFilter; > unused Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@39 PS10, Line 39: class IcebergDeleteNode; > not needed here Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@78 PS10, Line 78: t > nit: to early line break here Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@79 PS10, Line 79: build partitions > This solution doesn't use build partitions AFAICT. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@81 PS10, Line 81: phase > nit: 'build phase', or 'this phase' instead of 'the phase' Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@89 PS10, Line 89: /// : /// : /// > nit: too many empty lines Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@104 PS10, Line 104: virtual > nit: 'virtual' keyword is redundant here that is the more common style in the code base, I would keep it that way http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@186 PS10, Line 186: Get the next row batch from the probe (left) side (child(0)) > This operator always gets the next row bratch from the probe side, right? Yes http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@210 PS10, Line 210: /// Additional state for flushing build-side data. Needed for : /// separate build. : /// TODO: IMPALA-9411: this could be removed if we attached the buffers. : bool flushed_unattachable_build_buffers_ = false; > Not relevant for this operator? Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc File be/src/exec/iceberg-delete-node.cc: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@52 PS10, Line 52: using strings::Substitute > Unnecessary because of common/names.h Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@142 PS10, Line 142: > nit: unnecessary empty line Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@345 PS10, Line 345: ( > nit: I see no closing parenthesis. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@359 PS10, Line 359: ; > nit: unnecessary ; Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@392 PS10, Line 392: { : } > empty block Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.inline.h File be/src/exec/iceberg-delete-node.inline.h: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.inline.h@18 PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_NODE_INLINE_H : #define IMPALA_EXEC_ICEBERG_DELETE_NODE_INLINE_H > nit: use #pragma once Done http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/ImpalaService.thrift@795 PS10, Line 795: of > nit: off Done http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/Query.thrift@638 PS10, Line 638: 159: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override > This is good for perf testing for now, but we should limit the number of qu yes, that is the plan -- To view, visit http://gerrit.cloudera.org:8080/19850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa Gerrit-Change-Number: 19850 Gerrit-PatchSet: 10 Gerrit-Owner: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 23 May 2023 10:29:47 +0000 Gerrit-HasComments: Yes