Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )
Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator ...................................................................... Patch Set 10: (15 comments) http://gerrit.cloudera.org:8080/#/c/19850/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19850/10//COMMIT_MSG@13 PS10, Line 13: intergrate > nit: integrate Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@305 PS10, Line 305: BigIntVal id = build_expr_evals_[0]->GetBigIntVal(build_row); : StringVal strval = build_expr_evals_[1]->GetStringVal(build_row); > This is on a critical code path, and we can write very specific code here. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@307 PS10, Line 307: std::string filename = : std::string(reinterpret_cast<const char*>(strval.ptr), strval.len); > If we had StringVal as key in 'data_files_' and 'delete_hash' we wouldn't n Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@311 PS10, Line 311: delete_hash[std::move(filename)] > Instead of creating a new vector with 0 capacity at the first insertion, pr Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h File be/src/exec/iceberg-delete-node.h: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@223 PS10, Line 223: bool hashed_ = false; : bool matched_ = false; > Please add comments for the members Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@225 PS10, Line 225: long long > nit: int64_t Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@225 PS10, Line 225: std::vector<long long>* current_delete_id_vec_; : std::vector<long long>::iterator current_delete_id_it_; : impala_udf::BigIntVal current_probe_id_; : uint8_t* current_filename_ = nullptr; : int current_file_size_; > Could be moved to a struct. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@230 PS10, Line 230: size_t > nit: int32_t Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@231 PS10, Line 231: size_t > int32_t Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@234 PS10, Line 234: std::unordered_set<std::string> files_in_batch_; : std::unordered_set<uint8_t*> str_ptr_in_batch_; : std::unordered_map<std::string, std::vector<long long>>::iterator last_it_; > Could be moved to a struct. Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc File be/src/exec/iceberg-delete-node.cc: http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@402 PS10, Line 402: current_probe_id_ = probe_expr_evals_[0]->GetBigIntVal(current_probe_row_); : StringVal strval = probe_expr_evals_[1]->GetStringVal(current_probe_row_); > Instead of getting the values via virtual functions, we could just retrieve Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@408 PS10, Line 408: if (it == builder_->delete_hash.end()) { : matched_ = false; > Currently we are only scanning data files that have corresponding delete fi Done http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@416 PS10, Line 416: : : if (!hashed_) > nit: else Done http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java: http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@48 PS10, Line 48: * Hash join between left child (outer) and right child (inner). One child must be the : * plan generated for a table ref. Typically, that is the right child, but due to join : * inversion (for outer/semi/cross joins) it could also be the left child. > Copy-pasted comment, irrelevant to this operator. Done http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@83 PS10, Line 83: if (!t0.matchesType(t1)) { : // With decimal and char types, the child types do not have to match because : // the equality builtin handles it. However, they will not hash correctly so : // insert a cast. : boolean bothDecimal = t0.isDecimal() && t1.isDecimal(); : boolean bothString = t0.isStringType() && t1.isStringType(); : if (!bothDecimal && !bothString) { : throw new InternalException("Cannot compare " + t0.toSql() + " to " + t1.toSql() : + " in join predicate."); : } : Type compatibleType = : Type.getAssignmentCompatibleType(t0, t1, false, analyzer.isDecimalV2()); : if (compatibleType.isInvalid()) { : throw new InternalException( : String.format("Unable create a hash join with equi-join predicate %s " : + "because the operands cannot be cast without loss of precision. " : + "Operand types: %s = %s.", : eqPred.toSql(), t0.toSql(), t1.toSql())); : } : Preconditions.checkState( : compatibleType.isDecimal() || compatibleType.isStringType()); : try { : if (!t0.equals(compatibleType)) { : eqPred.setChild(0, eqPred.getChild(0).castTo(compatibleType)); : } : if (!t1.equals(compatibleType)) { : eqPred.setChild(1, eqPred.getChild(1).castTo(compatibleType)); : } : } catch (AnalysisException e) { : throw new InternalException("Should not happen", e); : } : } > Unneeded. I think we just need a couple of preconditions that we got the ex Done -- To view, visit http://gerrit.cloudera.org:8080/19850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa Gerrit-Change-Number: 19850 Gerrit-PatchSet: 10 Gerrit-Owner: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 15 Jun 2023 12:07:02 +0000 Gerrit-HasComments: Yes