Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom 
Iceberg Position Delete operator
......................................................................


Patch Set 10:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/19850/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19850/10//COMMIT_MSG@13
PS10, Line 13: intergrate
> nit: integrate
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@305
PS10, Line 305:     BigIntVal id = 
build_expr_evals_[0]->GetBigIntVal(build_row);
              :     StringVal strval = 
build_expr_evals_[1]->GetStringVal(build_row);
> This is on a critical code path, and we can write very specific code here.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@307
PS10, Line 307:     std::string filename =
              :         std::string(reinterpret_cast<const char*>(strval.ptr), 
strval.len);
> If we had StringVal as key in 'data_files_' and 'delete_hash' we wouldn't n
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@311
PS10, Line 311: delete_hash[std::move(filename)]
> Instead of creating a new vector with 0 capacity at the first insertion, pr
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@223
PS10, Line 223:   bool hashed_ = false;
              :   bool matched_ = false;
> Please add comments for the members
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@225
PS10, Line 225: long long
> nit: int64_t
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@225
PS10, Line 225:   std::vector<long long>* current_delete_id_vec_;
              :   std::vector<long long>::iterator current_delete_id_it_;
              :   impala_udf::BigIntVal current_probe_id_;
              :   uint8_t* current_filename_ = nullptr;
              :   int current_file_size_;
> Could be moved to a struct.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@230
PS10, Line 230: size_t
> nit: int32_t
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@231
PS10, Line 231: size_t
> int32_t
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@234
PS10, Line 234:   std::unordered_set<std::string> files_in_batch_;
              :   std::unordered_set<uint8_t*> str_ptr_in_batch_;
              :   std::unordered_map<std::string, std::vector<long 
long>>::iterator last_it_;
> Could be moved to a struct.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@402
PS10, Line 402:       current_probe_id_ = 
probe_expr_evals_[0]->GetBigIntVal(current_probe_row_);
              :       StringVal strval = 
probe_expr_evals_[1]->GetStringVal(current_probe_row_);
> Instead of getting the values via virtual functions, we could just retrieve
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@408
PS10, Line 408:       if (it == builder_->delete_hash.end()) {
              :         matched_ = false;
> Currently we are only scanning data files that have corresponding delete fi
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@416
PS10, Line 416:
              :
              :         if (!hashed_)
> nit: else
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@48
PS10, Line 48:  * Hash join between left child (outer) and right child (inner). 
One child must be the
             :  * plan generated for a table ref. Typically, that is the right 
child, but due to join
             :  * inversion (for outer/semi/cross joins) it could also be the 
left child.
> Copy-pasted comment, irrelevant to this operator.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@83
PS10, Line 83:       if (!t0.matchesType(t1)) {
             :         // With decimal and char types, the child types do not 
have to match because
             :         // the equality builtin handles it. However, they will 
not hash correctly so
             :         // insert a cast.
             :         boolean bothDecimal = t0.isDecimal() && t1.isDecimal();
             :         boolean bothString = t0.isStringType() && 
t1.isStringType();
             :         if (!bothDecimal && !bothString) {
             :           throw new InternalException("Cannot compare " + 
t0.toSql() + " to " + t1.toSql()
             :               + " in join predicate.");
             :         }
             :         Type compatibleType =
             :             Type.getAssignmentCompatibleType(t0, t1, false, 
analyzer.isDecimalV2());
             :         if (compatibleType.isInvalid()) {
             :           throw new InternalException(
             :               String.format("Unable create a hash join with 
equi-join predicate %s "
             :                       + "because the operands cannot be cast 
without loss of precision. "
             :                       + "Operand types: %s = %s.",
             :                   eqPred.toSql(), t0.toSql(), t1.toSql()));
             :         }
             :         Preconditions.checkState(
             :             compatibleType.isDecimal() || 
compatibleType.isStringType());
             :         try {
             :           if (!t0.equals(compatibleType)) {
             :             eqPred.setChild(0, 
eqPred.getChild(0).castTo(compatibleType));
             :           }
             :           if (!t1.equals(compatibleType)) {
             :             eqPred.setChild(1, 
eqPred.getChild(1).castTo(compatibleType));
             :           }
             :         } catch (AnalysisException e) {
             :           throw new InternalException("Should not happen", e);
             :         }
             :       }
> Unneeded. I think we just need a couple of preconditions that we got the ex
Done



--
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gfurnst...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Jun 2023 12:07:02 +0000
Gerrit-HasComments: Yes

Reply via email to