Re: [PR] perf(spark): Parse bucket index hash-field config once instead of per… [hudi]

via GitHub Fri, 12 Jun 2026 22:39:11 -0700


voonhous commented on code in PR #18979:
URL: https://github.com/apache/hudi/pull/18979#discussion_r3407479318



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkBucketIndexPartitioner.java:
##########
@@ -129,7 +132,7 @@ public int getPartition(Object key) {
     Option<HoodieRecordLocation> location = keyLocation._2;
     int bucketId = location.isPresent()
         ? BucketIdentifier.bucketIdFromFileId(location.get().getFileId())
-        : BucketIdentifier.getBucketId(keyLocation._1.getRecordKey(), 
indexKeyField, numBuckets);
+        : BucketIdentifier.getBucketId(keyLocation._1.getRecordKey(), 
indexKeyFieldList, numBuckets);

Review Comment:
   Good catch -- folded all three Spark sites into this PR: 
`SparkPartitionBucketIndexPartitioner` (the default partition-level 
simple-bucket partitioner), and the two consistent-hashing paths 
`ConsistentBucketIndexBulkInsertPartitionerWithRows` and 
`SingleSparkJobConsistentHashingExecutionStrategy`. Each now precomputes 
`KeyGenUtils.getIndexKeyFields(...)` once and calls the existing `List` 
overload.
   
   I also swept the remaining bucket-index call sites to confirm these were the 
only ones: everything else (`BucketIndexBulkInsertPartitioner`, 
`HoodieBucketIndex` / `HoodieSimpleBucketIndex` / 
`HoodieConsistentBucketIndex`, `SparkConsistentBucketDuplicateUpdateStrategy`, 
and read-side `BucketIndexSupport`) already takes a `List`, so these three were 
the only remaining Spark re-parse sites.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] perf(spark): Parse bucket index hash-field config once instead of per… [hudi]

Reply via email to