wombatu-kun commented on code in PR #18826:
URL: https://github.com/apache/hudi/pull/18826#discussion_r3411486587


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -824,28 +824,54 @@ private Pair<Integer, HoodieData<HoodieRecord>> 
initializeRecordIndexPartition(
         dataMetaClient,
         dataWriteConfig);
 
-    // Initialize the file groups
-    final int fileGroupCount = estimateFileGroupCount(records);
+    Pair<Integer, Integer> bounds = getRLIFileGroupCountBounds();

Review Comment:
   Verified: line 846 was the only persist() in this path, so after the change 
recordIndexRecords is never cached and recordIndexRecords.unpersist() at line 
774 is now dead. With validation enabled, validateRecordIndex (line 885, 
recordIndexRecords.count()) recomputes the full key-reading pipeline that the 
bulkCommit at line 768 already materialized once. The guard option above is the 
targeted fix: persist right after readRecordKeysFromFileSliceSnapshot only when 
isRecordIndexInitializationValidationEnabled() is set, keep the unpersist, and 
the common no-validation path keeps the no-persist win.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to