hudi-agent commented on code in PR #19052:
URL: https://github.com/apache/hudi/pull/19052#discussion_r3464383216


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieArchivalConfig.java:
##########
@@ -88,6 +88,15 @@ public class HoodieArchivalConfig extends HoodieConfig {
       .withDocumentation("Archiving of instants is batched in best-effort 
manner, to pack more instants into a single"
           + " archive log. This config controls such archival batch size.");
 
+  public static final ConfigProperty<Integer> 
MIGRATION_COMMITS_ARCHIVAL_BATCH_SIZE = ConfigProperty
+      .key("hoodie.timeline.migration.commits.archival.batch")
+      .defaultValue(500)

Review Comment:
   šŸ¤– Re the earlier thread: getActiveActionsIterator returns 
ActiveActionWithDetails, which eagerly retains each instant serialized details, 
so buffering batchSize of these does hold the metadata bytes in memory for the 
whole batch (not just instant refs). With a 500 default, wide tables with large 
per-commit metadata could see very high peak driver heap during the one-time 
upgrade. Would a more moderate default such as 100 be safer?
   
   <sub><i>āš ļø AI-generated; verify before applying. React šŸ‘/šŸ‘Ž to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to