bhavya-ganatra opened a new issue, #18686: URL: https://github.com/apache/hudi/issues/18686
### Task Description **What needs to be done:** This issue is based on a discussion in the Hudi Slack channel regarding performance degradation with a large active timeline: https://apache-hudi.slack.com/archives/C4D716NPQ/p1774948526496729 Propose and/or implement a solution to decouple savepoints from timeline archival (e.g., enable archival without losing restore capability). Additionally, update documentation to clearly state the impact of `hoodie.archive.beyond.savepoint` on savepoint restore behaviour. **Why this task is needed:** We are running a streaming pipeline writing to multiple Hudi MOR tables with: - Async compaction and cleaner - Commit frequency: every 5 minutes - Savepoints retained for 7 days (1 per 24 hours) Savepoints are required for our backup/recovery strategy and cannot be reduced. However, savepoints block archival of commits in the timeline, leading to continuous timeline growth and noticeable performance degradation in both reads and writes. Currently, the config `hoodie.archive.beyond.savepoint` allows archival beyond savepoints, but at the cost of losing savepoint restore capability (i.e., savepoints become non-recoverable): -> https://github.com/apache/hudi/pull/6239 Hence, to resolve this, we need decoupling of savepoint from the timeline archival process, so that we can have "Restore capability" without having significant Performance degradation. JFI: This task request was already part of this Jira: https://issues.apache.org/jira/browse/HUDI-4500. But since, Hudi is moved to Github Issues, I am creating this. ### Task Type Performance optimization ### Related Issues **Parent feature issue:** https://issues.apache.org/jira/browse/HUDI-4500 **Related issues:** https://issues.apache.org/jira/browse/HUDI-4501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
