yihua opened a new pull request, #7580:
URL: https://github.com/apache/hudi/pull/7580

   ### Change Logs
   
   Before this PR, the archival for the metadata table uses the earliest 
instant of all actions from the active timeline of the data table.  In the 
archival process, CLEAN and ROLLBACK instants are archived separately apart 
from commits (check HoodieTimelineArchiver#getCleanInstantsToArchive).  Because 
of this, a very old completed CLEAN or ROLLBACK instant in the data table can 
block the archive of the metadata table timeline and causes the active timeline 
of the metadata table to be extremely long, leading to performance issues for 
loading the timeline.
   
   This PR changes the archival in metadata table to not rely on completed 
rollback or clean in data table, by archiving the metadata table's instants 
after the earliest commit (COMMIT, DELTA_COMMIT, and REPLACE_COMMIT only) and 
the earliest inflight instant (all actions) in the data table's active timeline.
   
   The savepoints are seamlessly handled here, i.e., the completed savepoints 
do not affect the archive process in the metadata table.
   
   ### Impact
   
   Makes the active timeline of the metadata table shorter and improves the 
performance of loading the active timeline of the metadata table.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to