yihua opened a new pull request, #7561:
URL: https://github.com/apache/hudi/pull/7561

   ### Change Logs
   
   Before this change, the Hudi archived timeline is always loaded during the 
metastore sync process if the last sync time is given. Besides, the archived 
timeline is not cached inside the meta client if the start instant time is 
given. These cause performance issues and read timeout on cloud storage due to 
rate limiting on requests because of loading archived timeline from the 
storage, when the archived timeline is huge, e.g., hundreds of log files in 
`.hoodie/archived` folder.
   
   This PR improves the timeline loading by
   (1) only reading active timeline if the last sync time is the same as or 
after the start of the active timeline;
   (2) caching the archived timeline based on the start instant time in the 
meta client, to avoid unnecessary repeated loading of the same archived 
timeline.
   
   ### Impact
   
   This PR improves the performance of metastore sync.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to