hudi-bot opened a new issue, #14925:
URL: https://github.com/apache/hudi/issues/14925

   Make timeline server work with multiple concurrent writers. 
   
   As of now, if an executor is lagging wrt timeline server (timeline server 
refreshes its state for every call if timeline has moved), we throw an 
exception and executor falls back to secondary which will list the file system. 
   
    
   
   Related ticket: https://issues.apache.org/jira/browse/HUDI-2761
   
    
   
   We want to revisit this code and see how can we make timeline server work 
with multi-writer scenario. 
   
    
   
   Few points to consider:
   
   1. Executors should try to call getLatestBaseFilesOnOrBefore() instead of 
getLatestBaseFiles(). Not calls has to be fixed. the ones doing conflict 
resolutions, might have to get the latest snapshot always. 
   
   2. Fix async services to use separate write client in deltastreamer flow
   
   3. Revist every call from executor and set "REFRESH" param on only when 
matters.
   
   4. Sharing embedded timeline server. 
   
   5. Check for any holes. when C100 and C101 concurrently started and C101 
finishes early, if C100 makes getLatestBaseFileOnOrBefore(), do we return base 
files from C101? 
   
    
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-2860
   - Type: Improvement
   - Epic: https://issues.apache.org/jira/browse/HUDI-3248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to