waitinfuture commented on PR #990:
URL: 
https://github.com/apache/incubator-celeborn/pull/990#issuecomment-1328193735

   In this pr, multiple commitFiles requests among hard-split partitions are 
guaranteed non-overlap, which means no two requests commit the same 
PartitionLocation, guarded by sync on ShuffleCommittedInfo. But commitFiles 
requests between handleStageEnd and hard-split are not guaranteed, and two 
requests can commit the same PartitionLocation, which is error-prone.
   So we should wait for all hard-split commitFiles request finish before 
trigger commitFiles in handleStageEnd.
   
   Another issue is that we need a better policy to handle multiple commitFiles 
for a single shuffleId in server side, keep consistent with retryCommitFiles. I 
think we can give a unique epoch for each commitFiles request, ensuring that no 
overlap among any two epochs, and retryCommitFiles only impacts its epoch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to