leixm commented on PR #2975:
URL: https://github.com/apache/celeborn/pull/2975#issuecomment-2519277606

   There are three problems in total
   1. PartitionDataWriter#write flushBuffer.addComponent(true, data) After OOM 
occurs, data is not released.
   2. After OOM occurs in PushDataHandler#writeLocalData 
fileWriter.write(body), fileWriter.decrementPendingWrites() is not called, 
which will cause some fileWriters to wait for a period of time when call 
close().
   3. For the failed shuffle, commit was not called and the PartitionDataWriter 
was not closed, so there was no returnBuffer. The PartitionDataWriter should be 
closed when the shuffle expires.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to