pltbkd opened a new pull request, #3641:
URL: https://github.com/apache/celeborn/pull/3641

   ### What changes were proposed in this pull request?
   
   NOTE: This is the same patch with #3637 pushing to main branch. Because some 
code has been refactored, the original patch can not be simply cherry-picked.
   
   Handle the case where numSubpartitions is zero in 
MapPartitionDataReader.open(). When the partition is empty, treat it as a 
normal empty partition and notify consumers accordingly.
   
   ### Why are the changes needed?
   
   When the first PUSH_DATA_HAND_SHAKE request fails (e.g., timeout), client 
triggers revive with reason HARD_SPLIT. Manager adds the failed partition to 
partition locations, but numSubpartitions remains uninitialized (zero). Reading 
such partition causes ArithmeticException: / by zero.
   Since this is caused by client-side behavior, we handle it on worker side 
first for cross-version compatibility. The issue that flink shuffle client 
revives with fixed reason HARD_SPLIT can be addressed in later PRs.
   
   ### Does this PR resolve a correctness bug?
   
   No.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually tested with a hacked version that throws exception on the first 
handshake invocation. But the test code is too hacky to included into this PR. 
Advices are welcomed on how to add a proper unit test for this scenario without 
introducing too much complexity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to