pltbkd opened a new pull request, #3641: URL: https://github.com/apache/celeborn/pull/3641
### What changes were proposed in this pull request? NOTE: This is the same patch with #3637 pushing to main branch. Because some code has been refactored, the original patch can not be simply cherry-picked. Handle the case where numSubpartitions is zero in MapPartitionDataReader.open(). When the partition is empty, treat it as a normal empty partition and notify consumers accordingly. ### Why are the changes needed? When the first PUSH_DATA_HAND_SHAKE request fails (e.g., timeout), client triggers revive with reason HARD_SPLIT. Manager adds the failed partition to partition locations, but numSubpartitions remains uninitialized (zero). Reading such partition causes ArithmeticException: / by zero. Since this is caused by client-side behavior, we handle it on worker side first for cross-version compatibility. The issue that flink shuffle client revives with fixed reason HARD_SPLIT can be addressed in later PRs. ### Does this PR resolve a correctness bug? No. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually tested with a hacked version that throws exception on the first handshake invocation. But the test code is too hacky to included into this PR. Advices are welcomed on how to add a proper unit test for this scenario without introducing too much complexity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
