pltbkd opened a new pull request, #3637:
URL: https://github.com/apache/celeborn/pull/3637
…ls before any data written
What changes were proposed in this pull request?
Handle the case where numSubpartitions is zero in
MapPartitionDataReader.open(). When the partition is empty, treat it as a
normal empty partition and notify consumers accordingly.
Why are the changes needed?
When the first PUSH_DATA_HAND_SHAKE request fails (e.g., timeout),
client triggers revive with reason HARD_SPLIT. Manager adds the failed
partition to partition locations, but numSubpartitions remains uninitialized
(zero). Reading such partition causes ArithmeticException: / by zero.
Since this is caused by client-side behavior, we handle it on worker
side first for cross-version compatibility. The issue that flink shuffle client
revives with fixed reason HARD_SPLIT can be addressed in later PRs.
Does this PR resolve a correctness bug?
No
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually tested with a hacked version that throws exception on the first
handshake invocation. But the test code is too hacky to included into this PR.
Advices are welcomed on how to add a proper unit test for this scenario without
introducing too much complexity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]