Thanks Xinyu for the proposal. Adding HARD_SPLIT support for
PushMergeData is valuable for production. We've encountered issues
with small disk nodes getting overloaded in heterogeneous clusters.

I had a discussion with @rexxiong, the current implementation requires
introducing PUSH_MERGED_DATA_RESPONSE, which increases the complexity
of modifications. We could consider reusing the RpcResponse

Thanks,
Fu Chen

王馨雨 <[email protected]> 于2024年9月20日周五 10:35写道:
>
> Hi all,
>
> I've written up a proposal for supporting HARD_SPLIT in Celeborn. You can find
> the proposal here
> <https://cwiki.apache.org/confluence/display/CELEBORN/CIP-12+Support+HARD_SPLIT+in+PushMergedData>.
> Please let me know if you have any comments or questions.Unlike PushData, 
> Celeborn won’t actively trigger HARD_SPLIT in PushMergedData unless there are 
> one or more partitions which have been split in the partition group of 
> PushMergedData.
> This leads to several problems:
> Cascading HARD_SPLIT in PushMergedData will be too wasted because most 
> partitions may not reach the HARD_SPLIT threshold.Worker pressure cannot be 
> transferred if the partitions won’t be split.ReverveSize won’t take 
> effect.Supporting HARD_SPLIT in PushMergedData will solve the above problems 
> which will only split the partitions that need to be split.
>
> Thanks,
> Xinyu Wang
>

Reply via email to