Sounds good, thanks ! I wanted to make sure we handle this case (since independently each of primary/replica can hard split) ... rest looks good to me !
Regards, Mridul On Mon, Nov 4, 2024 at 10:42 AM Erik fang <[email protected]> wrote: > Hi Mirdul, > > I suppose the correct way should be union of primary split and replica > split > > Given PushMergedData RPC of {partition1 -> data, partition2 -> data, > partition3 -> data}, and primary returns hard split for partition 1 while > replica returns hard split for partition3 > the union of primary/replica response as {partition1 - hard split, > partition3 - hard split} should be handled by shuffle client > > Regards, > Erik > > On Sat, Nov 2, 2024 at 3:18 PM Mridul Muralidharan <[email protected]> > wrote: > > > Hi, > > > > How would we handle the case of primary can be written to but replica > is > > triggering a split ? > > > > Thanks, > > Mridul > > > > > > On Wed, Oct 30, 2024 at 4:20 AM 王馨雨 <[email protected]> wrote: > > > > > > > > > > > > > > We have discussed the proposal offline with some members in the > community > > > and the final version is > > > > > > https://docs.google.com/document/d/1Jaix22vME0m1Q-JtTHF9WYsrsxBWBwzwmifcPxNQZHk/edit?tab=t.0#heading=h.vvog8f9qc8je > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2024-09-23 09:26:18, "rexxiong" <[email protected]> wrote: > > > >Thanks Xinyu for the proposal! It's a great enhancement for the > current > > > >Split mechanism. The design LGTM overall. There are some additional > > > >considerations: > > > >1) We need to ensure client compatibility, so it's best to reuse > > > >RpcResponse, or add a new protocol to ensure the old behavior remains > > > >compatible. > > > >2) We could consider first implementing single-replica support for > Split > > > >and then extending it to dual replicas to reduce implementation > > > complexity. > > > > > > > >Thanks, > > > >Jiashu Xiong > > > > > > > >Keyong Zhou <[email protected]> 于2024年9月22日周日 11:15写道: > > > > > > > >> Thanks Xinyu for the proposal! Glad to see PushMergeData will > support > > > >> splitting finally :) > > > >> > > > >> The design LGTM overall, still I have some comments: > > > >> > > > >> 1. We can also support soft split as well > > > >> > > > >> 2. Typically there are more than one disks, so it's better to only > > > splits > > > >> partitions > > > >> belonging to full disks instead of splitting all partitions > > > >> > > > >> 3. We can group partitions by disks to increase efficiency when > > checking > > > >> whether disk is full for each partition > > > >> > > > >> 4. For FILE_ALREADY_CLOSED I think we shouldn't decrease > > > remainReviveTimes > > > >> either, it's not an exception or error > > > >> > > > >> Regards, > > > >> Keyong Zhou > > > >> > > > >> > > > >> > > > >> Fu Chen <[email protected]> 于2024年9月21日周六 18:03写道: > > > >> > > > >> > Thanks Xinyu for the proposal. Adding HARD_SPLIT support for > > > >> > PushMergeData is valuable for production. We've encountered issues > > > >> > with small disk nodes getting overloaded in heterogeneous > clusters. > > > >> > > > > >> > I had a discussion with @rexxiong, the current implementation > > requires > > > >> > introducing PUSH_MERGED_DATA_RESPONSE, which increases the > > complexity > > > >> > of modifications. We could consider reusing the RpcResponse > > > >> > > > > >> > Thanks, > > > >> > Fu Chen > > > >> > > > > >> > 王馨雨 <[email protected]> 于2024年9月20日周五 10:35写道: > > > >> > > > > > >> > > Hi all, > > > >> > > > > > >> > > I've written up a proposal for supporting HARD_SPLIT in > Celeborn. > > > You > > > >> > can find > > > >> > > the proposal here > > > >> > > < > > > >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/CELEBORN/CIP-12+Support+HARD_SPLIT+in+PushMergedData > > > >> > >. > > > >> > > Please let me know if you have any comments or questions.Unlike > > > >> > PushData, Celeborn won’t actively trigger HARD_SPLIT in > > PushMergedData > > > >> > unless there are one or more partitions which have been split in > the > > > >> > partition group of PushMergedData. > > > >> > > This leads to several problems: > > > >> > > Cascading HARD_SPLIT in PushMergedData will be too wasted > because > > > most > > > >> > partitions may not reach the HARD_SPLIT threshold.Worker pressure > > > cannot > > > >> be > > > >> > transferred if the partitions won’t be split.ReverveSize won’t > take > > > >> > effect.Supporting HARD_SPLIT in PushMergedData will solve the > above > > > >> > problems which will only split the partitions that need to be > split. > > > >> > > > > > >> > > Thanks, > > > >> > > Xinyu Wang > > > >> > > > > > >> > > > > >> > > > > > >
