Hi,

  How would we handle the case of primary can be written to but replica is
triggering a split ?

Thanks,
Mridul


On Wed, Oct 30, 2024 at 4:20 AM 王馨雨 <[email protected]> wrote:

>
>
>
> We have discussed the proposal offline with some members in the community
> and the final version is
> https://docs.google.com/document/d/1Jaix22vME0m1Q-JtTHF9WYsrsxBWBwzwmifcPxNQZHk/edit?tab=t.0#heading=h.vvog8f9qc8je
> .
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2024-09-23 09:26:18, "rexxiong" <[email protected]> wrote:
> >Thanks Xinyu for the proposal!  It's a great enhancement for the current
> >Split mechanism. The design LGTM overall. There are some additional
> >considerations:
> >1) We need to ensure client compatibility, so it's best to reuse
> >RpcResponse, or add a new protocol to ensure the old behavior remains
> >compatible.
> >2) We could consider first implementing single-replica support for Split
> >and then extending it to dual replicas to reduce implementation
> complexity.
> >
> >Thanks,
> >Jiashu Xiong
> >
> >Keyong Zhou <[email protected]> 于2024年9月22日周日 11:15写道:
> >
> >> Thanks Xinyu for the proposal! Glad to see PushMergeData will support
> >> splitting finally :)
> >>
> >> The design LGTM overall, still I have some comments:
> >>
> >> 1. We can also support soft split as well
> >>
> >> 2. Typically there are more than one disks, so it's better to only
> splits
> >> partitions
> >>     belonging to full disks instead of splitting all partitions
> >>
> >> 3. We can group partitions by disks to increase efficiency when checking
> >>     whether disk is full for each partition
> >>
> >> 4. For FILE_ALREADY_CLOSED I think we shouldn't decrease
> remainReviveTimes
> >>     either, it's not an exception or error
> >>
> >> Regards,
> >> Keyong Zhou
> >>
> >>
> >>
> >> Fu Chen <[email protected]> 于2024年9月21日周六 18:03写道:
> >>
> >> > Thanks Xinyu for the proposal. Adding HARD_SPLIT support for
> >> > PushMergeData is valuable for production. We've encountered issues
> >> > with small disk nodes getting overloaded in heterogeneous clusters.
> >> >
> >> > I had a discussion with @rexxiong, the current implementation requires
> >> > introducing PUSH_MERGED_DATA_RESPONSE, which increases the complexity
> >> > of modifications. We could consider reusing the RpcResponse
> >> >
> >> > Thanks,
> >> > Fu Chen
> >> >
> >> > 王馨雨 <[email protected]> 于2024年9月20日周五 10:35写道:
> >> > >
> >> > > Hi all,
> >> > >
> >> > > I've written up a proposal for supporting HARD_SPLIT in Celeborn.
> You
> >> > can find
> >> > > the proposal here
> >> > > <
> >> >
> >>
> https://cwiki.apache.org/confluence/display/CELEBORN/CIP-12+Support+HARD_SPLIT+in+PushMergedData
> >> > >.
> >> > > Please let me know if you have any comments or questions.Unlike
> >> > PushData, Celeborn won’t actively trigger HARD_SPLIT in PushMergedData
> >> > unless there are one or more partitions which have been split in the
> >> > partition group of PushMergedData.
> >> > > This leads to several problems:
> >> > > Cascading HARD_SPLIT in PushMergedData will be too wasted because
> most
> >> > partitions may not reach the HARD_SPLIT threshold.Worker pressure
> cannot
> >> be
> >> > transferred if the partitions won’t be split.ReverveSize won’t take
> >> > effect.Supporting HARD_SPLIT in PushMergedData will solve the above
> >> > problems which will only split the partitions that need to be split.
> >> > >
> >> > > Thanks,
> >> > > Xinyu Wang
> >> > >
> >> >
> >>
>

Reply via email to