Sounds good, thanks !
I wanted to make sure we handle this case (since independently each of
primary/replica can hard split) ... rest looks good to me !

Regards,
Mridul

On Mon, Nov 4, 2024 at 10:42 AM Erik fang <[email protected]> wrote:

> Hi Mirdul,
>
> I suppose the correct way should be union of primary split and replica
> split
>
> Given PushMergedData RPC of {partition1 -> data, partition2 -> data,
> partition3 -> data},  and primary returns hard split for partition 1 while
> replica returns hard split for partition3
> the union of primary/replica response as {partition1 - hard split,
> partition3 - hard split} should be handled by shuffle client
>
> Regards,
> Erik
>
> On Sat, Nov 2, 2024 at 3:18 PM Mridul Muralidharan <[email protected]>
> wrote:
>
> > Hi,
> >
> >   How would we handle the case of primary can be written to but replica
> is
> > triggering a split ?
> >
> > Thanks,
> > Mridul
> >
> >
> > On Wed, Oct 30, 2024 at 4:20 AM 王馨雨 <[email protected]> wrote:
> >
> > >
> > >
> > >
> > > We have discussed the proposal offline with some members in the
> community
> > > and the final version is
> > >
> >
> https://docs.google.com/document/d/1Jaix22vME0m1Q-JtTHF9WYsrsxBWBwzwmifcPxNQZHk/edit?tab=t.0#heading=h.vvog8f9qc8je
> > > .
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2024-09-23 09:26:18, "rexxiong" <[email protected]> wrote:
> > > >Thanks Xinyu for the proposal!  It's a great enhancement for the
> current
> > > >Split mechanism. The design LGTM overall. There are some additional
> > > >considerations:
> > > >1) We need to ensure client compatibility, so it's best to reuse
> > > >RpcResponse, or add a new protocol to ensure the old behavior remains
> > > >compatible.
> > > >2) We could consider first implementing single-replica support for
> Split
> > > >and then extending it to dual replicas to reduce implementation
> > > complexity.
> > > >
> > > >Thanks,
> > > >Jiashu Xiong
> > > >
> > > >Keyong Zhou <[email protected]> 于2024年9月22日周日 11:15写道:
> > > >
> > > >> Thanks Xinyu for the proposal! Glad to see PushMergeData will
> support
> > > >> splitting finally :)
> > > >>
> > > >> The design LGTM overall, still I have some comments:
> > > >>
> > > >> 1. We can also support soft split as well
> > > >>
> > > >> 2. Typically there are more than one disks, so it's better to only
> > > splits
> > > >> partitions
> > > >>     belonging to full disks instead of splitting all partitions
> > > >>
> > > >> 3. We can group partitions by disks to increase efficiency when
> > checking
> > > >>     whether disk is full for each partition
> > > >>
> > > >> 4. For FILE_ALREADY_CLOSED I think we shouldn't decrease
> > > remainReviveTimes
> > > >>     either, it's not an exception or error
> > > >>
> > > >> Regards,
> > > >> Keyong Zhou
> > > >>
> > > >>
> > > >>
> > > >> Fu Chen <[email protected]> 于2024年9月21日周六 18:03写道:
> > > >>
> > > >> > Thanks Xinyu for the proposal. Adding HARD_SPLIT support for
> > > >> > PushMergeData is valuable for production. We've encountered issues
> > > >> > with small disk nodes getting overloaded in heterogeneous
> clusters.
> > > >> >
> > > >> > I had a discussion with @rexxiong, the current implementation
> > requires
> > > >> > introducing PUSH_MERGED_DATA_RESPONSE, which increases the
> > complexity
> > > >> > of modifications. We could consider reusing the RpcResponse
> > > >> >
> > > >> > Thanks,
> > > >> > Fu Chen
> > > >> >
> > > >> > 王馨雨 <[email protected]> 于2024年9月20日周五 10:35写道:
> > > >> > >
> > > >> > > Hi all,
> > > >> > >
> > > >> > > I've written up a proposal for supporting HARD_SPLIT in
> Celeborn.
> > > You
> > > >> > can find
> > > >> > > the proposal here
> > > >> > > <
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/CELEBORN/CIP-12+Support+HARD_SPLIT+in+PushMergedData
> > > >> > >.
> > > >> > > Please let me know if you have any comments or questions.Unlike
> > > >> > PushData, Celeborn won’t actively trigger HARD_SPLIT in
> > PushMergedData
> > > >> > unless there are one or more partitions which have been split in
> the
> > > >> > partition group of PushMergedData.
> > > >> > > This leads to several problems:
> > > >> > > Cascading HARD_SPLIT in PushMergedData will be too wasted
> because
> > > most
> > > >> > partitions may not reach the HARD_SPLIT threshold.Worker pressure
> > > cannot
> > > >> be
> > > >> > transferred if the partitions won’t be split.ReverveSize won’t
> take
> > > >> > effect.Supporting HARD_SPLIT in PushMergedData will solve the
> above
> > > >> > problems which will only split the partitions that need to be
> split.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Xinyu Wang
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>

Reply via email to