Hi Yuxin, thank you for the explanation. I have no further concerns.

Thanks,
Jiashu Xiong

Yuxin Tan <[email protected]> 于2024年6月4日周二 17:03写道:

> Hi, Jiashu
>
> Thanks for joining the discussion.
>
> > My only concern is whether the change for the
> Celeborn worker supports a graceful shutdown/decommission. Could you
> provide more details on that?
>
> Since the graceful shutdown/decommission is achieved by changing the
> state of the worker and the master, the proposal doesn't change the
> logic. It can leverage the existing management framework as before,
> so I think the feature will continue to operate as intended. We have also
> conducted some simple tests to validate the process, the results also
> show that it can work as expected.
>
> Best,
> Yuxin
>
>
> rexxiong <[email protected]> 于2024年6月3日周一 12:18写道:
>
> > Hi Yunxin,
> > Thanks a lot for the CIP, +1
> >
> > For me the whole design and implementation appear clearer and have no
> > compatibility issues. My only concern is whether the change for the
> > Celeborn worker supports a graceful shutdown/decommission. Could you
> > provide more details on that?
> >
> > Thanks,
> > Jiashu Xiong
> >
> > Xintong Song <[email protected]> 于2024年5月29日周三 09:39写道:
> >
> > > +1 for this proposal.
> > >
> > > Greetings to the Apache Celeborn community~! Yuxin and I are from the
> > > Apache Flink community, and have been working on the shuffle related
> > > components for years. We are both excited about making our first
> > > contribution to the Apache Celeborn community.
> > >
> > > Hybrid Shuffle is a new shuffle architecture that the Flink community
> has
> > > been working on for ~2 years. We are planning to make it the default
> (and
> > > eventually the only) batch shuffle in the Flink 2.0 release (end of
> this
> > > year). The architecture is flexible and extensible so that it can
> support
> > > all the capabilities of existing shuffle modes, while providing new
> > > advantages on task scheduling, resource efficiency and usability. To
> > > achieve this, we abstract storages (memory, local dist, remote storage
> /
> > > service) into Tiers, and hide details such as assembling records to
> > > buffers, dynamic switching between Tiers and memory management from the
> > > Tiers.
> > >
> > > We believe it is important that Flink and Celeborn can be integrated
> > under
> > > the new architecture, in addition to the existing integration based on
> > the
> > > shuffle-service interfaces.
> > >
> > > Looking forward to your feedback.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Tue, May 28, 2024 at 8:52 PM Yuxin Tan <[email protected]>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to start a discussion on CIP-6 Support Flink hybrid
> > shuffle
> > > > integration with Apache
> > > > Celeborn[1]. Celeborn provides a stable, performant, scalable remote
> > > > shuffle service.
> > > > Concurrently, Flink hybrid shuffle supports transitions between
> memory,
> > > > disk, and remote
> > > > storage to improve performance and job stability. This integration
> > > proposal
> > > > is to harness the
> > > > benefits from both Celeborn and hybrid shuffle simultaneously.
> > > >
> > > > Note that this proposal has two parts.
> > > > 1. The Celeborn-side changes are in CIP-6[1].
> > > > 2. The Flink-side modifications are in FLIP-459[2].
> > > >
> > > > Looking forward to everyone's feedback and suggestions. Thank you!
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> > > > [2]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> > > >
> > > > Best,
> > > > Yuxin
> > > >
> > >
> >
>

Reply via email to