Hi Yuxin, thank you for the explanation. I have no further concerns. Thanks, Jiashu Xiong
Yuxin Tan <[email protected]> 于2024年6月4日周二 17:03写道: > Hi, Jiashu > > Thanks for joining the discussion. > > > My only concern is whether the change for the > Celeborn worker supports a graceful shutdown/decommission. Could you > provide more details on that? > > Since the graceful shutdown/decommission is achieved by changing the > state of the worker and the master, the proposal doesn't change the > logic. It can leverage the existing management framework as before, > so I think the feature will continue to operate as intended. We have also > conducted some simple tests to validate the process, the results also > show that it can work as expected. > > Best, > Yuxin > > > rexxiong <[email protected]> 于2024年6月3日周一 12:18写道: > > > Hi Yunxin, > > Thanks a lot for the CIP, +1 > > > > For me the whole design and implementation appear clearer and have no > > compatibility issues. My only concern is whether the change for the > > Celeborn worker supports a graceful shutdown/decommission. Could you > > provide more details on that? > > > > Thanks, > > Jiashu Xiong > > > > Xintong Song <[email protected]> 于2024年5月29日周三 09:39写道: > > > > > +1 for this proposal. > > > > > > Greetings to the Apache Celeborn community~! Yuxin and I are from the > > > Apache Flink community, and have been working on the shuffle related > > > components for years. We are both excited about making our first > > > contribution to the Apache Celeborn community. > > > > > > Hybrid Shuffle is a new shuffle architecture that the Flink community > has > > > been working on for ~2 years. We are planning to make it the default > (and > > > eventually the only) batch shuffle in the Flink 2.0 release (end of > this > > > year). The architecture is flexible and extensible so that it can > support > > > all the capabilities of existing shuffle modes, while providing new > > > advantages on task scheduling, resource efficiency and usability. To > > > achieve this, we abstract storages (memory, local dist, remote storage > / > > > service) into Tiers, and hide details such as assembling records to > > > buffers, dynamic switching between Tiers and memory management from the > > > Tiers. > > > > > > We believe it is important that Flink and Celeborn can be integrated > > under > > > the new architecture, in addition to the existing integration based on > > the > > > shuffle-service interfaces. > > > > > > Looking forward to your feedback. > > > > > > Best, > > > > > > Xintong > > > > > > > > > > > > On Tue, May 28, 2024 at 8:52 PM Yuxin Tan <[email protected]> > > wrote: > > > > > > > Hi all, > > > > > > > > I would like to start a discussion on CIP-6 Support Flink hybrid > > shuffle > > > > integration with Apache > > > > Celeborn[1]. Celeborn provides a stable, performant, scalable remote > > > > shuffle service. > > > > Concurrently, Flink hybrid shuffle supports transitions between > memory, > > > > disk, and remote > > > > storage to improve performance and job stability. This integration > > > proposal > > > > is to harness the > > > > benefits from both Celeborn and hybrid shuffle simultaneously. > > > > > > > > Note that this proposal has two parts. > > > > 1. The Celeborn-side changes are in CIP-6[1]. > > > > 2. The Flink-side modifications are in FLIP-459[2]. > > > > > > > > Looking forward to everyone's feedback and suggestions. Thank you! > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn > > > > [2] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn > > > > > > > > Best, > > > > Yuxin > > > > > > > > > >
