Hi Yuxin,

Thanks for the explanation of above question. IMO, for the implementation of 
Celeborn, more design details need to be provided in CIP rather than FLIP for 
reviewing of community developers. Meanwhile, although some public 
configurations are configured in Flink, for Celeborn Flink Client, they still 
need to be additionally exposed in CIP so that reviewer does not have to spend 
time looking at FLIP. Anyway, I got the answer from your detailed reply. Thanks.

+1 for me. Looking forward to this integration. I would like to consider to use 
this feature after ready.

Regards,
Nicholas Jiang

On 2024/06/11 08:09:06 Yuxin Tan wrote:
> Hi Nicholas,
> 
> Thanks for the valuable feedbacks.
> 
> > 1.  Could you describe in detail what functions the relevant components
> mentioned in Proposed Changes
> 
> These components are only the pluggable implementations of the Celeborn
> tier.
> The details and the mechanisms of switching between tiers are in the
> previous
> FLIP[1]. The Celeborn, as a new tier, is added to hybrid shuffle, sharing
> the
> similarities with existing tiers, such as the Memory tier and Disk tier. In
> this tiered
> storage, agents serve as the entry points of interaction between the
> framework
> and different tiers. For instance, CelebornProducerAgent acts as the entry
> point
> for producers to emit data into the tier. If there are still more similar
> questions
> after referencing that FLIP, please feel free to let me know.
> 
> > 2. Can you briefly introduce how to guarantee compatibility with
> Celeborn’s
> existing features such as partition splitting?
> 
> This integration work is a new way to make Celeborn work with Flink, so the
> compatibility of the old shuffle service mode is not affected. The new
> integration
> will also support the features of the old mode, e.g., the partition split
> will be
> supported by trying to open the stream from the next partition when the
> previous
> partition is read completely. Since these features are all implementation
> details,
> initially I didn't add them in the CIP to keep it focused, simple, and easy
> to
> understand. After the question, I have added some feature details to it.
> 
> > 3. Is there any public configuration of integration with Hybrid Shuffle
> and Flink
> client?
> 
> Yes, there is an added Flink configuration, which is described in the
> FLIP[2].
> 
> 
> > 4. How does the server side guarantee the accuracy and recoverability of
> Segment information?
> 
> Similar to other writing information, the segment info is also added to
> FileInfo.
> and the lock can protect it to guarantee accuracy. The recoverability is
> achieved
> by serialization and deserialization, which is also the same as other
> fields.
> 
> > 5. Should Celeborn wait until FLIP-459 is released before releasing this
> integration? Which Flink version will release FLIP-459?
> 
> Celeborn's integration should wait for FLIP-459 to be released. This is
> because
> the feature relies on both CIP-6 and FLIP-459 to function correctly. If all
> goes well,
> FLIP-459 could be part of Flink's next release, Flink 1.20.
> 
> 
> Hi, Keyong,
> 
> Thanks for the reminder and the interest in the Reduce Partition. After the
> Map
> Partition part is finished, we will continue to work on it as soon as
> possible.
> 
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> 
> Best,
> Yuxin
> 
> 
> Keyong Zhou <[email protected]> 于2024年6月8日周六 13:00写道:
> 
> > Hi Yuxin and Xintong,
> >
> > Really excited to see Flink and Celeborn communities collaborate
> > more on shuffle component! I believe this will inspire more for both sides
> > :)
> >
> > +1 for this proposal, looking forward to see this feature to make progress.
> >
> > Also I'm very interested in integrating Flink Hybrid Shuffle with
> > Celeborn's
> > Reduce Partition as mentioned in the doc in the future, which I believe
> > will
> > benefit more for very large shuffle operators :)
> >
> > Regards,
> > Keyong Zhou
> >
> > Nicholas Jiang <[email protected]> 于2024年6月6日周四 13:25写道:
> >
> > > Hi Yuxin,
> > >
> > > Thanks for driving this CIP about integration with Hybrid Shuffle. I have
> > > some comments on this CIP:
> > >
> > > 1. Could you describe in detail what functions the relevant components
> > > mentioned in Proposed Changes, including CelebornProducerAgent,
> > > CelebornConsumerAgent, CelebornMasterAgent, etc., support? In the design
> > > document, these components are only mentioned and no any details of
> > changes.
> > >
> > > 2. Can you briefly introduce how to guarantee compatibility with
> > > Celeborn’s existing features such as partition splitting? IMO, the
> > > compatibility introduction should be mentioned in Proposed Changes to
> > help
> > > community developers understand.
> > >
> > > 3. There are no changes on public interfaces. Is there any public
> > > configuration of integration with Hybrid Shuffle and Flink client?
> > >
> > > 4. The server side must store Segment information for each subpartition.
> > > How does the server side guarantee the accuracy and recoverability of
> > > Segment information?
> > >
> > > 5. Should Celeborn wait until FLIP-459 is released before releasing this
> > > integration? Which Flink version will release FLIP-459?
> > >
> > > Regards,
> > > Nicholas Jiang
> > >
> > > On 2024/05/28 12:51:32 Yuxin Tan wrote:
> > > > Hi all,
> > > >
> > > > I would like to start a discussion on CIP-6 Support Flink hybrid
> > shuffle
> > > > integration with Apache
> > > > Celeborn[1]. Celeborn provides a stable, performant, scalable remote
> > > > shuffle service.
> > > > Concurrently, Flink hybrid shuffle supports transitions between memory,
> > > > disk, and remote
> > > > storage to improve performance and job stability. This integration
> > > proposal
> > > > is to harness the
> > > > benefits from both Celeborn and hybrid shuffle simultaneously.
> > > >
> > > > Note that this proposal has two parts.
> > > > 1. The Celeborn-side changes are in CIP-6[1].
> > > > 2. The Flink-side modifications are in FLIP-459[2].
> > > >
> > > > Looking forward to everyone's feedback and suggestions. Thank you!
> > > >
> > > > [1]
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> > > > [2]
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> > > >
> > > > Best,
> > > > Yuxin
> > > >
> > >
> >
> 

Reply via email to