Hi,

I would like to make a few minor suggestions regarding the naming of certain 
variables to enhance clarity:

1. I recommend changing all instances of partitionId to partitionIndex, as the 
dataBatches in pushMergedData are organized based on index rather than 
partitionId.

2. I suggest renaming failedPartitionIds to unsuccessfulPartitionIndexes, since 
SOFT_SPLIT is not a kind of write failure.

Regards,
Yanze

 On 2024/10/30 09:19:42 王馨雨 wrote: > > > > We have discussed the proposal 
offline with some members in the community and the final version is 
https://docs.google.com/document/d/1Jaix22vME0m1Q-JtTHF9WYsrsxBWBwzwmifcPxNQZHk/edit?tab=t.0#heading=h.vvog8f9qc8je.
 > > > > > > > > > > > > > > > At 2024-09-23 09:26:18, "rexxiong"  wrote: > 
>Thanks Xinyu for the proposal! It's a great enhancement for the current > 
>Split mechanism. The design LGTM overall. There are some additional > 
>considerations: > >1) We need to ensure client compatibility, so it's best to 
reuse > >RpcResponse, or add a new protocol to ensure the old behavior remains 
> >compatible. > >2) We could consider first implementing single-replica 
support for Split > >and then extending it to dual replicas to reduce 
implementation complexity. > > > >Thanks, > >Jiashu Xiong > > > >Keyong Zhou  
于2024年9月22日周日 11:15写道: > > > >> Thanks Xinyu for the proposal! Glad to see 
PushMergeData will support > >> splitting finally :) > >> > >> The design LGTM 
overall, still I have some comments: > >> > >> 1. We can also support soft 
split as well > >> > >> 2. Typically there are more than one disks, so it's 
better to only splits > >> partitions > >> belonging to full disks instead of 
splitting all partitions > >> > >> 3. We can group partitions by disks to 
increase efficiency when checking > >> whether disk is full for each partition 
> >> > >> 4. For FILE_ALREADY_CLOSED I think we shouldn't decrease 
remainReviveTimes > >> either, it's not an exception or error > >> > >> 
Regards, > >> Keyong Zhou > >> > >> > >> > >> Fu Chen  于2024年9月21日周六 18:03写道: > 
>> > >> > Thanks Xinyu for the proposal. Adding HARD_SPLIT support for > >> > 
PushMergeData is valuable for production. We've encountered issues > >> > with 
small disk nodes getting overloaded in heterogeneous clusters. > >> > > >> > I 
had a discussion with @rexxiong, the current implementation requires > >> > 
introducing PUSH_MERGED_DATA_RESPONSE, which increases the complexity > >> > of 
modifications. We could consider reusing the RpcResponse > >> > > >> > Thanks, 
> >> > Fu Chen > >> > > >> > 王馨雨  于2024年9月20日周五 10:35写道: > >> > > > >> > > Hi 
all, > >> > > > >> > > I've written up a proposal for supporting HARD_SPLIT in 
Celeborn. You > >> > can find > >> > > the proposal here > >> > > < > >> > > >> 
https://cwiki.apache.org/confluence/display/CELEBORN/CIP-12+Support+HARD_SPLIT+in+PushMergedData
 > >> > >. > >> > > Please let me know if you have any comments or 
questions.Unlike > >> > PushData, Celeborn won’t actively trigger HARD_SPLIT in 
PushMergedData > >> > unless there are one or more partitions which have been 
split in the > >> > partition group of PushMergedData. > >> > > This leads to 
several problems: > >> > > Cascading HARD_SPLIT in PushMergedData will be too 
wasted because most > >> > partitions may not reach the HARD_SPLIT 
threshold.Worker pressure cannot > >> be > >> > transferred if the partitions 
won’t be split.ReverveSize won’t take > >> > effect.Supporting HARD_SPLIT in 
PushMergedData will solve the above > >> > problems which will only split the 
partitions that need to be split. > >> > > > >> > > Thanks, > >> > > Xinyu Wang 
> >> > > > >> > > >> >

Reply via email to