Hi Aidar,
This feature sounds great and can solve disk problems in many scenarios.

I have some small questions about the details. If it's a spill in a
non-shuffle scenario, will celeborn cluster also be responsible for
the corresponding spill data?

Overall, sort the reduce partition files on the celeborn worker looks
good, looking forward to seeing the overall design and the CIP.

Thanks,
Binjie Yang

Aidar Bariev <ai...@stripe.com.invalid> 于2024年8月29日周四 05:39写道:
>
> Hey Celeborn community!
>
> We (Stripe) are discovering the possibility of reducing the amount of disk
> spills due to external sorting on the reducer side and trying to find a
> partial solution to https://issues.apache.org/jira/browse/CELEBORN-1435
> issue.
>
> Currently, the idea that we have is to sort the ReducePartition file on the
> Celeborn worker at the write time and use K-way merge to further sort the
> data on the reducers. It would still require spilling some data back to
> Celeborn in case the number of ReducePartition files is high and we need
> multiple passes of external merge/sort.
>
> This description might seem pretty generic, but I'm curious to hear
> opinions before starting to draft the CIP! It would be nice to see whether
> there have been previous thoughts and interest in reducing disk usage on
> executors other than the jira ticket above.

Reply via email to