Hey Celeborn community!

We (Stripe) are discovering the possibility of reducing the amount of disk
spills due to external sorting on the reducer side and trying to find a
partial solution to https://issues.apache.org/jira/browse/CELEBORN-1435
issue.

Currently, the idea that we have is to sort the ReducePartition file on the
Celeborn worker at the write time and use K-way merge to further sort the
data on the reducers. It would still require spilling some data back to
Celeborn in case the number of ReducePartition files is high and we need
multiple passes of external merge/sort.

This description might seem pretty generic, but I'm curious to hear
opinions before starting to draft the CIP! It would be nice to see whether
there have been previous thoughts and interest in reducing disk usage on
executors other than the jira ticket above.

Reply via email to