Hi Liam, Thanks for the proposal! Celeborn was targeted at both shuffle and spilled data (united to intermediate data) from day1, as the "About" in github says: Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Really great to see community is raising this up. Regards, Keyong Zhou Liam Hecht <[email protected]> 于2026年3月15日周日 17:02写道: > Hi Celeborn Devs, > > I would like to start a discussion about a new idea for Celeborn: *Support > Remote Block Spilling for Compute Engines*. > > Celeborn works very well for shuffle data, but compute engines like Spark > still use local disks for execution spills (for example during large sorts > or aggregations). This can be a problem on machines with limited local > storage. > > The idea is to extend Celeborn so it can store these spill blocks remotely. > When an executor runs out of memory, instead of writing to local disk, it > would send the spilled data to Celeborn Workers. > > *Main points of the proposal:* > > - > > New RPC messages: PushSpillData and ReleaseSpill > - > > Worker-side SpillFileManager to manage spilled blocks > - > > Single-copy storage since the data is temporary > - > > Reuse Celeborn’s existing network and storage components > > I think this could make Celeborn more useful for cloud native and diskless > environments. > > Happy to share a more detailed design document if there is interest. > > Best regards, > Liam Hecht >
