Re: [D] Deploying Apache Celebron for Large-Scale Spark SQL Workloads [celeborn]

via GitHub Fri, 10 Apr 2026 02:36:11 -0700


GitHub user jakubfijolek added a comment to the discussion: Deploying Apache 
Celebron for Large-Scale Spark SQL Workloads


I know this question was asked long time ago but as it pops up on google 
results I will reply either way. 

Our setup: 3 separate spark clusters in different regions. Each one: 50-600 
Yarn nodemanagers (Spark on Yarn) on EC2 ASG, scaled dynamically. Shuffle sizes 
 hitting up >50TB on regular basis.  3 static celeborn nodes + single master 
(we had issues with multi master) + 0-20 dynamic celeborn nodes triggered by 
autoscaler when large jobs are submitted. 
We are using celeborn since Sep 2025

We had single cluster that is set up across multiple geo zones an uses celeborn 
tags to route the shuffle files to specific celeborn workers - it works without 
issues although it's slightly counter intuitive to set up initially. Currently 
it's split to multiple per-region clusters. 

My general experience: as long as cluster is in same network (same region or 
DC) it's better to use one big cluster, and celeborn tags to isolate the load 
it provides better scaling and allows to fallback to empty tag to use whole 
cluster. 

I've did not had issues with blending multiple concurrent spark jobs on same 
celeborn cluster. It handles mixed workloads very well and as long as they 
don't saturate disk space. CPU/MEMORY is rarely an issue as long as shuffle 
partitions are sensibly sized (64-512MB) 

Was it worth for us to switch to celeborn from spark shuffles. **YES** the only 
bad thing is that we did not do it sooner. We have very agressive autoscaling 
for NM and long running jobs had regularly run into issues where nodes get 
either deprovisioned before shuffle data was consumed or cluster was scaled up 
and idling just to keep shuffle data available for couple writer threads. With 
celeborn we were able to decouple this completely and save a lot of $. 
Performance wise on ~8 celeborn workers we hit ~15-20GB/s read/write throughput 
for shuffles which is very respectable and scales very well as more nodes are 
added. 

IMHO it's a no-brainer for any bigger spark deployment with autoscaling. For 
static size or small clusters might not be worth the time and cost. 

GitHub link: 
https://github.com/apache/celeborn/discussions/3191#discussioncomment-16515390

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Deploying Apache Celebron for Large-Scale Spark SQL Workloads [celeborn]

Reply via email to