GitHub user jakubfijolek added a comment to the discussion: Deploying Apache Celebron for Large-Scale Spark SQL Workloads
I know this question was asked long time ago but as it pops up on google results I will reply either way. Our setup: 3 separate spark clusters in different regions. Each one: 50-600 Yarn nodemanagers (Spark on Yarn) on EC2 ASG, scaled dynamically. Shuffle sizes hitting up >50TB on regular basis. 3 static celeborn nodes + single master (we had issues with multi master) + 0-20 dynamic celeborn nodes triggered by autoscaler when large jobs are submitted. We are using celeborn since Sep 2025 We had single cluster that is set up across multiple geo zones an uses celeborn tags to route the shuffle files to specific celeborn workers - it works without issues although it's slightly counter intuitive to set up initially. Currently it's split to multiple per-region clusters. My general experience: as long as cluster is in same network (same region or DC) it's better to use one big cluster, and celeborn tags to isolate the load it provides better scaling and allows to fallback to empty tag to use whole cluster. I've did not had issues with blending multiple concurrent spark jobs on same celeborn cluster. It handles mixed workloads very well and as long as they don't saturate disk space. CPU/MEMORY is rarely an issue as long as shuffle partitions are sensibly sized (64-512MB) Was it worth for us to switch to celeborn from spark shuffles. **YES** the only bad thing is that we did not do it sooner. We have very agressive autoscaling for NM and long running jobs had regularly run into issues where nodes get either deprovisioned before shuffle data was consumed or cluster was scaled up and idling just to keep shuffle data available for couple writer threads. With celeborn we were able to decouple this completely and save a lot of $. Performance wise on ~8 celeborn workers we hit ~15-20GB/s read/write throughput for shuffles which is very respectable and scales very well as more nodes are added. IMHO it's a no-brainer for any bigger spark deployment with autoscaling. For static size or small clusters might not be worth the time and cost. GitHub link: https://github.com/apache/celeborn/discussions/3191#discussioncomment-16515390 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
