Re: Breaking the previous large-scale sort record with Spark

Matei Zaharia Mon, 13 Oct 2014 14:37:35 -0700

The biggest scaling issue was supporting a large number of reduce tasks 
efficiently, which the JIRAs in that post handle. In particular, our current 
default shuffle (the hash-based one) has each map task open a separate file 
output stream for each reduce task, which wastes a lot of memory (since each 
stream has its own buffer).


A second thing that helped efficiency tremendously was Reynold's new network 
module (https://issues.apache.org/jira/browse/SPARK-2468). Doing I/O on 32 
cores, 10 Gbps Ethernet and 8+ disks efficiently is not easy, as can be seen 
when you try to scale up other software.

Finally, with 30,000 tasks even sending info about every map's output size to 
each reducer was a problem, so Reynold has a patch that avoids that if the 
number of tasks is large.

Matei

On Oct 10, 2014, at 10:09 PM, Ilya Ganelin <ilgan...@gmail.com> wrote:

> Hi Matei - I read your post with great interest. Could you possibly comment 
> in more depth on some of the issues you guys saw when scaling up spark and 
> how you resolved them? I am interested specifically in spark-related 
> problems. I'm working on scaling up spark to very large datasets and have 
> been running into a variety of issues. Thanks in advance!
> 
> On Oct 10, 2014 10:54 AM, "Matei Zaharia" <matei.zaha...@gmail.com> wrote:
> Hi folks,
> 
> I interrupt your regularly scheduled user / dev list to bring you some pretty 
> cool news for the project, which is that we've been able to use Spark to 
> break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x 
> fewer nodes. There's a detailed writeup at 
> http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html.
>  Summary: while Hadoop MapReduce held last year's 100 TB world record by 
> sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 206 
> nodes; and we also scaled up to sort 1 PB in 234 minutes.
> 
> I want to thank Reynold Xin for leading this effort over the past few weeks, 
> along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali Ghodsi. In 
> addition, we'd really like to thank Amazon's EC2 team for providing the 
> machines to make this possible. Finally, this result would of course not be 
> possible without the many many other contributions, testing and feature 
> requests from throughout the community.
> 
> For an engine to scale from these multi-hour petabyte batch jobs down to 
> 100-millisecond streaming and interactive queries is quite uncommon, and it's 
> thanks to all of you folks that we are able to make this happen.
> 
> Matei
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Breaking the previous large-scale sort record with Spark

Reply via email to