Re: Breaking the previous large-scale sort record with Spark

Ilya Ganelin Mon, 13 Oct 2014 15:53:49 -0700

Thank you for the details! Would you mind speaking to what tools proved
most useful as far as identifying bottlenecks or bugs? Thanks again.
On Oct 13, 2014 5:36 PM, "Matei Zaharia" <matei.zaha...@gmail.com> wrote:


> The biggest scaling issue was supporting a large number of reduce tasks
> efficiently, which the JIRAs in that post handle. In particular, our
> current default shuffle (the hash-based one) has each map task open a
> separate file output stream for each reduce task, which wastes a lot of
> memory (since each stream has its own buffer).
>
> A second thing that helped efficiency tremendously was Reynold's new
> network module (https://issues.apache.org/jira/browse/SPARK-2468). Doing
> I/O on 32 cores, 10 Gbps Ethernet and 8+ disks efficiently is not easy, as
> can be seen when you try to scale up other software.
>
> Finally, with 30,000 tasks even sending info about every map's output size
> to each reducer was a problem, so Reynold has a patch that avoids that if
> the number of tasks is large.
>
> Matei
>
> On Oct 10, 2014, at 10:09 PM, Ilya Ganelin <ilgan...@gmail.com> wrote:
>
> > Hi Matei - I read your post with great interest. Could you possibly
> comment in more depth on some of the issues you guys saw when scaling up
> spark and how you resolved them? I am interested specifically in
> spark-related problems. I'm working on scaling up spark to very large
> datasets and have been running into a variety of issues. Thanks in advance!
> >
> > On Oct 10, 2014 10:54 AM, "Matei Zaharia" <matei.zaha...@gmail.com>
> wrote:
> > Hi folks,
> >
> > I interrupt your regularly scheduled user / dev list to bring you some
> pretty cool news for the project, which is that we've been able to use
> Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x
> faster on 10x fewer nodes. There's a detailed writeup at
> http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html.
> Summary: while Hadoop MapReduce held last year's 100 TB world record by
> sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on
> 206 nodes; and we also scaled up to sort 1 PB in 234 minutes.
> >
> > I want to thank Reynold Xin for leading this effort over the past few
> weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali
> Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for
> providing the machines to make this possible. Finally, this result would of
> course not be possible without the many many other contributions, testing
> and feature requests from throughout the community.
> >
> > For an engine to scale from these multi-hour petabyte batch jobs down to
> 100-millisecond streaming and interactive queries is quite uncommon, and
> it's thanks to all of you folks that we are able to make this happen.
> >
> > Matei
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>

Re: Breaking the previous large-scale sort record with Spark

Reply via email to