Re: Breaking the previous large-scale sort record with Spark

Mridul Muralidharan Fri, 10 Oct 2014 08:20:13 -0700

Brilliant stuff ! Congrats all :-)
This is indeed really heartening news !

Regards,
Mridul



On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> Hi folks,
>
> I interrupt your regularly scheduled user / dev list to bring you some pretty 
> cool news for the project, which is that we've been able to use Spark to 
> break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x 
> fewer nodes. There's a detailed writeup at 
> http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html.
>  Summary: while Hadoop MapReduce held last year's 100 TB world record by 
> sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 206 
> nodes; and we also scaled up to sort 1 PB in 234 minutes.
>
> I want to thank Reynold Xin for leading this effort over the past few weeks, 
> along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali Ghodsi. In 
> addition, we'd really like to thank Amazon's EC2 team for providing the 
> machines to make this possible. Finally, this result would of course not be 
> possible without the many many other contributions, testing and feature 
> requests from throughout the community.
>
> For an engine to scale from these multi-hour petabyte batch jobs down to 
> 100-millisecond streaming and interactive queries is quite uncommon, and it's 
> thanks to all of you folks that we are able to make this happen.
>
> Matei
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Breaking the previous large-scale sort record with Spark

Reply via email to