Great performance improvement of Spark 1.6.2 on our production cluster

Yong Zhang Mon, 29 Aug 2016 10:17:32 -0700

Today I deployed Spark 1.6.2 on our production cluster.

There is one daily huge job we run it every day using Spark SQL, and it is the 
biggest Spark job on our cluster running daily. I was impressive by the speed 
improvement.


Here is the history statistics of this daily job:

1) 11 to 12 hours on Hive 0.12 using MR
2) 6 hours on Spark 1.3.1
3) 4.5 hours on Spark 1.5.2

1.6 hours on Spark 1.6.2 with the same resource allocation (We are using 
Standalone mode). Very hard to believe.


 Looking forward to the coming Spark 2.x release (Can you guys really make 10x 
faster? For this job, 2x will already blow my heart).


Great job, Spark development team! Thank you for such great product.


Yong

Great performance improvement of Spark 1.6.2 on our production cluster

Reply via email to