Today I deployed Spark 1.6.2 on our production cluster. There is one daily huge job we run it every day using Spark SQL, and it is the biggest Spark job on our cluster running daily. I was impressive by the speed improvement.
Here is the history statistics of this daily job: 1) 11 to 12 hours on Hive 0.12 using MR 2) 6 hours on Spark 1.3.1 3) 4.5 hours on Spark 1.5.2 1.6 hours on Spark 1.6.2 with the same resource allocation (We are using Standalone mode). Very hard to believe. Looking forward to the coming Spark 2.x release (Can you guys really make 10x faster? For this job, 2x will already blow my heart). Great job, Spark development team! Thank you for such great product. Yong