Is there any tool that i can prove to customer that spark is faster then hive ?

2015-08-12 Thread Ladle
Hi , I have build the the machine learning features and model using Apache spark. And the same features i have i build using hive,java and used mahout to run model. Now how can i show to customer that Apache Spark is more faster then hive. Is there any tool that shows the time ? Regards,

Re: Is there any tool that i can prove to customer that spark is faster then hive ?

2015-08-12 Thread Nick Pentreath
Perhaps you could time the end-to-end runtime for each pipeline, and each stage? Through Id be fairly confidant that Spark will outperform hive/mahout on MR, that's not he only consideration - having everything on a single platform and the Spark / data frame API is a huge win just by itself

Re: Is there any tool that i can prove to customer that spark is faster then hive ?

2015-08-12 Thread Gourav Sengupta
You might also need to consider the maturity of SPARKSQL vs HIVEQL. Besides that please read the following (which will soon be available as a part of standard Amazon stack, in case its not already) https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started. All that you