Hi ,
I have build the the machine learning features and model using Apache spark.
And the same features i have i build using hive,java and used mahout to run
model.
Now how can i show to customer that Apache Spark is more faster then hive.
Is there any tool that shows the time ?
Regards,
Perhaps you could time the end-to-end runtime for each pipeline, and each stage?
Through Id be fairly confidant that Spark will outperform hive/mahout on MR,
that's not he only consideration - having everything on a single platform and
the Spark / data frame API is a huge win just by itself
You might also need to consider the maturity of SPARKSQL vs HIVEQL.
Besides that please read the following (which will soon be available as a
part of standard Amazon stack, in case its not already)
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started.
All that you