Re: Is there any tool that i can prove to customer that spark is faster then hive ?

2015-08-12 Thread Gourav Sengupta
You might also need to consider the maturity of SPARKSQL vs HIVEQL.

Besides that please read the following (which will soon be available as a
part of standard Amazon stack, in case its not already)
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started.


All that you need to do is run the following (in case the environment is
already set up in AWS) and HIVE will be using SPARK for executing the
queries
hive> set hive.execution.engine=spark;


HIVE can use Map Reduce, Tez and now SPARK in order to execute its queries.

Please do read the details in the above link.


Regards,
Gourav Sengupta

On Wed, Aug 12, 2015 at 1:01 PM, Nick Pentreath 
wrote:

> Perhaps you could time the end-to-end runtime for each pipeline, and each
> stage?
>
> Through Id be fairly confidant that Spark will outperform hive/mahout on
> MR, that's not he only consideration - having everything on a single
> platform and the Spark / data frame API is a huge win just by itself
>
>
>
> —
> Sent from Mailbox 
>
>
> On Wed, Aug 12, 2015 at 1:45 PM, Ladle  wrote:
>
>> Hi ,
>>
>> I have build the the machine learning features and model using Apache
>> spark.
>>
>> And the same features i have i build using hive,java and used mahout to
>> run
>> model.
>>
>> Now how can i show to customer that Apache Spark is more faster then
>> hive.
>>
>> Is there any tool that shows the time ?
>>
>> Regards,
>> Ladle
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-any-tool-that-i-can-prove-to-customer-that-spark-is-faster-then-hive-tp24224.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Is there any tool that i can prove to customer that spark is faster then hive ?

2015-08-12 Thread Nick Pentreath
Perhaps you could time the end-to-end runtime for each pipeline, and each stage?




Through Id be fairly confidant that Spark will outperform hive/mahout on MR, 
that's not he only consideration - having everything on a single platform and 
the Spark / data frame API is a huge win just by itself









—
Sent from Mailbox

On Wed, Aug 12, 2015 at 1:45 PM, Ladle  wrote:

> Hi ,
> I have build the the machine learning features and model using Apache spark.
> And the same features i have i build using hive,java and used mahout to run
> model.
> Now how can i show to customer that Apache Spark is more faster then hive.
> Is there any tool that shows the time ?
> Regards,
> Ladle
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-any-tool-that-i-can-prove-to-customer-that-spark-is-faster-then-hive-tp24224.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org