If you want a performance boost, you need to load the full table in memory 
using caching and them execute your query directly on cached dataframe. 
Otherwise you use spark only as a bridge and you don't leverage the distributed 
in memory engine of spark.

Paolo

Inviata dal mio Windows Phone
________________________________
Da: Louis Hust<mailto:louis.h...@gmail.com>
Inviato: ‎26/‎07/‎2015 10:28
A: Shixiong Zhu<mailto:zsxw...@gmail.com>
Cc: Jerrick Hoang<mailto:jerrickho...@gmail.com>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Oggetto: Re: Spark is much slower than direct access MySQL

Thanks for your explain

2015-07-26 16:22 GMT+08:00 Shixiong Zhu 
<zsxw...@gmail.com<mailto:zsxw...@gmail.com>>:
Oh, I see. That's the total time of executing a query in Spark. Then the 
difference is reasonable, considering Spark has much more work to do, e.g., 
launching tasks in executors.


Best Regards,

Shixiong Zhu

2015-07-26 16:16 GMT+08:00 Louis Hust 
<louis.h...@gmail.com<mailto:louis.h...@gmail.com>>:
Look at the given url:

Code can be found at:

https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java

2015-07-26 16:14 GMT+08:00 Shixiong Zhu 
<zsxw...@gmail.com<mailto:zsxw...@gmail.com>>:
Could you clarify how you measure the Spark time cost? Is it the total time of 
running the query? If so, it's possible because the overhead of Spark dominates 
for small queries.


Best Regards,

Shixiong Zhu

2015-07-26 15:56 GMT+08:00 Jerrick Hoang 
<jerrickho...@gmail.com<mailto:jerrickho...@gmail.com>>:
how big is the dataset? how complicated is the query?

On Sun, Jul 26, 2015 at 12:47 AM Louis Hust 
<louis.h...@gmail.com<mailto:louis.h...@gmail.com>> wrote:
Hi, all,

I am using spark DataFrame to fetch small table from MySQL,
and i found it cost so much than directly access MySQL Using JDBC.

Time cost for Spark is about 2033ms, and direct access at about 16ms.

Code can be found at:

https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java

So If my configuration for spark is wrong? How to optimise Spark to achieve the 
similar performance like direct access?

Any idea will be appreciated!





Reply via email to