Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Shixiong Zhu
Oh, I see. That's the total time of executing a query in Spark. Then the difference is reasonable, considering Spark has much more work to do, e.g., launching tasks in executors. Best Regards, Shixiong Zhu 2015-07-26 16:16 GMT+08:00 Louis Hust louis.h...@gmail.com: Look at the given url:

R: Spark is much slower than direct access MySQL

2015-07-26 Thread Paolo Platter
Da: Louis Hustmailto:louis.h...@gmail.com Inviato: ‎26/‎07/‎2015 10:28 A: Shixiong Zhumailto:zsxw...@gmail.com Cc: Jerrick Hoangmailto:jerrickho...@gmail.com; user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: Spark is much slower than direct access MySQL

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Louis Hust
Oggetto: Re: Spark is much slower than direct access MySQL Thanks for your explain 2015-07-26 16:22 GMT+08:00 Shixiong Zhu zsxw...@gmail.com: Oh, I see. That's the total time of executing a query in Spark. Then the difference is reasonable, considering Spark has much more work to do, e.g

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Jerrick Hoang
how big is the dataset? how complicated is the query? On Sun, Jul 26, 2015 at 12:47 AM Louis Hust louis.h...@gmail.com wrote: Hi, all, I am using spark DataFrame to fetch small table from MySQL, and i found it cost so much than directly access MySQL Using JDBC. Time cost for Spark is about

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Louis Hust
Look at the given url: Code can be found at: https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java 2015-07-26 16:14 GMT+08:00 Shixiong Zhu zsxw...@gmail.com: Could you clarify how you measure the Spark time cost? Is it the total time of running the query? If

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Shixiong Zhu
Could you clarify how you measure the Spark time cost? Is it the total time of running the query? If so, it's possible because the overhead of Spark dominates for small queries. Best Regards, Shixiong Zhu 2015-07-26 15:56 GMT+08:00 Jerrick Hoang jerrickho...@gmail.com: how big is the dataset?

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Louis Hust
Thanks for your explain 2015-07-26 16:22 GMT+08:00 Shixiong Zhu zsxw...@gmail.com: Oh, I see. That's the total time of executing a query in Spark. Then the difference is reasonable, considering Spark has much more work to do, e.g., launching tasks in executors. Best Regards, Shixiong Zhu

Spark is much slower than direct access MySQL

2015-07-26 Thread Louis Hust
Hi, all, I am using spark DataFrame to fetch small table from MySQL, and i found it cost so much than directly access MySQL Using JDBC. Time cost for Spark is about 2033ms, and direct access at about 16ms. Code can be found at: