I assume you’re using the DataFrame API within your application.

sql(“SELECT…”).explain(true)

From: Wang, Daoyuan
Sent: Tuesday, May 5, 2015 10:16 AM
To: luohui20...@sina.com; Cheng, Hao; Olivier Girardot; user
Subject: RE: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables.

You can use
Explain extended select ….

From: luohui20...@sina.com<mailto:luohui20...@sina.com> 
[mailto:luohui20...@sina.com]
Sent: Tuesday, May 05, 2015 9:52 AM
To: Cheng, Hao; Olivier Girardot; user
Subject: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables.


As I know broadcastjoin is automatically enabled by 
spark.sql.autoBroadcastJoinThreshold.

refer to 
http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options



and how to check my app's physical plan,and others things like optimized 
plan,executable plan.etc



thanks

--------------------------------

Thanks&amp;Best regards!
罗辉 San.Luo

----- 原始邮件 -----
发件人:"Cheng, Hao" <hao.ch...@intel.com<mailto:hao.ch...@intel.com>>
收件人:"Cheng, Hao" <hao.ch...@intel.com<mailto:hao.ch...@intel.com>>, 
"luohui20...@sina.com<mailto:luohui20...@sina.com>" 
<luohui20...@sina.com<mailto:luohui20...@sina.com>>, Olivier Girardot 
<ssab...@gmail.com<mailto:ssab...@gmail.com>>, user 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题:RE: 回复:Re: sparksql running slow while joining_2_tables.
日期:2015年05月05日 08点38分

Or, have you ever try broadcast join?

From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Tuesday, May 5, 2015 8:33 AM
To: luohui20...@sina.com<mailto:luohui20...@sina.com>; Olivier Girardot; user
Subject: RE: 回复:Re: sparksql running slow while joining 2 tables.

Can you print out the physical plan?

EXPLAIN SELECT xxx…

From: luohui20...@sina.com<mailto:luohui20...@sina.com> 
[mailto:luohui20...@sina.com]
Sent: Monday, May 4, 2015 9:08 PM
To: Olivier Girardot; user
Subject: 回复:Re: sparksql running slow while joining 2 tables.


hi Olivier

spark1.3.1, with java1.8.0.45

and add 2 pics .

it seems like a GC issue. I also tried with different parameters like memory 
size of driver&executor, memory fraction, java opts...

but this issue still happens.

--------------------------------

Thanks&amp;Best regards!
罗辉 San.Luo

----- 原始邮件 -----
发件人:Olivier Girardot <ssab...@gmail.com<mailto:ssab...@gmail.com>>
收件人:luohui20...@sina.com<mailto:luohui20...@sina.com>, user 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题:Re: sparksql running slow while joining 2 tables.
日期:2015年05月04日 20点46分

Hi,
What is you Spark version ?

Regards,

Olivier.

Le lun. 4 mai 2015 à 11:03, <luohui20...@sina.com<mailto:luohui20...@sina.com>> 
a écrit :

hi guys

        when i am running a sql  like "select 
a.name<http://a.name>,a.startpoint,a.endpoint, a.piece from db a join sample b 
on (a.name<http://a.name> = b.name<http://b.name>) where (b.startpoint > 
a.startpoint + 25);" I found sparksql running slow in minutes which may caused 
by very long GC and shuffle time.



       table db is created from a txt file size at 56mb while table sample 
sized at 26mb, both at small size.

       my spark cluster is a standalone  pseudo-distributed spark cluster with 
8g executor and 4g driver manager.

       any advises? thank you guys.



--------------------------------

Thanks&amp;Best regards!
罗辉 San.Luo

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

Reply via email to