What we have observed so far is Spark picks join order in the same order as tables in from clause is specified. Sometimes reordering benefits the join query. This can be an inbuilt optimization in Spark. But again its not going to be straight forward, where rather than table size, selectivity of Join is important. Probably some kind of heuristic might help.
Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra On Mon, Mar 21, 2016 at 11:18 PM, gtinside <gtins...@gmail.com> wrote: > More details : > > Execution plan for Original query > select distinct pge.portfolio_code > from table1 pge join table2 p > on p.perm_group = pge.anc_port_group > join table3 uge > on p.user_group=uge.anc_user_group > where uge.user_name = 'user' and p.perm_type = 'TEST' > > == Physical Plan == > TungstenAggregate(key=[portfolio_code#14119], functions=[], > output=[portfolio_code#14119]) > TungstenExchange hashpartitioning(portfolio_code#14119) > TungstenAggregate(key=[portfolio_code#14119], functions=[], > output=[portfolio_code#14119]) > TungstenProject [portfolio_code#14119] > BroadcastHashJoin [user_group#13665], [anc_user_group#13658], > BuildRight > TungstenProject [portfolio_code#14119,user_group#13665] > BroadcastHashJoin [anc_port_group#14117], [perm_group#13667], > BuildRight > ConvertToUnsafe > Scan > > ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table1.parquet][portfolio_code#14119,anc_port_group#14117] > ConvertToUnsafe > Project [user_group#13665,perm_group#13667] > Filter (perm_type#13666 = TEST) > Scan > > ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table2.parquet][user_group#13665,perm_group#13667,perm_type#13666] > ConvertToUnsafe > Project [anc_user_group#13658] > Filter (user_name#13659 = user) > Scan > > ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table3.parquet][anc_user_group#13658,user_name#13659] > > > > Execution plan for optimized query > select distinct pge.portfolio_code > from table1 uge, table2 p, table3 pge > where uge.user_name = 'user' and p.perm_type = 'TEST' > and p.perm_group = pge.anc_port_group > and p.user_group=uge.anc_user_group > > == Physical Plan == > TungstenAggregate(key=[portfolio_code#14119], functions=[], > output=[portfolio_code#14119]) > TungstenExchange hashpartitioning(portfolio_code#14119) > TungstenAggregate(key=[portfolio_code#14119], functions=[], > output=[portfolio_code#14119]) > TungstenProject [portfolio_code#14119] > BroadcastHashJoin [perm_group#13667], [anc_port_group#14117], > BuildRight > TungstenProject [perm_group#13667] > BroadcastHashJoin [anc_user_group#13658], [user_group#13665], > BuildRight > ConvertToUnsafe > Project [anc_user_group#13658] > Filter (user_name#13659 = user) > Scan > > ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table3.parquet][anc_user_group#13658,user_name#13659] > ConvertToUnsafe > Project [perm_group#13667,user_group#13665] > Filter (perm_type#13666 = TEST) > Scan > > ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table2.parquet][perm_group#13667,user_group#13665,perm_type#13666] > ConvertToUnsafe > Scan > > ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table1.parquet][portfolio_code#14119,anc_port_group#14117] > > > > > > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Optimization-tp26548p26553.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >