[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865272#comment-16865272
 ] 

Xianyin Xin commented on SPARK-27714:
-------------------------------------

[~nkollar], sorry for the late reply. Yes, It's similar with the implementation 
in Postgres. However, It is not a replacement, or an alternative of current 
join reorder logic (DP), but a supplement of DP. DP is used when the number of 
joined table is small (<12 now in spark), while GA is used when the number of 
joined tables is large. Because as the number of joined table grows, DP would 
spend lots of time to find the best joined plan. GA can accelerates the "best 
plan searching" progress.

TPC-DS q64 is an example. Our experiment shows the executing time decreased 
from 1300+s to 200+s for 10TB TPC-DS q64, with a 18 nodes cluster.

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-27714
>                 URL: https://issues.apache.org/jira/browse/SPARK-27714
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Xianyin Xin
>            Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to