[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865272#comment-16865272 ]
Xianyin Xin commented on SPARK-27714: ------------------------------------- [~nkollar], sorry for the late reply. Yes, It's similar with the implementation in Postgres. However, It is not a replacement, or an alternative of current join reorder logic (DP), but a supplement of DP. DP is used when the number of joined table is small (<12 now in spark), while GA is used when the number of joined tables is large. Because as the number of joined table grows, DP would spend lots of time to find the best joined plan. GA can accelerates the "best plan searching" progress. TPC-DS q64 is an example. Our experiment shows the executing time decreased from 1300+s to 200+s for 10TB TPC-DS q64, with a 18 nodes cluster. > Support Join Reorder based on Genetic Algorithm when the # of joined tables > > 12 > -------------------------------------------------------------------------------- > > Key: SPARK-27714 > URL: https://issues.apache.org/jira/browse/SPARK-27714 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Xianyin Xin > Priority: Major > > Now the join reorder logic is based on dynamic planning which can find the > most optimized plan theoretically, but the searching cost grows rapidly with > the # of joined tables grows. It would be better to introduce Genetic > algorithm (GA) to overcome this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org