[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12
[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696543#comment-17696543 ] Sujith Chacko commented on SPARK-27714: --- [~xinxianyin] any work is in progress for this issue? I saw a PR [https://github.com/apache/spark/pull/24983,] can we reopen the same for the review. Thanks > Support Join Reorder based on Genetic Algorithm when the # of joined tables > > 12 > > > Key: SPARK-27714 > URL: https://issues.apache.org/jira/browse/SPARK-27714 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xianyin Xin >Assignee: Xianyin Xin >Priority: Major > > Now the join reorder logic is based on dynamic planning which can find the > most optimized plan theoretically, but the searching cost grows rapidly with > the # of joined tables grows. It would be better to introduce Genetic > algorithm (GA) to overcome this problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12
[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865272#comment-16865272 ] Xianyin Xin commented on SPARK-27714: - [~nkollar], sorry for the late reply. Yes, It's similar with the implementation in Postgres. However, It is not a replacement, or an alternative of current join reorder logic (DP), but a supplement of DP. DP is used when the number of joined table is small (<12 now in spark), while GA is used when the number of joined tables is large. Because as the number of joined table grows, DP would spend lots of time to find the best joined plan. GA can accelerates the "best plan searching" progress. TPC-DS q64 is an example. Our experiment shows the executing time decreased from 1300+s to 200+s for 10TB TPC-DS q64, with a 18 nodes cluster. > Support Join Reorder based on Genetic Algorithm when the # of joined tables > > 12 > > > Key: SPARK-27714 > URL: https://issues.apache.org/jira/browse/SPARK-27714 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xianyin Xin >Priority: Major > > Now the join reorder logic is based on dynamic planning which can find the > most optimized plan theoretically, but the searching cost grows rapidly with > the # of joined tables grows. It would be better to introduce Genetic > algorithm (GA) to overcome this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12
[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845988#comment-16845988 ] Nandor Kollar commented on SPARK-27714: --- Is this something similar that is implemented [in Postgres|https://www.postgresql.org/docs/current/geqo.html]? I'm curious about the use case where GA could outperforms the current join reorder logic in Spark. I tried to find benchmark results about the Postgres implementation, but couldn't find any. Do you know any success story about applying GA in this field? > Support Join Reorder based on Genetic Algorithm when the # of joined tables > > 12 > > > Key: SPARK-27714 > URL: https://issues.apache.org/jira/browse/SPARK-27714 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Xianyin Xin >Priority: Major > > Now the join reorder logic is based on dynamic planning which can find the > most optimized plan theoretically, but the searching cost grows rapidly with > the # of joined tables grows. It would be better to introduce Genetic > algorithm (GA) to overcome this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12
[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840370#comment-16840370 ] Xianyin Xin commented on SPARK-27714: - [~hyukjin.kwon] Thanks for reminding. [~hyukjin.kwon] [~viirya] Thank you for comments. I'm working on a doc, will post it later on. > Support Join Reorder based on Genetic Algorithm when the # of joined tables > > 12 > > > Key: SPARK-27714 > URL: https://issues.apache.org/jira/browse/SPARK-27714 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Xianyin Xin >Priority: Major > > Now the join reorder logic is based on dynamic planning which can find the > most optimized plan theoretically, but the searching cost grows rapidly with > the # of joined tables grows. It would be better to introduce Genetic > algorithm (GA) to overcome this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12
[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840220#comment-16840220 ] Liang-Chi Hsieh commented on SPARK-27714: - In case how many joined tables there are, the searching cost is unacceptable and GA introduction is necessary? If you tried it, could you share it here? > Support Join Reorder based on Genetic Algorithm when the # of joined tables > > 12 > > > Key: SPARK-27714 > URL: https://issues.apache.org/jira/browse/SPARK-27714 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Xianyin Xin >Priority: Major > > Now the join reorder logic is based on dynamic planning which can find the > most optimized plan theoretically, but the searching cost grows rapidly with > the # of joined tables grows. It would be better to introduce Genetic > algorithm (GA) to overcome this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12
[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840158#comment-16840158 ] Hyukjin Kwon commented on SPARK-27714: -- Can you elaborate the idea in the JIRA? For instance, example input.output. expected input.output. reasons for "the most optimized plan theoretically", etc. It's difficult to follow what this JIRA targets. > Support Join Reorder based on Genetic Algorithm when the # of joined tables > > 12 > > > Key: SPARK-27714 > URL: https://issues.apache.org/jira/browse/SPARK-27714 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Xianyin Xin >Priority: Major > > Now the join reorder logic is based on dynamic planning which can find the > most optimized plan theoretically, but the searching cost grows rapidly with > the # of joined tables grows. It would be better to introduce Genetic > algorithm (GA) to overcome this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12
[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840154#comment-16840154 ] Hyukjin Kwon commented on SPARK-27714: -- Please avoid to set a target version which is reserved for committers. > Support Join Reorder based on Genetic Algorithm when the # of joined tables > > 12 > > > Key: SPARK-27714 > URL: https://issues.apache.org/jira/browse/SPARK-27714 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Xianyin Xin >Priority: Major > > Now the join reorder logic is based on dynamic planning which can find the > most optimized plan theoretically, but the searching cost grows rapidly with > the # of joined tables grows. It would be better to introduce Genetic > algorithm (GA) to overcome this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org