[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2023-03-05 Thread Sujith Chacko (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696543#comment-17696543
 ] 

Sujith Chacko commented on SPARK-27714:
---

[~xinxianyin]  any work is in progress for this issue? I saw a PR 
[https://github.com/apache/spark/pull/24983,] can we reopen the same for the 
review. Thanks

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-06-16 Thread Xianyin Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865272#comment-16865272
 ] 

Xianyin Xin commented on SPARK-27714:
-

[~nkollar], sorry for the late reply. Yes, It's similar with the implementation 
in Postgres. However, It is not a replacement, or an alternative of current 
join reorder logic (DP), but a supplement of DP. DP is used when the number of 
joined table is small (<12 now in spark), while GA is used when the number of 
joined tables is large. Because as the number of joined table grows, DP would 
spend lots of time to find the best joined plan. GA can accelerates the "best 
plan searching" progress.

TPC-DS q64 is an example. Our experiment shows the executing time decreased 
from 1300+s to 200+s for 10TB TPC-DS q64, with a 18 nodes cluster.

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-22 Thread Nandor Kollar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845988#comment-16845988
 ] 

Nandor Kollar commented on SPARK-27714:
---

Is this something similar that is implemented [in 
Postgres|https://www.postgresql.org/docs/current/geqo.html]? I'm curious about 
the use case where GA could outperforms the current join reorder logic in 
Spark. I tried to find benchmark results about the Postgres implementation, but 
couldn't find any. Do you know any success story about applying GA in this 
field?

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-15 Thread Xianyin Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840370#comment-16840370
 ] 

Xianyin Xin commented on SPARK-27714:
-

[~hyukjin.kwon] Thanks for reminding.

[~hyukjin.kwon] [~viirya] Thank you for comments. I'm working on a doc, will 
post it later on.

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-15 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840220#comment-16840220
 ] 

Liang-Chi Hsieh commented on SPARK-27714:
-

In case how many joined tables there are, the searching cost is unacceptable 
and GA introduction is necessary? If you tried it, could you share it here?

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-15 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840158#comment-16840158
 ] 

Hyukjin Kwon commented on SPARK-27714:
--

Can you elaborate the idea in the JIRA? For instance, example input.output. 
expected input.output. reasons for "the most optimized plan theoretically", 
etc. It's difficult to follow what this JIRA targets.

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-15 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840154#comment-16840154
 ] 

Hyukjin Kwon commented on SPARK-27714:
--

Please avoid to set a target version which is reserved for committers.

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org