[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection

2020-06-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127059#comment-17127059
 ] 

Apache Spark commented on SPARK-29048:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28388

> Query optimizer slow when using Column.isInCollection() with a large size 
> collection
> 
>
> Key: SPARK-29048
> URL: https://issues.apache.org/jira/browse/SPARK-29048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4, 2.4.5, 2.4.6
>Reporter: Weichen Xu
>Priority: Major
>
> Query optimizer slow when using Column.isInCollection() with a large size 
> collection.
> The query optimizer takes a long time to do its thing and on the UI all I see 
> is "Running commands". This can take from 10s of minutes to 11 hours 
> depending on how many values there are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection

2020-04-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096939#comment-17096939
 ] 

Dongjoon Hyun commented on SPARK-29048:
---

BTW, this was merged and reverted for `3.0.0` only. This JIRA is irrelevant to 
2.4.6 release~ So, we can ignore for 2.4.6 release.

> Query optimizer slow when using Column.isInCollection() with a large size 
> collection
> 
>
> Key: SPARK-29048
> URL: https://issues.apache.org/jira/browse/SPARK-29048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4, 2.4.5, 2.4.6
>Reporter: Weichen Xu
>Priority: Major
>
> Query optimizer slow when using Column.isInCollection() with a large size 
> collection.
> The query optimizer takes a long time to do its thing and on the UI all I see 
> is "Running commands". This can take from 10s of minutes to 11 hours 
> depending on how many values there are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection

2020-04-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096935#comment-17096935
 ] 

Dongjoon Hyun commented on SPARK-29048:
---

This caused a correctness issue, SPARK-31553, which is linked to this JIRA.

> Query optimizer slow when using Column.isInCollection() with a large size 
> collection
> 
>
> Key: SPARK-29048
> URL: https://issues.apache.org/jira/browse/SPARK-29048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4, 2.4.5, 2.4.6
>Reporter: Weichen Xu
>Priority: Major
>
> Query optimizer slow when using Column.isInCollection() with a large size 
> collection.
> The query optimizer takes a long time to do its thing and on the UI all I see 
> is "Running commands". This can take from 10s of minutes to 11 hours 
> depending on how many values there are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection

2020-04-30 Thread Holden Karau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096705#comment-17096705
 ] 

Holden Karau commented on SPARK-29048:
--

Do we have context for the reason of revert? Should I still be tracking this 
for 2.4.6?

> Query optimizer slow when using Column.isInCollection() with a large size 
> collection
> 
>
> Key: SPARK-29048
> URL: https://issues.apache.org/jira/browse/SPARK-29048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4, 2.4.5, 2.4.6
>Reporter: Weichen Xu
>Priority: Major
>
> Query optimizer slow when using Column.isInCollection() with a large size 
> collection.
> The query optimizer takes a long time to do its thing and on the UI all I see 
> is "Running commands". This can take from 10s of minutes to 11 hours 
> depending on how many values there are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection

2020-04-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094627#comment-17094627
 ] 

Dongjoon Hyun commented on SPARK-29048:
---

This is reverted via 
https://github.com/apache/spark/commit/b7cabc80e6df523f0377b651fdbdc2a669c11550

> Query optimizer slow when using Column.isInCollection() with a large size 
> collection
> 
>
> Key: SPARK-29048
> URL: https://issues.apache.org/jira/browse/SPARK-29048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Weichen Xu
>Priority: Major
>
> Query optimizer slow when using Column.isInCollection() with a large size 
> collection.
> The query optimizer takes a long time to do its thing and on the UI all I see 
> is "Running commands". This can take from 10s of minutes to 11 hours 
> depending on how many values there are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org