[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection
[ https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127059#comment-17127059 ] Apache Spark commented on SPARK-29048: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/28388 > Query optimizer slow when using Column.isInCollection() with a large size > collection > > > Key: SPARK-29048 > URL: https://issues.apache.org/jira/browse/SPARK-29048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4, 2.4.5, 2.4.6 >Reporter: Weichen Xu >Priority: Major > > Query optimizer slow when using Column.isInCollection() with a large size > collection. > The query optimizer takes a long time to do its thing and on the UI all I see > is "Running commands". This can take from 10s of minutes to 11 hours > depending on how many values there are. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection
[ https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096939#comment-17096939 ] Dongjoon Hyun commented on SPARK-29048: --- BTW, this was merged and reverted for `3.0.0` only. This JIRA is irrelevant to 2.4.6 release~ So, we can ignore for 2.4.6 release. > Query optimizer slow when using Column.isInCollection() with a large size > collection > > > Key: SPARK-29048 > URL: https://issues.apache.org/jira/browse/SPARK-29048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4, 2.4.5, 2.4.6 >Reporter: Weichen Xu >Priority: Major > > Query optimizer slow when using Column.isInCollection() with a large size > collection. > The query optimizer takes a long time to do its thing and on the UI all I see > is "Running commands". This can take from 10s of minutes to 11 hours > depending on how many values there are. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection
[ https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096935#comment-17096935 ] Dongjoon Hyun commented on SPARK-29048: --- This caused a correctness issue, SPARK-31553, which is linked to this JIRA. > Query optimizer slow when using Column.isInCollection() with a large size > collection > > > Key: SPARK-29048 > URL: https://issues.apache.org/jira/browse/SPARK-29048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4, 2.4.5, 2.4.6 >Reporter: Weichen Xu >Priority: Major > > Query optimizer slow when using Column.isInCollection() with a large size > collection. > The query optimizer takes a long time to do its thing and on the UI all I see > is "Running commands". This can take from 10s of minutes to 11 hours > depending on how many values there are. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection
[ https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096705#comment-17096705 ] Holden Karau commented on SPARK-29048: -- Do we have context for the reason of revert? Should I still be tracking this for 2.4.6? > Query optimizer slow when using Column.isInCollection() with a large size > collection > > > Key: SPARK-29048 > URL: https://issues.apache.org/jira/browse/SPARK-29048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4, 2.4.5, 2.4.6 >Reporter: Weichen Xu >Priority: Major > > Query optimizer slow when using Column.isInCollection() with a large size > collection. > The query optimizer takes a long time to do its thing and on the UI all I see > is "Running commands". This can take from 10s of minutes to 11 hours > depending on how many values there are. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection
[ https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094627#comment-17094627 ] Dongjoon Hyun commented on SPARK-29048: --- This is reverted via https://github.com/apache/spark/commit/b7cabc80e6df523f0377b651fdbdc2a669c11550 > Query optimizer slow when using Column.isInCollection() with a large size > collection > > > Key: SPARK-29048 > URL: https://issues.apache.org/jira/browse/SPARK-29048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Weichen Xu >Priority: Major > > Query optimizer slow when using Column.isInCollection() with a large size > collection. > The query optimizer takes a long time to do its thing and on the UI all I see > is "Running commands". This can take from 10s of minutes to 11 hours > depending on how many values there are. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org