[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code
[ https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238411#comment-17238411 ] Asif edited comment on SPARK-19875 at 11/25/20, 6:02 PM: - [~maropu], [~sameerag] [~jay.pranavamurthi] I have generated a PR for SPARK-33152 which fixes the OOM or unreasonable compile time in queries. The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185] I cannot get any body for code review. The explanation of the logic used is in the PR. If needed we can go through the code together. This is going to be used by workday in production. was (Author: ashahid7): [~maropu], [~sameerag] [~jay.pranavamurthi] I have generated a PR for SPARK-3152 which fixes the OOM or unreasonable compile time in queries. The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185] I cannot get any body for code review. The explanation of the logic used is in the PR. If needed we can go through the code together. This is going to be used by workday in production. > Map->filter on many columns gets stuck in constraint inference optimization > code > > > Key: SPARK-19875 > URL: https://issues.apache.org/jira/browse/SPARK-19875 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jay Pranavamurthi >Priority: Major > Labels: bulk-closed > Attachments: TestFilter.scala, test10cols.csv, test50cols.csv > > > The attached code (TestFilter.scala) works with a 10-column csv dataset, but > gets stuck with a 50-column csv dataset. Both datasets are attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code
[ https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238411#comment-17238411 ] Asif edited comment on SPARK-19875 at 11/24/20, 11:43 PM: -- [~maropu], [~sameerag] [~jay.pranavamurthi] I have generated a PR for SPARK-3152 which fixes the OOM or unreasonable compile time in queries. The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185] I cannot get any body for code review. The explanation of the logic used is in the PR. If needed we can go through the code together. This is going to be used by workday in production. was (Author: ashahid7): [~maropu], [~sameerag] [~jay.pranavamurthi] I have generated a PR for SPARK-3152 which fixes the OOM or unreasonable compile time in queries. The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185] I cannot get any body for code review. The explanation of the logic used is in the PR > Map->filter on many columns gets stuck in constraint inference optimization > code > > > Key: SPARK-19875 > URL: https://issues.apache.org/jira/browse/SPARK-19875 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jay Pranavamurthi >Priority: Major > Labels: bulk-closed > Attachments: TestFilter.scala, test10cols.csv, test50cols.csv > > > The attached code (TestFilter.scala) works with a 10-column csv dataset, but > gets stuck with a 50-column csv dataset. Both datasets are attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code
[ https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238411#comment-17238411 ] Asif edited comment on SPARK-19875 at 11/24/20, 11:42 PM: -- [~maropu], [~sameerag] [~jay.pranavamurthi] I have generated a PR for SPARK-3152 which fixes the OOM or unreasonable compile time in queries. The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185] I cannot get any body for code review. The explanation of the logic used is in the PR was (Author: ashahid7): [~maropu] I have generated a PR for SPARK-3152 which fixes the OOM or unreasonable compile time in queries. The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185] I cannot get any body for code review. The explanation of the logic used is in the PR > Map->filter on many columns gets stuck in constraint inference optimization > code > > > Key: SPARK-19875 > URL: https://issues.apache.org/jira/browse/SPARK-19875 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jay Pranavamurthi >Priority: Major > Labels: bulk-closed > Attachments: TestFilter.scala, test10cols.csv, test50cols.csv > > > The attached code (TestFilter.scala) works with a 10-column csv dataset, but > gets stuck with a 50-column csv dataset. Both datasets are attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code
[ https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925332#comment-15925332 ] Takeshi Yamamuro edited comment on SPARK-19875 at 3/15/17 12:32 AM: If you understand a concrete reason about the bug you described, could you update the description in this JIRA so that we could fix in future. was (Author: maropu): Hi, Sameer. If you understand a concrete reason about the bug you described, could you update the description in this JIRA so that we could fix in future. Thanks. > Map->filter on many columns gets stuck in constraint inference optimization > code > > > Key: SPARK-19875 > URL: https://issues.apache.org/jira/browse/SPARK-19875 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jay Pranavamurthi > Attachments: test10cols.csv, test50cols.csv, TestFilter.scala > > > The attached code (TestFilter.scala) works with a 10-column csv dataset, but > gets stuck with a 50-column csv dataset. Both datasets are attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code
[ https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902668#comment-15902668 ] Sean Owen edited comment on SPARK-19875 at 3/9/17 8:32 AM: --- It's easier to inline the code in a comment: {code} package test.spark import org.apache.spark.SparkConf import org.apache.spark.sql.functions._ import org.apache.spark.sql.SparkSession object TestFilter extends App { val conf = new SparkConf().setMaster("local[1]").setAppName("tester") val session = SparkSession.builder().config(conf).getOrCreate() val sc = session.sparkContext val sqlContext = session.sqlContext val df = sqlContext.read.format("csv").load("test50cols.csv") // some map operation on all columns val cols = df.columns.map { col => upper(df.col(col)) } val df2 = df.select(cols: _*) // filter header val filter = (0 until df.columns.length) .foldLeft(lit(false))((e, index) => e.or(df2.col(df2.columns(index)) =!= s"COLUMN${index+1}")) val df3 = df2.filter(filter) // some filter operation val df4 = df3.filter(df3.col(df3.columns(0)).isNotNull) df4.show(100) // stuck here with a 50 column dataset } {code} What do you mean it gets stuck -- do you have a thread dump? was (Author: srowen): It's easier to inline the code in a comment: {code:scala} package test.spark import org.apache.spark.SparkConf import org.apache.spark.sql.functions._ import org.apache.spark.sql.SparkSession object TestFilter extends App { val conf = new SparkConf().setMaster("local[1]").setAppName("tester") val session = SparkSession.builder().config(conf).getOrCreate() val sc = session.sparkContext val sqlContext = session.sqlContext val df = sqlContext.read.format("csv").load("test50cols.csv") // some map operation on all columns val cols = df.columns.map { col => upper(df.col(col)) } val df2 = df.select(cols: _*) // filter header val filter = (0 until df.columns.length) .foldLeft(lit(false))((e, index) => e.or(df2.col(df2.columns(index)) =!= s"COLUMN${index+1}")) val df3 = df2.filter(filter) // some filter operation val df4 = df3.filter(df3.col(df3.columns(0)).isNotNull) df4.show(100) // stuck here with a 50 column dataset } {code} What do you mean it gets stuck -- do you have a thread dump? > Map->filter on many columns gets stuck in constraint inference optimization > code > > > Key: SPARK-19875 > URL: https://issues.apache.org/jira/browse/SPARK-19875 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jay Pranavamurthi > Attachments: test10cols.csv, test50cols.csv, TestFilter.scala > > > The attached code (TestFilter.scala) works with a 10-column csv dataset, but > gets stuck with a 50-column csv dataset. Both datasets are attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org