[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code

2020-11-25 Thread Asif (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238411#comment-17238411
 ] 

Asif edited comment on SPARK-19875 at 11/25/20, 6:02 PM:
-

[~maropu], [~sameerag]  [~jay.pranavamurthi] I have generated a PR for 
SPARK-33152 which fixes the OOM or unreasonable compile time in queries.

The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185]

I cannot get any body for code review.

The explanation of the logic used is in the PR.

If needed we can go through the code together. This is going to be used by 
workday in production.


was (Author: ashahid7):
[~maropu], [~sameerag]  [~jay.pranavamurthi] I have generated a PR for 
SPARK-3152 which fixes the OOM or unreasonable compile time in queries.

The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185]

I cannot get any body for code review.

The explanation of the logic used is in the PR.

If needed we can go through the code together. This is going to be used by 
workday in production.

> Map->filter on many columns gets stuck in constraint inference optimization 
> code
> 
>
> Key: SPARK-19875
> URL: https://issues.apache.org/jira/browse/SPARK-19875
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>Priority: Major
>  Labels: bulk-closed
> Attachments: TestFilter.scala, test10cols.csv, test50cols.csv
>
>
> The attached code (TestFilter.scala) works with a 10-column csv dataset, but 
> gets stuck with a 50-column csv dataset. Both datasets are attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code

2020-11-24 Thread Asif (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238411#comment-17238411
 ] 

Asif edited comment on SPARK-19875 at 11/24/20, 11:43 PM:
--

[~maropu], [~sameerag]  [~jay.pranavamurthi] I have generated a PR for 
SPARK-3152 which fixes the OOM or unreasonable compile time in queries.

The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185]

I cannot get any body for code review.

The explanation of the logic used is in the PR.

If needed we can go through the code together. This is going to be used by 
workday in production.


was (Author: ashahid7):
[~maropu], [~sameerag]  [~jay.pranavamurthi] I have generated a PR for 
SPARK-3152 which fixes the OOM or unreasonable compile time in queries.

The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185]

I cannot get any body for code review.

The explanation of the logic used is in the PR

> Map->filter on many columns gets stuck in constraint inference optimization 
> code
> 
>
> Key: SPARK-19875
> URL: https://issues.apache.org/jira/browse/SPARK-19875
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>Priority: Major
>  Labels: bulk-closed
> Attachments: TestFilter.scala, test10cols.csv, test50cols.csv
>
>
> The attached code (TestFilter.scala) works with a 10-column csv dataset, but 
> gets stuck with a 50-column csv dataset. Both datasets are attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code

2020-11-24 Thread Asif (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238411#comment-17238411
 ] 

Asif edited comment on SPARK-19875 at 11/24/20, 11:42 PM:
--

[~maropu], [~sameerag]  [~jay.pranavamurthi] I have generated a PR for 
SPARK-3152 which fixes the OOM or unreasonable compile time in queries.

The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185]

I cannot get any body for code review.

The explanation of the logic used is in the PR


was (Author: ashahid7):
[~maropu] I have generated a PR for SPARK-3152 which fixes the OOM or 
unreasonable compile time in queries.

The PR is [pr-for-spark-33152|https://github.com/apache/spark/pull/30185]

I cannot get any body for code review.

The explanation of the logic used is in the PR

> Map->filter on many columns gets stuck in constraint inference optimization 
> code
> 
>
> Key: SPARK-19875
> URL: https://issues.apache.org/jira/browse/SPARK-19875
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>Priority: Major
>  Labels: bulk-closed
> Attachments: TestFilter.scala, test10cols.csv, test50cols.csv
>
>
> The attached code (TestFilter.scala) works with a 10-column csv dataset, but 
> gets stuck with a 50-column csv dataset. Both datasets are attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code

2017-03-14 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925332#comment-15925332
 ] 

Takeshi Yamamuro edited comment on SPARK-19875 at 3/15/17 12:32 AM:


If you understand a concrete reason about the bug you described, could you 
update the description in this JIRA so that we could fix in future.


was (Author: maropu):
Hi, Sameer. If you understand a concrete reason about the bug you described, 
could you update the description in this JIRA so that we could fix in future. 
Thanks.

> Map->filter on many columns gets stuck in constraint inference optimization 
> code
> 
>
> Key: SPARK-19875
> URL: https://issues.apache.org/jira/browse/SPARK-19875
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
> Attachments: test10cols.csv, test50cols.csv, TestFilter.scala
>
>
> The attached code (TestFilter.scala) works with a 10-column csv dataset, but 
> gets stuck with a 50-column csv dataset. Both datasets are attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code

2017-03-09 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902668#comment-15902668
 ] 

Sean Owen edited comment on SPARK-19875 at 3/9/17 8:32 AM:
---

It's easier to inline the code in a comment:

{code}
package test.spark

import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession

object TestFilter extends App {

  val conf = new SparkConf().setMaster("local[1]").setAppName("tester")

  val session = SparkSession.builder().config(conf).getOrCreate()
  val sc = session.sparkContext
  val sqlContext = session.sqlContext

  val df = sqlContext.read.format("csv").load("test50cols.csv")

  // some map operation on all columns
  val cols = df.columns.map { col => upper(df.col(col))  }
  val df2 = df.select(cols: _*)

  // filter header
  val filter = (0 until df.columns.length)
.foldLeft(lit(false))((e, index) => e.or(df2.col(df2.columns(index)) =!= 
s"COLUMN${index+1}"))
  val df3 = df2.filter(filter)

  // some filter operation
  val df4 = df3.filter(df3.col(df3.columns(0)).isNotNull)

  df4.show(100)  // stuck here with a 50 column dataset

}
{code}

What do you mean it gets stuck -- do you have a thread dump?


was (Author: srowen):
It's easier to inline the code in a comment:

{code:scala}
package test.spark

import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession

object TestFilter extends App {

  val conf = new SparkConf().setMaster("local[1]").setAppName("tester")

  val session = SparkSession.builder().config(conf).getOrCreate()
  val sc = session.sparkContext
  val sqlContext = session.sqlContext

  val df = sqlContext.read.format("csv").load("test50cols.csv")

  // some map operation on all columns
  val cols = df.columns.map { col => upper(df.col(col))  }
  val df2 = df.select(cols: _*)

  // filter header
  val filter = (0 until df.columns.length)
.foldLeft(lit(false))((e, index) => e.or(df2.col(df2.columns(index)) =!= 
s"COLUMN${index+1}"))
  val df3 = df2.filter(filter)

  // some filter operation
  val df4 = df3.filter(df3.col(df3.columns(0)).isNotNull)

  df4.show(100)  // stuck here with a 50 column dataset

}
{code}

What do you mean it gets stuck -- do you have a thread dump?

> Map->filter on many columns gets stuck in constraint inference optimization 
> code
> 
>
> Key: SPARK-19875
> URL: https://issues.apache.org/jira/browse/SPARK-19875
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
> Attachments: test10cols.csv, test50cols.csv, TestFilter.scala
>
>
> The attached code (TestFilter.scala) works with a 10-column csv dataset, but 
> gets stuck with a 50-column csv dataset. Both datasets are attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org