[jira] [Commented] (SPARK-21807) The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100

2017-08-31 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149600#comment-16149600
 ] 

Andrew Ash commented on SPARK-21807:


For reference, here's a stacktrace I'm seeing on a cluster before this change 
that I think this PR will improve:

{noformat}
"spark-task-4" #714 prio=5 os_prio=0 tid=0x7fa368031000 nid=0x4d91 runnable 
[0x7fa24e592000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.spark.sql.catalyst.expressions.AttributeReference.equals(namedExpressions.scala:220)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.EqualNullSafe.equals(predicates.scala:505)
at 
scala.collection.mutable.FlatHashTable$class.addEntry(FlatHashTable.scala:151)
at scala.collection.mutable.HashSet.addEntry(HashSet.scala:40)
at 
scala.collection.mutable.FlatHashTable$class.growTable(FlatHashTable.scala:225)
at 
scala.collection.mutable.FlatHashTable$class.addEntry(FlatHashTable.scala:159)
at scala.collection.mutable.HashSet.addEntry(HashSet.scala:40)
at 
scala.collection.mutable.FlatHashTable$class.addElem(FlatHashTable.scala:139)
at scala.collection.mutable.HashSet.addElem(HashSet.scala:40)
at scala.collection.mutable.HashSet.$plus$eq(HashSet.scala:59)
at scala.collection.mutable.HashSet.$plus$eq(HashSet.scala:40)
at 
scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59)
at 
scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.AbstractSet.$plus$plus$eq(Set.scala:46)
at scala.collection.mutable.HashSet.clone(HashSet.scala:83)
at scala.collection.mutable.HashSet.clone(HashSet.scala:40)
at 
org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus(ExpressionSet.scala:65)
at 
org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus(ExpressionSet.scala:50)
at scala.collection.SetLike$$anonfun$$plus$plus$1.apply(SetLike.scala:141)
at scala.collection.SetLike$$anonfun$$plus$plus$1.apply(SetLike.scala:141)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:316)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
at 
scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.scala:151)
   

[jira] [Commented] (SPARK-21807) The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100

2017-08-22 Thread eaton (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137920#comment-16137920
 ] 

eaton commented on SPARK-21807:
---

Yes, I have got it, thanks, but maybe we can improve it by reducing the clone 
time, would you like to have a look for the 
[https://github.com/apache/spark/pull/19022]

> The getAliasedConstraints function  in LogicalPlan will take a long time when 
> number of expressions is greater than 100 
> 
>
> Key: SPARK-21807
> URL: https://issues.apache.org/jira/browse/SPARK-21807
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: eaton
>
> The getAliasedConstraints  fuction in LogicalPlan.scala will clone the 
> expression set when an element added,
> and it will take a long time.
> Before modified, the cost of getAliasedConstraints is:
> 100 expressions:  41 seconds
> 150 expressions:  466 seconds
> The test is like this:
> test("getAliasedConstraints") {
> val expressionNum = 150
> val aggExpression = (1 to expressionNum).map(i => Alias(Count(Literal(1)), 
> s"cnt$i")())
> val aggPlan = Aggregate(Nil, aggExpression, LocalRelation())
> val beginTime = System.currentTimeMillis()
> val expressions = aggPlan.validConstraints
> println(s"validConstraints cost: ${System.currentTimeMillis() - beginTime}ms")
> // The size of Aliased expression is n * (n - 1) / 2 + n
> assert( expressions.size === expressionNum * (expressionNum - 1) / 2 + 
> expressionNum)
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21807) The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100

2017-08-22 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137819#comment-16137819
 ] 

Liang-Chi Hsieh commented on SPARK-21807:
-

This is a known issue. Currently we provide a SQL conf 
{{spark.sql.constraintPropagation.enabled}} that can disable constraint 
propagation.

> The getAliasedConstraints function  in LogicalPlan will take a long time when 
> number of expressions is greater than 100 
> 
>
> Key: SPARK-21807
> URL: https://issues.apache.org/jira/browse/SPARK-21807
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: eaton
>
> The getAliasedConstraints  fuction in LogicalPlan.scala will clone the 
> expression set when an element added,
> and it will take a long time.
> Before modified, the cost of getAliasedConstraints is:
> 100 expressions:  41 seconds
> 150 expressions:  466 seconds
> The test is like this:
> test("getAliasedConstraints") {
> val expressionNum = 150
> val aggExpression = (1 to expressionNum).map(i => Alias(Count(Literal(1)), 
> s"cnt$i")())
> val aggPlan = Aggregate(Nil, aggExpression, LocalRelation())
> val beginTime = System.currentTimeMillis()
> val expressions = aggPlan.validConstraints
> println(s"validConstraints cost: ${System.currentTimeMillis() - beginTime}ms")
> // The size of Aliased expression is n * (n - 1) / 2 + n
> assert( expressions.size === expressionNum * (expressionNum - 1) / 2 + 
> expressionNum)
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org