[jira] [Commented] (SPARK-21807) The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100
[ https://issues.apache.org/jira/browse/SPARK-21807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149600#comment-16149600 ] Andrew Ash commented on SPARK-21807: For reference, here's a stacktrace I'm seeing on a cluster before this change that I think this PR will improve: {noformat} "spark-task-4" #714 prio=5 os_prio=0 tid=0x7fa368031000 nid=0x4d91 runnable [0x7fa24e592000] java.lang.Thread.State: RUNNABLE at org.apache.spark.sql.catalyst.expressions.AttributeReference.equals(namedExpressions.scala:220) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.Add.equals(arithmetic.scala:149) at org.apache.spark.sql.catalyst.expressions.EqualNullSafe.equals(predicates.scala:505) at scala.collection.mutable.FlatHashTable$class.addEntry(FlatHashTable.scala:151) at scala.collection.mutable.HashSet.addEntry(HashSet.scala:40) at scala.collection.mutable.FlatHashTable$class.growTable(FlatHashTable.scala:225) at scala.collection.mutable.FlatHashTable$class.addEntry(FlatHashTable.scala:159) at scala.collection.mutable.HashSet.addEntry(HashSet.scala:40) at scala.collection.mutable.FlatHashTable$class.addElem(FlatHashTable.scala:139) at scala.collection.mutable.HashSet.addElem(HashSet.scala:40) at scala.collection.mutable.HashSet.$plus$eq(HashSet.scala:59) at scala.collection.mutable.HashSet.$plus$eq(HashSet.scala:40) at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59) at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.AbstractSet.$plus$plus$eq(Set.scala:46) at scala.collection.mutable.HashSet.clone(HashSet.scala:83) at scala.collection.mutable.HashSet.clone(HashSet.scala:40) at org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus(ExpressionSet.scala:65) at org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus(ExpressionSet.scala:50) at scala.collection.SetLike$$anonfun$$plus$plus$1.apply(SetLike.scala:141) at scala.collection.SetLike$$anonfun$$plus$plus$1.apply(SetLike.scala:141) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:316) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) at scala.collection.TraversableOnce$class.$div$colon(TraversableOn
[jira] [Commented] (SPARK-21807) The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100
[ https://issues.apache.org/jira/browse/SPARK-21807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137920#comment-16137920 ] eaton commented on SPARK-21807: --- Yes, I have got it, thanks, but maybe we can improve it by reducing the clone time, would you like to have a look for the [https://github.com/apache/spark/pull/19022] > The getAliasedConstraints function in LogicalPlan will take a long time when > number of expressions is greater than 100 > > > Key: SPARK-21807 > URL: https://issues.apache.org/jira/browse/SPARK-21807 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: eaton > > The getAliasedConstraints fuction in LogicalPlan.scala will clone the > expression set when an element added, > and it will take a long time. > Before modified, the cost of getAliasedConstraints is: > 100 expressions: 41 seconds > 150 expressions: 466 seconds > The test is like this: > test("getAliasedConstraints") { > val expressionNum = 150 > val aggExpression = (1 to expressionNum).map(i => Alias(Count(Literal(1)), > s"cnt$i")()) > val aggPlan = Aggregate(Nil, aggExpression, LocalRelation()) > val beginTime = System.currentTimeMillis() > val expressions = aggPlan.validConstraints > println(s"validConstraints cost: ${System.currentTimeMillis() - beginTime}ms") > // The size of Aliased expression is n * (n - 1) / 2 + n > assert( expressions.size === expressionNum * (expressionNum - 1) / 2 + > expressionNum) > } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21807) The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100
[ https://issues.apache.org/jira/browse/SPARK-21807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137819#comment-16137819 ] Liang-Chi Hsieh commented on SPARK-21807: - This is a known issue. Currently we provide a SQL conf {{spark.sql.constraintPropagation.enabled}} that can disable constraint propagation. > The getAliasedConstraints function in LogicalPlan will take a long time when > number of expressions is greater than 100 > > > Key: SPARK-21807 > URL: https://issues.apache.org/jira/browse/SPARK-21807 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: eaton > > The getAliasedConstraints fuction in LogicalPlan.scala will clone the > expression set when an element added, > and it will take a long time. > Before modified, the cost of getAliasedConstraints is: > 100 expressions: 41 seconds > 150 expressions: 466 seconds > The test is like this: > test("getAliasedConstraints") { > val expressionNum = 150 > val aggExpression = (1 to expressionNum).map(i => Alias(Count(Literal(1)), > s"cnt$i")()) > val aggPlan = Aggregate(Nil, aggExpression, LocalRelation()) > val beginTime = System.currentTimeMillis() > val expressions = aggPlan.validConstraints > println(s"validConstraints cost: ${System.currentTimeMillis() - beginTime}ms") > // The size of Aliased expression is n * (n - 1) / 2 + n > assert( expressions.size === expressionNum * (expressionNum - 1) / 2 + > expressionNum) > } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org