[ https://issues.apache.org/jira/browse/SPARK-25276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ajith S updated SPARK-25276: ---------------------------- Description: OutOfMemoryError: GC overhead limit exceeded when using alias When run the sql. attached in test.txt, we get Exception {{java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Class.copyConstructors(Class.java:3130) at java.lang.Class.getConstructors(Class.java:1651) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:387) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:385) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:385) at org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:244) at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:190) at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188) at org.apache.spark.sql.catalyst.expressions.ExpressionSet.add(ExpressionSet.scala:63) at org.apache.spark.sql.catalyst.expressions.ExpressionSet$$anonfun$$plus$plus$1.apply(ExpressionSet.scala:79) at org.apache.spark.sql.catalyst.expressions.ExpressionSet$$anonfun$$plus$plus$1.apply(ExpressionSet.scala:79) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:316) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus$plus(ExpressionSet.scala:79) at org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus$plus(ExpressionSet.scala:55) at org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$getAliasedConstraints$1.apply(LogicalPlan.scala:254) at org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$getAliasedConstraints$1.apply(LogicalPlan.scala:249) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.plans.logical.UnaryNode.getAliasedConstraints(LogicalPlan.scala:249)}} This looks like due to redundant constrains. Attaching a test to reproduce the issue. The test fails with following message {color:#ff0000}== FAIL: Constraints do not match ==={color} {color:#ff0000}Found: isnotnull(z#5),(z#5 > 10),(x#3 > 10),(z#5 <=> x#3),(b#1 <=> y#4),isnotnull(x#3){color} {color:#ff0000}Expected: (x#3 > 10),isnotnull(x#3),(b#1 <=> y#4),(z#5 <=> x#3){color} {color:#ff0000}== Result =={color} {color:#ff0000}Missing: N/A{color} {color:#ff0000}Found but not expected: isnotnull(z#5),(z#5 > 10){color} Here i think as z has a EqualNullSafe comparison with x, so having isnotnull(z#5),(z#5 > 10) is redundant. If a query has lot of aliases, this may cause overhead. So i suggest [https://github.com/apache/spark/blob/v2.3.2-rc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L254] instead of addAll++= we must just assign = was: Attaching a test to reproduce the issue. The test fails with following message {color:#ff0000}== FAIL: Constraints do not match ==={color} {color:#ff0000}Found: isnotnull(z#5),(z#5 > 10),(x#3 > 10),(z#5 <=> x#3),(b#1 <=> y#4),isnotnull(x#3){color} {color:#ff0000}Expected: (x#3 > 10),isnotnull(x#3),(b#1 <=> y#4),(z#5 <=> x#3){color} {color:#ff0000}== Result =={color} {color:#ff0000}Missing: N/A{color} {color:#ff0000}Found but not expected: isnotnull(z#5),(z#5 > 10){color} Here i think as z has a EqualNullSafe comparison with x, so having isnotnull(z#5),(z#5 > 10) is redundant. If a query has lot of aliases, this may cause overhead. So i suggest [https://github.com/apache/spark/blob/v2.3.2-rc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L254] instead of addAll++= we must just assign = > OutOfMemoryError: GC overhead limit exceeded when using alias > ------------------------------------------------------------- > > Key: SPARK-25276 > URL: https://issues.apache.org/jira/browse/SPARK-25276 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0, 2.3.1 > Reporter: Ajith S > Priority: Major > Attachments: test.patch > > > OutOfMemoryError: GC overhead limit exceeded when using alias > When run the sql. attached in test.txt, we get Exception > {{java.lang.OutOfMemoryError: GC overhead limit exceeded at > java.lang.Class.copyConstructors(Class.java:3130) at > java.lang.Class.getConstructors(Class.java:1651) at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:387) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:385) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) at > org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:385) at > org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:190) > at > org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.immutable.List.map(List.scala:285) at > org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188) > at > org.apache.spark.sql.catalyst.expressions.ExpressionSet.add(ExpressionSet.scala:63) > at > org.apache.spark.sql.catalyst.expressions.ExpressionSet$$anonfun$$plus$plus$1.apply(ExpressionSet.scala:79) > at > org.apache.spark.sql.catalyst.expressions.ExpressionSet$$anonfun$$plus$plus$1.apply(ExpressionSet.scala:79) > at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:316) at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at > org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus$plus(ExpressionSet.scala:79) > at > org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus$plus(ExpressionSet.scala:55) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$getAliasedConstraints$1.apply(LogicalPlan.scala:254) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$getAliasedConstraints$1.apply(LogicalPlan.scala:249) > at scala.collection.immutable.List.foreach(List.scala:381) at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode.getAliasedConstraints(LogicalPlan.scala:249)}} > > > This looks like due to redundant constrains. Attaching a test to reproduce > the issue. The test fails with following message > {color:#ff0000}== FAIL: Constraints do not match ==={color} > {color:#ff0000}Found: isnotnull(z#5),(z#5 > 10),(x#3 > 10),(z#5 <=> > x#3),(b#1 <=> y#4),isnotnull(x#3){color} > {color:#ff0000}Expected: (x#3 > 10),isnotnull(x#3),(b#1 <=> y#4),(z#5 <=> > x#3){color} > {color:#ff0000}== Result =={color} > {color:#ff0000}Missing: N/A{color} > {color:#ff0000}Found but not expected: isnotnull(z#5),(z#5 > 10){color} > Here i think as z has a EqualNullSafe comparison with x, so having > isnotnull(z#5),(z#5 > 10) is redundant. If a query has lot of aliases, this > may cause overhead. > So i suggest > [https://github.com/apache/spark/blob/v2.3.2-rc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L254] > instead of addAll++= we must just assign = -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org