[ 
https://issues.apache.org/jira/browse/SPARK-25276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-25276:
----------------------------
    Description: 
OutOfMemoryError: GC overhead limit exceeded when using alias 

When run the sql. attached in test.txt, we get Exception  

{{java.lang.OutOfMemoryError: GC overhead limit exceeded at 
java.lang.Class.copyConstructors(Class.java:3130) at 
java.lang.Class.getConstructors(Class.java:1651) at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:387)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:385)
 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) 
at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:385) at 
org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:244)
 at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:190)
 at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188)
 at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189)
 at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at scala.collection.immutable.List.foreach(List.scala:381) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.immutable.List.map(List.scala:285) at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:189)
 at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188)
 at 
org.apache.spark.sql.catalyst.expressions.ExpressionSet.add(ExpressionSet.scala:63)
 at 
org.apache.spark.sql.catalyst.expressions.ExpressionSet$$anonfun$$plus$plus$1.apply(ExpressionSet.scala:79)
 at 
org.apache.spark.sql.catalyst.expressions.ExpressionSet$$anonfun$$plus$plus$1.apply(ExpressionSet.scala:79)
 at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:316) at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at 
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at 
org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus$plus(ExpressionSet.scala:79)
 at 
org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus$plus(ExpressionSet.scala:55)
 at 
org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$getAliasedConstraints$1.apply(LogicalPlan.scala:254)
 at 
org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$getAliasedConstraints$1.apply(LogicalPlan.scala:249)
 at scala.collection.immutable.List.foreach(List.scala:381) at 
org.apache.spark.sql.catalyst.plans.logical.UnaryNode.getAliasedConstraints(LogicalPlan.scala:249)}}

 

 

This looks like due to redundant constrains. Attaching a test to reproduce the 
issue. The test fails with following message

{color:#ff0000}== FAIL: Constraints do not match ==={color}
 {color:#ff0000}Found: isnotnull(z#5),(z#5 > 10),(x#3 > 10),(z#5 <=> x#3),(b#1 
<=> y#4),isnotnull(x#3){color}
 {color:#ff0000}Expected: (x#3 > 10),isnotnull(x#3),(b#1 <=> y#4),(z#5 <=> 
x#3){color}
 {color:#ff0000}== Result =={color}
 {color:#ff0000}Missing: N/A{color}
 {color:#ff0000}Found but not expected: isnotnull(z#5),(z#5 > 10){color}

Here i think as z has a EqualNullSafe comparison with x, so having 
isnotnull(z#5),(z#5 > 10) is redundant. If a query has lot of aliases, this may 
cause overhead.

So i suggest 
[https://github.com/apache/spark/blob/v2.3.2-rc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L254]
 instead of  addAll++= we must just assign =

  was:
 

 

Attaching a test to reproduce the issue. The test fails with following message

{color:#ff0000}== FAIL: Constraints do not match ==={color}
 {color:#ff0000}Found: isnotnull(z#5),(z#5 > 10),(x#3 > 10),(z#5 <=> x#3),(b#1 
<=> y#4),isnotnull(x#3){color}
 {color:#ff0000}Expected: (x#3 > 10),isnotnull(x#3),(b#1 <=> y#4),(z#5 <=> 
x#3){color}
 {color:#ff0000}== Result =={color}
 {color:#ff0000}Missing: N/A{color}
 {color:#ff0000}Found but not expected: isnotnull(z#5),(z#5 > 10){color}

Here i think as z has a EqualNullSafe comparison with x, so having 
isnotnull(z#5),(z#5 > 10) is redundant. If a query has lot of aliases, this may 
cause overhead.

So i suggest 
[https://github.com/apache/spark/blob/v2.3.2-rc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L254]
 instead of  addAll++= we must just assign =


> OutOfMemoryError: GC overhead limit exceeded when using alias
> -------------------------------------------------------------
>
>                 Key: SPARK-25276
>                 URL: https://issues.apache.org/jira/browse/SPARK-25276
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.3.1
>            Reporter: Ajith S
>            Priority: Major
>         Attachments: test.patch
>
>
> OutOfMemoryError: GC overhead limit exceeded when using alias 
> When run the sql. attached in test.txt, we get Exception  
> {{java.lang.OutOfMemoryError: GC overhead limit exceeded at 
> java.lang.Class.copyConstructors(Class.java:3130) at 
> java.lang.Class.getConstructors(Class.java:1651) at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:387)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:385)
>  at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) at 
> org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:385) at 
> org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:244)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:190)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:381) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.immutable.List.map(List.scala:285) at 
> org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:189)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188)
>  at 
> org.apache.spark.sql.catalyst.expressions.ExpressionSet.add(ExpressionSet.scala:63)
>  at 
> org.apache.spark.sql.catalyst.expressions.ExpressionSet$$anonfun$$plus$plus$1.apply(ExpressionSet.scala:79)
>  at 
> org.apache.spark.sql.catalyst.expressions.ExpressionSet$$anonfun$$plus$plus$1.apply(ExpressionSet.scala:79)
>  at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:316) at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972) at 
> org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus$plus(ExpressionSet.scala:79)
>  at 
> org.apache.spark.sql.catalyst.expressions.ExpressionSet.$plus$plus(ExpressionSet.scala:55)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$getAliasedConstraints$1.apply(LogicalPlan.scala:254)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$getAliasedConstraints$1.apply(LogicalPlan.scala:249)
>  at scala.collection.immutable.List.foreach(List.scala:381) at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.getAliasedConstraints(LogicalPlan.scala:249)}}
>  
>  
> This looks like due to redundant constrains. Attaching a test to reproduce 
> the issue. The test fails with following message
> {color:#ff0000}== FAIL: Constraints do not match ==={color}
>  {color:#ff0000}Found: isnotnull(z#5),(z#5 > 10),(x#3 > 10),(z#5 <=> 
> x#3),(b#1 <=> y#4),isnotnull(x#3){color}
>  {color:#ff0000}Expected: (x#3 > 10),isnotnull(x#3),(b#1 <=> y#4),(z#5 <=> 
> x#3){color}
>  {color:#ff0000}== Result =={color}
>  {color:#ff0000}Missing: N/A{color}
>  {color:#ff0000}Found but not expected: isnotnull(z#5),(z#5 > 10){color}
> Here i think as z has a EqualNullSafe comparison with x, so having 
> isnotnull(z#5),(z#5 > 10) is redundant. If a query has lot of aliases, this 
> may cause overhead.
> So i suggest 
> [https://github.com/apache/spark/blob/v2.3.2-rc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L254]
>  instead of  addAll++= we must just assign =



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to