[ 
https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317132#comment-17317132
 ] 

Apache Spark commented on SPARK-34989:
--------------------------------------

User 'dbaliafroozeh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32030

> Improve the performance of mapChildren and withNewChildren methods
> ------------------------------------------------------------------
>
>                 Key: SPARK-34989
>                 URL: https://issues.apache.org/jira/browse/SPARK-34989
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Ali Afroozeh
>            Priority: Major
>
> One of the main performance bottlenecks in query compilation is 
> overly-generic tree transformation methods, namely {{mapChildren}} and 
> {{withNewChildren}} (defined in {{TreeNode}}). These methods have an 
> overly-generic implementation to iterate over the children and rely on 
> reflection to create new instances. We have observed that, especially for 
> queries with large query plans, a significant amount of CPU cycles are wasted 
> in these methods. In this PR we make these methods more efficient, by 
> delegating the iteration and instantiation to concrete node types. The 
> benchmarks show that we can expect significant performance improvement in 
> total query compilation time in queries with large query plans (from 30-80%) 
> and about 20% on average.
> h4. Problem detail
> The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To 
> be more specific, this method:
>  * iterates over all the fields of a node using Scala’s product iterator. 
> While the iteration is not reflection-based, thanks to the Scala compiler 
> generating code for {{Product}}, we create many anonymous functions and visit 
> many nested structures (recursive calls).
>  The anonymous functions (presumably compiled to Java anonymous inner 
> classes) also show up quite high on the list in the object allocation 
> profiles, so we are putting unnecessary pressure on GC here.
>  * does a lot of comparisons. Basically for each element returned from the 
> product iterator, we check if it is a child (contained in the list of 
> children) and then transform it. We can avoid that by just iterating over 
> children, but in the current implementation, we need to gather all the fields 
> (only transform the children) so that we can instantiate the object using the 
> reflection.
>  * creates objects using reflection, by delegating to the {{makeCopy}} 
> method, which is several orders of magnitude slower than using the 
> constructor.
> h4. Solution
> The proposed solution in this PR is rather straightforward: we rewrite the 
> {{mapChildren}} method using the {{children}} and {{withNewChildren}} 
> methods. The default {{withNewChildren}} method suffers from the same 
> problems as {{mapChildren}} and we need to make it more efficient by 
> specializing it in concrete classes. Similar to how each concrete query plan 
> node already defines its children, it should also define how they can be 
> constructed given a new list of children. Actually, the implementation is 
> quite simple in most cases and is a one-liner thanks to the copy method 
> present in Scala case classes. Note that we cannot abstract over the copy 
> method, it’s generated by the compiler for case classes if no other type 
> higher in the hierarchy defines it. For most concrete nodes, the 
> implementation of {{withNewChildren}} looks like this:
>  
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> copy(children = newChildren)}}
> The current {{withNewChildren}} method has two properties that we should 
> preserve:
>  * It returns the same instance if the provided children are the same as its 
> children, i.e., it preserves referential equality.
>  * It copies tags and maintains the origin links when a new copy is created.
> These properties are hard to enforce in the concrete node type 
> implementation. Therefore, we propose a template method 
> {{withNewChildrenInternal}} that should be rewritten by the concrete classes 
> and let the {{withNewChildren}} method take care of referential equality and 
> copying:
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> {}}
> {{  if (childrenTheSame(children, newChildren)) {}}
> {{    this}}
> {{  } else {}}
> {{    CurrentOrigin.withOrigin(origin) {}}
> {{      val res = withNewChildrenInternal(newChildren)}}
> {{      res.copyTagsFrom(this)}}
> {{      res}}
> {{    } }}
> {{  } }}
> {{ } }}
> With the refactoring done in a previous PR 
> ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types 
> fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or 
> {{Ternary}}. These traits have a more efficient implementation for 
> {{mapChildren}} and define a more specialized version of 
> {{withNewChildrenInternal}} that avoids creating unnecessary lists. For 
> example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows:
> {{override final def mapChildren(f: T => T): T = {}}
> {{  val newChild = f(child)}}
> {{  if (newChild fastEquals child) {}}
> {{    this.asInstanceOf[T]}}
> {{  } else {}}
> {{    CurrentOrigin.withOrigin(origin) {}}
> {{      val res = withNewChildInternal(newChild)}}
> {{      res.copyTagsFrom(this.asInstanceOf[T])}}
> {{      res}}
> {{    }}}
> {{  }}}
> {{ }}}
> h4. Results
> With this PR, we have observed significant performance improvements in query 
> compilation time, more specifically in the analysis and optimization phases. 
> The table below shows the TPC-DS queries that had more than 25% speedup in 
> compilation times. Biggest speedups are observed in queries with large query 
> plans.
> ||Query||Speedup||
> |q4|29%|
> |q9|81%|
> |q14a|31%|
> |q14b|28%|
> |q22|33%|
> |q33|29%|
> |q34|25%|
> |q39|27%|
> |q41|27%|
> |q44|26%|
> |q47|28%|
> |q48|76%|
> |q49|46%|
> |q56|26%|
> |q58|43%|
> |q59|46%|
> |q60|50%|
> |q65|59%|
> |q66|46%|
> |q67|52%|
> |q69|31%|
> |q70|30%|
> |q96|26%|
> |q98|32%|
> h4. Binary incompatibility
> Making the {{withNewChildren}} abstract in {{TreeNode}} can potentially break 
> the binary compatibility of the code compiled against older versions of 
> Spark. This is a problem, for example, when users write custom expressions. 
> Making \{{withNewChildren}}abstract is the right choice, since it forces all 
> newly added expressions to Catalyst implement it in an efficient manner and 
> will prevent future regression.
>  Please note that we have not completely removed the old implementation and 
> renamed it to {{legacyWithNewChildren}}. This method will be removed in the 
> future and for now helps the transition. There are expressions such as 
> {{UpdateFields}} that have a complex way of defining children. Writing 
> {{withNewChildren}} for them requires refactoring the expression. For now, 
> these expressions use the old, slow method. In a future PR we address these 
> expressions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to