[ https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317132#comment-17317132 ]
Apache Spark commented on SPARK-34989: -------------------------------------- User 'dbaliafroozeh' has created a pull request for this issue: https://github.com/apache/spark/pull/32030 > Improve the performance of mapChildren and withNewChildren methods > ------------------------------------------------------------------ > > Key: SPARK-34989 > URL: https://issues.apache.org/jira/browse/SPARK-34989 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Ali Afroozeh > Priority: Major > > One of the main performance bottlenecks in query compilation is > overly-generic tree transformation methods, namely {{mapChildren}} and > {{withNewChildren}} (defined in {{TreeNode}}). These methods have an > overly-generic implementation to iterate over the children and rely on > reflection to create new instances. We have observed that, especially for > queries with large query plans, a significant amount of CPU cycles are wasted > in these methods. In this PR we make these methods more efficient, by > delegating the iteration and instantiation to concrete node types. The > benchmarks show that we can expect significant performance improvement in > total query compilation time in queries with large query plans (from 30-80%) > and about 20% on average. > h4. Problem detail > The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To > be more specific, this method: > * iterates over all the fields of a node using Scala’s product iterator. > While the iteration is not reflection-based, thanks to the Scala compiler > generating code for {{Product}}, we create many anonymous functions and visit > many nested structures (recursive calls). > The anonymous functions (presumably compiled to Java anonymous inner > classes) also show up quite high on the list in the object allocation > profiles, so we are putting unnecessary pressure on GC here. > * does a lot of comparisons. Basically for each element returned from the > product iterator, we check if it is a child (contained in the list of > children) and then transform it. We can avoid that by just iterating over > children, but in the current implementation, we need to gather all the fields > (only transform the children) so that we can instantiate the object using the > reflection. > * creates objects using reflection, by delegating to the {{makeCopy}} > method, which is several orders of magnitude slower than using the > constructor. > h4. Solution > The proposed solution in this PR is rather straightforward: we rewrite the > {{mapChildren}} method using the {{children}} and {{withNewChildren}} > methods. The default {{withNewChildren}} method suffers from the same > problems as {{mapChildren}} and we need to make it more efficient by > specializing it in concrete classes. Similar to how each concrete query plan > node already defines its children, it should also define how they can be > constructed given a new list of children. Actually, the implementation is > quite simple in most cases and is a one-liner thanks to the copy method > present in Scala case classes. Note that we cannot abstract over the copy > method, it’s generated by the compiler for case classes if no other type > higher in the hierarchy defines it. For most concrete nodes, the > implementation of {{withNewChildren}} looks like this: > > {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = > copy(children = newChildren)}} > The current {{withNewChildren}} method has two properties that we should > preserve: > * It returns the same instance if the provided children are the same as its > children, i.e., it preserves referential equality. > * It copies tags and maintains the origin links when a new copy is created. > These properties are hard to enforce in the concrete node type > implementation. Therefore, we propose a template method > {{withNewChildrenInternal}} that should be rewritten by the concrete classes > and let the {{withNewChildren}} method take care of referential equality and > copying: > {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = > {}} > {{ if (childrenTheSame(children, newChildren)) {}} > {{ this}} > {{ } else {}} > {{ CurrentOrigin.withOrigin(origin) {}} > {{ val res = withNewChildrenInternal(newChildren)}} > {{ res.copyTagsFrom(this)}} > {{ res}} > {{ } }} > {{ } }} > {{ } }} > With the refactoring done in a previous PR > ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types > fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or > {{Ternary}}. These traits have a more efficient implementation for > {{mapChildren}} and define a more specialized version of > {{withNewChildrenInternal}} that avoids creating unnecessary lists. For > example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows: > {{override final def mapChildren(f: T => T): T = {}} > {{ val newChild = f(child)}} > {{ if (newChild fastEquals child) {}} > {{ this.asInstanceOf[T]}} > {{ } else {}} > {{ CurrentOrigin.withOrigin(origin) {}} > {{ val res = withNewChildInternal(newChild)}} > {{ res.copyTagsFrom(this.asInstanceOf[T])}} > {{ res}} > {{ }}} > {{ }}} > {{ }}} > h4. Results > With this PR, we have observed significant performance improvements in query > compilation time, more specifically in the analysis and optimization phases. > The table below shows the TPC-DS queries that had more than 25% speedup in > compilation times. Biggest speedups are observed in queries with large query > plans. > ||Query||Speedup|| > |q4|29%| > |q9|81%| > |q14a|31%| > |q14b|28%| > |q22|33%| > |q33|29%| > |q34|25%| > |q39|27%| > |q41|27%| > |q44|26%| > |q47|28%| > |q48|76%| > |q49|46%| > |q56|26%| > |q58|43%| > |q59|46%| > |q60|50%| > |q65|59%| > |q66|46%| > |q67|52%| > |q69|31%| > |q70|30%| > |q96|26%| > |q98|32%| > h4. Binary incompatibility > Making the {{withNewChildren}} abstract in {{TreeNode}} can potentially break > the binary compatibility of the code compiled against older versions of > Spark. This is a problem, for example, when users write custom expressions. > Making \{{withNewChildren}}abstract is the right choice, since it forces all > newly added expressions to Catalyst implement it in an efficient manner and > will prevent future regression. > Please note that we have not completely removed the old implementation and > renamed it to {{legacyWithNewChildren}}. This method will be removed in the > future and for now helps the transition. There are expressions such as > {{UpdateFields}} that have a complex way of defining children. Writing > {{withNewChildren}} for them requires refactoring the expression. For now, > these expressions use the old, slow method. In a future PR we address these > expressions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org