[ 
https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ali Afroozeh updated SPARK-34989:
---------------------------------
    Description: 
One of the main performance bottlenecks in query compilation is overly-generic 
tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} 
(defined in {{TreeNode}}). These methods have an overly-generic implementation 
to iterate over the children and rely on reflection to create new instances. We 
have observed that, especially for queries with large query plans, a 
significant amount of CPU cycles are wasted in these methods. In this PR we 
make these methods more efficient, by delegating the iteration and 
instantiation to concrete node types. The benchmarks show that we can expect 
significant performance improvement in total query compilation time in queries 
with large query plans (from 30-80%) and about 20% on average.
h4. Problem detail

The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be 
more specific, this method:
 * iterates over all the fields of a node using Scala’s product iterator. While 
the iteration is not reflection-based, thanks to the Scala compiler generating 
code for {{Product}}, we create many anonymous functions and visit many nested 
structures (recursive calls).
 The anonymous functions (presumably compiled to Java anonymous inner classes) 
also show up quite high on the list in the object allocation profiles, so we 
are putting unnecessary pressure on GC here.
 * does a lot of comparisons. Basically for each element returned from the 
product iterator, we check if it is a child (contained in the list of children) 
and then transform it. We can avoid that by just iterating over children, but 
in the current implementation, we need to gather all the fields (only transform 
the children) so that we can instantiate the object using the reflection.
 * creates objects using reflection, by delegating to the {{makeCopy}} method, 
which is several orders of magnitude slower than using the constructor.

h4. Solution

The proposed solution in this PR is rather straightforward: we rewrite the 
{{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. 
The default {{withNewChildren}} method suffers from the same problems as 
{{mapChildren}} and we need to make it more efficient by specializing it in 
concrete classes. Similar to how each concrete query plan node already defines 
its children, it should also define how they can be constructed given a new 
list of children. Actually, the implementation is quite simple in most cases 
and is a one-liner thanks to the copy method present in Scala case classes. 
Note that we cannot abstract over the copy method, it’s generated by the 
compiler for case classes if no other type higher in the hierarchy defines it. 
For most concrete nodes, the implementation of {{withNewChildren}} looks like 
this:

 

{{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
copy(children = newChildren)}}

The current {{withNewChildren}} method has two properties that we should 
preserve:
 * It returns the same instance if the provided children are the same as its 
children, i.e., it preserves referential equality.
 * It copies tags and maintains the origin links when a new copy is created.

These properties are hard to enforce in the concrete node type implementation. 
Therefore, we propose a template method {{withNewChildrenInternal}} that should 
be rewritten by the concrete classes and let the {{withNewChildren}} method 
take care of referential equality and copying:

{{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}}
 {{  if (childrenTheSame(children, newChildren)) {}}
 {{    this}}
 {{  } else {}}
 {{    CurrentOrigin.withOrigin(origin) {}}
 {{      val res = withNewChildrenInternal(newChildren)}}
 {{      res.copyTagsFrom(this)}}
{{      res}}
      }
   }
}

With the refactoring done in a previous PR 
([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall 
in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. 
These traits have a more efficient implementation for {{mapChildren}} and 
define a more specialized version of {{withNewChildrenInternal}} that avoids 
creating unnecessary lists. For example, the {{mapChildren}} method in 
{{UnaryLike}} is defined as follows:

{{override final def mapChildren(f: T => T): T = {}}
 {{  val newChild = f(child)}}
 {{  if (newChild fastEquals child) {}}
 {{    this.asInstanceOf[T]}}
 {{  } else {}}
 {{    CurrentOrigin.withOrigin(origin) {}}
 {{      val res = withNewChildInternal(newChild)}}
 {{      res.copyTagsFrom(this.asInstanceOf[T])}}
 {{      res}}
 \{{    }}}
 \{{  }}}
 \{{ }}}
h4. Results

With this PR, we have observed significant performance improvements in query 
compilation time, more specifically in the analysis and optimization phases. 
The table below shows the TPC-DS queries that had more than 25% speedup in 
compilation times. Biggest speedups are observed in queries with large query 
plans.
||Query||Speedup||
|q4|29%|
|q9|81%|
|q14a|31%|
|q14b|28%|
|q22|33%|
|q33|29%|
|q34|25%|
|q39|27%|
|q41|27%|
|q44|26%|
|q47|28%|
|q48|76%|
|q49|46%|
|q56|26%|
|q58|43%|
|q59|46%|
|q60|50%|
|q65|59%|
|q66|46%|
|q67|52%|
|q69|31%|
|q70|30%|
|q96|26%|
|q98|32%|
h4. Binary incompatibility

Making the {{withNewChildren}} abstract in {{TreeNode}} can potentially break 
the binary compatibility of the code compiled against older versions of Spark. 
This is a problem, for example, when users write custom expressions. Making 
\{{withNewChildren}}abstract is the right choice, since it forces all newly 
added expressions to Catalyst implement it in an efficient manner and will 
prevent future regression.
 Please note that we have not completely removed the old implementation and 
renamed it to {{legacyWithNewChildren}}. This method will be removed in the 
future and for now helps the transition. There are expressions such as 
{{UpdateFields}} that have a complex way of defining children. Writing 
{{withNewChildren}} for them requires refactoring the expression. For now, 
these expressions use the old, slow method. In a future PR we address these 
expressions.

 

  was:
One of the main performance bottlenecks in query compilation is overly-generic 
tree transformation methods, namely {{mapChildren}} and {{withNewChildren}} 
(defined in {{TreeNode}}). These methods have an overly-generic implementation 
to iterate over the children and rely on reflection to create new instances. We 
have observed that, especially for queries with large query plans, a 
significant amount of CPU cycles are wasted in these methods. In this PR we 
make these methods more efficient, by delegating the iteration and 
instantiation to concrete node types. The benchmarks show that we can expect 
significant performance improvement in total query compilation time in queries 
with large query plans (from 30-80%) and about 20% on average.
h4. Problem detail

The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To be 
more specific, this method:
 * iterates over all the fields of a node using Scala’s product iterator. While 
the iteration is not reflection-based, thanks to the Scala compiler generating 
code for {{Product}}, we create many anonymous functions and visit many nested 
structures (recursive calls).
 The anonymous functions (presumably compiled to Java anonymous inner classes) 
also show up quite high on the list in the object allocation profiles, so we 
are putting unnecessary pressure on GC here.
 * does a lot of comparisons. Basically for each element returned from the 
product iterator, we check if it is a child (contained in the list of children) 
and then transform it. We can avoid that by just iterating over children, but 
in the current implementation, we need to gather all the fields (only transform 
the children) so that we can instantiate the object using the reflection.
 * creates objects using reflection, by delegating to the {{makeCopy}} method, 
which is several orders of magnitude slower than using the constructor.

h4. Solution

The proposed solution in this PR is rather straightforward: we rewrite the 
{{mapChildren}} method using the {{children}} and {{withNewChildren}} methods. 
The default {{withNewChildren}} method suffers from the same problems as 
{{mapChildren}} and we need to make it more efficient by specializing it in 
concrete classes. Similar to how each concrete query plan node already defines 
its children, it should also define how they can be constructed given a new 
list of children. Actually, the implementation is quite simple in most cases 
and is a one-liner thanks to the copy method present in Scala case classes. 
Note that we cannot abstract over the copy method, it’s generated by the 
compiler for case classes if no other type higher in the hierarchy defines it. 
For most concrete nodes, the implementation of {{withNewChildren}} looks like 
this:

 

{{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
copy(children = newChildren)}}

The current {{withNewChildren}} method has two properties that we should 
preserve:
 * It returns the same instance if the provided children are the same as its 
children, i.e., it preserves referential equality.
 * It copies tags and maintains the origin links when a new copy is created.

These properties are hard to enforce in the concrete node type implementation. 
Therefore, we propose a template method {{withNewChildrenInternal}} that should 
be rewritten by the concrete classes and let the {{withNewChildren}} method 
take care of referential equality and copying:

{{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = {}}
 {{  if (childrenTheSame(children, newChildren)) {}}
 {{    this}}
 {{  } else {}}
 {{    CurrentOrigin.withOrigin(origin) {}}
 {{      val res = withNewChildrenInternal(newChildren)}}
 {{      res.copyTagsFrom(this)}}
{{      res}}

      }

   }

}

With the refactoring done in a previous PR 
([#31932|https://github.com/apache/spark/pull/31932]) most tree node types fall 
in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or {{Ternary}}. 
These traits have a more efficient implementation for {{mapChildren}} and 
define a more specialized version of {{withNewChildrenInternal}} that avoids 
creating unnecessary lists. For example, the {{mapChildren}} method in 
{{UnaryLike}} is defined as follows:

{{override final def mapChildren(f: T => T): T = {}}
 {{  val newChild = f(child)}}
 {{  if (newChild fastEquals child) {}}
 {{    this.asInstanceOf[T]}}
 {{  } else {}}
 {{    CurrentOrigin.withOrigin(origin) {}}
 {{      val res = withNewChildInternal(newChild)}}
 {{      res.copyTagsFrom(this.asInstanceOf[T])}}
 {{      res}}
 \{{    }}}
 \{{  }}}
 \{{ }}}
h4. Results

With this PR, we have observed significant performance improvements in query 
compilation time, more specifically in the analysis and optimization phases. 
The table below shows the TPC-DS queries that had more than 25% speedup in 
compilation times. Biggest speedups are observed in queries with large query 
plans.
||Query||Speedup||
|q4|29%|
|q9|81%|
|q14a|31%|
|q14b|28%|
|q22|33%|
|q33|29%|
|q34|25%|
|q39|27%|
|q41|27%|
|q44|26%|
|q47|28%|
|q48|76%|
|q49|46%|
|q56|26%|
|q58|43%|
|q59|46%|
|q60|50%|
|q65|59%|
|q66|46%|
|q67|52%|
|q69|31%|
|q70|30%|
|q96|26%|
|q98|32%|
h4. Binary incompatibility

Making the {{withNewChildren}} abstract in {{TreeNode}} can potentially break 
the binary compatibility of the code compiled against older versions of Spark. 
This is a problem, for example, when users write custom expressions. Making 
\{{withNewChildren}}abstract is the right choice, since it forces all newly 
added expressions to Catalyst implement it in an efficient manner and will 
prevent future regression.
 Please note that we have not completely removed the old implementation and 
renamed it to {{legacyWithNewChildren}}. This method will be removed in the 
future and for now helps the transition. There are expressions such as 
{{UpdateFields}} that have a complex way of defining children. Writing 
{{withNewChildren}} for them requires refactoring the expression. For now, 
these expressions use the old, slow method. In a future PR we address these 
expressions.

 


> Improve the performance of mapChildren and withNewChildren methods
> ------------------------------------------------------------------
>
>                 Key: SPARK-34989
>                 URL: https://issues.apache.org/jira/browse/SPARK-34989
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Ali Afroozeh
>            Priority: Major
>
> One of the main performance bottlenecks in query compilation is 
> overly-generic tree transformation methods, namely {{mapChildren}} and 
> {{withNewChildren}} (defined in {{TreeNode}}). These methods have an 
> overly-generic implementation to iterate over the children and rely on 
> reflection to create new instances. We have observed that, especially for 
> queries with large query plans, a significant amount of CPU cycles are wasted 
> in these methods. In this PR we make these methods more efficient, by 
> delegating the iteration and instantiation to concrete node types. The 
> benchmarks show that we can expect significant performance improvement in 
> total query compilation time in queries with large query plans (from 30-80%) 
> and about 20% on average.
> h4. Problem detail
> The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To 
> be more specific, this method:
>  * iterates over all the fields of a node using Scala’s product iterator. 
> While the iteration is not reflection-based, thanks to the Scala compiler 
> generating code for {{Product}}, we create many anonymous functions and visit 
> many nested structures (recursive calls).
>  The anonymous functions (presumably compiled to Java anonymous inner 
> classes) also show up quite high on the list in the object allocation 
> profiles, so we are putting unnecessary pressure on GC here.
>  * does a lot of comparisons. Basically for each element returned from the 
> product iterator, we check if it is a child (contained in the list of 
> children) and then transform it. We can avoid that by just iterating over 
> children, but in the current implementation, we need to gather all the fields 
> (only transform the children) so that we can instantiate the object using the 
> reflection.
>  * creates objects using reflection, by delegating to the {{makeCopy}} 
> method, which is several orders of magnitude slower than using the 
> constructor.
> h4. Solution
> The proposed solution in this PR is rather straightforward: we rewrite the 
> {{mapChildren}} method using the {{children}} and {{withNewChildren}} 
> methods. The default {{withNewChildren}} method suffers from the same 
> problems as {{mapChildren}} and we need to make it more efficient by 
> specializing it in concrete classes. Similar to how each concrete query plan 
> node already defines its children, it should also define how they can be 
> constructed given a new list of children. Actually, the implementation is 
> quite simple in most cases and is a one-liner thanks to the copy method 
> present in Scala case classes. Note that we cannot abstract over the copy 
> method, it’s generated by the compiler for case classes if no other type 
> higher in the hierarchy defines it. For most concrete nodes, the 
> implementation of {{withNewChildren}} looks like this:
>  
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> copy(children = newChildren)}}
> The current {{withNewChildren}} method has two properties that we should 
> preserve:
>  * It returns the same instance if the provided children are the same as its 
> children, i.e., it preserves referential equality.
>  * It copies tags and maintains the origin links when a new copy is created.
> These properties are hard to enforce in the concrete node type 
> implementation. Therefore, we propose a template method 
> {{withNewChildrenInternal}} that should be rewritten by the concrete classes 
> and let the {{withNewChildren}} method take care of referential equality and 
> copying:
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> {}}
>  {{  if (childrenTheSame(children, newChildren)) {}}
>  {{    this}}
>  {{  } else {}}
>  {{    CurrentOrigin.withOrigin(origin) {}}
>  {{      val res = withNewChildrenInternal(newChildren)}}
>  {{      res.copyTagsFrom(this)}}
> {{      res}}
>       }
>    }
> }
> With the refactoring done in a previous PR 
> ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types 
> fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or 
> {{Ternary}}. These traits have a more efficient implementation for 
> {{mapChildren}} and define a more specialized version of 
> {{withNewChildrenInternal}} that avoids creating unnecessary lists. For 
> example, the {{mapChildren}} method in {{UnaryLike}} is defined as follows:
> {{override final def mapChildren(f: T => T): T = {}}
>  {{  val newChild = f(child)}}
>  {{  if (newChild fastEquals child) {}}
>  {{    this.asInstanceOf[T]}}
>  {{  } else {}}
>  {{    CurrentOrigin.withOrigin(origin) {}}
>  {{      val res = withNewChildInternal(newChild)}}
>  {{      res.copyTagsFrom(this.asInstanceOf[T])}}
>  {{      res}}
>  \{{    }}}
>  \{{  }}}
>  \{{ }}}
> h4. Results
> With this PR, we have observed significant performance improvements in query 
> compilation time, more specifically in the analysis and optimization phases. 
> The table below shows the TPC-DS queries that had more than 25% speedup in 
> compilation times. Biggest speedups are observed in queries with large query 
> plans.
> ||Query||Speedup||
> |q4|29%|
> |q9|81%|
> |q14a|31%|
> |q14b|28%|
> |q22|33%|
> |q33|29%|
> |q34|25%|
> |q39|27%|
> |q41|27%|
> |q44|26%|
> |q47|28%|
> |q48|76%|
> |q49|46%|
> |q56|26%|
> |q58|43%|
> |q59|46%|
> |q60|50%|
> |q65|59%|
> |q66|46%|
> |q67|52%|
> |q69|31%|
> |q70|30%|
> |q96|26%|
> |q98|32%|
> h4. Binary incompatibility
> Making the {{withNewChildren}} abstract in {{TreeNode}} can potentially break 
> the binary compatibility of the code compiled against older versions of 
> Spark. This is a problem, for example, when users write custom expressions. 
> Making \{{withNewChildren}}abstract is the right choice, since it forces all 
> newly added expressions to Catalyst implement it in an efficient manner and 
> will prevent future regression.
>  Please note that we have not completely removed the old implementation and 
> renamed it to {{legacyWithNewChildren}}. This method will be removed in the 
> future and for now helps the transition. There are expressions such as 
> {{UpdateFields}} that have a complex way of defining children. Writing 
> {{withNewChildren}} for them requires refactoring the expression. For now, 
> these expressions use the old, slow method. In a future PR we address these 
> expressions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to