[jira] [Commented] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods

2021-04-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319700#comment-17319700
 ] 

Herman van Hövell commented on SPARK-34989:
---

We are seeing some pretty serious query compilation overhead (5-10s) for 
non-trivial (massive) queries. This change tends to cut this in half. It also 
works well for smaller queries. On top of this we are working on a couple of 
other changes that should reduce things even more.

It is a pretty big change to the TreeNode code base. We opted to make it a 
breaking change, so Spark dev's won't get lazy and use the old code path. 
However I do understand that this is a bit annoying for folks who have 
implemented their own TreeNode. In that case they have an escape hatch in the 
form of the TreeNode.legacyWithNewChildren function. Let me know if this 
addresses your concerns.

> Improve the performance of mapChildren and withNewChildren methods
> --
>
> Key: SPARK-34989
> URL: https://issues.apache.org/jira/browse/SPARK-34989
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Ali Afroozeh
>Assignee: Ali Afroozeh
>Priority: Major
> Fix For: 3.2.0
>
>
> One of the main performance bottlenecks in query compilation is 
> overly-generic tree transformation methods, namely {{mapChildren}} and 
> {{withNewChildren}} (defined in {{TreeNode}}). These methods have an 
> overly-generic implementation to iterate over the children and rely on 
> reflection to create new instances. We have observed that, especially for 
> queries with large query plans, a significant amount of CPU cycles are wasted 
> in these methods. In this PR we make these methods more efficient, by 
> delegating the iteration and instantiation to concrete node types. The 
> benchmarks show that we can expect significant performance improvement in 
> total query compilation time in queries with large query plans (from 30-80%) 
> and about 20% on average.
> h4. Problem detail
> The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To 
> be more specific, this method:
>  * iterates over all the fields of a node using Scala’s product iterator. 
> While the iteration is not reflection-based, thanks to the Scala compiler 
> generating code for {{Product}}, we create many anonymous functions and visit 
> many nested structures (recursive calls).
>  The anonymous functions (presumably compiled to Java anonymous inner 
> classes) also show up quite high on the list in the object allocation 
> profiles, so we are putting unnecessary pressure on GC here.
>  * does a lot of comparisons. Basically for each element returned from the 
> product iterator, we check if it is a child (contained in the list of 
> children) and then transform it. We can avoid that by just iterating over 
> children, but in the current implementation, we need to gather all the fields 
> (only transform the children) so that we can instantiate the object using the 
> reflection.
>  * creates objects using reflection, by delegating to the {{makeCopy}} 
> method, which is several orders of magnitude slower than using the 
> constructor.
> h4. Solution
> The proposed solution in this PR is rather straightforward: we rewrite the 
> {{mapChildren}} method using the {{children}} and {{withNewChildren}} 
> methods. The default {{withNewChildren}} method suffers from the same 
> problems as {{mapChildren}} and we need to make it more efficient by 
> specializing it in concrete classes. Similar to how each concrete query plan 
> node already defines its children, it should also define how they can be 
> constructed given a new list of children. Actually, the implementation is 
> quite simple in most cases and is a one-liner thanks to the copy method 
> present in Scala case classes. Note that we cannot abstract over the copy 
> method, it’s generated by the compiler for case classes if no other type 
> higher in the hierarchy defines it. For most concrete nodes, the 
> implementation of {{withNewChildren}} looks like this:
>  
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> copy(children = newChildren)}}
> The current {{withNewChildren}} method has two properties that we should 
> preserve:
>  * It returns the same instance if the provided children are the same as its 
> children, i.e., it preserves referential equality.
>  * It copies tags and maintains the origin links when a new copy is created.
> These properties are hard to enforce in the concrete node type 
> implementation. Therefore, we propose a template method 
> {{withNewChildrenInternal}} that should be rewritten by the concrete classes 
> and let the {{withNewChildren}} method take care of referential 

[jira] [Commented] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods

2021-04-12 Thread Thomas Graves (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319679#comment-17319679
 ] 

Thomas Graves commented on SPARK-34989:
---

I just saw this go, all the performance numbers here are in % improvement.  
What kind of raw times are you seeing for query compilation?

Sounds like a nice improvement, I haven't had any issues with query compilation 
times so I'm curious what the numbers are and I assume people are seeing issues 
with this ? Its a pretty major changes to the base TreeNode class so anyone 
extending those is now broken.

> Improve the performance of mapChildren and withNewChildren methods
> --
>
> Key: SPARK-34989
> URL: https://issues.apache.org/jira/browse/SPARK-34989
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Ali Afroozeh
>Assignee: Ali Afroozeh
>Priority: Major
> Fix For: 3.2.0
>
>
> One of the main performance bottlenecks in query compilation is 
> overly-generic tree transformation methods, namely {{mapChildren}} and 
> {{withNewChildren}} (defined in {{TreeNode}}). These methods have an 
> overly-generic implementation to iterate over the children and rely on 
> reflection to create new instances. We have observed that, especially for 
> queries with large query plans, a significant amount of CPU cycles are wasted 
> in these methods. In this PR we make these methods more efficient, by 
> delegating the iteration and instantiation to concrete node types. The 
> benchmarks show that we can expect significant performance improvement in 
> total query compilation time in queries with large query plans (from 30-80%) 
> and about 20% on average.
> h4. Problem detail
> The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To 
> be more specific, this method:
>  * iterates over all the fields of a node using Scala’s product iterator. 
> While the iteration is not reflection-based, thanks to the Scala compiler 
> generating code for {{Product}}, we create many anonymous functions and visit 
> many nested structures (recursive calls).
>  The anonymous functions (presumably compiled to Java anonymous inner 
> classes) also show up quite high on the list in the object allocation 
> profiles, so we are putting unnecessary pressure on GC here.
>  * does a lot of comparisons. Basically for each element returned from the 
> product iterator, we check if it is a child (contained in the list of 
> children) and then transform it. We can avoid that by just iterating over 
> children, but in the current implementation, we need to gather all the fields 
> (only transform the children) so that we can instantiate the object using the 
> reflection.
>  * creates objects using reflection, by delegating to the {{makeCopy}} 
> method, which is several orders of magnitude slower than using the 
> constructor.
> h4. Solution
> The proposed solution in this PR is rather straightforward: we rewrite the 
> {{mapChildren}} method using the {{children}} and {{withNewChildren}} 
> methods. The default {{withNewChildren}} method suffers from the same 
> problems as {{mapChildren}} and we need to make it more efficient by 
> specializing it in concrete classes. Similar to how each concrete query plan 
> node already defines its children, it should also define how they can be 
> constructed given a new list of children. Actually, the implementation is 
> quite simple in most cases and is a one-liner thanks to the copy method 
> present in Scala case classes. Note that we cannot abstract over the copy 
> method, it’s generated by the compiler for case classes if no other type 
> higher in the hierarchy defines it. For most concrete nodes, the 
> implementation of {{withNewChildren}} looks like this:
>  
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> copy(children = newChildren)}}
> The current {{withNewChildren}} method has two properties that we should 
> preserve:
>  * It returns the same instance if the provided children are the same as its 
> children, i.e., it preserves referential equality.
>  * It copies tags and maintains the origin links when a new copy is created.
> These properties are hard to enforce in the concrete node type 
> implementation. Therefore, we propose a template method 
> {{withNewChildrenInternal}} that should be rewritten by the concrete classes 
> and let the {{withNewChildren}} method take care of referential equality and 
> copying:
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> {}}
>  {{  if (childrenFastEquals(children, newChildren)) {}}
>  {{    this}}
>  {{  } else {}}
>  {{    CurrentOrigin.withOrigin(origin) {}}
>  {{      val res = withNewChildr

[jira] [Commented] (SPARK-34989) Improve the performance of mapChildren and withNewChildren methods

2021-04-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317132#comment-17317132
 ] 

Apache Spark commented on SPARK-34989:
--

User 'dbaliafroozeh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32030

> Improve the performance of mapChildren and withNewChildren methods
> --
>
> Key: SPARK-34989
> URL: https://issues.apache.org/jira/browse/SPARK-34989
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Ali Afroozeh
>Priority: Major
>
> One of the main performance bottlenecks in query compilation is 
> overly-generic tree transformation methods, namely {{mapChildren}} and 
> {{withNewChildren}} (defined in {{TreeNode}}). These methods have an 
> overly-generic implementation to iterate over the children and rely on 
> reflection to create new instances. We have observed that, especially for 
> queries with large query plans, a significant amount of CPU cycles are wasted 
> in these methods. In this PR we make these methods more efficient, by 
> delegating the iteration and instantiation to concrete node types. The 
> benchmarks show that we can expect significant performance improvement in 
> total query compilation time in queries with large query plans (from 30-80%) 
> and about 20% on average.
> h4. Problem detail
> The {{mapChildren}} method in {{TreeNode}} is overly generic and costly. To 
> be more specific, this method:
>  * iterates over all the fields of a node using Scala’s product iterator. 
> While the iteration is not reflection-based, thanks to the Scala compiler 
> generating code for {{Product}}, we create many anonymous functions and visit 
> many nested structures (recursive calls).
>  The anonymous functions (presumably compiled to Java anonymous inner 
> classes) also show up quite high on the list in the object allocation 
> profiles, so we are putting unnecessary pressure on GC here.
>  * does a lot of comparisons. Basically for each element returned from the 
> product iterator, we check if it is a child (contained in the list of 
> children) and then transform it. We can avoid that by just iterating over 
> children, but in the current implementation, we need to gather all the fields 
> (only transform the children) so that we can instantiate the object using the 
> reflection.
>  * creates objects using reflection, by delegating to the {{makeCopy}} 
> method, which is several orders of magnitude slower than using the 
> constructor.
> h4. Solution
> The proposed solution in this PR is rather straightforward: we rewrite the 
> {{mapChildren}} method using the {{children}} and {{withNewChildren}} 
> methods. The default {{withNewChildren}} method suffers from the same 
> problems as {{mapChildren}} and we need to make it more efficient by 
> specializing it in concrete classes. Similar to how each concrete query plan 
> node already defines its children, it should also define how they can be 
> constructed given a new list of children. Actually, the implementation is 
> quite simple in most cases and is a one-liner thanks to the copy method 
> present in Scala case classes. Note that we cannot abstract over the copy 
> method, it’s generated by the compiler for case classes if no other type 
> higher in the hierarchy defines it. For most concrete nodes, the 
> implementation of {{withNewChildren}} looks like this:
>  
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> copy(children = newChildren)}}
> The current {{withNewChildren}} method has two properties that we should 
> preserve:
>  * It returns the same instance if the provided children are the same as its 
> children, i.e., it preserves referential equality.
>  * It copies tags and maintains the origin links when a new copy is created.
> These properties are hard to enforce in the concrete node type 
> implementation. Therefore, we propose a template method 
> {{withNewChildrenInternal}} that should be rewritten by the concrete classes 
> and let the {{withNewChildren}} method take care of referential equality and 
> copying:
> {{override def withNewChildren(newChildren: Seq[LogicalPlan]): LogicalPlan = 
> {}}
> {{  if (childrenTheSame(children, newChildren)) {}}
> {{    this}}
> {{  } else {}}
> {{    CurrentOrigin.withOrigin(origin) {}}
> {{      val res = withNewChildrenInternal(newChildren)}}
> {{      res.copyTagsFrom(this)}}
> {{      res}}
> {{    } }}
> {{  } }}
> {{ } }}
> With the refactoring done in a previous PR 
> ([#31932|https://github.com/apache/spark/pull/31932]) most tree node types 
> fall in one of the categories of {{Leaf}}, {{Unary}}, {{Binary}} or 
> {{Ternary}}. These traits have a more efficient implementation for 
> {