[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6673


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-17 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-112928014
  
Thanks!  Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111419792
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111419788
  
  [Test build #34762 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34762/console)
 for   PR 6673 at commit 
[`38cd425`](https://github.com/apache/spark/commit/38cd42549ea39188216d43998230efa474bf546b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111391748
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111389970
  
  [Test build #34762 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34762/consoleFull)
 for   PR 6673 at commit 
[`38cd425`](https://github.com/apache/spark/commit/38cd42549ea39188216d43998230efa474bf546b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111389827
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111389804
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread MickDavies
Github user MickDavies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6673#discussion_r32294499
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -62,6 +62,8 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
   /** Returns a Seq of the children of this node */
   def children: Seq[BaseType]
 
+  lazy val childrenSet:Set[TreeNode[_]] = children.toSet
--- End diff --

Good idea.

Thanks

Mick

> On 12 Jun 2015, at 05:34, Wenchen Fan  wrote:
> 
> In 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
:
> 
> > @@ -62,6 +62,8 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] {
> >/** Returns a Seq of the children of this node */
> >def children: Seq[BaseType]
> >  
> > +  lazy val childrenSet:Set[TreeNode[_]] = children.toSet
> how about naming it containsChild so that we can use it like 
containsChild(arg)?
> 
> —
> Reply to this email directly or view it on GitHub 
.
> 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111388861
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111388848
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/6673#discussion_r32289725
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -62,6 +62,8 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
   /** Returns a Seq of the children of this node */
   def children: Seq[BaseType]
 
+  lazy val childrenSet:Set[TreeNode[_]] = children.toSet
--- End diff --

how about naming it `containsChild` so that we can use it like 
`containsChild(arg)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111319398
  
  [Test build #34737 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34737/console)
 for   PR 6673 at commit 
[`e6be8be`](https://github.com/apache/spark/commit/e6be8beb72936bb457343e6c9bd0dfddeede040f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111319399
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111319280
  
  [Test build #34737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34737/consoleFull)
 for   PR 6673 at commit 
[`e6be8be`](https://github.com/apache/spark/commit/e6be8beb72936bb457343e6c9bd0dfddeede040f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111319205
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111319198
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111319056
  
I think its reasonable to assume `children` will not change as `TreeNode`s 
are generally expected to be immutable.  I'd add this requirement to the 
method's scala doc though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/6673#discussion_r32284055
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -62,6 +62,8 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
   /** Returns a Seq of the children of this node */
   def children: Seq[BaseType]
 
+  lazy val childrenSet:Set[TreeNode[_]] = children.toSet
--- End diff --

Space after `:`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-11 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-111318721
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-109747013
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-07 Thread MickDavies
Github user MickDavies commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-109746939
  
Regarding this lazy val from children, given lack of guarantee that 
children produces an unchanging sequence? It looks like the intention is that 
children will not change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-07 Thread MickDavies
GitHub user MickDavies reopened a pull request:

https://github.com/apache/spark/pull/6673

[SPARK-8077][SQL] Optimization for  TreeNodes with large numbers of children

For example large IN clauses

Large IN clauses are parsed very slowly. For example SQL below (10K items 
in IN) takes 45-50s.

s"""SELECT * FROM Person WHERE ForeName IN ('${(1 to 1).map("n" + 
_).mkString("','")}')"""

This is principally due to TreeNode which repeatedly call contains on 
children, where children in this case is a List that is 10K long. In effect 
parsing for large IN clauses is O(N squared).
A lazily initialised Set based on children for contains reduces parse time 
to around 2.5s

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MickDavies/spark SPARK-8077

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6673


commit e6be8beb72936bb457343e6c9bd0dfddeede040f
Author: Michael Davies 
Date:   2015-06-05T18:02:15Z

SPARK-8077: Optimization for  TreeNodes with large numbers of children

For example large IN clauses




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-05 Thread MickDavies
Github user MickDavies commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-109383608
  
I need to run some more tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-05 Thread MickDavies
Github user MickDavies closed the pull request at:

https://github.com/apache/spark/pull/6673


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6673#issuecomment-109382818
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

2015-06-05 Thread MickDavies
GitHub user MickDavies opened a pull request:

https://github.com/apache/spark/pull/6673

[SPARK-8077][SQL] Optimization for  TreeNodes with large numbers of children

For example large IN clauses

Large IN clauses are parsed very slowly. For example SQL below (10K items 
in IN) takes 45-50s.

s"""SELECT * FROM Person WHERE ForeName IN ('${(1 to 1).map("n" + 
_).mkString("','")}')"""

This is principally due to TreeNode which repeatedly call contains on 
children, where children in this case is a List that is 10K long. In effect 
parsing for large IN clauses is O(N squared).
A lazily initialised Set based on children for contains reduces parse time 
to around 2.5s

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MickDavies/spark SPARK-8077

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6673


commit e6be8beb72936bb457343e6c9bd0dfddeede040f
Author: Michael Davies 
Date:   2015-06-05T18:02:15Z

SPARK-8077: Optimization for  TreeNodes with large numbers of children

For example large IN clauses




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org