[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10977


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176675120
  
I've merged this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176663786
  
**[Test build #2473 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2473/consoleFull)**
 for PR 10977 at commit 
[`ffa8e6b`](https://github.com/apache/spark/commit/ffa8e6b55df95cea1faf0363bd6eb4090cbe5313).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176642840
  
**[Test build #50359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50359/consoleFull)**
 for PR 10977 at commit 
[`ffa8e6b`](https://github.com/apache/spark/commit/ffa8e6b55df95cea1faf0363bd6eb4090cbe5313).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176643027
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50359/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176643022
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176638543
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50358/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176638542
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176636537
  
**[Test build #50359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50359/consoleFull)**
 for PR 10977 at commit 
[`ffa8e6b`](https://github.com/apache/spark/commit/ffa8e6b55df95cea1faf0363bd6eb4090cbe5313).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176636705
  
**[Test build #2473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2473/consoleFull)**
 for PR 10977 at commit 
[`ffa8e6b`](https://github.com/apache/spark/commit/ffa8e6b55df95cea1faf0363bd6eb4090cbe5313).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176635918
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-29 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176635007
  
@rxin Added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176620545
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50351/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176620544
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176620367
  
**[Test build #50351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50351/consoleFull)**
 for PR 10977 at commit 
[`951e2cd`](https://github.com/apache/spark/commit/951e2cd8de2c6b2f5b5bd1f5dbdb6b6fad6bc4a4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176619981
  
The issue here is that we want test cases that are targeted for specific 
problems, and the Hive ones are not (they are just a giant blackbox we took to 
bootstrap coverage).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176619523
  
Sure it's a good idea to use that golden file infrastructure. Given we 
don't have that yet, can you just add a test case?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176618967
  
The way we managed HiveCompatibilitySuite is actually better than our unit 
tests (sql query and golden results in text format). Even if we don't want to 
be compatible with Hive, it's still good to have those tests (don't call them 
HiveCompatibilitySuite), and also managed in similar way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176609355
  
Thanks - can you add a test case that would catch this? In the long run, we 
don't want to rely on HiveCompatibilitySuite.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176601896
  
**[Test build #50351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50351/consoleFull)**
 for PR 10977 at commit 
[`951e2cd`](https://github.com/apache/spark/commit/951e2cd8de2c6b2f5b5bd1f5dbdb6b6fad6bc4a4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176601291
  
When there is no aggregate functions, it did not generate the output using 
resultExpression, which have only literals (I was mislead by the comment in 
AggregateIterator). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10977#issuecomment-176600324
  
What's the bug?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/10977

[SPARK-13031] [SQL] cleanup codegen and improve test coverage

1. enable whole stage codegen during tests even there is only one operator 
supports that.
2. split doProduce() into two APIs: upstream() and doProduce()
3. generate prefix for fresh names of each operator
4. pass UnsafeRow to parent directly (avoid getters and create UnsafeRow 
again)
5. fix bugs and tests.

This PR re-open #10944 and fix the bug.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark gen_refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10977.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10977


commit b4db00675bc3c51ddf8735cace522a5d771cf7e2
Author: Davies Liu 
Date:   2016-01-27T07:43:40Z

cleanup whole stage codegen

commit 70a7c7edd1988c7dd69bccc8e563c9943775bd2c
Author: Davies Liu 
Date:   2016-01-27T23:22:33Z

improve stddev and variance

commit 951e2cd8de2c6b2f5b5bd1f5dbdb6b6fad6bc4a4
Author: Davies Liu 
Date:   2016-01-29T06:24:05Z

fix aggregation without functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10944#discussion_r51193081
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 
---
@@ -42,10 +44,16 @@ trait CodegenSupport extends SparkPlan {
   private var parent: CodegenSupport = null
 
   /**
-* Returns an input RDD of InternalRow and Java source code to process 
them.
+* Returns the RDD of InternalRow which generates the input rows.
 */
-  def produce(ctx: CodegenContext, parent: CodegenSupport): 
(RDD[InternalRow], String) = {
+  def upstream(): RDD[InternalRow]
+
+  /**
+* Returns Java source code to process the rows from upstream.
+*/
+  def produce(ctx: CodegenContext, parent: CodegenSupport): String = {
 this.parent = parent
+ctx.freshNamePrefix = nodeName
--- End diff --

Good question. Right now we didn't have unique id for SparkPlan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10944


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-176437234
  
Thanks, merging this into master to unblock others, comments will be 
addressed by follow up. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10944#discussion_r51193347
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -172,38 +184,42 @@ case class Range(
   s"$number > $partitionEnd"
 }
 
-val rdd = sqlContext.sparkContext.parallelize(0 until numSlices, 
numSlices)
-  .map(i => InternalRow(i))
+ctx.addNewFunction("initRange",
+  s"""
+| private void initRange(int idx) {
+|   $BigInt index = $BigInt.valueOf(idx);
+|   $BigInt numSlice = $BigInt.valueOf(${numSlices}L);
+|   $BigInt numElement = $BigInt.valueOf(${numElements.toLong}L);
+|   $BigInt step = $BigInt.valueOf(${step}L);
+|   $BigInt start = $BigInt.valueOf(${start}L);
+|
+|   $BigInt st = 
index.multiply(numElement).divide(numSlice).multiply(step).add(start);
+|   if (st.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) {
+| $number = Long.MAX_VALUE;
+|   } else if (st.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) {
+| $number = Long.MIN_VALUE;
+|   } else {
+| $number = st.longValue();
+|   }
+|
+|   $BigInt end = 
index.add($BigInt.ONE).multiply(numElement).divide(numSlice)
+| .multiply(step).add(start);
+|   if (end.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) {
+| $partitionEnd = Long.MAX_VALUE;
+|   } else if (end.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) 
{
+| $partitionEnd = Long.MIN_VALUE;
+|   } else {
+| $partitionEnd = end.longValue();
+|   }
+| }
+   """.stripMargin)
 
-val code = s"""
+s"""
   | // initialize Range
   | if (!$initTerm) {
   |   $initTerm = true;
   |   if (input.hasNext()) {
-  | $BigInt index = $BigInt.valueOf(((InternalRow) 
input.next()).getInt(0));
-  | $BigInt numSlice = $BigInt.valueOf(${numSlices}L);
-  | $BigInt numElement = $BigInt.valueOf(${numElements.toLong}L);
-  | $BigInt step = $BigInt.valueOf(${step}L);
-  | $BigInt start = $BigInt.valueOf(${start}L);
-  |
-  | $BigInt st = 
index.multiply(numElement).divide(numSlice).multiply(step).add(start);
-  | if (st.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) {
-  |   $number = Long.MAX_VALUE;
-  | } else if (st.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) {
-  |   $number = Long.MIN_VALUE;
-  | } else {
-  |   $number = st.longValue();
-  | }
-  |
-  | $BigInt end = 
index.add($BigInt.ONE).multiply(numElement).divide(numSlice)
-  |   .multiply(step).add(start);
-  | if (end.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) {
-  |   $partitionEnd = Long.MAX_VALUE;
-  | } else if (end.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) 
{
-  |   $partitionEnd = Long.MIN_VALUE;
-  | } else {
-  |   $partitionEnd = end.longValue();
-  | }
+  | initRange(((InternalRow) input.next()).getInt(0));
--- End diff --

This is the easy way to make Range work, or you have to find the partition 
id.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-176333057
  
The generated code has a ton of extra new lines. If this is easy to remove, 
it will help the debuggability of this.

LGTM, feel free to address the comments in follow ups.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10944#discussion_r51166689
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 
---
@@ -42,10 +44,16 @@ trait CodegenSupport extends SparkPlan {
   private var parent: CodegenSupport = null
 
   /**
-* Returns an input RDD of InternalRow and Java source code to process 
them.
+* Returns the RDD of InternalRow which generates the input rows.
 */
-  def produce(ctx: CodegenContext, parent: CodegenSupport): 
(RDD[InternalRow], String) = {
+  def upstream(): RDD[InternalRow]
+
+  /**
+* Returns Java source code to process the rows from upstream.
+*/
+  def produce(ctx: CodegenContext, parent: CodegenSupport): String = {
 this.parent = parent
+ctx.freshNamePrefix = nodeName
--- End diff --

Do we have a notion of node id? This is not going to help when we have many 
joins in one pipeline. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10944#discussion_r51166567
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -172,38 +184,42 @@ case class Range(
   s"$number > $partitionEnd"
 }
 
-val rdd = sqlContext.sparkContext.parallelize(0 until numSlices, 
numSlices)
-  .map(i => InternalRow(i))
+ctx.addNewFunction("initRange",
+  s"""
+| private void initRange(int idx) {
+|   $BigInt index = $BigInt.valueOf(idx);
+|   $BigInt numSlice = $BigInt.valueOf(${numSlices}L);
+|   $BigInt numElement = $BigInt.valueOf(${numElements.toLong}L);
+|   $BigInt step = $BigInt.valueOf(${step}L);
+|   $BigInt start = $BigInt.valueOf(${start}L);
+|
+|   $BigInt st = 
index.multiply(numElement).divide(numSlice).multiply(step).add(start);
+|   if (st.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) {
+| $number = Long.MAX_VALUE;
+|   } else if (st.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) {
+| $number = Long.MIN_VALUE;
+|   } else {
+| $number = st.longValue();
+|   }
+|
+|   $BigInt end = 
index.add($BigInt.ONE).multiply(numElement).divide(numSlice)
+| .multiply(step).add(start);
+|   if (end.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) {
+| $partitionEnd = Long.MAX_VALUE;
+|   } else if (end.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) 
{
+| $partitionEnd = Long.MIN_VALUE;
+|   } else {
+| $partitionEnd = end.longValue();
+|   }
+| }
+   """.stripMargin)
 
-val code = s"""
+s"""
   | // initialize Range
   | if (!$initTerm) {
   |   $initTerm = true;
   |   if (input.hasNext()) {
-  | $BigInt index = $BigInt.valueOf(((InternalRow) 
input.next()).getInt(0));
-  | $BigInt numSlice = $BigInt.valueOf(${numSlices}L);
-  | $BigInt numElement = $BigInt.valueOf(${numElements.toLong}L);
-  | $BigInt step = $BigInt.valueOf(${step}L);
-  | $BigInt start = $BigInt.valueOf(${start}L);
-  |
-  | $BigInt st = 
index.multiply(numElement).divide(numSlice).multiply(step).add(start);
-  | if (st.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) {
-  |   $number = Long.MAX_VALUE;
-  | } else if (st.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) {
-  |   $number = Long.MIN_VALUE;
-  | } else {
-  |   $number = st.longValue();
-  | }
-  |
-  | $BigInt end = 
index.add($BigInt.ONE).multiply(numElement).divide(numSlice)
-  |   .multiply(step).add(start);
-  | if (end.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) {
-  |   $partitionEnd = Long.MAX_VALUE;
-  | } else if (end.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) 
{
-  |   $partitionEnd = Long.MIN_VALUE;
-  | } else {
-  |   $partitionEnd = end.longValue();
-  | }
+  | initRange(((InternalRow) input.next()).getInt(0));
--- End diff --

Why does this need an input? The range should know it's a leaf and not need 
this no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-28 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10944#discussion_r51166249
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 
---
@@ -162,37 +206,48 @@ case class InputAdapter(child: SparkPlan) extends 
LeafNode with CodegenSupport {
 case class WholeStageCodegen(plan: CodegenSupport, children: 
Seq[SparkPlan])
   extends SparkPlan with CodegenSupport {
 
+  override def supportCodegen: Boolean = false
+
   override def output: Seq[Attribute] = plan.output
+  override def outputPartitioning: Partitioning = plan.outputPartitioning
+  override def outputOrdering: Seq[SortOrder] = plan.outputOrdering
+
+  override def doPrepare(): Unit = {
+plan.prepare()
+  }
 
   override def doExecute(): RDD[InternalRow] = {
 val ctx = new CodegenContext
-val (rdd, code) = plan.produce(ctx, this)
+val code = plan.produce(ctx, this)
 val references = ctx.references.toArray
 val source = s"""
   public Object generate(Object[] references) {
--- End diff --

Can you comment what references mean? references is a very generic name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-176013043
  
@nongli  Does this one looks good to you? this one blocks others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175895315
  
Here is the generated code for `sqlContext.range(values).filter("(id & 1) = 
1").count()`

```
/* 001 */
/* 002 */ public Object generate(Object[] references) {
/* 003 */   return new GeneratedIterator(references);
/* 004 */ }
/* 005 */
/* 006 */ class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */
/* 008 */   private Object[] references;
/* 009 */   private boolean TungstenAggregate_initAgg0;
/* 010 */   private boolean TungstenAggregate_bufIsNull1;
/* 011 */   private long TungstenAggregate_bufValue2;
/* 012 */   private boolean Range_initRange6;
/* 013 */   private long Range_partitionEnd7;
/* 014 */   private long Range_number8;
/* 015 */   private boolean Range_overflow9;
/* 016 */   private UnsafeRow TungstenAggregate_result29;
/* 017 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
TungstenAggregate_holder30;
/* 018 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
TungstenAggregate_rowWriter31;
/* 019 */
/* 020 */   private void initRange(int idx) {
/* 021 */ java.math.BigInteger index = 
java.math.BigInteger.valueOf(idx);
/* 022 */ java.math.BigInteger numSlice = 
java.math.BigInteger.valueOf(1L);
/* 023 */ java.math.BigInteger numElement = 
java.math.BigInteger.valueOf(209715200L);
/* 024 */ java.math.BigInteger step = java.math.BigInteger.valueOf(1L);
/* 025 */ java.math.BigInteger start = java.math.BigInteger.valueOf(0L);
/* 026 */
/* 027 */ java.math.BigInteger st = 
index.multiply(numElement).divide(numSlice).multiply(step).add(start);
/* 028 */ if 
(st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
/* 029 */   Range_number8 = Long.MAX_VALUE;
/* 030 */ } else if 
(st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
/* 031 */   Range_number8 = Long.MIN_VALUE;
/* 032 */ } else {
/* 033 */   Range_number8 = st.longValue();
/* 034 */ }
/* 035 */
/* 036 */ java.math.BigInteger end = 
index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice)
/* 037 */ .multiply(step).add(start);
/* 038 */ if 
(end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
/* 039 */   Range_partitionEnd7 = Long.MAX_VALUE;
/* 040 */ } else if 
(end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
/* 041 */   Range_partitionEnd7 = Long.MIN_VALUE;
/* 042 */ } else {
/* 043 */   Range_partitionEnd7 = end.longValue();
/* 044 */ }
/* 045 */   }
/* 046 */
/* 047 */
/* 048 */   private void TungstenAggregate_doAgg5() {
/* 049 */ // initialize aggregation buffer
/* 050 */ /* 0 */
/* 051 */
/* 052 */ TungstenAggregate_bufIsNull1 = false;
/* 053 */ TungstenAggregate_bufValue2 = 0L;
/* 054 */
/* 055 */
/* 056 */
/* 057 */ // initialize Range
/* 058 */ if (!Range_initRange6) {
/* 059 */   Range_initRange6 = true;
/* 060 */   if (input.hasNext()) {
/* 061 */ initRange(((InternalRow) input.next()).getInt(0));
/* 062 */   } else {
/* 063 */ return;
/* 064 */   }
/* 065 */ }
/* 066 */
/* 067 */ while (!Range_overflow9 && Range_number8 < 
Range_partitionEnd7) {
/* 068 */   long Range_value10 = Range_number8;
/* 069 */   Range_number8 += 1L;
/* 070 */   if (Range_number8 < Range_value10 ^ 1L < 0) {
/* 071 */ Range_overflow9 = true;
/* 072 */   }
/* 073 */
/* 074 */   /* ((input[0, bigint] & 1) = 1) */
/* 075 */   /* (input[0, bigint] & 1) */
/* 076 */   /* input[0, bigint] */
/* 077 */
/* 078 */   /* 1 */
/* 079 */
/* 080 */   long Filter_value14 = -1L;
/* 081 */   Filter_value14 = Range_value10 & 1L;
/* 082 */   /* 1 */
/* 083 */
/* 084 */   boolean Filter_value12 = false;
/* 085 */   Filter_value12 = Filter_value14 == 1L;
/* 086 */   if (!false && Filter_value12) {
/* 087 */
/* 088 */
/* 089 */
/* 090 */
/* 091 */ // do aggregate and update aggregation buffer
/* 092 */
/* 093 */ /* (input[0, bigint] + 1) */
/* 094 */ /* input[0, bigint] */
/* 095 */
/* 096 */ /* 1 */
/* 097 */
/* 098 */ long TungstenAggregate_value22 = -1L;
/* 099 */ TungstenAggregate_value22 = TungstenAggregate_bufValue2 + 
1L;
/* 100 */ TungstenAggregate_bufIsNull1 = false;
/* 101 */ TungstenAggregate_bufValue2 = TungstenAggregate_value22;
/* 102 */
/* 103 */
/* 104 */
/* 105 */  

[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175892206
  
Can you paste some generated code? (Actually I think that's useful for most 
of the code gen prs).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175885463
  
cc @nongli @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175512911
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175512913
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50183/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175512703
  
**[Test build #50183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50183/consoleFull)**
 for PR 10944 at commit 
[`b4db006`](https://github.com/apache/spark/commit/b4db00675bc3c51ddf8735cace522a5d771cf7e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175478481
  
**[Test build #50183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50183/consoleFull)**
 for PR 10944 at commit 
[`b4db006`](https://github.com/apache/spark/commit/b4db00675bc3c51ddf8735cace522a5d771cf7e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/10944

[SPARK-13031] [SQL] cleanup codegen and improve test coverage

1. enable whole stage codegen during tests even there is only one operator 
supports that.
2. split doProduce() into two APIs: upstream() and doProduce()
3. generate prefix for fresh names of each operator
4. pass UnsafeRow to parent directly (avoid getters and create UnsafeRow 
again)
5. fix bugs and tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark gen_refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10944.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10944


commit b4db00675bc3c51ddf8735cace522a5d771cf7e2
Author: Davies Liu 
Date:   2016-01-27T07:43:40Z

cleanup whole stage codegen




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org