[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11636


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199497880
  
Merged into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199491169
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53706/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199491164
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199490844
  
**[Test build #53706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53706/consoleFull)**
 for PR 11636 at commit 
[`6b07e69`](https://github.com/apache/spark/commit/6b07e69f7b5cfb11edd324f0a6ddde8e1eb85f33).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199451226
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56886878
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,16 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
+val columnLIVAssigns = (colLocalVars zip colVars).map { case 
(localName, name) =>
--- End diff --

I see, LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199446045
  
**[Test build #53706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53706/consoleFull)**
 for PR 11636 at commit 
[`6b07e69`](https://github.com/apache/spark/commit/6b07e69f7b5cfb11edd324f0a6ddde8e1eb85f33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread kiszk
Github user kiszk commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199444953
  
Here are benchmark results in the latest code.

Without this PR
```
model name  : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Partitioned Table:  Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

---
Read data column  484 /  529 32.5  
30.7   1.0X
Read partition column 323 /  368 48.7  
20.5   1.5X
Read both columns 617 /  654 25.5  
39.2   0.8X

```

With this PR
```
model name  : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Partitioned Table:  Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

---
Read data column  412 /  444 38.2  
26.2   1.0X
Read partition column 290 /  327 54.2  
18.4   1.4X
Read both columns 516 /  613 30.5  
32.8   0.8X
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56883452
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,19 +241,29 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
+val columnLIVAssigns = (colLocalVars zip colVars).map { case 
(localName, name) =>
+  s"$columnVectorClz $localName = $name;" }
+
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val columns1 = (output zip colLocalVars).map { case (attr, colVar) =>
+  genCodeColumnVector(ctx, colVar, rowidx, attr.dataType, 
attr.nullable) }
 val scanBatches = ctx.freshName("processBatches")
 ctx.addNewFunction(scanBatches,
   s"""
   | private void $scanBatches() throws java.io.IOException {
   |  while (true) {
   | int numRows = $batch.numRows();
-  | if ($idx == 0) $numOutputRows.add(numRows);
+  | if ($idx == 0) {
+  |   ${columnAssigns.mkString("", "\n", "\n")}
+  |   $numOutputRows.add(numRows);
+  | }
   |
+  | ${columnLIVAssigns.mkString("", "\n", "\n")}
--- End diff --

For now, I will not use this. In the future, I may revisit this scala 
replacement regarding all of possible variables in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56866480
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,19 +241,29 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
+val columnLIVAssigns = (colLocalVars zip colVars).map { case 
(localName, name) =>
+  s"$columnVectorClz $localName = $name;" }
+
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val columns1 = (output zip colLocalVars).map { case (attr, colVar) =>
+  genCodeColumnVector(ctx, colVar, rowidx, attr.dataType, 
attr.nullable) }
 val scanBatches = ctx.freshName("processBatches")
 ctx.addNewFunction(scanBatches,
   s"""
   | private void $scanBatches() throws java.io.IOException {
   |  while (true) {
   | int numRows = $batch.numRows();
-  | if ($idx == 0) $numOutputRows.add(numRows);
+  | if ($idx == 0) {
+  |   ${columnAssigns.mkString("", "\n", "\n")}
+  |   $numOutputRows.add(numRows);
+  | }
   |
+  | ${columnLIVAssigns.mkString("", "\n", "\n")}
--- End diff --

Do we still need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199356553
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53678/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199356521
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199355887
  
**[Test build #53678 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53678/consoleFull)**
 for PR 11636 at commit 
[`9ec61ce`](https://github.com/apache/spark/commit/9ec61ceb0eb5b55b558e1cf4475735d4cd9d4008).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56840405
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,19 +241,29 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
+val columnLIVAssigns = (colLocalVars zip colVars).map { case 
(localName, name) =>
+  s"$columnVectorClz $localName = $name;" }
+
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val columns1 = (output zip colLocalVars).map { case (attr, colVar) =>
+  genCodeColumnVector(ctx, colVar, rowidx, attr.dataType, 
attr.nullable) }
 val scanBatches = ctx.freshName("processBatches")
 ctx.addNewFunction(scanBatches,
   s"""
   | private void $scanBatches() throws java.io.IOException {
   |  while (true) {
   | int numRows = $batch.numRows();
-  | if ($idx == 0) $numOutputRows.add(numRows);
+  | if ($idx == 0) {
+  |   ${columnAssigns.mkString("", "\n", "\n")}
+  |   $numOutputRows.add(numRows);
+  | }
   |
+  | ${columnLIVAssigns.mkString("", "\n", "\n")}
   | while (!shouldStop() && $idx < numRows) {
-  |   InternalRow $row = $batch.getRow($idx++);
+  |   int $rowidx = $idx++;
--- End diff --

@viirya , we need this. Here is an example of the generated code. In the 
code, line 93 corresponds to what you pointed out.
Lines 95 and 97 refers to ```scan_rowIdx``` that keeps pre-increment value 
of ```scan_batchIndex```. If we use ```batchIndex``` that has been already 
incremented, the argument for ```getUTF8String()``` and ```getInt()``` is 
incorrect.

```java
/* 092 */   while (!shouldStop() && scan_batchIdx < numRows) {
/* 093 */ int scan_rowIdx = scan_batchIdx++;
/* 094 */ /* columnVector[scan_col0, scan_rowIdx, string] */
/* 095 */ UTF8String scan_value = 
scan_col0.getUTF8String(scan_rowIdx);
/* 096 */ /* columnVector[scan_col1, scan_rowIdx, int] */
/* 097 */ int scan_value1 = scan_col1.getInt(scan_rowIdx);
/* 098 */ scan_holder.reset();
/* 099 */ 
/* 100 */ scan_rowWriter.write(0, scan_value);
/* 101 */ 
/* 102 */ scan_rowWriter.write(1, scan_value1);
/* 103 */ scan_result.setTotalSize(scan_holder.totalSize());
/* 104 */ append(scan_result);
/* 105 */   }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199318644
  
Have you rerun the benchmark with these changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199313491
  
**[Test build #53678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53678/consoleFull)**
 for PR 11636 at commit 
[`9ec61ce`](https://github.com/apache/spark/commit/9ec61ceb0eb5b55b558e1cf4475735d4cd9d4008).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199251039
  
**[Test build #53674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53674/consoleFull)**
 for PR 11636 at commit 
[`fb693d2`](https://github.com/apache/spark/commit/fb693d2168ca397de400b3082f5f640d5e3bae90).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199251045
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53674/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199251041
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-199249367
  
**[Test build #53674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53674/consoleFull)**
 for PR 11636 at commit 
[`fb693d2`](https://github.com/apache/spark/commit/fb693d2168ca397de400b3082f5f640d5e3bae90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56782696
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,19 +241,29 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
+val columnLIVAssigns = (colLocalVars zip colVars).map { case 
(localName, name) =>
+  s"$columnVectorClz $localName = $name;" }
+
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val columns1 = (output zip colLocalVars).map { case (attr, colVar) =>
+  genCodeColumnVector(ctx, colVar, rowidx, attr.dataType, 
attr.nullable) }
 val scanBatches = ctx.freshName("processBatches")
 ctx.addNewFunction(scanBatches,
   s"""
   | private void $scanBatches() throws java.io.IOException {
   |  while (true) {
   | int numRows = $batch.numRows();
-  | if ($idx == 0) $numOutputRows.add(numRows);
+  | if ($idx == 0) {
+  |   ${columnAssigns.mkString("", "\n", "\n")}
+  |   $numOutputRows.add(numRows);
+  | }
   |
+  | ${columnLIVAssigns.mkString("", "\n", "\n")}
   | while (!shouldStop() && $idx < numRows) {
-  |   InternalRow $row = $batch.getRow($idx++);
+  |   int $rowidx = $idx++;
--- End diff --

nit: rowidx seems as the same as idx. Do we need it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198082685
  
**[Test build #53468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53468/consoleFull)**
 for PR 11636 at commit 
[`5efadf3`](https://github.com/apache/spark/commit/5efadf3f159b40a04621832facbccb99cb4b2c5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198084216
  
**[Test build #53468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53468/consoleFull)**
 for PR 11636 at commit 
[`5efadf3`](https://github.com/apache/spark/commit/5efadf3f159b40a04621832facbccb99cb4b2c5c).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class InputReference(ordinal: Int, dataType: DataType, nullable: 
Boolean, isColumn: Boolean)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198598006
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56575717
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -199,7 +210,8 @@ class CodegenContext {
   case StringType => s"$input.getUTF8String($ordinal)"
   case BinaryType => s"$input.getBinary($ordinal)"
   case CalendarIntervalType => s"$input.getInterval($ordinal)"
-  case t: StructType => s"$input.getStruct($ordinal, ${t.size})"
+  case t: StructType => if (!isColumnarType(input)) { 
s"$input.getStruct($ordinal, ${t.size})" }
--- End diff --

Sure, I provided the same API ```ColumnVector.getStruct(int, int)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56567171
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -199,7 +210,8 @@ class CodegenContext {
   case StringType => s"$input.getUTF8String($ordinal)"
   case BinaryType => s"$input.getBinary($ordinal)"
   case CalendarIntervalType => s"$input.getInterval($ordinal)"
-  case t: StructType => s"$input.getStruct($ordinal, ${t.size})"
+  case t: StructType => if (!isColumnarType(input)) { 
s"$input.getStruct($ordinal, ${t.size})" }
--- End diff --

While this ```getStruct()``` for ```ColumnVector``` may not be called at 
rutnime now, a code generator always produces two version for ```InternalRow``` 
and ```ColumnVector```.
Thus, this code is necessary to avoid compilation error for now. In the 
future, this code is necessary to correctly handle nested types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-197460528
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198079212
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53466/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198085861
  
**[Test build #53470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53470/consoleFull)**
 for PR 11636 at commit 
[`8c9d054`](https://github.com/apache/spark/commit/8c9d054c30c43b256d1f0d054dc82f8dc6e1e9af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56453365
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
 
   override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
 val javaType = ctx.javaType(dataType)
-val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+} else {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL)
--- End diff --

Is it good to introduce new ```InputReference```, which is similar to 
```BoundReference```, only for the code to access ```ColumnBatch```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198198837
  
**[Test build #53500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53500/consoleFull)**
 for PR 11636 at commit 
[`e08472b`](https://github.com/apache/spark/commit/e08472b2673b49f8807e2949431f35bec83b511e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-197410160
  
**[Test build #53324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53324/consoleFull)**
 for PR 11636 at commit 
[`c522a68`](https://github.com/apache/spark/commit/c522a6841d06649ff4d84f72344407b4e2028b8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198109663
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56567647
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -199,7 +210,8 @@ class CodegenContext {
   case StringType => s"$input.getUTF8String($ordinal)"
   case BinaryType => s"$input.getBinary($ordinal)"
   case CalendarIntervalType => s"$input.getInterval($ordinal)"
-  case t: StructType => s"$input.getStruct($ordinal, ${t.size})"
+  case t: StructType => if (!isColumnarType(input)) { 
s"$input.getStruct($ordinal, ${t.size})" }
--- End diff --

Why not make them have the same APIs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198597887
  
**[Test build #53584 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53584/consoleFull)**
 for PR 11636 at commit 
[`cdd3078`](https://github.com/apache/spark/commit/cdd3078c4a252ab8701cd1bfb08e911f3878db65).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198585189
  
**[Test build #53584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53584/consoleFull)**
 for PR 11636 at commit 
[`cdd3078`](https://github.com/apache/spark/commit/cdd3078c4a252ab8701cd1bfb08e911f3878db65).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198372505
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53537/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198079203
  
**[Test build #53466 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53466/consoleFull)**
 for PR 11636 at commit 
[`5544c96`](https://github.com/apache/spark/commit/5544c96de899f15813c03f3533ee55c0dede3c64).
 * This patch **fails R style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `case class InputReference(ordinal: Int, dataType: DataType, nullable: 
Boolean, isColumn: Boolean)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198084229
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198372017
  
**[Test build #53537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53537/consoleFull)**
 for PR 11636 at commit 
[`a7ac8fb`](https://github.com/apache/spark/commit/a7ac8fbd1e2662df49293fff62d596b10690b0c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198079210
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198598009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53584/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198218039
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53500/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-197460112
  
**[Test build #53324 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53324/consoleFull)**
 for PR 11636 at commit 
[`c522a68`](https://github.com/apache/spark/commit/c522a6841d06649ff4d84f72344407b4e2028b8a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56691811
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -199,15 +199,20 @@ private[sql] case class DataSourceScan(
   // never requires UnsafeRow as input.
   override protected def doProduce(ctx: CodegenContext): String = {
 val columnarBatchClz = 
"org.apache.spark.sql.execution.vectorized.ColumnarBatch"
+val columnVectorClz = 
"org.apache.spark.sql.execution.vectorized.ColumnVector"
 val input = ctx.freshName("input")
 val idx = ctx.freshName("batchIdx")
+val rowidx = ctx.freshName("rowIdx")
 val batch = ctx.freshName("batch")
 // PhysicalRDD always just has one input
 ctx.addMutableState("scala.collection.Iterator", input, s"$input = 
inputs[0];")
 ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;")
 ctx.addMutableState("int", idx, s"$idx = 0;")
 
-val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, 
x._1.dataType, true))
+val exprCols = output.zipWithIndex.map(
--- End diff --

Move this to where it's used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56540019
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -199,7 +210,8 @@ class CodegenContext {
   case StringType => s"$input.getUTF8String($ordinal)"
   case BinaryType => s"$input.getBinary($ordinal)"
   case CalendarIntervalType => s"$input.getInterval($ordinal)"
-  case t: StructType => s"$input.getStruct($ordinal, ${t.size})"
+  case t: StructType => if (!isColumnarType(input)) { 
s"$input.getStruct($ordinal, ${t.size})" }
--- End diff --

Right now, the parquet reader does not support nested types (Array, Map, 
Struct), it's fine to not have this special case in this PR.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56692635
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -104,3 +104,39 @@ object BindReferences extends Logging {
 }.asInstanceOf[A] // Kind of a hack, but safe.  TODO: Tighten return 
type when possible.
   }
 }
+
+case class InputReference(
--- End diff --

Add doc string here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56692854
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -104,3 +104,39 @@ object BindReferences extends Logging {
 }.asInstanceOf[A] // Kind of a hack, but safe.  TODO: Tighten return 
type when possible.
   }
 }
+
+case class InputReference(
+  columnOrdinal: Int, dataType: DataType, nullable: Boolean, rowOrdinal: 
String = "")
+  extends LeafExpression {
+
+  private val isColumnarStorage: Boolean = rowOrdinal != ""
--- End diff --

This one should only be used with columnar batch, we could simplify this a 
lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56691971
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
--- End diff --

it's better to written as this:
colVars.zipWithIndex.map { case (name, i) =>
   
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198109532
  
**[Test build #53470 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53470/consoleFull)**
 for PR 11636 at commit 
[`8c9d054`](https://github.com/apache/spark/commit/8c9d054c30c43b256d1f0d054dc82f8dc6e1e9af).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56692918
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -104,3 +104,39 @@ object BindReferences extends Logging {
 }.asInstanceOf[A] // Kind of a hack, but safe.  TODO: Tighten return 
type when possible.
   }
 }
+
+case class InputReference(
+  columnOrdinal: Int, dataType: DataType, nullable: Boolean, rowOrdinal: 
String = "")
+  extends LeafExpression {
+
+  private val isColumnarStorage: Boolean = rowOrdinal != ""
+
+  private val ordinalString = if (isColumnarStorage) { rowOrdinal } else { 
columnOrdinal.toString }
+
+  override def toString: String = s"input[$columnOrdinal, 
${dataType.simpleString}]"
+
+  override def eval(input: InternalRow): Any = {
+null
+  }
+
+  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
+val javaType = ctx.javaType(dataType)
+val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinalString)
+if (ctx.currentVars != null && ctx.currentVars(columnOrdinal) != null) 
{
--- End diff --

We do need this, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56364323
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -158,9 +158,13 @@ class CodegenContext {
   /** The variable name of the input row in generated code. */
   final var INPUT_ROW = "i"
 
+  /** The variable name of the input col in generated code. */
+  var INPUT_COLORDINAL = "idx"
+
   /**
 * The map from a variable name to it's next ID.
 */
+  private val freshNameJavaTypes = new mutable.HashMap[String, String]
--- End diff --

Move these into DataSourceScan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56404151
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
 
   override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
 val javaType = ctx.javaType(dataType)
-val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+} else {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL)
--- End diff --

One possible idea is to prepare ```getValue()``` function that is can be 
overridden as follows.
In my opinion, the following code is not easy to read. What do you think?

In ```BoundExpression```
```scala
def getValue(...): String = { ctx.getValue(ctx.INPUT_ROW, dataType, 
ordinal.toString) }

override genCode(...): String = {
  ...
  val value = getValue(...)
 ...
}
```

In ```DataSourceScan```, we pass ```BoundExpression``` that has own 
```getValue()``` as follows:

```scala
...
val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, 
x._1.dataType, true) {
  def getValue(...): String = {
val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) {
  ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
} else {
  ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL)
}
  }
})
...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198217769
  
**[Test build #53500 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53500/consoleFull)**
 for PR 11636 at commit 
[`e08472b`](https://github.com/apache/spark/commit/e08472b2673b49f8807e2949431f35bec83b511e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class InputReference(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198671636
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198671637
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53611/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198671563
  
**[Test build #53611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53611/consoleFull)**
 for PR 11636 at commit 
[`cdf4333`](https://github.com/apache/spark/commit/cdf433320a9d552a2cb776413bce5d1ba4e9a4b4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-197378611
  
@nongli , is it possible to change the name of API from 
```ColumnVector.getIsNull()``` to ```ColumnVector.isNullAt()```. If it is 
possible, we can remove a change of this PR in at [lines 
74-84](https://github.com/apache/spark/pull/11636/files#diff-7897c332d274bad365886139e23ff9caR74)
 in ```BoundsAttribute.scala```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198109665
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53470/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56724708
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
+  (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + 
");\n").trim} )
--- End diff --

@davies , this comment makes sense.
Since the class 
[```ColumnarBatch```](https://github.com/apache/spark/blob/f96997ba244a14c26e85a2475415a762d0c0d0a8/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java#L47)
 is declared as final, I expect a method call ```batch.column()``` is inlined 
as ```batch.columns[]``` by a Java JIT compiler.
Or, is it better to explicitly access ```batch.columns[]``` by making 
```ColumnarBatch.columns[]``` public?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198077981
  
**[Test build #53466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53466/consoleFull)**
 for PR 11636 at commit 
[`5544c96`](https://github.com/apache/spark/commit/5544c96de899f15813c03f3533ee55c0dede3c64).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56736654
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -199,15 +199,20 @@ private[sql] case class DataSourceScan(
   // never requires UnsafeRow as input.
   override protected def doProduce(ctx: CodegenContext): String = {
 val columnarBatchClz = 
"org.apache.spark.sql.execution.vectorized.ColumnarBatch"
+val columnVectorClz = 
"org.apache.spark.sql.execution.vectorized.ColumnVector"
 val input = ctx.freshName("input")
 val idx = ctx.freshName("batchIdx")
+val rowidx = ctx.freshName("rowIdx")
 val batch = ctx.freshName("batch")
 // PhysicalRDD always just has one input
 ctx.addMutableState("scala.collection.Iterator", input, s"$input = 
inputs[0];")
 ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;")
 ctx.addMutableState("int", idx, s"$idx = 0;")
 
-val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, 
x._1.dataType, true))
+val exprCols = output.zipWithIndex.map(
--- End diff --

Moved where it is used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56691784
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -199,15 +199,20 @@ private[sql] case class DataSourceScan(
   // never requires UnsafeRow as input.
   override protected def doProduce(ctx: CodegenContext): String = {
 val columnarBatchClz = 
"org.apache.spark.sql.execution.vectorized.ColumnarBatch"
+val columnVectorClz = 
"org.apache.spark.sql.execution.vectorized.ColumnVector"
 val input = ctx.freshName("input")
 val idx = ctx.freshName("batchIdx")
+val rowidx = ctx.freshName("rowIdx")
 val batch = ctx.freshName("batch")
 // PhysicalRDD always just has one input
 ctx.addMutableState("scala.collection.Iterator", input, s"$input = 
inputs[0];")
 ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;")
 ctx.addMutableState("int", idx, s"$idx = 0;")
 
-val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, 
x._1.dataType, true))
+val exprCols = output.zipWithIndex.map(
+  x => new InputReference(x._2, x._1.dataType, x._1.nullable, rowidx))
+val exprRows = output.zipWithIndex.map(
+  x => new InputReference(x._2, x._1.dataType, x._1.nullable))
--- End diff --

Should we use BoundReference here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56408960
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
 
   override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
 val javaType = ctx.javaType(dataType)
-val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+} else {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL)
--- End diff --

In DataSourceScan, we do NOT need BoundReference to generate the code to 
access ColumnBatch, we can generate the code directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56482715
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -158,9 +158,13 @@ class CodegenContext {
   /** The variable name of the input row in generated code. */
   final var INPUT_ROW = "i"
 
+  /** The variable name of the input col in generated code. */
+  var INPUT_COLORDINAL = "idx"
+
   /**
 * The map from a variable name to it's next ID.
 */
+  private val freshNameJavaTypes = new mutable.HashMap[String, String]
--- End diff --

It is also used in ```CodeGenerator.getValue()```. This is called from 
other classes rather than ```DataSourceScan```. Thus, can we keep this in 
```CodeGenerator.scala```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198218038
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56364150
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
 
   override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
 val javaType = ctx.javaType(dataType)
-val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+} else {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL)
--- End diff --

Can we move these into DataSourceScan? (Because it's only used there) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198327451
  
**[Test build #53537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53537/consoleFull)**
 for PR 11636 at commit 
[`a7ac8fb`](https://github.com/apache/spark/commit/a7ac8fbd1e2662df49293fff62d596b10690b0c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56725827
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
+  (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + 
");\n").trim} )
--- End diff --

I'm not sure what's the impact on performance, since you are optimizing the 
these for performance, it's better to figure it out.

If it's trivial, we may do not need this variable, just use 
`batch.column(ordinal).getInt(idx)` in InputReference. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56692563
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
+  (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + 
");\n").trim} )
--- End diff --

Can we make the columns as class member, so we don't need to call 
`batch.column()` for every row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56736677
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
--- End diff --

Rewrote, thank you for your suggestion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56736640
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -199,15 +199,20 @@ private[sql] case class DataSourceScan(
   // never requires UnsafeRow as input.
   override protected def doProduce(ctx: CodegenContext): String = {
 val columnarBatchClz = 
"org.apache.spark.sql.execution.vectorized.ColumnarBatch"
+val columnVectorClz = 
"org.apache.spark.sql.execution.vectorized.ColumnVector"
 val input = ctx.freshName("input")
 val idx = ctx.freshName("batchIdx")
+val rowidx = ctx.freshName("rowIdx")
 val batch = ctx.freshName("batch")
 // PhysicalRDD always just has one input
 ctx.addMutableState("scala.collection.Iterator", input, s"$input = 
inputs[0];")
 ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;")
 ctx.addMutableState("int", idx, s"$idx = 0;")
 
-val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, 
x._1.dataType, true))
+val exprCols = output.zipWithIndex.map(
+  x => new InputReference(x._2, x._1.dataType, x._1.nullable, rowidx))
+val exprRows = output.zipWithIndex.map(
+  x => new InputReference(x._2, x._1.dataType, x._1.nullable))
--- End diff --

Used BoundReference for exprRows


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56692279
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
+  (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + 
");\n").trim} )
--- End diff --

Please follow the style how we generate code. 
```
 s"xxs $x = x;"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198652882
  
**[Test build #53611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53611/consoleFull)**
 for PR 11636 at commit 
[`cdf4333`](https://github.com/apache/spark/commit/cdf433320a9d552a2cb776413bce5d1ba4e9a4b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198652819
  
@davies I am glad to work on what the community wants.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56745689
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,16 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
--- End diff --

Used ```output.indices()```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56745695
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -228,8 +240,9 @@ private[sql] case class DataSourceScan(
   | int numRows = $batch.numRows();
   | if ($idx == 0) $numOutputRows.add(numRows);
--- End diff --

Good catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-197460537
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53324/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56335579
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -186,6 +190,13 @@ class CodegenContext {
   freshNameIds += fullName -> 1
   fullName
 }
+freshNameJavaTypes.put(freshname, javaType)
+freshname
+  }
+
+  def isColumnarType(name: String): Boolean = {
--- End diff --

The bottom line is that we need to maintain this for each variable. For 
example, ```scanBatches()```, we need to access ```InternalRow``` for a column 
with ```struct``` type (e.g. ```ColumnVector.getStruct()``` is called. In my 
first prototype, I set ```true``` into ```ctx.columnarInput```. However, I 
cannot support such a complex case.

I will replace lookup this by name with lookup this by setting a flag in 
`freshName()` to avoid brittleness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198084238
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53468/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56455795
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
 
   override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
 val javaType = ctx.javaType(dataType)
-val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
+} else {
+  ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL)
--- End diff --

Since it's only used in one place, I'd like to narrow down the changes, 
it's easier to maintain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-197404951
  
@kiszk Feel free to change that API


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56744525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -104,3 +104,32 @@ object BindReferences extends Logging {
 }.asInstanceOf[A] // Kind of a hack, but safe.  TODO: Tighten return 
type when possible.
   }
 }
+
+/**
+ * A column vector reference points to a specific column for ColumnVector.
+ * columnVar is a variable that keeps ColumnVector, and ordinal is row 
index in ColumnVector
+ */
+case class ColumnVectorReference(
+  columnVar: String, ordinal: String, dataType: DataType, nullable: 
Boolean)
+  extends LeafExpression {
+
+  override def toString: String = s"columnVector[$columnVar, $ordinal, 
${dataType.simpleString}]"
+
+  override def eval(input: InternalRow): Any = null
+
+  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
+val javaType = ctx.javaType(dataType)
+val value = ctx.getValue(columnVar, dataType, ordinal)
+if (nullable) {
+  s"""
+boolean ${ev.isNull} = ${columnVar}.isNullAt($ordinal);
+$javaType ${ev.value} = ${ev.isNull} ? 
${ctx.defaultValue(dataType)} : ($value);
--- End diff --

I think copy these lines into DataSourceScan should be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56744501
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -104,3 +104,32 @@ object BindReferences extends Logging {
 }.asInstanceOf[A] // Kind of a hack, but safe.  TODO: Tighten return 
type when possible.
   }
 }
+
+/**
+ * A column vector reference points to a specific column for ColumnVector.
+ * columnVar is a variable that keeps ColumnVector, and ordinal is row 
index in ColumnVector
+ */
+case class ColumnVectorReference(
+  columnVar: String, ordinal: String, dataType: DataType, nullable: 
Boolean)
+  extends LeafExpression {
+
+  override def toString: String = s"columnVector[$columnVar, $ordinal, 
${dataType.simpleString}]"
+
+  override def eval(input: InternalRow): Any = null
+
+  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
+val javaType = ctx.javaType(dataType)
+val value = ctx.getValue(columnVar, dataType, ordinal)
+if (nullable) {
+  s"""
+boolean ${ev.isNull} = ${columnVar}.isNullAt($ordinal);
+$javaType ${ev.value} = ${ev.isNull} ? 
${ctx.defaultValue(dataType)} : ($value);
--- End diff --

Let me confirm you suggestion. Is your suggestion to move the definition of 
```ColumnVectorReference``` to ```ExistingRDD.scala```?  Or, is this to move 
```genCode``` into ```DataSourceScan``` by dropping ```ColumnVectorReference``` 
and copy ```Expression.gen()``` into ```DataSourceScan```?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198372501
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56744426
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,16 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
+val columnLIVAssigns = (colLocalVars zip colVars).map { case 
(localName, name) =>
--- End diff --

An access to an instance variable in ```colVars```, which requires 
```getField``` bytecode, would involve memory access. We expect that an access 
to a local variable in ```colLocalVars```, which requires ```aload``` bytecode, 
would use register access by a Java JIT compiler. 
This is why we introduced ```colLocalVars```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56736684
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -104,3 +104,39 @@ object BindReferences extends Logging {
 }.asInstanceOf[A] // Kind of a hack, but safe.  TODO: Tighten return 
type when possible.
   }
 }
+
+case class InputReference(
--- End diff --

added a comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198632150
  
@kiszk I think this is pretty close to what we want, thanks for working on 
it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56743990
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -228,8 +240,9 @@ private[sql] case class DataSourceScan(
   | int numRows = $batch.numRows();
   | if ($idx == 0) $numOutputRows.add(numRows);
--- End diff --

I think we should put `columnAssigns` here, when idx is 0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56743967
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -104,3 +104,32 @@ object BindReferences extends Logging {
 }.asInstanceOf[A] // Kind of a hack, but safe.  TODO: Tighten return 
type when possible.
   }
 }
+
+/**
+ * A column vector reference points to a specific column for ColumnVector.
+ * columnVar is a variable that keeps ColumnVector, and ordinal is row 
index in ColumnVector
+ */
+case class ColumnVectorReference(
+  columnVar: String, ordinal: String, dataType: DataType, nullable: 
Boolean)
+  extends LeafExpression {
+
+  override def toString: String = s"columnVector[$columnVar, $ordinal, 
${dataType.simpleString}]"
+
+  override def eval(input: InternalRow): Any = null
+
+  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
+val javaType = ctx.javaType(dataType)
+val value = ctx.getValue(columnVar, dataType, ordinal)
+if (nullable) {
+  s"""
+boolean ${ev.isNull} = ${columnVar}.isNullAt($ordinal);
+$javaType ${ev.value} = ${ev.isNull} ? 
${ctx.defaultValue(dataType)} : ($value);
--- End diff --

Only these lines are useful, why not put these into DataSourceScan?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56743960
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,16 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
+val columnLIVAssigns = (colLocalVars zip colVars).map { case 
(localName, name) =>
--- End diff --

Since we have `colVars`, do we still need `colLocalVars`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56743931
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,16 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
+// These assignments avoids accesses to instance variables in a 
while-loop
+// Since they are loop invariant variables (LIVs), host them outside 
the while-loop
+val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + 
x._2))
--- End diff --

output.slice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56730574
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
+  (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + 
");\n").trim} )
--- End diff --

Thank you for your clarification. You mean `col0` as a class member for 
`GeneratedIterator`. This allow us to update `col0` only when ```batch``` is 
updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56728827
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
+  (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + 
");\n").trim} )
--- End diff --

I see. I misunderstood your comment. You wanted to avoid declaration of 
variables `col0`, `col1`, ...
@nongli suggested "Move this to before the while loop (the contract is that 
the batch object is reused)". I will put a comment for this loop invariant 
hoisting for performance optimization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56729652
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -217,9 +222,14 @@ private[sql] case class DataSourceScan(
 // TODO: The abstractions between this class and SqlNewHadoopRDD makes 
it difficult to know
 // here which path to use. Fix this.
 
-ctx.INPUT_ROW = row
 ctx.currentVars = null
-val columns1 = exprs.map(_.gen(ctx))
+val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2))
+val colDeclarations = colVars.zipWithIndex.map(x => {
+  (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + 
");\n").trim} )
--- End diff --

To make this clear: I'd suggest that having `col0` as class member, only 
update it for each batch once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56736738
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -104,3 +104,39 @@ object BindReferences extends Logging {
 }.asInstanceOf[A] // Kind of a hack, but safe.  TODO: Tighten return 
type when possible.
   }
 }
+
+case class InputReference(
+  columnOrdinal: Int, dataType: DataType, nullable: Boolean, rowOrdinal: 
String = "")
+  extends LeafExpression {
+
+  private val isColumnarStorage: Boolean = rowOrdinal != ""
--- End diff --

Simplified this class for ```ColumnVector```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-15 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56189923
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -186,6 +190,13 @@ class CodegenContext {
   freshNameIds += fullName -> 1
   fullName
 }
+freshNameJavaTypes.put(freshname, javaType)
+freshname
+  }
+
+  def isColumnarType(name: String): Boolean = {
--- End diff --

Is this a more explicit way to maintain this? For example, in PhysicalRDD 
when scanBatches() is generated, we set is columnarInput to true. Looking this 
up by name is brittle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-15 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56189439
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -170,7 +179,8 @@ private[sql] case class PhysicalRDD(
   | if ($idx == 0) $numOutputRows.add(numRows);
   |
   | while (!shouldStop() && $idx < numRows) {
-  |   InternalRow $row = $batch.getRow($idx++);
+  |   int $rowidx = $idx++;
+  |   ${colDeclarations.mkString("", "\n", "\n")}
--- End diff --

Move this to before the while loop (the contract is that the batch object 
is reused)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-15 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11636#discussion_r56189147
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -158,9 +158,13 @@ class CodegenContext {
   /** The variable name of the input row in generated code. */
   final var INPUT_ROW = "i"
 
+  /** The variable name of the input col in generated code. */
+  var INPUT_COLORDINAL = "idx"
--- End diff --

INPUT_COL_ORDINAL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >