[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11636 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199497880 Merged into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199491169 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53706/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199491164 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199490844 **[Test build #53706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53706/consoleFull)** for PR 11636 at commit [`6b07e69`](https://github.com/apache/spark/commit/6b07e69f7b5cfb11edd324f0a6ddde8e1eb85f33). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199451226 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56886878 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,16 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val columnLIVAssigns = (colLocalVars zip colVars).map { case (localName, name) => --- End diff -- I see, LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199446045 **[Test build #53706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53706/consoleFull)** for PR 11636 at commit [`6b07e69`](https://github.com/apache/spark/commit/6b07e69f7b5cfb11edd324f0a6ddde8e1eb85f33). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199444953 Here are benchmark results in the latest code. Without this PR ``` model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz Partitioned Table: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative --- Read data column 484 / 529 32.5 30.7 1.0X Read partition column 323 / 368 48.7 20.5 1.5X Read both columns 617 / 654 25.5 39.2 0.8X ``` With this PR ``` model name : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz Partitioned Table: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative --- Read data column 412 / 444 38.2 26.2 1.0X Read partition column 290 / 327 54.2 18.4 1.4X Read both columns 516 / 613 30.5 32.8 0.8X ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56883452 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,19 +241,29 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val columnLIVAssigns = (colLocalVars zip colVars).map { case (localName, name) => + s"$columnVectorClz $localName = $name;" } + ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val columns1 = (output zip colLocalVars).map { case (attr, colVar) => + genCodeColumnVector(ctx, colVar, rowidx, attr.dataType, attr.nullable) } val scanBatches = ctx.freshName("processBatches") ctx.addNewFunction(scanBatches, s""" | private void $scanBatches() throws java.io.IOException { | while (true) { | int numRows = $batch.numRows(); - | if ($idx == 0) $numOutputRows.add(numRows); + | if ($idx == 0) { + | ${columnAssigns.mkString("", "\n", "\n")} + | $numOutputRows.add(numRows); + | } | + | ${columnLIVAssigns.mkString("", "\n", "\n")} --- End diff -- For now, I will not use this. In the future, I may revisit this scala replacement regarding all of possible variables in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56866480 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,19 +241,29 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val columnLIVAssigns = (colLocalVars zip colVars).map { case (localName, name) => + s"$columnVectorClz $localName = $name;" } + ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val columns1 = (output zip colLocalVars).map { case (attr, colVar) => + genCodeColumnVector(ctx, colVar, rowidx, attr.dataType, attr.nullable) } val scanBatches = ctx.freshName("processBatches") ctx.addNewFunction(scanBatches, s""" | private void $scanBatches() throws java.io.IOException { | while (true) { | int numRows = $batch.numRows(); - | if ($idx == 0) $numOutputRows.add(numRows); + | if ($idx == 0) { + | ${columnAssigns.mkString("", "\n", "\n")} + | $numOutputRows.add(numRows); + | } | + | ${columnLIVAssigns.mkString("", "\n", "\n")} --- End diff -- Do we still need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199356553 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53678/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199356521 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199355887 **[Test build #53678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53678/consoleFull)** for PR 11636 at commit [`9ec61ce`](https://github.com/apache/spark/commit/9ec61ceb0eb5b55b558e1cf4475735d4cd9d4008). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56840405 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,19 +241,29 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val columnLIVAssigns = (colLocalVars zip colVars).map { case (localName, name) => + s"$columnVectorClz $localName = $name;" } + ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val columns1 = (output zip colLocalVars).map { case (attr, colVar) => + genCodeColumnVector(ctx, colVar, rowidx, attr.dataType, attr.nullable) } val scanBatches = ctx.freshName("processBatches") ctx.addNewFunction(scanBatches, s""" | private void $scanBatches() throws java.io.IOException { | while (true) { | int numRows = $batch.numRows(); - | if ($idx == 0) $numOutputRows.add(numRows); + | if ($idx == 0) { + | ${columnAssigns.mkString("", "\n", "\n")} + | $numOutputRows.add(numRows); + | } | + | ${columnLIVAssigns.mkString("", "\n", "\n")} | while (!shouldStop() && $idx < numRows) { - | InternalRow $row = $batch.getRow($idx++); + | int $rowidx = $idx++; --- End diff -- @viirya , we need this. Here is an example of the generated code. In the code, line 93 corresponds to what you pointed out. Lines 95 and 97 refers to ```scan_rowIdx``` that keeps pre-increment value of ```scan_batchIndex```. If we use ```batchIndex``` that has been already incremented, the argument for ```getUTF8String()``` and ```getInt()``` is incorrect. ```java /* 092 */ while (!shouldStop() && scan_batchIdx < numRows) { /* 093 */ int scan_rowIdx = scan_batchIdx++; /* 094 */ /* columnVector[scan_col0, scan_rowIdx, string] */ /* 095 */ UTF8String scan_value = scan_col0.getUTF8String(scan_rowIdx); /* 096 */ /* columnVector[scan_col1, scan_rowIdx, int] */ /* 097 */ int scan_value1 = scan_col1.getInt(scan_rowIdx); /* 098 */ scan_holder.reset(); /* 099 */ /* 100 */ scan_rowWriter.write(0, scan_value); /* 101 */ /* 102 */ scan_rowWriter.write(1, scan_value1); /* 103 */ scan_result.setTotalSize(scan_holder.totalSize()); /* 104 */ append(scan_result); /* 105 */ } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199318644 Have you rerun the benchmark with these changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199313491 **[Test build #53678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53678/consoleFull)** for PR 11636 at commit [`9ec61ce`](https://github.com/apache/spark/commit/9ec61ceb0eb5b55b558e1cf4475735d4cd9d4008). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199251039 **[Test build #53674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53674/consoleFull)** for PR 11636 at commit [`fb693d2`](https://github.com/apache/spark/commit/fb693d2168ca397de400b3082f5f640d5e3bae90). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199251045 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53674/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199251041 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-199249367 **[Test build #53674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53674/consoleFull)** for PR 11636 at commit [`fb693d2`](https://github.com/apache/spark/commit/fb693d2168ca397de400b3082f5f640d5e3bae90). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56782696 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,19 +241,29 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val columnLIVAssigns = (colLocalVars zip colVars).map { case (localName, name) => + s"$columnVectorClz $localName = $name;" } + ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val columns1 = (output zip colLocalVars).map { case (attr, colVar) => + genCodeColumnVector(ctx, colVar, rowidx, attr.dataType, attr.nullable) } val scanBatches = ctx.freshName("processBatches") ctx.addNewFunction(scanBatches, s""" | private void $scanBatches() throws java.io.IOException { | while (true) { | int numRows = $batch.numRows(); - | if ($idx == 0) $numOutputRows.add(numRows); + | if ($idx == 0) { + | ${columnAssigns.mkString("", "\n", "\n")} + | $numOutputRows.add(numRows); + | } | + | ${columnLIVAssigns.mkString("", "\n", "\n")} | while (!shouldStop() && $idx < numRows) { - | InternalRow $row = $batch.getRow($idx++); + | int $rowidx = $idx++; --- End diff -- nit: rowidx seems as the same as idx. Do we need it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198082685 **[Test build #53468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53468/consoleFull)** for PR 11636 at commit [`5efadf3`](https://github.com/apache/spark/commit/5efadf3f159b40a04621832facbccb99cb4b2c5c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198084216 **[Test build #53468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53468/consoleFull)** for PR 11636 at commit [`5efadf3`](https://github.com/apache/spark/commit/5efadf3f159b40a04621832facbccb99cb4b2c5c). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class InputReference(ordinal: Int, dataType: DataType, nullable: Boolean, isColumn: Boolean)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198598006 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56575717 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -199,7 +210,8 @@ class CodegenContext { case StringType => s"$input.getUTF8String($ordinal)" case BinaryType => s"$input.getBinary($ordinal)" case CalendarIntervalType => s"$input.getInterval($ordinal)" - case t: StructType => s"$input.getStruct($ordinal, ${t.size})" + case t: StructType => if (!isColumnarType(input)) { s"$input.getStruct($ordinal, ${t.size})" } --- End diff -- Sure, I provided the same API ```ColumnVector.getStruct(int, int)```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56567171 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -199,7 +210,8 @@ class CodegenContext { case StringType => s"$input.getUTF8String($ordinal)" case BinaryType => s"$input.getBinary($ordinal)" case CalendarIntervalType => s"$input.getInterval($ordinal)" - case t: StructType => s"$input.getStruct($ordinal, ${t.size})" + case t: StructType => if (!isColumnarType(input)) { s"$input.getStruct($ordinal, ${t.size})" } --- End diff -- While this ```getStruct()``` for ```ColumnVector``` may not be called at rutnime now, a code generator always produces two version for ```InternalRow``` and ```ColumnVector```. Thus, this code is necessary to avoid compilation error for now. In the future, this code is necessary to correctly handle nested types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-197460528 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198079212 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53466/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198085861 **[Test build #53470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53470/consoleFull)** for PR 11636 at commit [`8c9d054`](https://github.com/apache/spark/commit/8c9d054c30c43b256d1f0d054dc82f8dc6e1e9af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56453365 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) override def genCode(ctx: CodegenContext, ev: ExprCode): String = { val javaType = ctx.javaType(dataType) -val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) { + ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +} else { + ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL) --- End diff -- Is it good to introduce new ```InputReference```, which is similar to ```BoundReference```, only for the code to access ```ColumnBatch```? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198198837 **[Test build #53500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53500/consoleFull)** for PR 11636 at commit [`e08472b`](https://github.com/apache/spark/commit/e08472b2673b49f8807e2949431f35bec83b511e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-197410160 **[Test build #53324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53324/consoleFull)** for PR 11636 at commit [`c522a68`](https://github.com/apache/spark/commit/c522a6841d06649ff4d84f72344407b4e2028b8a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198109663 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56567647 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -199,7 +210,8 @@ class CodegenContext { case StringType => s"$input.getUTF8String($ordinal)" case BinaryType => s"$input.getBinary($ordinal)" case CalendarIntervalType => s"$input.getInterval($ordinal)" - case t: StructType => s"$input.getStruct($ordinal, ${t.size})" + case t: StructType => if (!isColumnarType(input)) { s"$input.getStruct($ordinal, ${t.size})" } --- End diff -- Why not make them have the same APIs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198597887 **[Test build #53584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53584/consoleFull)** for PR 11636 at commit [`cdd3078`](https://github.com/apache/spark/commit/cdd3078c4a252ab8701cd1bfb08e911f3878db65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198585189 **[Test build #53584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53584/consoleFull)** for PR 11636 at commit [`cdd3078`](https://github.com/apache/spark/commit/cdd3078c4a252ab8701cd1bfb08e911f3878db65). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198372505 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53537/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198079203 **[Test build #53466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53466/consoleFull)** for PR 11636 at commit [`5544c96`](https://github.com/apache/spark/commit/5544c96de899f15813c03f3533ee55c0dede3c64). * This patch **fails R style tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `case class InputReference(ordinal: Int, dataType: DataType, nullable: Boolean, isColumn: Boolean)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198084229 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198372017 **[Test build #53537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53537/consoleFull)** for PR 11636 at commit [`a7ac8fb`](https://github.com/apache/spark/commit/a7ac8fbd1e2662df49293fff62d596b10690b0c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198079210 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198598009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53584/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198218039 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53500/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-197460112 **[Test build #53324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53324/consoleFull)** for PR 11636 at commit [`c522a68`](https://github.com/apache/spark/commit/c522a6841d06649ff4d84f72344407b4e2028b8a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56691811 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -199,15 +199,20 @@ private[sql] case class DataSourceScan( // never requires UnsafeRow as input. override protected def doProduce(ctx: CodegenContext): String = { val columnarBatchClz = "org.apache.spark.sql.execution.vectorized.ColumnarBatch" +val columnVectorClz = "org.apache.spark.sql.execution.vectorized.ColumnVector" val input = ctx.freshName("input") val idx = ctx.freshName("batchIdx") +val rowidx = ctx.freshName("rowIdx") val batch = ctx.freshName("batch") // PhysicalRDD always just has one input ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;") ctx.addMutableState("int", idx, s"$idx = 0;") -val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, x._1.dataType, true)) +val exprCols = output.zipWithIndex.map( --- End diff -- Move this to where it's used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56540019 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -199,7 +210,8 @@ class CodegenContext { case StringType => s"$input.getUTF8String($ordinal)" case BinaryType => s"$input.getBinary($ordinal)" case CalendarIntervalType => s"$input.getInterval($ordinal)" - case t: StructType => s"$input.getStruct($ordinal, ${t.size})" + case t: StructType => if (!isColumnarType(input)) { s"$input.getStruct($ordinal, ${t.size})" } --- End diff -- Right now, the parquet reader does not support nested types (Array, Map, Struct), it's fine to not have this special case in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56692635 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -104,3 +104,39 @@ object BindReferences extends Logging { }.asInstanceOf[A] // Kind of a hack, but safe. TODO: Tighten return type when possible. } } + +case class InputReference( --- End diff -- Add doc string here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56692854 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -104,3 +104,39 @@ object BindReferences extends Logging { }.asInstanceOf[A] // Kind of a hack, but safe. TODO: Tighten return type when possible. } } + +case class InputReference( + columnOrdinal: Int, dataType: DataType, nullable: Boolean, rowOrdinal: String = "") + extends LeafExpression { + + private val isColumnarStorage: Boolean = rowOrdinal != "" --- End diff -- This one should only be used with columnar batch, we could simplify this a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56691971 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { --- End diff -- it's better to written as this: colVars.zipWithIndex.map { case (name, i) => } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198109532 **[Test build #53470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53470/consoleFull)** for PR 11636 at commit [`8c9d054`](https://github.com/apache/spark/commit/8c9d054c30c43b256d1f0d054dc82f8dc6e1e9af). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56692918 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -104,3 +104,39 @@ object BindReferences extends Logging { }.asInstanceOf[A] // Kind of a hack, but safe. TODO: Tighten return type when possible. } } + +case class InputReference( + columnOrdinal: Int, dataType: DataType, nullable: Boolean, rowOrdinal: String = "") + extends LeafExpression { + + private val isColumnarStorage: Boolean = rowOrdinal != "" + + private val ordinalString = if (isColumnarStorage) { rowOrdinal } else { columnOrdinal.toString } + + override def toString: String = s"input[$columnOrdinal, ${dataType.simpleString}]" + + override def eval(input: InternalRow): Any = { +null + } + + override def genCode(ctx: CodegenContext, ev: ExprCode): String = { +val javaType = ctx.javaType(dataType) +val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinalString) +if (ctx.currentVars != null && ctx.currentVars(columnOrdinal) != null) { --- End diff -- We do need this, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56364323 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -158,9 +158,13 @@ class CodegenContext { /** The variable name of the input row in generated code. */ final var INPUT_ROW = "i" + /** The variable name of the input col in generated code. */ + var INPUT_COLORDINAL = "idx" + /** * The map from a variable name to it's next ID. */ + private val freshNameJavaTypes = new mutable.HashMap[String, String] --- End diff -- Move these into DataSourceScan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56404151 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) override def genCode(ctx: CodegenContext, ev: ExprCode): String = { val javaType = ctx.javaType(dataType) -val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) { + ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +} else { + ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL) --- End diff -- One possible idea is to prepare ```getValue()``` function that is can be overridden as follows. In my opinion, the following code is not easy to read. What do you think? In ```BoundExpression``` ```scala def getValue(...): String = { ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) } override genCode(...): String = { ... val value = getValue(...) ... } ``` In ```DataSourceScan```, we pass ```BoundExpression``` that has own ```getValue()``` as follows: ```scala ... val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, x._1.dataType, true) { def getValue(...): String = { val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) { ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) } else { ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL) } } }) ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198217769 **[Test build #53500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53500/consoleFull)** for PR 11636 at commit [`e08472b`](https://github.com/apache/spark/commit/e08472b2673b49f8807e2949431f35bec83b511e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class InputReference(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198671636 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198671637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53611/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198671563 **[Test build #53611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53611/consoleFull)** for PR 11636 at commit [`cdf4333`](https://github.com/apache/spark/commit/cdf433320a9d552a2cb776413bce5d1ba4e9a4b4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-197378611 @nongli , is it possible to change the name of API from ```ColumnVector.getIsNull()``` to ```ColumnVector.isNullAt()```. If it is possible, we can remove a change of this PR in at [lines 74-84](https://github.com/apache/spark/pull/11636/files#diff-7897c332d274bad365886139e23ff9caR74) in ```BoundsAttribute.scala```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198109665 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53470/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56724708 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { + (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + ");\n").trim} ) --- End diff -- @davies , this comment makes sense. Since the class [```ColumnarBatch```](https://github.com/apache/spark/blob/f96997ba244a14c26e85a2475415a762d0c0d0a8/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java#L47) is declared as final, I expect a method call ```batch.column()``` is inlined as ```batch.columns[]``` by a Java JIT compiler. Or, is it better to explicitly access ```batch.columns[]``` by making ```ColumnarBatch.columns[]``` public? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198077981 **[Test build #53466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53466/consoleFull)** for PR 11636 at commit [`5544c96`](https://github.com/apache/spark/commit/5544c96de899f15813c03f3533ee55c0dede3c64). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56736654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -199,15 +199,20 @@ private[sql] case class DataSourceScan( // never requires UnsafeRow as input. override protected def doProduce(ctx: CodegenContext): String = { val columnarBatchClz = "org.apache.spark.sql.execution.vectorized.ColumnarBatch" +val columnVectorClz = "org.apache.spark.sql.execution.vectorized.ColumnVector" val input = ctx.freshName("input") val idx = ctx.freshName("batchIdx") +val rowidx = ctx.freshName("rowIdx") val batch = ctx.freshName("batch") // PhysicalRDD always just has one input ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;") ctx.addMutableState("int", idx, s"$idx = 0;") -val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, x._1.dataType, true)) +val exprCols = output.zipWithIndex.map( --- End diff -- Moved where it is used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56691784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -199,15 +199,20 @@ private[sql] case class DataSourceScan( // never requires UnsafeRow as input. override protected def doProduce(ctx: CodegenContext): String = { val columnarBatchClz = "org.apache.spark.sql.execution.vectorized.ColumnarBatch" +val columnVectorClz = "org.apache.spark.sql.execution.vectorized.ColumnVector" val input = ctx.freshName("input") val idx = ctx.freshName("batchIdx") +val rowidx = ctx.freshName("rowIdx") val batch = ctx.freshName("batch") // PhysicalRDD always just has one input ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;") ctx.addMutableState("int", idx, s"$idx = 0;") -val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, x._1.dataType, true)) +val exprCols = output.zipWithIndex.map( + x => new InputReference(x._2, x._1.dataType, x._1.nullable, rowidx)) +val exprRows = output.zipWithIndex.map( + x => new InputReference(x._2, x._1.dataType, x._1.nullable)) --- End diff -- Should we use BoundReference here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56408960 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) override def genCode(ctx: CodegenContext, ev: ExprCode): String = { val javaType = ctx.javaType(dataType) -val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) { + ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +} else { + ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL) --- End diff -- In DataSourceScan, we do NOT need BoundReference to generate the code to access ColumnBatch, we can generate the code directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56482715 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -158,9 +158,13 @@ class CodegenContext { /** The variable name of the input row in generated code. */ final var INPUT_ROW = "i" + /** The variable name of the input col in generated code. */ + var INPUT_COLORDINAL = "idx" + /** * The map from a variable name to it's next ID. */ + private val freshNameJavaTypes = new mutable.HashMap[String, String] --- End diff -- It is also used in ```CodeGenerator.getValue()```. This is called from other classes rather than ```DataSourceScan```. Thus, can we keep this in ```CodeGenerator.scala```? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198218038 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56364150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) override def genCode(ctx: CodegenContext, ev: ExprCode): String = { val javaType = ctx.javaType(dataType) -val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) { + ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +} else { + ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL) --- End diff -- Can we move these into DataSourceScan? (Because it's only used there) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198327451 **[Test build #53537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53537/consoleFull)** for PR 11636 at commit [`a7ac8fb`](https://github.com/apache/spark/commit/a7ac8fbd1e2662df49293fff62d596b10690b0c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56725827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { + (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + ");\n").trim} ) --- End diff -- I'm not sure what's the impact on performance, since you are optimizing the these for performance, it's better to figure it out. If it's trivial, we may do not need this variable, just use `batch.column(ordinal).getInt(idx)` in InputReference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56692563 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { + (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + ");\n").trim} ) --- End diff -- Can we make the columns as class member, so we don't need to call `batch.column()` for every row? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56736677 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { --- End diff -- Rewrote, thank you for your suggestion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56736640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -199,15 +199,20 @@ private[sql] case class DataSourceScan( // never requires UnsafeRow as input. override protected def doProduce(ctx: CodegenContext): String = { val columnarBatchClz = "org.apache.spark.sql.execution.vectorized.ColumnarBatch" +val columnVectorClz = "org.apache.spark.sql.execution.vectorized.ColumnVector" val input = ctx.freshName("input") val idx = ctx.freshName("batchIdx") +val rowidx = ctx.freshName("rowIdx") val batch = ctx.freshName("batch") // PhysicalRDD always just has one input ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];") ctx.addMutableState(columnarBatchClz, batch, s"$batch = null;") ctx.addMutableState("int", idx, s"$idx = 0;") -val exprs = output.zipWithIndex.map(x => new BoundReference(x._2, x._1.dataType, true)) +val exprCols = output.zipWithIndex.map( + x => new InputReference(x._2, x._1.dataType, x._1.nullable, rowidx)) +val exprRows = output.zipWithIndex.map( + x => new InputReference(x._2, x._1.dataType, x._1.nullable)) --- End diff -- Used BoundReference for exprRows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56692279 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { + (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + ");\n").trim} ) --- End diff -- Please follow the style how we generate code. ``` s"xxs $x = x;" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198652882 **[Test build #53611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53611/consoleFull)** for PR 11636 at commit [`cdf4333`](https://github.com/apache/spark/commit/cdf433320a9d552a2cb776413bce5d1ba4e9a4b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198652819 @davies I am glad to work on what the community wants. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56745689 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,16 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) --- End diff -- Used ```output.indices()``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56745695 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -228,8 +240,9 @@ private[sql] case class DataSourceScan( | int numRows = $batch.numRows(); | if ($idx == 0) $numOutputRows.add(numRows); --- End diff -- Good catch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-197460537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53324/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56335579 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -186,6 +190,13 @@ class CodegenContext { freshNameIds += fullName -> 1 fullName } +freshNameJavaTypes.put(freshname, javaType) +freshname + } + + def isColumnarType(name: String): Boolean = { --- End diff -- The bottom line is that we need to maintain this for each variable. For example, ```scanBatches()```, we need to access ```InternalRow``` for a column with ```struct``` type (e.g. ```ColumnVector.getStruct()``` is called. In my first prototype, I set ```true``` into ```ctx.columnarInput```. However, I cannot support such a complex case. I will replace lookup this by name with lookup this by setting a flag in `freshName()` to avoid brittleness. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198084238 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53468/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56455795 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -60,17 +60,28 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) override def genCode(ctx: CodegenContext, ev: ExprCode): String = { val javaType = ctx.javaType(dataType) -val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +val value = if (!ctx.isColumnarType(ctx.INPUT_ROW)) { + ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString) +} else { + ctx.getValue(ctx.INPUT_ROW, dataType, ctx.INPUT_COLORDINAL) --- End diff -- Since it's only used in one place, I'd like to narrow down the changes, it's easier to maintain. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-197404951 @kiszk Feel free to change that API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56744525 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -104,3 +104,32 @@ object BindReferences extends Logging { }.asInstanceOf[A] // Kind of a hack, but safe. TODO: Tighten return type when possible. } } + +/** + * A column vector reference points to a specific column for ColumnVector. + * columnVar is a variable that keeps ColumnVector, and ordinal is row index in ColumnVector + */ +case class ColumnVectorReference( + columnVar: String, ordinal: String, dataType: DataType, nullable: Boolean) + extends LeafExpression { + + override def toString: String = s"columnVector[$columnVar, $ordinal, ${dataType.simpleString}]" + + override def eval(input: InternalRow): Any = null + + override def genCode(ctx: CodegenContext, ev: ExprCode): String = { +val javaType = ctx.javaType(dataType) +val value = ctx.getValue(columnVar, dataType, ordinal) +if (nullable) { + s""" +boolean ${ev.isNull} = ${columnVar}.isNullAt($ordinal); +$javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value); --- End diff -- I think copy these lines into DataSourceScan should be enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56744501 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -104,3 +104,32 @@ object BindReferences extends Logging { }.asInstanceOf[A] // Kind of a hack, but safe. TODO: Tighten return type when possible. } } + +/** + * A column vector reference points to a specific column for ColumnVector. + * columnVar is a variable that keeps ColumnVector, and ordinal is row index in ColumnVector + */ +case class ColumnVectorReference( + columnVar: String, ordinal: String, dataType: DataType, nullable: Boolean) + extends LeafExpression { + + override def toString: String = s"columnVector[$columnVar, $ordinal, ${dataType.simpleString}]" + + override def eval(input: InternalRow): Any = null + + override def genCode(ctx: CodegenContext, ev: ExprCode): String = { +val javaType = ctx.javaType(dataType) +val value = ctx.getValue(columnVar, dataType, ordinal) +if (nullable) { + s""" +boolean ${ev.isNull} = ${columnVar}.isNullAt($ordinal); +$javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value); --- End diff -- Let me confirm you suggestion. Is your suggestion to move the definition of ```ColumnVectorReference``` to ```ExistingRDD.scala```? Or, is this to move ```genCode``` into ```DataSourceScan``` by dropping ```ColumnVectorReference``` and copy ```Expression.gen()``` into ```DataSourceScan```? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198372501 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56744426 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,16 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val columnLIVAssigns = (colLocalVars zip colVars).map { case (localName, name) => --- End diff -- An access to an instance variable in ```colVars```, which requires ```getField``` bytecode, would involve memory access. We expect that an access to a local variable in ```colLocalVars```, which requires ```aload``` bytecode, would use register access by a Java JIT compiler. This is why we introduced ```colLocalVars```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56736684 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -104,3 +104,39 @@ object BindReferences extends Logging { }.asInstanceOf[A] // Kind of a hack, but safe. TODO: Tighten return type when possible. } } + +case class InputReference( --- End diff -- added a comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198632150 @kiszk I think this is pretty close to what we want, thanks for working on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56743990 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -228,8 +240,9 @@ private[sql] case class DataSourceScan( | int numRows = $batch.numRows(); | if ($idx == 0) $numOutputRows.add(numRows); --- End diff -- I think we should put `columnAssigns` here, when idx is 0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56743967 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -104,3 +104,32 @@ object BindReferences extends Logging { }.asInstanceOf[A] // Kind of a hack, but safe. TODO: Tighten return type when possible. } } + +/** + * A column vector reference points to a specific column for ColumnVector. + * columnVar is a variable that keeps ColumnVector, and ordinal is row index in ColumnVector + */ +case class ColumnVectorReference( + columnVar: String, ordinal: String, dataType: DataType, nullable: Boolean) + extends LeafExpression { + + override def toString: String = s"columnVector[$columnVar, $ordinal, ${dataType.simpleString}]" + + override def eval(input: InternalRow): Any = null + + override def genCode(ctx: CodegenContext, ev: ExprCode): String = { +val javaType = ctx.javaType(dataType) +val value = ctx.getValue(columnVar, dataType, ordinal) +if (nullable) { + s""" +boolean ${ev.isNull} = ${columnVar}.isNullAt($ordinal); +$javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value); --- End diff -- Only these lines are useful, why not put these into DataSourceScan? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56743960 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,16 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val columnLIVAssigns = (colLocalVars zip colVars).map { case (localName, name) => --- End diff -- Since we have `colVars`, do we still need `colLocalVars`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56743931 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,16 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row +// These assignments avoids accesses to instance variables in a while-loop +// Since they are loop invariant variables (LIVs), host them outside the while-loop +val colLocalVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) --- End diff -- output.slice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56730574 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { + (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + ");\n").trim} ) --- End diff -- Thank you for your clarification. You mean `col0` as a class member for `GeneratedIterator`. This allow us to update `col0` only when ```batch``` is updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56728827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { + (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + ");\n").trim} ) --- End diff -- I see. I misunderstood your comment. You wanted to avoid declaration of variables `col0`, `col1`, ... @nongli suggested "Move this to before the while loop (the contract is that the batch object is reused)". I will put a comment for this loop invariant hoisting for performance optimization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56729652 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -217,9 +222,14 @@ private[sql] case class DataSourceScan( // TODO: The abstractions between this class and SqlNewHadoopRDD makes it difficult to know // here which path to use. Fix this. -ctx.INPUT_ROW = row ctx.currentVars = null -val columns1 = exprs.map(_.gen(ctx)) +val colVars = output.zipWithIndex.map(x => ctx.freshName("col" + x._2)) +val colDeclarations = colVars.zipWithIndex.map(x => { + (columnVectorClz + " " + x._1 + " = " + batch + ".column(" + x._2 + ");\n").trim} ) --- End diff -- To make this clear: I'd suggest that having `col0` as class member, only update it for each batch once. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56736738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -104,3 +104,39 @@ object BindReferences extends Logging { }.asInstanceOf[A] // Kind of a hack, but safe. TODO: Tighten return type when possible. } } + +case class InputReference( + columnOrdinal: Int, dataType: DataType, nullable: Boolean, rowOrdinal: String = "") + extends LeafExpression { + + private val isColumnarStorage: Boolean = rowOrdinal != "" --- End diff -- Simplified this class for ```ColumnVector``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56189923 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -186,6 +190,13 @@ class CodegenContext { freshNameIds += fullName -> 1 fullName } +freshNameJavaTypes.put(freshname, javaType) +freshname + } + + def isColumnarType(name: String): Boolean = { --- End diff -- Is this a more explicit way to maintain this? For example, in PhysicalRDD when scanBatches() is generated, we set is columnarInput to true. Looking this up by name is brittle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56189439 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -170,7 +179,8 @@ private[sql] case class PhysicalRDD( | if ($idx == 0) $numOutputRows.add(numRows); | | while (!shouldStop() && $idx < numRows) { - | InternalRow $row = $batch.getRow($idx++); + | int $rowidx = $idx++; + | ${colDeclarations.mkString("", "\n", "\n")} --- End diff -- Move this to before the while loop (the contract is that the batch object is reused) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11636#discussion_r56189147 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -158,9 +158,13 @@ class CodegenContext { /** The variable name of the input row in generated code. */ final var INPUT_ROW = "i" + /** The variable name of the input col in generated code. */ + var INPUT_COLORDINAL = "idx" --- End diff -- INPUT_COL_ORDINAL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org