[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10977 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176675120 I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176663786 **[Test build #2473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2473/consoleFull)** for PR 10977 at commit [`ffa8e6b`](https://github.com/apache/spark/commit/ffa8e6b55df95cea1faf0363bd6eb4090cbe5313). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176642840 **[Test build #50359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50359/consoleFull)** for PR 10977 at commit [`ffa8e6b`](https://github.com/apache/spark/commit/ffa8e6b55df95cea1faf0363bd6eb4090cbe5313). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176643027 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50359/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176643022 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176638543 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50358/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176638542 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176636537 **[Test build #50359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50359/consoleFull)** for PR 10977 at commit [`ffa8e6b`](https://github.com/apache/spark/commit/ffa8e6b55df95cea1faf0363bd6eb4090cbe5313). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176636705 **[Test build #2473 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2473/consoleFull)** for PR 10977 at commit [`ffa8e6b`](https://github.com/apache/spark/commit/ffa8e6b55df95cea1faf0363bd6eb4090cbe5313). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176635918 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176635007 @rxin Added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176620545 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50351/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176620544 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176620367 **[Test build #50351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50351/consoleFull)** for PR 10977 at commit [`951e2cd`](https://github.com/apache/spark/commit/951e2cd8de2c6b2f5b5bd1f5dbdb6b6fad6bc4a4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176619981 The issue here is that we want test cases that are targeted for specific problems, and the Hive ones are not (they are just a giant blackbox we took to bootstrap coverage). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176619523 Sure it's a good idea to use that golden file infrastructure. Given we don't have that yet, can you just add a test case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176618967 The way we managed HiveCompatibilitySuite is actually better than our unit tests (sql query and golden results in text format). Even if we don't want to be compatible with Hive, it's still good to have those tests (don't call them HiveCompatibilitySuite), and also managed in similar way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176609355 Thanks - can you add a test case that would catch this? In the long run, we don't want to rely on HiveCompatibilitySuite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176601896 **[Test build #50351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50351/consoleFull)** for PR 10977 at commit [`951e2cd`](https://github.com/apache/spark/commit/951e2cd8de2c6b2f5b5bd1f5dbdb6b6fad6bc4a4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176601291 When there is no aggregate functions, it did not generate the output using resultExpression, which have only literals (I was mislead by the comment in AggregateIterator). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10977#issuecomment-176600324 What's the bug? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/10977 [SPARK-13031] [SQL] cleanup codegen and improve test coverage 1. enable whole stage codegen during tests even there is only one operator supports that. 2. split doProduce() into two APIs: upstream() and doProduce() 3. generate prefix for fresh names of each operator 4. pass UnsafeRow to parent directly (avoid getters and create UnsafeRow again) 5. fix bugs and tests. This PR re-open #10944 and fix the bug. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark gen_refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10977.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10977 commit b4db00675bc3c51ddf8735cace522a5d771cf7e2 Author: Davies Liu Date: 2016-01-27T07:43:40Z cleanup whole stage codegen commit 70a7c7edd1988c7dd69bccc8e563c9943775bd2c Author: Davies Liu Date: 2016-01-27T23:22:33Z improve stddev and variance commit 951e2cd8de2c6b2f5b5bd1f5dbdb6b6fad6bc4a4 Author: Davies Liu Date: 2016-01-29T06:24:05Z fix aggregation without functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10944#discussion_r51193081 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -42,10 +44,16 @@ trait CodegenSupport extends SparkPlan { private var parent: CodegenSupport = null /** -* Returns an input RDD of InternalRow and Java source code to process them. +* Returns the RDD of InternalRow which generates the input rows. */ - def produce(ctx: CodegenContext, parent: CodegenSupport): (RDD[InternalRow], String) = { + def upstream(): RDD[InternalRow] + + /** +* Returns Java source code to process the rows from upstream. +*/ + def produce(ctx: CodegenContext, parent: CodegenSupport): String = { this.parent = parent +ctx.freshNamePrefix = nodeName --- End diff -- Good question. Right now we didn't have unique id for SparkPlan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10944 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-176437234 Thanks, merging this into master to unblock others, comments will be addressed by follow up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10944#discussion_r51193347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -172,38 +184,42 @@ case class Range( s"$number > $partitionEnd" } -val rdd = sqlContext.sparkContext.parallelize(0 until numSlices, numSlices) - .map(i => InternalRow(i)) +ctx.addNewFunction("initRange", + s""" +| private void initRange(int idx) { +| $BigInt index = $BigInt.valueOf(idx); +| $BigInt numSlice = $BigInt.valueOf(${numSlices}L); +| $BigInt numElement = $BigInt.valueOf(${numElements.toLong}L); +| $BigInt step = $BigInt.valueOf(${step}L); +| $BigInt start = $BigInt.valueOf(${start}L); +| +| $BigInt st = index.multiply(numElement).divide(numSlice).multiply(step).add(start); +| if (st.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) { +| $number = Long.MAX_VALUE; +| } else if (st.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) { +| $number = Long.MIN_VALUE; +| } else { +| $number = st.longValue(); +| } +| +| $BigInt end = index.add($BigInt.ONE).multiply(numElement).divide(numSlice) +| .multiply(step).add(start); +| if (end.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) { +| $partitionEnd = Long.MAX_VALUE; +| } else if (end.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) { +| $partitionEnd = Long.MIN_VALUE; +| } else { +| $partitionEnd = end.longValue(); +| } +| } + """.stripMargin) -val code = s""" +s""" | // initialize Range | if (!$initTerm) { | $initTerm = true; | if (input.hasNext()) { - | $BigInt index = $BigInt.valueOf(((InternalRow) input.next()).getInt(0)); - | $BigInt numSlice = $BigInt.valueOf(${numSlices}L); - | $BigInt numElement = $BigInt.valueOf(${numElements.toLong}L); - | $BigInt step = $BigInt.valueOf(${step}L); - | $BigInt start = $BigInt.valueOf(${start}L); - | - | $BigInt st = index.multiply(numElement).divide(numSlice).multiply(step).add(start); - | if (st.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) { - | $number = Long.MAX_VALUE; - | } else if (st.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) { - | $number = Long.MIN_VALUE; - | } else { - | $number = st.longValue(); - | } - | - | $BigInt end = index.add($BigInt.ONE).multiply(numElement).divide(numSlice) - | .multiply(step).add(start); - | if (end.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) { - | $partitionEnd = Long.MAX_VALUE; - | } else if (end.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) { - | $partitionEnd = Long.MIN_VALUE; - | } else { - | $partitionEnd = end.longValue(); - | } + | initRange(((InternalRow) input.next()).getInt(0)); --- End diff -- This is the easy way to make Range work, or you have to find the partition id. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-176333057 The generated code has a ton of extra new lines. If this is easy to remove, it will help the debuggability of this. LGTM, feel free to address the comments in follow ups. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10944#discussion_r51166689 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -42,10 +44,16 @@ trait CodegenSupport extends SparkPlan { private var parent: CodegenSupport = null /** -* Returns an input RDD of InternalRow and Java source code to process them. +* Returns the RDD of InternalRow which generates the input rows. */ - def produce(ctx: CodegenContext, parent: CodegenSupport): (RDD[InternalRow], String) = { + def upstream(): RDD[InternalRow] + + /** +* Returns Java source code to process the rows from upstream. +*/ + def produce(ctx: CodegenContext, parent: CodegenSupport): String = { this.parent = parent +ctx.freshNamePrefix = nodeName --- End diff -- Do we have a notion of node id? This is not going to help when we have many joins in one pipeline. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10944#discussion_r51166567 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -172,38 +184,42 @@ case class Range( s"$number > $partitionEnd" } -val rdd = sqlContext.sparkContext.parallelize(0 until numSlices, numSlices) - .map(i => InternalRow(i)) +ctx.addNewFunction("initRange", + s""" +| private void initRange(int idx) { +| $BigInt index = $BigInt.valueOf(idx); +| $BigInt numSlice = $BigInt.valueOf(${numSlices}L); +| $BigInt numElement = $BigInt.valueOf(${numElements.toLong}L); +| $BigInt step = $BigInt.valueOf(${step}L); +| $BigInt start = $BigInt.valueOf(${start}L); +| +| $BigInt st = index.multiply(numElement).divide(numSlice).multiply(step).add(start); +| if (st.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) { +| $number = Long.MAX_VALUE; +| } else if (st.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) { +| $number = Long.MIN_VALUE; +| } else { +| $number = st.longValue(); +| } +| +| $BigInt end = index.add($BigInt.ONE).multiply(numElement).divide(numSlice) +| .multiply(step).add(start); +| if (end.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) { +| $partitionEnd = Long.MAX_VALUE; +| } else if (end.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) { +| $partitionEnd = Long.MIN_VALUE; +| } else { +| $partitionEnd = end.longValue(); +| } +| } + """.stripMargin) -val code = s""" +s""" | // initialize Range | if (!$initTerm) { | $initTerm = true; | if (input.hasNext()) { - | $BigInt index = $BigInt.valueOf(((InternalRow) input.next()).getInt(0)); - | $BigInt numSlice = $BigInt.valueOf(${numSlices}L); - | $BigInt numElement = $BigInt.valueOf(${numElements.toLong}L); - | $BigInt step = $BigInt.valueOf(${step}L); - | $BigInt start = $BigInt.valueOf(${start}L); - | - | $BigInt st = index.multiply(numElement).divide(numSlice).multiply(step).add(start); - | if (st.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) { - | $number = Long.MAX_VALUE; - | } else if (st.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) { - | $number = Long.MIN_VALUE; - | } else { - | $number = st.longValue(); - | } - | - | $BigInt end = index.add($BigInt.ONE).multiply(numElement).divide(numSlice) - | .multiply(step).add(start); - | if (end.compareTo($BigInt.valueOf(Long.MAX_VALUE)) > 0) { - | $partitionEnd = Long.MAX_VALUE; - | } else if (end.compareTo($BigInt.valueOf(Long.MIN_VALUE)) < 0) { - | $partitionEnd = Long.MIN_VALUE; - | } else { - | $partitionEnd = end.longValue(); - | } + | initRange(((InternalRow) input.next()).getInt(0)); --- End diff -- Why does this need an input? The range should know it's a leaf and not need this no? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10944#discussion_r51166249 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -162,37 +206,48 @@ case class InputAdapter(child: SparkPlan) extends LeafNode with CodegenSupport { case class WholeStageCodegen(plan: CodegenSupport, children: Seq[SparkPlan]) extends SparkPlan with CodegenSupport { + override def supportCodegen: Boolean = false + override def output: Seq[Attribute] = plan.output + override def outputPartitioning: Partitioning = plan.outputPartitioning + override def outputOrdering: Seq[SortOrder] = plan.outputOrdering + + override def doPrepare(): Unit = { +plan.prepare() + } override def doExecute(): RDD[InternalRow] = { val ctx = new CodegenContext -val (rdd, code) = plan.produce(ctx, this) +val code = plan.produce(ctx, this) val references = ctx.references.toArray val source = s""" public Object generate(Object[] references) { --- End diff -- Can you comment what references mean? references is a very generic name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-176013043 @nongli Does this one looks good to you? this one blocks others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175895315 Here is the generated code for `sqlContext.range(values).filter("(id & 1) = 1").count()` ``` /* 001 */ /* 002 */ public Object generate(Object[] references) { /* 003 */ return new GeneratedIterator(references); /* 004 */ } /* 005 */ /* 006 */ class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 007 */ /* 008 */ private Object[] references; /* 009 */ private boolean TungstenAggregate_initAgg0; /* 010 */ private boolean TungstenAggregate_bufIsNull1; /* 011 */ private long TungstenAggregate_bufValue2; /* 012 */ private boolean Range_initRange6; /* 013 */ private long Range_partitionEnd7; /* 014 */ private long Range_number8; /* 015 */ private boolean Range_overflow9; /* 016 */ private UnsafeRow TungstenAggregate_result29; /* 017 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder TungstenAggregate_holder30; /* 018 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter TungstenAggregate_rowWriter31; /* 019 */ /* 020 */ private void initRange(int idx) { /* 021 */ java.math.BigInteger index = java.math.BigInteger.valueOf(idx); /* 022 */ java.math.BigInteger numSlice = java.math.BigInteger.valueOf(1L); /* 023 */ java.math.BigInteger numElement = java.math.BigInteger.valueOf(209715200L); /* 024 */ java.math.BigInteger step = java.math.BigInteger.valueOf(1L); /* 025 */ java.math.BigInteger start = java.math.BigInteger.valueOf(0L); /* 026 */ /* 027 */ java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start); /* 028 */ if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) { /* 029 */ Range_number8 = Long.MAX_VALUE; /* 030 */ } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) { /* 031 */ Range_number8 = Long.MIN_VALUE; /* 032 */ } else { /* 033 */ Range_number8 = st.longValue(); /* 034 */ } /* 035 */ /* 036 */ java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice) /* 037 */ .multiply(step).add(start); /* 038 */ if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) { /* 039 */ Range_partitionEnd7 = Long.MAX_VALUE; /* 040 */ } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) { /* 041 */ Range_partitionEnd7 = Long.MIN_VALUE; /* 042 */ } else { /* 043 */ Range_partitionEnd7 = end.longValue(); /* 044 */ } /* 045 */ } /* 046 */ /* 047 */ /* 048 */ private void TungstenAggregate_doAgg5() { /* 049 */ // initialize aggregation buffer /* 050 */ /* 0 */ /* 051 */ /* 052 */ TungstenAggregate_bufIsNull1 = false; /* 053 */ TungstenAggregate_bufValue2 = 0L; /* 054 */ /* 055 */ /* 056 */ /* 057 */ // initialize Range /* 058 */ if (!Range_initRange6) { /* 059 */ Range_initRange6 = true; /* 060 */ if (input.hasNext()) { /* 061 */ initRange(((InternalRow) input.next()).getInt(0)); /* 062 */ } else { /* 063 */ return; /* 064 */ } /* 065 */ } /* 066 */ /* 067 */ while (!Range_overflow9 && Range_number8 < Range_partitionEnd7) { /* 068 */ long Range_value10 = Range_number8; /* 069 */ Range_number8 += 1L; /* 070 */ if (Range_number8 < Range_value10 ^ 1L < 0) { /* 071 */ Range_overflow9 = true; /* 072 */ } /* 073 */ /* 074 */ /* ((input[0, bigint] & 1) = 1) */ /* 075 */ /* (input[0, bigint] & 1) */ /* 076 */ /* input[0, bigint] */ /* 077 */ /* 078 */ /* 1 */ /* 079 */ /* 080 */ long Filter_value14 = -1L; /* 081 */ Filter_value14 = Range_value10 & 1L; /* 082 */ /* 1 */ /* 083 */ /* 084 */ boolean Filter_value12 = false; /* 085 */ Filter_value12 = Filter_value14 == 1L; /* 086 */ if (!false && Filter_value12) { /* 087 */ /* 088 */ /* 089 */ /* 090 */ /* 091 */ // do aggregate and update aggregation buffer /* 092 */ /* 093 */ /* (input[0, bigint] + 1) */ /* 094 */ /* input[0, bigint] */ /* 095 */ /* 096 */ /* 1 */ /* 097 */ /* 098 */ long TungstenAggregate_value22 = -1L; /* 099 */ TungstenAggregate_value22 = TungstenAggregate_bufValue2 + 1L; /* 100 */ TungstenAggregate_bufIsNull1 = false; /* 101 */ TungstenAggregate_bufValue2 = TungstenAggregate_value22; /* 102 */ /* 103 */ /* 104 */ /* 105 */
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175892206 Can you paste some generated code? (Actually I think that's useful for most of the code gen prs). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175885463 cc @nongli @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175512911 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175512913 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50183/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175512703 **[Test build #50183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50183/consoleFull)** for PR 10944 at commit [`b4db006`](https://github.com/apache/spark/commit/b4db00675bc3c51ddf8735cace522a5d771cf7e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175478481 **[Test build #50183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50183/consoleFull)** for PR 10944 at commit [`b4db006`](https://github.com/apache/spark/commit/b4db00675bc3c51ddf8735cace522a5d771cf7e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/10944 [SPARK-13031] [SQL] cleanup codegen and improve test coverage 1. enable whole stage codegen during tests even there is only one operator supports that. 2. split doProduce() into two APIs: upstream() and doProduce() 3. generate prefix for fresh names of each operator 4. pass UnsafeRow to parent directly (avoid getters and create UnsafeRow again) 5. fix bugs and tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark gen_refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10944.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10944 commit b4db00675bc3c51ddf8735cace522a5d771cf7e2 Author: Davies Liu Date: 2016-01-27T07:43:40Z cleanup whole stage codegen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org