[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user mallman commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-83691288 I'm confused. Why was this PR abruptly closed? Was there another active PR for window functions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-83722409 I think that might have been a mistake and #2953 was supposed to be closed. Since we are unable to close PRs, there is (long story) a process that eventually closes PRs that have a comment like mind closing this pr. That's why it got auto-closed then. That said I don't otherwise know whether this was going to proceed anyway. I don't see other PRs for this JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-69768767 Should we close this issue in favor of #3703 ? @marmbrus This is #3703. Did you mean another PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-67418117 Should we close this issue in favor of #3703 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3703 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/3703 [SPARK-1442][SQL][WIP] Initial window function implementation (refactored from #2953) This WIP PR is refactored from PR #2953. Please refer to the original PR description for features implemented and not implemented in this PR. The original PR was a huge one, commenting on each issue could be very time consuming. After offline discussions with @guowei2, I decided to work on a refactoring branch to fix most minor issues first and then start discussion based on this refactored version. Major issues left in this PR are: 1. Window spec is added to aggregation functions with a `var`, which breaks query plan immutability. 2. When used with window specs, common aggregation functions like `COUNT`, `SUM`, `AVG` etc are not translated into Hive aggregation functions rather than Spark SQL builtin implementations. 3. Execution code (`execution.WindowFunction`) can be further simplified. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark window-refact Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3703.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3703 commit 9897413564ff27f0a311cc2cef6322422f3807ab Author: guowei2 guow...@asiainfo.com Date: 2014-10-24T09:55:47Z window function commit 7d7a703d5e7bf37e00d074cb8c04e2150f8fbeb4 Author: guowei2 guow...@asiainfo.com Date: 2014-10-27T05:29:35Z window function commit 1999e07a23c18808738e4e3b14b64c1db108eda2 Author: guowei2 guow...@asiainfo.com Date: 2014-10-27T06:03:17Z window function commit 76bfd4b8b1137426b0dbcde5d56cefb0c98cfab5 Author: guowei2 guow...@asiainfo.com Date: 2014-10-27T14:16:22Z window function commit 88c5789d9f6989d0fedcbdd129de097152e2d8eb Author: guowei2 guow...@asiainfo.com Date: 2014-10-28T04:00:57Z window function commit 828199a48c619d03b4ec524dbdfe9c043baa5e14 Author: guowei2 guow...@asiainfo.com Date: 2014-10-29T07:49:01Z fix problems after rebase commit 03bd77d5533f76484d7589e0296283b58f2d0688 Author: guowei2 guow...@asiainfo.com Date: 2014-10-30T10:12:42Z change test suite and golden files commit d06baeba2dc859f860c8fd43c292275837b3e0e6 Author: guowei2 guow...@asiainfo.com Date: 2014-11-05T03:01:33Z add constant objectinspector support for udafs, such as last_value(col, false) commit 173016c08770fd2aa6ee15c3f194c2282bd46e68 Author: guowei2 guow...@asiainfo.com Date: 2014-11-26T06:58:47Z fix window function to support multi-different window partitions commit ab21933e64b3ee7afdcbb622bec935a34fe0785c Author: guowei2 guow...@asiainfo.com Date: 2014-11-27T08:40:26Z fix DecimalType bug after rebase commit 66ef7a6d449f6ec1e644d2e73118d8be1cb56cde Author: guowei2 guow...@asiainfo.com Date: 2014-11-28T09:51:16Z fix bug about attribute reference commit dc87d8d08c33644e61f6355ed07baf720b0e9ef9 Author: Cheng Lian l...@databricks.com Date: 2014-12-04T06:34:32Z WIP: refactoring window functions support commit 2da61753590fe00ecf46219f387d70d48c6dd32a Author: Cheng Lian l...@databricks.com Date: 2014-12-15T06:42:57Z Removed trailing spaces from query string in HiveWindowFunctionSuite commit 922a8b9bfe0278577378c3cd9fc13cb9998b6e0f Author: Cheng Lian l...@databricks.com Date: 2014-12-15T17:18:42Z Fixes COUNT with window spec --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-67035848 [Test build #24462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24462/consoleFull) for PR 3703 at commit [`922a8b9`](https://github.com/apache/spark/commit/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-67035927 Comments from the [review on Reviewable.io](https://reviewable.io:443/reviews/apache/spark/3703) --- Note that instead of whitelisting window function test cases in `HiveCompatibilitySuite`, a new `HiveWindowFunctionSuite` was added. This is because the current Spark SQL HiveQl parser doesn't handle comments, and window function test input files come with Hive contains comment lines. --- !-- Sent from Reviewable.io -- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-67041844 Comments from the [review on Reviewable.io](https://reviewable.io:443/reviews/apache/spark/3703) --- sup**[sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala, line 30 \[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdE5SkiLKM30za1keNC)** ([raw file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala#L30)):/sup This can be problematic. Ideally every aggregation function that can be used with window should have a `windowSpec: Option[WindowSpec]` field which defaults to `None`, and a `withWindowSpec` method that returns a new instance of the aggregation function object itself with a window spec. --- sup**[sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, line 874 \[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdE660e137A3Dk7Uqp2)** ([raw file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala#L874)):/sup The thread-local `windowDefs` map is used to store window definitions (`w1`, `w2` and `w3`) in queries like this: ```sql SELECT p_mfgr, p_name, p_size, SUM(p_size) OVER w1 AS s1, SUM(p_size) OVER w2 AS s2, SUM(p_size) OVER (w3 ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS s3 FROM part WINDOW w1 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING), w2 AS w3, w3 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ``` This map is cleaned and refilled in `collectWindowDefs` below, so it doesn't grow indefinitely. --- sup**[sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, line 1060 \[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdEA7B6QXd6N0aDAEMh)** ([raw file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala#L1060)):/sup All builtin aggregation functions need a similar `case` clause to handle their windowed version. Otherwise they all fallback to Hive UDAF implementations. `COUNT` is picked here because its Hive version `GenericUDAFCount` implements `GenericUDAFResolver2` rather than `AbstractGenericUDAFResolver`, and is not handled by `HiveFunctionRegistry.lookupFunction`. --- !-- Sent from Reviewable.io -- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-67047485 [Test build #24462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24462/consoleFull) for PR 3703 at commit [`922a8b9`](https://github.com/apache/spark/commit/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: Option[WindowFrame])` * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: Seq[SortOrder])` * `case class WindowFrame(frameType: FrameType, preceding: Int, following: Int)` * `abstract class AggregateExpression extends Expression with Serializable ` * `case class WindowFunction(` * `case class WindowFunction(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3703#issuecomment-67047490 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24462/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org