[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2015-03-19 Thread mallman
Github user mallman commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-83691288
  
I'm confused. Why was this PR abruptly closed? Was there another active PR 
for window functions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2015-03-19 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-83722409
  
I think that might have been a mistake and #2953 was supposed to be closed. 
Since we are unable to close PRs, there is (long story) a process that 
eventually closes PRs that have a comment like mind closing this pr. That's 
why it got auto-closed then.

That said I don't otherwise know whether this was going to proceed anyway. 
I don't see other PRs for this JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2015-01-13 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-69768767
  
 Should we close this issue in favor of #3703 ?

@marmbrus This is #3703. Did you mean another PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2014-12-17 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-67418117
  
Should we close this issue in favor of #3703 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2014-12-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3703


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2014-12-15 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/3703

[SPARK-1442][SQL][WIP] Initial window function implementation (refactored 
from #2953)

This WIP PR is refactored from PR #2953. Please refer to the original PR 
description for features implemented and not implemented in this PR.

The original PR was a huge one, commenting on each issue could be very time 
consuming. After offline discussions with @guowei2, I decided to work on a 
refactoring branch to fix most minor issues first and then start discussion 
based on this refactored version.

Major issues left in this PR are:

1. Window spec is added to aggregation functions with a `var`, which breaks 
query plan immutability.
2. When used with window specs, common aggregation functions like `COUNT`, 
`SUM`, `AVG` etc are not translated into Hive aggregation functions rather than 
Spark SQL builtin implementations.
3. Execution code (`execution.WindowFunction`) can be further simplified.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark window-refact

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3703


commit 9897413564ff27f0a311cc2cef6322422f3807ab
Author: guowei2 guow...@asiainfo.com
Date:   2014-10-24T09:55:47Z

window function

commit 7d7a703d5e7bf37e00d074cb8c04e2150f8fbeb4
Author: guowei2 guow...@asiainfo.com
Date:   2014-10-27T05:29:35Z

window function

commit 1999e07a23c18808738e4e3b14b64c1db108eda2
Author: guowei2 guow...@asiainfo.com
Date:   2014-10-27T06:03:17Z

window function

commit 76bfd4b8b1137426b0dbcde5d56cefb0c98cfab5
Author: guowei2 guow...@asiainfo.com
Date:   2014-10-27T14:16:22Z

window function

commit 88c5789d9f6989d0fedcbdd129de097152e2d8eb
Author: guowei2 guow...@asiainfo.com
Date:   2014-10-28T04:00:57Z

window function

commit 828199a48c619d03b4ec524dbdfe9c043baa5e14
Author: guowei2 guow...@asiainfo.com
Date:   2014-10-29T07:49:01Z

fix problems after rebase

commit 03bd77d5533f76484d7589e0296283b58f2d0688
Author: guowei2 guow...@asiainfo.com
Date:   2014-10-30T10:12:42Z

change test suite and golden files

commit d06baeba2dc859f860c8fd43c292275837b3e0e6
Author: guowei2 guow...@asiainfo.com
Date:   2014-11-05T03:01:33Z

add constant objectinspector support for udafs, such as last_value(col, 
false)

commit 173016c08770fd2aa6ee15c3f194c2282bd46e68
Author: guowei2 guow...@asiainfo.com
Date:   2014-11-26T06:58:47Z

fix window function to support multi-different window partitions

commit ab21933e64b3ee7afdcbb622bec935a34fe0785c
Author: guowei2 guow...@asiainfo.com
Date:   2014-11-27T08:40:26Z

fix DecimalType bug after rebase

commit 66ef7a6d449f6ec1e644d2e73118d8be1cb56cde
Author: guowei2 guow...@asiainfo.com
Date:   2014-11-28T09:51:16Z

fix bug about attribute reference

commit dc87d8d08c33644e61f6355ed07baf720b0e9ef9
Author: Cheng Lian l...@databricks.com
Date:   2014-12-04T06:34:32Z

WIP: refactoring window functions support

commit 2da61753590fe00ecf46219f387d70d48c6dd32a
Author: Cheng Lian l...@databricks.com
Date:   2014-12-15T06:42:57Z

Removed trailing spaces from query string in HiveWindowFunctionSuite

commit 922a8b9bfe0278577378c3cd9fc13cb9998b6e0f
Author: Cheng Lian l...@databricks.com
Date:   2014-12-15T17:18:42Z

Fixes COUNT with window spec




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2014-12-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-67035848
  
  [Test build #24462 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24462/consoleFull)
 for   PR 3703 at commit 
[`922a8b9`](https://github.com/apache/spark/commit/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2014-12-15 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-67035927
  
Comments from the [review on 
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/3703)



---

Note that instead of whitelisting window function test cases in 
`HiveCompatibilitySuite`, a new `HiveWindowFunctionSuite` was added. This is 
because the current Spark SQL HiveQl parser doesn't handle comments, and window 
function test input files come with Hive contains comment lines.

---



!-- Sent from Reviewable.io --



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2014-12-15 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-67041844
  
Comments from the [review on 
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/3703)







---


sup**[sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala,
 line 30 
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdE5SkiLKM30za1keNC)**
 ([raw 
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala#L30)):/sup
This can be problematic. Ideally every aggregation function that can be 
used with window should have a `windowSpec: Option[WindowSpec]` field which 
defaults to `None`, and a `withWindowSpec` method that returns a new instance 
of the aggregation function object itself with a window spec.

---

sup**[sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, 
line 874 
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdE660e137A3Dk7Uqp2)**
 ([raw 
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala#L874)):/sup
The thread-local `windowDefs` map is used to store window definitions 
(`w1`, `w2` and `w3`) in queries like this:

```sql
SELECT
p_mfgr, p_name, p_size,
SUM(p_size) OVER w1 AS s1,
SUM(p_size) OVER w2 AS s2,
SUM(p_size) OVER (w3 ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)  AS s3
FROM
part
WINDOW
w1 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN 2 PRECEDING 
AND 2 FOLLOWING),
w2 AS w3,
w3 AS (DISTRIBUTE BY p_mfgr SORT BY p_size RANGE BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW)
```

This map is cleaned and refilled in `collectWindowDefs` below, so it 
doesn't grow indefinitely.

---

sup**[sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala, 
line 1060 
\[r1\]](https://reviewable.io:443/reviews/apache/spark/3703#-JdEA7B6QXd6N0aDAEMh)**
 ([raw 
file](https://github.com/apache/spark/blob/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala#L1060)):/sup
All builtin aggregation functions need a similar `case` clause to handle 
their windowed version. Otherwise they all fallback to Hive UDAF 
implementations.

`COUNT` is picked here because its Hive version `GenericUDAFCount` 
implements `GenericUDAFResolver2` rather than `AbstractGenericUDAFResolver`, 
and is not handled by `HiveFunctionRegistry.lookupFunction`.

---


!-- Sent from Reviewable.io --



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2014-12-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-67047485
  
  [Test build #24462 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24462/consoleFull)
 for   PR 3703 at commit 
[`922a8b9`](https://github.com/apache/spark/commit/922a8b9bfe0278577378c3cd9fc13cb9998b6e0f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: 
Option[WindowFrame])`
  * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: 
Seq[SortOrder])`
  * `case class WindowFrame(frameType: FrameType, preceding: Int, 
following: Int)`
  * `abstract class AggregateExpression extends Expression with 
Serializable `
  * `case class WindowFunction(`
  * `case class WindowFunction(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Initial window function...

2014-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3703#issuecomment-67047490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24462/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org