[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

2015-04-29 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/5604#issuecomment-97518465 Hi, I have been experimenting with Window functions in Spark SQL as well. It has been partially based on this. You can find my work [here](https://github.com

[GitHub] spark pull request: [SPARK-7322] [SQL] [WIP] Support Window Functi...

2015-05-12 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/6104#issuecomment-101519756 Hi, In the JIRA the following examples is given: ``` df.select( df.store, df.date, df.sales, avg(df.sales).over.partitionBy

[GitHub] spark pull request: [SPARK-7712] [SQL] Move Window Functions from ...

2015-05-20 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/6278 [SPARK-7712] [SQL] Move Window Functions from Hive UDAFS to Spark Native backend [WIP] This PR aims to improve the current window functionality in Spark SQL in the following ways: - Moving

[GitHub] spark pull request: [SPARK-7712] [SQL] Move Window Functions from ...

2015-06-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/6278#issuecomment-110769725 Thanks for starting the test process. I'll take a look at the merge issues today. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-06-27 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/7057 [SPARK-8638] [SQL] Window Function Performance Improvements ## Description Performance improvements for Spark Window functions. This PR will also serve as the basis for moving away from Hive

[GitHub] spark pull request: [SPARK-7712] [SQL] Move Window Functions from ...

2015-06-12 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/6278#issuecomment-111584747 A small status update: - The PR currently passes all tests. - The code has been rebased, but this is a moving target due to the frequent commits to the SQL

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-06-30 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r33641164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +59,622 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-06-30 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r33641261 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala --- @@ -749,7 +749,7 @@ abstract

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-06-30 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r33641591 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDataFrameWindowSuite.scala --- @@ -189,7 +189,7 @@ class HiveDataFrameWindowSuite extends

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-06-30 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r33641018 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -19,17 +19,39 @@ package org.apache.spark.sql.execution

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-06-30 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r33642234 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +59,622 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-06-30 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r33642464 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -19,17 +19,39 @@ package org.apache.spark.sql.execution

[GitHub] spark pull request: [SPARK-9740] [SPARK-9592] [SQL] Change the def...

2015-08-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8113#discussion_r36873052 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -130,24 +147,67 @@ case class First

[GitHub] spark pull request: [SPARK-9740] [SPARK-9592] [SQL] Change the def...

2015-08-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8113#discussion_r36873218 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -130,24 +147,67 @@ case class First

[GitHub] spark pull request: [SPARK-9740] [SQL] Change the default behavior...

2015-08-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8113#discussion_r36818586 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -130,23 +138,36 @@ case class First

[GitHub] spark pull request: [SPARK-9740] [SPARK-9592] [SQL] Change the def...

2015-08-12 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8113#issuecomment-130343660 One more small thing. We should probably also add the ```ignoreNulls``` option to the ```first``` and ```last``` dataframe functions. --- If your project is set up

[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...

2015-08-18 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/8298 [SPARK-10100] [SQL] Perfomance improvements to new MIN/MAX aggregate functions. The new MIN/MAX suffer from a performance regression. This PR aims to fix this by simplifying the evaluation

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-08-21 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/8362 [SPARK-9741][SQL] Approximate Count Distinct using the new UDAF interface. This PR implements a HyperLogLog based Approximate Count Distinct function using the new UDAF interface

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-08-21 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-133566608 Thanks. I was aiming for compatibility with the existing approxCountDistinct, but we can also implement HLL++. HLL++ introduces three (orthogonal

[GitHub] spark pull request: [SPARK-8640] [SQL] Enable Processing of Multip...

2015-07-30 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7515#discussion_r35946650 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -818,7 +818,8 @@ class Analyzer

[GitHub] spark pull request: [SPARK-9482] [SQL] Fix thread-safey issue of u...

2015-08-05 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7940#discussion_r36343368 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoin.scala --- @@ -47,7 +47,7 @@ case class

[GitHub] spark pull request: [SPARK-7712] [SQL] Move Window Functions from ...

2015-07-30 Thread hvanhovell
Github user hvanhovell closed the pull request at: https://github.com/apache/spark/pull/6278 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-7712] [SQL] Move Window Functions from ...

2015-07-30 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/6278#issuecomment-126397148 This PR is redundant now. See SPARK-8638 SPARK-8640 and SPARK-8641 for the proposed implementation. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-9729] [SPARK-9363] [WIP] [SQL] Use sort...

2015-08-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7904#discussion_r36487283 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala --- @@ -56,117 +52,247 @@ case class SortMergeJoin

[GitHub] spark pull request: [SPARK-9357][SQL] Remove JoinedRow/Introduce J...

2015-08-04 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/7942 [SPARK-9357][SQL] Remove JoinedRow/Introduce JoinedProjection [WIP] ```JoinedRow```'s are used to join two rows together, and are used a lot of the most performance critical sections of Spark

[GitHub] spark pull request: [SPARK-9980][BUILD] Fix SBT publishLocal error...

2015-08-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8209#issuecomment-131206616 I have replaced ```p/``` tags by ```p``` in all java files I could find them in. I haven't touched the ```BytesToByteMap```, ```Bin``` and ```PagedTable``` classes

[GitHub] spark pull request: [SPARK-9980][BUILD] Fix SBT publishLocal error...

2015-08-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8209#issuecomment-131210873 I'd rather leave the scala source alone for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-9980][BUILD] Fix SBT publishLocal error...

2015-08-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8209#issuecomment-131198780 I'll change those as well. It is strange that these didn't create problems; I guess it has something to do with the position of the in the line. --- If your

[GitHub] spark pull request: [SPARK-9980][BUILD] Fix SBT publishLocal error...

2015-08-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8209#discussion_r37103311 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -211,10 +211,10 @@ public SparkLauncher addSparkArg(String arg

[GitHub] spark pull request: [SPARK-9980][BUILD] Fix SBT publishLocal error...

2015-08-14 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/8209 [SPARK-9980][BUILD] Fix SBT publishLocal error due to invalid characters in doc Tiny modification to a few comments ```sbt publishLocal``` work again. You can merge this pull request into a Git

[GitHub] spark pull request: [SPARK-10001] [CORE] Allow Ctrl-C in spark-she...

2015-08-15 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8216#discussion_r37136449 --- Diff: core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala --- @@ -17,6 +17,10 @@ package org.apache.spark.scheduler

[GitHub] spark pull request: [SPARK-10001] [CORE] Allow Ctrl-C in spark-she...

2015-08-15 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8216#discussion_r37137085 --- Diff: core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala --- @@ -17,6 +17,10 @@ package org.apache.spark.scheduler

[GitHub] spark pull request: [SPARK-9740] [SQL] Change the default behavior...

2015-08-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8113#discussion_r36818762 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -130,23 +138,36 @@ case class First

[GitHub] spark pull request: [SPARK-9740] [SQL] Change the default behavior...

2015-08-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8113#discussion_r36818719 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -130,23 +138,36 @@ case class First

[GitHub] spark pull request: [SPARK-9740] [SQL] Change the default behavior...

2015-08-11 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8113#issuecomment-130128926 Besides the ```valueSet``` update/merge expressions LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-9740] [SQL] Change the default behavior...

2015-08-11 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8113#issuecomment-130128263 LGTM. One final question, shouldn't we introduce a ```skipNulls``` parameter? Or do you want to address this in a follow-up PR? --- If your project is set up

[GitHub] spark pull request: [SPARK-9740] [SPARK-9592] [SQL] Change the def...

2015-08-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8113#discussion_r36878009 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -130,24 +147,67 @@ case class First

[GitHub] spark pull request: [SPARK-9740] [SPARK-9592] [SQL] Change the def...

2015-08-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8113#discussion_r36877100 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -130,24 +147,67 @@ case class First

[GitHub] spark pull request: [SPARK-8641][SPARK-8712][SQL] Native Spark Win...

2015-07-27 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/7715 [SPARK-8641][SPARK-8712][SQL] Native Spark Window Functions [WIP] This replaces the Hive UDAFs with native Spark SQL UDAFs using the new UDAF interface. See the JIRA ticket for more information

[GitHub] spark pull request: [SPARK-8641][SPARK-8712][SQL] Native Spark Win...

2015-07-27 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7715#issuecomment-125431976 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-24 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7417#discussion_r35433399 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -213,10 +213,51 @@ private[sql] abstract class

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-24 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7417#issuecomment-124562458 @Sephiroth-Lin The performance improvement sounds really good. It seems like a good thing to put in Spark. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-24 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7417#discussion_r35433038 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -213,10 +213,51 @@ private[sql] abstract class

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34844383 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -38,443 +84,661 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-17 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7379#discussion_r34891100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastRangeJoin.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-24 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7417#discussion_r35422390 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastCartesianProduct.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7379#issuecomment-121250819 Current test errors are a bit weird. They shouldn't have been caused by this change, because the functionality is disabled by default. Rebased to most recent

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34602077 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +59,622 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34603151 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +59,622 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34616177 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -38,443 +68,645 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34617761 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -38,443 +68,645 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34618909 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -38,443 +68,645 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34857571 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowSuite.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-16 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7057#issuecomment-122158569 @yhuai the benchmarking results are attached. It might be interesting to see how the operator performs on different datasets. --- If your project is set up

[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-16 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7379#issuecomment-122177553 The = case is quite easy to implement. This implementation is currently targetted at range joining a rather small (broadcastable) to an arbitrarily large

[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7379#discussion_r34862439 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastRangeJoin.scala --- @@ -0,0 +1,411 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34844502 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -38,443 +68,645 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-17 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7379#issuecomment-12257 No problem. ### Supporting N-Ary Predicates. In order to make the range join work we need the predicates to define a single interval for each side

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-19 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7513#issuecomment-122707719 @yhuai Don't think jenkins picked up your OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-19 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7513#issuecomment-122674419 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-19 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/7513 [SPARK-8638] [SQL] Window Function Performance Improvements - Cleanup This PR contains a few clean-ups that are a part of SPARK-8638: a few style issues got fixed, and a few tests were moved

[GitHub] spark pull request: [SPARK-8640] [SQL] Enable Processing of Multip...

2015-07-20 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7515#issuecomment-122912351 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-13 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/7379 [SPARK-8682][SQL][WIP] Range Join *...copied from JIRA (SPARK-8682):* Currently Spark SQL uses a Broadcast Nested Loop join (or a filtered Cartesian Join) when it has to execute

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-21 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7417#issuecomment-123296087 Do you have any benchmarking results for this? Would be great to see how much this improves the current situation. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-21 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7417#discussion_r35098517 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastCartesianProduct.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-15 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7417#discussion_r34681979 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/CartesianProduct.scala --- @@ -34,7 +34,15 @@ case class CartesianProduct(left

[GitHub] spark pull request: [SPARK-9730][SQL] Add Full Outer Join support ...

2015-08-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8383#discussion_r37883244 --- Diff: unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -68,6 +68,19 @@ public static boolean isSet(Object baseObject

[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...

2015-08-25 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8298#issuecomment-134620114 @adrian-wang the improvement is absolutely tiny, about 2-3% if you do a lot of ```min```'s of ```max```'es. This PR was a response to misdiagnosed

[GitHub] spark pull request: [SPARK-9241] [SQL] [WIP] Supporting multiple D...

2015-10-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9280#discussion_r43067869 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/distinctFallback.scala --- @@ -0,0 +1,173

[GitHub] spark pull request: [SPARK-9241] [SQL] [WIP] Supporting multiple D...

2015-10-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9280#discussion_r43116722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/distinctFallback.scala --- @@ -0,0 +1,173

[GitHub] spark pull request: [SPARK-9298][SQL] Add pearson correlation aggr...

2015-10-29 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8587#discussion_r43439837 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -524,6 +525,133 @@ case class Sum(child

[GitHub] spark pull request: [SPARK-9298][SQL] Add pearson correlation aggr...

2015-10-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8587#discussion_r43162271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -524,6 +525,133 @@ case class Sum(child

[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.

2015-10-28 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9339 [SPARK-11388][Build]Fix self closing tags. Java 8 javadoc does not like self closing tags: ``, ``, ... This PR fixes those. You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.

2015-10-29 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9339#issuecomment-152135261 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.

2015-10-29 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9339#issuecomment-152135237 Hmmm, test fails with the following beautiful error: [error] (core/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time

[GitHub] spark pull request: [SPARK-9241] [SQL] [WIP] Supporting multiple D...

2015-10-26 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9280 [SPARK-9241] [SQL] [WIP] Supporting multiple DISTINCT columns This PR adds support for multiple distinct columns to the new aggregation code path. The implementation uses

[GitHub] spark pull request: [SPARK-11594][SQL][REPL] Cannot create UDAF in...

2015-11-09 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9568 [SPARK-11594][SQL][REPL] Cannot create UDAF in REPL This PR enables users to create a UDAF in the REPL without getting a ```java.lang.InternalError```. You can merge this pull request

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44269743 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -141,40 +141,46 @@ class WindowSpec private[sql

[GitHub] spark pull request: [SPARK-11594][SQL][REPL] Cannot create UDAF in...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9568#discussion_r44273796 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/udaf.scala --- @@ -129,6 +129,13 @@ abstract class UserDefinedAggregateFunction extends

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155054384 Made a quick pass. I think the PR is in good shape. I do have a few aditional questions/remarks: * Maybe should add some documentation to ```DeclarativeAggregate

[GitHub] spark pull request: [SPARK-11594][SQL][REPL] Cannot create UDAF in...

2015-11-11 Thread hvanhovell
Github user hvanhovell closed the pull request at: https://github.com/apache/spark/pull/9568 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-11594][SQL][REPL] Cannot create UDAF in...

2015-11-11 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9568#issuecomment-155710292 Move to scala 2.10.5 fixed this. Closing PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11594][SQL][REPL] Cannot create UDAF in...

2015-11-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9568#discussion_r44469882 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala --- @@ -452,7 +452,7 @@ private[sql] case class ScalaUDAF

[GitHub] spark pull request: [WIP] [SPARK-11636] [SQL] Support as for class...

2015-11-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9602#issuecomment-155576674 Does a REPL defined class blow up with a ```java.lang.InternalError```? If it does, then we have the same problem: https://github.com/apache/spark/pull/9568

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9566#issuecomment-155578114 I'll rebase. The potential bug is caused by the fact that the attributes for the distinct columns and the expressions for the distinct columns can possibly

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9566#discussion_r44514711 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala --- @@ -545,19 +576,21 @@ abstract class

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9566#discussion_r44472817 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala --- @@ -151,11 +151,12 @@ case class

[GitHub] spark pull request: [SPARK-11594][SQL][REPL] Cannot create UDAF in...

2015-11-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9568#issuecomment-155589310 Sounds like a good idea. I'll add this in the morning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-11594][SQL][REPL] Cannot create UDAF in...

2015-11-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9568#issuecomment-155582285 This is actually a scala problem: https://issues.scala-lang.org/browse/SI-9051 --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-08 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154825955 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-08 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154826382 Jenkins does not like me... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-08 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154831183 @yhuai can you get jenkins to test this? The bug exposed by this patch affected the regular aggregation path, as soon as we used more than one regular

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-08 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154825193 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-08 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-154984531 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11553] [SQL] Primitive Row accessors sh...

2015-11-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9642#issuecomment-156693107 ```UnsafeRow``` and ```SpecificRow``` have similar problems. Shouldn't we fix those as well? For example: import org.apache.spark.sql.types.IntegerType

[GitHub] spark pull request: [SPARK-11553] [SQL] Primitive Row accessors sh...

2015-11-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9642#issuecomment-156697230 > You mean discard change and only update documentation? Yes this is one option - I even think that it is not so bad. I would only update the documentat

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155225000 PySpark results are correct, just not in the correct form :(... --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155224746 LGTM, pending a successful test build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44343490 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -146,148 +146,105 @@ private[sql] abstract class

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-09 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9566 [SPARK-9241][SQL] Supporting multiple DISTINCT columns - follow-up (3) This PR is a 2nd follow-up for [SPARK-9241](https://issues.apache.org/jira/browse/SPARK-9241). It contains the following

  1   2   3   4   5   6   7   8   9   10   >