[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-09-21 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 @cloud-fan i thought about this a little more, and my suggested changes to the Aggregator api does not allow one to use a different encoder when applying a typed operation on Dataset. so i do

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-06 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 for example with this branch you can do: ``` val df3 = Seq(("a", "x", 1), ("a", "y", 3), ("b", "x", 3)).toDF("i", "j", "k") df3.groupBy("i").agg( ComplexResultAgg.apply("i",

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-06 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 well that was sort of what i was trying to achieve. the unit tests i added were for using Aggregator for untyped grouping(```groupBy```). and i think for it to be useful within that

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13512 a possible approach may be to enable untyped grouping(`groupBy`) support typed aggregating(`Aggregator`), i.e. something like `df.groupBy("i").keyAs[Int].agg(ComplexResultAgg.toColumn)` --- If

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-06 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 If Aggregator is designed for typed Dataset only then that is a bit of a shame, because its a elegant and generic api that should be useful for DataFrame too. this causes fragmentation

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13512 `Aggregator` API is designed for typed `Dataset` only, not for untyped `DataFrame`. It can work if users use `Row` as the input type of `Aggregator`, but it's not a recommended usage. On

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-05 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 @cloud-fan i am running into some trouble updating my branch to the latest master. i get errors in tests due to Analyzer.validateTopLevelTupleFields the issue seems to be that in

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-05 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 @cloud-fan from the (added) unit tests: ``` val df2 = Seq("a" -> 1, "a" -> 3, "b" -> 3).toDF("i", "j") checkAnswer(df2.groupBy("i").agg(ComplexResultAgg.toColumn), Row("a",

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13512 Can you give some examples to show how this PR make the aggregator API more friendly and easier to use? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-04 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 **[Test build #5 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)** for PR 13512 at commit

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-04 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/5/ Test FAILed. ---

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-04 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-06-04 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 **[Test build #5 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)** for PR 13512 at commit