Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
@cloud-fan i thought about this a little more, and my suggested changes to
the Aggregator api does not allow one to use a different encoder when applying
a typed operation on Dataset. so i do
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
for example with this branch you can do:
```
val df3 = Seq(("a", "x", 1), ("a", "y", 3), ("b", "x", 3)).toDF("i", "j",
"k")
df3.groupBy("i").agg(
ComplexResultAgg.apply("i",
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
well that was sort of what i was trying to achieve. the unit tests i added
were for using Aggregator for untyped grouping(```groupBy```).
and i think for it to be useful within that
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/13512
a possible approach may be to enable untyped grouping(`groupBy`) support
typed aggregating(`Aggregator`), i.e. something like
`df.groupBy("i").keyAs[Int].agg(ComplexResultAgg.toColumn)`
---
If
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
If Aggregator is designed for typed Dataset only then that is a bit of a
shame, because its a elegant and generic api that should be useful for
DataFrame too. this causes fragmentation
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/13512
`Aggregator` API is designed for typed `Dataset` only, not for untyped
`DataFrame`. It can work if users use `Row` as the input type of `Aggregator`,
but it's not a recommended usage.
On
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
@cloud-fan i am running into some trouble updating my branch to the latest
master. i get errors in tests due to Analyzer.validateTopLevelTupleFields
the issue seems to be that in
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
@cloud-fan from the (added) unit tests:
```
val df2 = Seq("a" -> 1, "a" -> 3, "b" -> 3).toDF("i", "j")
checkAnswer(df2.groupBy("i").agg(ComplexResultAgg.toColumn),
Row("a",
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/13512
Can you give some examples to show how this PR make the aggregator API more
friendly and easier to use?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
**[Test build #5 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
for PR 13512 at commit
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/5/
Test FAILed.
---
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user koertkuipers commented on the issue:
https://github.com/apache/spark/pull/13512
**[Test build #5 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
for PR 13512 at commit
13 matches
Mail list logo