great, by adding a little implicit wrapper i can use algebird's
MonoidAggregator, which gives me the equivalent of GroupedDataset.mapValues
(by using Aggregator.composePrepare)
i am a little surprised you require a monoid and not just a semiring. but
probably the right choice given possibly empty
Hi Michael
From: Michael Armbrust <mich...@databricks.com>
Date: Saturday, February 13, 2016 at 9:31 PM
To: Koert Kuipers <ko...@tresata.com>
Cc: "user @spark" <user@spark.apache.org>
Subject: Re: GroupedDataset needs a mapValues
> Instead of grouping wit
i have a Dataset[(K, V)]
i would like to group by k and then reduce V using a function (V, V) => V
how do i do this?
i would expect something like:
val ds = Dataset[(K, V)] ds.groupBy(_._1).mapValues(_._2).reduce(f)
or better:
ds.grouped.reduce(f) # grouped only works on Dataset[(_, _)] and i
Instead of grouping with a lambda function, you can do it with a column
expression to avoid materializing an unnecessary tuple:
df.groupBy($"_1")
Regarding the mapValues, you can do something similar using an Aggregator
thanks i will look into Aggregator as well
On Sun, Feb 14, 2016 at 12:31 AM, Michael Armbrust
wrote:
> Instead of grouping with a lambda function, you can do it with a column
> expression to avoid materializing an unnecessary tuple:
>
> df.groupBy($"_1")
>
> Regarding