Re: GroupedDataset needs a mapValues

2016-02-14 Thread Koert Kuipers
great, by adding a little implicit wrapper i can use algebird's MonoidAggregator, which gives me the equivalent of GroupedDataset.mapValues (by using Aggregator.composePrepare) i am a little surprised you require a monoid and not just a semiring. but probably the right choice given possibly empty

Re: GroupedDataset needs a mapValues

2016-02-14 Thread Andy Davidson
Hi Michael From: Michael Armbrust <mich...@databricks.com> Date: Saturday, February 13, 2016 at 9:31 PM To: Koert Kuipers <ko...@tresata.com> Cc: "user @spark" <user@spark.apache.org> Subject: Re: GroupedDataset needs a mapValues > Instead of grouping wit

GroupedDataset needs a mapValues

2016-02-13 Thread Koert Kuipers
i have a Dataset[(K, V)] i would like to group by k and then reduce V using a function (V, V) => V how do i do this? i would expect something like: val ds = Dataset[(K, V)] ds.groupBy(_._1).mapValues(_._2).reduce(f) or better: ds.grouped.reduce(f) # grouped only works on Dataset[(_, _)] and i

Re: GroupedDataset needs a mapValues

2016-02-13 Thread Michael Armbrust
Instead of grouping with a lambda function, you can do it with a column expression to avoid materializing an unnecessary tuple: df.groupBy($"_1") Regarding the mapValues, you can do something similar using an Aggregator

Re: GroupedDataset needs a mapValues

2016-02-13 Thread Koert Kuipers
thanks i will look into Aggregator as well On Sun, Feb 14, 2016 at 12:31 AM, Michael Armbrust wrote: > Instead of grouping with a lambda function, you can do it with a column > expression to avoid materializing an unnecessary tuple: > > df.groupBy($"_1") > > Regarding