Hi Stephen,

If you use aggregate functions or reduceGroup on KeyValueGroupedDataset it
behaves as reduceByKey on RDD.

Only if you use flatMapGroups and mapGroups  it behaves as groupByKey on
RDD and if you read the API documentation it warns of using the API.

Hope this helps.

Thanks
Ankur

On Fri, Apr 7, 2017 at 7:26 AM, Stephen Fletcher <stephen.fletc...@gmail.com
> wrote:

> Are there plans to add reduceByKey to dataframes, Since switching over to
> spark 2 I find myself increasing dissatisfied with the idea of converting
> dataframes to RDD to do procedural programming on grouped data(both from a
> ease of programming stance and performance stance). So I've been using
> Dataframe's experimental groupByKey and flatMapGroups which perform
> extremely well, I'm guessing because of the encoders, but the amount of
> data being transfers is a little excessive. Is there any plans to port
> reduceByKey ( and additionally a reduceByKeyleft and right)?
>
  • reducebykey Stephen Fletcher
    • Re: reducebykey Ankur Srivastava

Reply via email to