groupByKey() is a wide dependency and will cause a full shuffle. It's
advised against using this transformation unless you keys are balanced
(well-distributed) and you need a full shuffle.

Otherwise, what you want is aggregateByKey() or reduceByKey() (depending on
the output). These actions are backed by comineByKey(), which can perform
map-side aggregation.

-------
Regards,
Andy

On Mon, Jan 16, 2017 at 2:21 PM, Patrick <titlibat...@gmail.com> wrote:

> Hi,
>
> Does groupByKey has intelligence associated with it, such that if all the
> keys resides in the same partition, it should not do the shuffle?
>
> Or user should write mapPartitions( scala groupBy code).
>
> Which would be more efficient and what are the memory considerations?
>
>
> Thanks
>
>
>
>

Reply via email to