groupByKey() is a wide dependency and will cause a full shuffle. It's
advised against using this transformation unless you keys are balanced
(well-distributed) and you need a full shuffle.
Otherwise, what you want is aggregateByKey() or reduceByKey() (depending on
the output). These actions are
Hi,
Does groupByKey has intelligence associated with it, such that if all the
keys resides in the same partition, it should not do the shuffle?
Or user should write mapPartitions( scala groupBy code).
Which would be more efficient and what are the memory considerations?
Thanks