groupByKey() is a wide dependency and will cause a full shuffle. It's advised against using this transformation unless you keys are balanced (well-distributed) and you need a full shuffle.
Otherwise, what you want is aggregateByKey() or reduceByKey() (depending on the output). These actions are backed by comineByKey(), which can perform map-side aggregation. ------- Regards, Andy On Mon, Jan 16, 2017 at 2:21 PM, Patrick <titlibat...@gmail.com> wrote: > Hi, > > Does groupByKey has intelligence associated with it, such that if all the > keys resides in the same partition, it should not do the shuffle? > > Or user should write mapPartitions( scala groupBy code). > > Which would be more efficient and what are the memory considerations? > > > Thanks > > > >