In a nutshell : because it’s moving all of your data, compared to other operations (e.g. reduce) that summarize it in one form or another before moving it.
For the longer answer: http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html — FG On Wed, Feb 18, 2015 at 10:33 AM, shahab <shahab.mok...@gmail.com> wrote: > Hi, > Based on what I could see in the Spark UI, I noticed that "groupBy" > transformation is quite slow (taking a lot of time) compared to other > operations. > Is there any reason that groupBy is slow? > shahab