In a nutshell : because it’s moving all of your data, compared to other 
operations (e.g. reduce) that summarize it in one form or another before moving 
it.




For the longer answer:

http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html



—
FG

On Wed, Feb 18, 2015 at 10:33 AM, shahab <shahab.mok...@gmail.com> wrote:

> Hi,
> Based on what I could see in the Spark UI, I noticed that  "groupBy"
> transformation is quite slow (taking a lot of time) compared to other
> operations.
> Is there any reason that groupBy is slow?
> shahab

Reply via email to