On Wednesday 19 July 2017 06:20 PM,
qihuagao wrote:
java pair rdd has aggregateByKey, which can avoid full shuffle, so have
impressive performance. which has parameters,
The aggregateByKey function requires 3 parameters:
# An intitial ‘zero’ value that will
-user-list.1001560.n3.nabble.com/about-aggregateByKey-of-pairrdd-tp28878.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
dev
<d...@spark.apache.org>
Subject: Re: aggregateByKey on PairRDD
Hi,shouldn't groupByKey be avoided
(https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html)
?
Thank you,.Daniel
On Wed, Mar 30, 2016 at 9:01 AM, Akhi
Hi,
shouldn't groupByKey be avoided (
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html)
?
Thank you,.
Daniel
On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das
wrote:
> Isn't it what
Isn't it what tempRDD.groupByKey does?
Thanks
Best Regards
On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh
wrote:
> Hi All,
>
> I have an RDD having the data in the following form :
>
> tempRDD: RDD[(String, (String, String))]
>
> (brand , (product, key))
>
>
Hi All,
I have an RDD having the data in the following form :
tempRDD: RDD[(String, (String, String))]
(brand , (product, key))
("amazon",("book1","tech"))
("eBay",("book1","tech"))
("barns",("book","tech"))
("amazon",("book2","tech"))
I would like to group the data by Brand and would