Re: about aggregateByKey of pairrdd.

2017-07-19 Thread Sumedh Wale
On Wednesday 19 July 2017 06:20 PM, qihuagao wrote: java pair rdd has aggregateByKey, which can avoid full shuffle, so have impressive performance. which has parameters, The aggregateByKey function requires 3 parameters: # An intitial ‘zero’ value that will

about aggregateByKey of pairrdd.

2017-07-19 Thread qihuagao
-user-list.1001560.n3.nabble.com/about-aggregateByKey-of-pairrdd-tp28878.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: aggregateByKey on PairRDD

2016-03-30 Thread write2sivakumar@gmail
dev <d...@spark.apache.org> Subject: Re: aggregateByKey on PairRDD Hi,shouldn't groupByKey be avoided (https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html) ? Thank you,.Daniel On Wed, Mar 30, 2016 at 9:01 AM, Akhi

Re: aggregateByKey on PairRDD

2016-03-30 Thread Daniel Haviv
Hi, shouldn't groupByKey be avoided ( https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html) ? Thank you,. Daniel On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das wrote: > Isn't it what

Re: aggregateByKey on PairRDD

2016-03-30 Thread Akhil Das
Isn't it what tempRDD.groupByKey does? Thanks Best Regards On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh wrote: > Hi All, > > I have an RDD having the data in the following form : > > tempRDD: RDD[(String, (String, String))] > > (brand , (product, key)) > >

aggregateByKey on PairRDD

2016-03-29 Thread Suniti Singh
Hi All, I have an RDD having the data in the following form : tempRDD: RDD[(String, (String, String))] (brand , (product, key)) ("amazon",("book1","tech")) ("eBay",("book1","tech")) ("barns",("book","tech")) ("amazon",("book2","tech")) I would like to group the data by Brand and would