Re: aggregateByKey - external combine function

2016-04-29 Thread Nirav Patel
Any thoughts? I can explain more on problem but basically shuffle data doesn't seem to fit in reducer memory (32GB) and I am looking ways to process them on disk+memory. Thanks On Thu, Apr 28, 2016 at 10:07 AM, Nirav Patel wrote: > Hi, > > I tried to convert a

aggregateByKey - external combine function

2016-04-28 Thread Nirav Patel
Hi, I tried to convert a groupByKey operation to aggregateByKey in a hope to avoid memory and high gc issue when dealing with 200GB of data. I needed to create a Collection of resulting key-value pairs which represent all combinations of given key. My merge fun definition is as follows: private