Any thoughts?
I can explain more on problem but basically shuffle data doesn't seem to
fit in reducer memory (32GB) and I am looking ways to process them on
disk+memory.
Thanks
On Thu, Apr 28, 2016 at 10:07 AM, Nirav Patel wrote:
> Hi,
>
> I tried to convert a
Hi,
I tried to convert a groupByKey operation to aggregateByKey in a hope to
avoid memory and high gc issue when dealing with 200GB of data.
I needed to create a Collection of resulting key-value pairs which
represent all combinations of given key.
My merge fun definition is as follows:
private