Re: OOM writing out sorted RDD

2014-08-10 Thread Bharath Ravi Kumar
Update: as expected, switching to kryo merely delays the inevitable. Does anyone have experience controlling memory consumption while processing (e.g. writing out) imbalanced partitions? On 09-Aug-2014 10:41 am, Bharath Ravi Kumar reachb...@gmail.com wrote: Our prototype application reads a 20GB

OOM writing out sorted RDD

2014-08-08 Thread Bharath Ravi Kumar
Our prototype application reads a 20GB dataset from HDFS (nearly 180 partitions), groups it by key, sorts by rank and write out to HDFS in that order. The job runs against two nodes (16G, 24 cores per node available to the job). I noticed that the execution plan results in two sortByKey stages,