Do we have any update on this thread? Has anyone met and solved similar
problems before?
Any pointers will be greatly appreciated!
Best,
XianXing
On Mon, Jun 15, 2015 at 11:48 PM, Jia Yu jia...@asu.edu wrote:
Hi Peng,
I got exactly same error! My shuffle data is also very large. Have you
Are you using yarn?
If yes increase the yarn memory overhead option. Yarn is probably killing
your executors.
Le 26 juin 2015 20:43, XianXing Zhang xianxing.zh...@gmail.com a écrit :
Do we have any update on this thread? Has anyone met and solved similar
problems before?
Any pointers will be
Yes we deployed Spark on top of Yarn.
What you suggested is very helpful, I increased the Yarn memory overhead
option and it helped in most cases. (Sometime it still has some failures
when the amount of data to be shuffled is large, but I guess if I continue
increasing the Yarn memory overhead
Hi Peng,
I got exactly same error! My shuffle data is also very large. Have you
figured out a method to solve that?
Thanks,
Jia
On Fri, Apr 24, 2015 at 7:59 AM, Peng Cheng pc...@uow.edu.au wrote:
I'm deploying a Spark data processing job on an EC2 cluster, the job is
small
for the cluster
I'm deploying a Spark data processing job on an EC2 cluster, the job is small
for the cluster (16 cores with 120G RAM in total), the largest RDD has only
76k+ rows. But heavily skewed in the middle (thus requires repartitioning)
and each row has around 100k of data after serialization. The job