Re: an OOM while persist as DISK_ONLY

2016-03-03 Thread Eugen Cepoi
We are in the process of upgrading to spark 1.6 from 1.4, and had a hard time getting some of our more memory/join intensive jobs to work (rdd caching + a lot of shuffling). Most of the time they were getting killed by yarn. Increasing the overhead was of course an option but the increase to make

Re: an OOM while persist as DISK_ONLY

2016-03-03 Thread Ted Yu
bq. that solved some problems Is there any problem that was not solved by the tweak ? Thanks On Thu, Mar 3, 2016 at 4:11 PM, Eugen Cepoi wrote: > You can limit the amount of memory spark will use for shuffle even in 1.6. > You can do that by tweaking the

Re: an OOM while persist as DISK_ONLY

2016-03-03 Thread Andy Dang
Spark shuffling algorithm is very aggressive in storing everything in RAM, and the behavior is worse in 1.6 with the UnifiedMemoryManagement. At least in previous versions you can limit the shuffler memory, but Spark 1.6 will use as much memory as it can get. What I see is that Spark seems to

an OOM while persist as DISK_ONLY

2016-02-22 Thread Alex Dzhagriev
Hello all, I'm using spark 1.6 and trying to cache a dataset which is 1.5 TB, I have only ~800GB RAM in total, so I am choosing the DISK_ONLY storage level. Unfortunately, I'm getting out of the overhead memory limit: Container killed by YARN for exceeding memory limits. 27.0 GB of 27 GB