Re: Executor lost with too many temp files

2015-02-26 Thread Marius Soutier
Yeah did that already (65k). We also disabled swapping and reduced the amount of memory allocated to Spark (available - 4). This seems to have resolved the situation. Thanks! On 26.02.2015, at 05:43, Raghavendra Pandey raghavendra.pan...@gmail.com wrote: Can you try increasing the ulimit

Re: Executor lost with too many temp files

2015-02-25 Thread Raghavendra Pandey
Can you try increasing the ulimit -n on your machine. On Mon, Feb 23, 2015 at 10:55 PM, Marius Soutier mps@gmail.com wrote: Hi Sameer, I’m still using Spark 1.1.1, I think the default is hash shuffle. No external shuffle service. We are processing gzipped JSON files, the partitions are

Re: Executor lost with too many temp files

2015-02-23 Thread Sameer Farooqui
Hi Marius, Are you using the sort or hash shuffle? Also, do you have the external shuffle service enabled (so that the Worker JVM or NodeManager can still serve the map spill files after an Executor crashes)? How many partitions are in your RDDs before and after the problematic shuffle

Re: Executor lost with too many temp files

2015-02-23 Thread Marius Soutier
Hi Sameer, I’m still using Spark 1.1.1, I think the default is hash shuffle. No external shuffle service. We are processing gzipped JSON files, the partitions are the amount of input files. In my current data set we have ~850 files that amount to 60 GB (so ~600 GB uncompressed). We have 5

Executor lost with too many temp files

2015-02-23 Thread Marius Soutier
Hi guys, I keep running into a strange problem where my jobs start to fail with the dreaded Resubmitted (resubmitted due to lost executor)” because of having too many temp files from previous runs. Both /var/run and /spill have enough disk space left, but after a given amount of jobs have