Yeah did that already (65k). We also disabled swapping and reduced the amount
of memory allocated to Spark (available - 4). This seems to have resolved the
situation.
Thanks!
On 26.02.2015, at 05:43, Raghavendra Pandey raghavendra.pan...@gmail.com
wrote:
Can you try increasing the ulimit
Can you try increasing the ulimit -n on your machine.
On Mon, Feb 23, 2015 at 10:55 PM, Marius Soutier mps@gmail.com wrote:
Hi Sameer,
I’m still using Spark 1.1.1, I think the default is hash shuffle. No
external shuffle service.
We are processing gzipped JSON files, the partitions are
Hi Marius,
Are you using the sort or hash shuffle?
Also, do you have the external shuffle service enabled (so that the Worker
JVM or NodeManager can still serve the map spill files after an Executor
crashes)?
How many partitions are in your RDDs before and after the problematic
shuffle
Hi Sameer,
I’m still using Spark 1.1.1, I think the default is hash shuffle. No external
shuffle service.
We are processing gzipped JSON files, the partitions are the amount of input
files. In my current data set we have ~850 files that amount to 60 GB (so ~600
GB uncompressed). We have 5
Hi guys,
I keep running into a strange problem where my jobs start to fail with the
dreaded Resubmitted (resubmitted due to lost executor)” because of having too
many temp files from previous runs.
Both /var/run and /spill have enough disk space left, but after a given amount
of jobs have