Can you try increasing the ulimit -n on your machine.

On Mon, Feb 23, 2015 at 10:55 PM, Marius Soutier <mps....@gmail.com> wrote:

> Hi Sameer,
>
> I’m still using Spark 1.1.1, I think the default is hash shuffle. No
> external shuffle service.
>
> We are processing gzipped JSON files, the partitions are the amount of
> input files. In my current data set we have ~850 files that amount to 60 GB
> (so ~600 GB uncompressed). We have 5 workers with 8 cores and 48 GB RAM
> each. We extract five different groups of data from this to filter, clean
> and denormalize (i.e. join) it for easier downstream processing.
>
> By the way this code does not seem to complete at all without using
> coalesce() at a low number, 5 or 10 work great. Everything above that make
> it very likely it will crash, even on smaller datasets (~300 files). But
> I’m not sure if this is related to the above issue.
>
>
> On 23.02.2015, at 18:15, Sameer Farooqui <same...@databricks.com> wrote:
>
> Hi Marius,
>
> Are you using the sort or hash shuffle?
>
> Also, do you have the external shuffle service enabled (so that the Worker
> JVM or NodeManager can still serve the map spill files after an Executor
> crashes)?
>
> How many partitions are in your RDDs before and after the problematic
> shuffle operation?
>
>
>
> On Monday, February 23, 2015, Marius Soutier <mps....@gmail.com> wrote:
>
>> Hi guys,
>>
>> I keep running into a strange problem where my jobs start to fail with
>> the dreaded "Resubmitted (resubmitted due to lost executor)” because of
>> having too many temp files from previous runs.
>>
>> Both /var/run and /spill have enough disk space left, but after a given
>> amount of jobs have run, following jobs will struggle with completion.
>> There are a lot of failures without any exception message, only the above
>> mentioned lost executor. As soon as I clear out /var/run/spark/work/ and
>> the spill disk, everything goes back to normal.
>>
>> Thanks for any hint,
>> - Marius
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to