Hi Irina,

I would question the use of multiple threads in your application. Since
Spark is going to run the processing of each DataFrame on all the cores of
your cluster, the processes will be competing for resources. In fact, they
would not only compete for CPU cores but also for memory.

Spark is designed to run your processes in a sequence, and each process
will be run in a distributed manner (multiple threads on multiple
instances). I would suggest to follow this principle.

Feel free to share to code if you can. It's always helpful so that we can
give better advice.

Alexis

On Thu, Nov 17, 2016 at 8:51 PM, Irina Truong <ir...@parsely.com> wrote:

> We have an application that reads text files, converts them to dataframes,
> and saves them in Parquet format. The application runs fine when processing
> a few files, but we have several thousand produced every day. When running
> the job for all files, we have spark-submit killed on OOM:
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 27226"...
>
> The job is written in Python. We’re running it in Amazon EMR 5.0 (Spark
> 2.0.0) with spark-submit. We’re using a cluster with a master c3.2xlarge
> instance (8 cores and 15g of RAM) and 3 core c3.4xlarge instances (16 cores
> and 30g of RAM each). Spark config settings are as follows:
>
> ('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'),
>
> ('spark.executors.instances', '3'),
>
> ('spark.yarn.executor.memoryOverhead', '9g'),
>
> ('spark.executor.cores', '15'),
>
> ('spark.executor.memory', '12g'),
>
> ('spark.scheduler.mode', 'FIFO'),
>
> ('spark.cleaner.ttl', '1800'),
>
> The job processes each file in a thread, and we have 10 threads running
> concurrently. The process will OOM after about 4 hours, at which point
> Spark has processed over 20,000 jobs.
>
> It seems like the driver is running out of memory, but each individual job
> is quite small. Are there any known memory leaks for long-running Spark
> applications on Yarn?
>



-- 

*Alexis Seigneurin*
*Managing Consultant*
(202) 459-1591 <202%20459.1591> - LinkedIn
<http://www.linkedin.com/in/alexisseigneurin>

<http://ipponusa.com/>
Rate our service <https://www.recommendi.com/app/survey/Eh2ZnWUPTxY>

Reply via email to