> On 15 mei 2016, at 22:50, harish singh <harish.sing...@gmail.com> wrote:
> 
> Our DAG (hourly) has 10 tasks (all of them Bash Operators - issuing curl
> commands).
> We run airflow on docker.
> 
> When we do a backfill for, say last 10 days, we see that airflow
> consistently hits the memory limit (4gb) and the container dies (OOM
> Killed).
> 
> We increased the memory to 8gb. I still see mem utilization to be around
> 90%.
> 
> when I do ps -ef, I see a lot of backfill processes. All the process
> running the same command. I use the pid and know more about each process
> (process sys var etc).
> All these processes are exactly the same. Why so many processes?
> 
> 
> Also, my worry is really how much memory is enough? How is the memory
> management done (object pools etc) ?

Airflow runs as many tasks as is defined by parallelism in the config, 
defaulting to 32. If you are backfilling a couple of days it will easily reach 
this limit. 

For every task airflow will run a copy of itself.  It will thus also use its 
own pool of database connections. Which can be quite significant. 

So if you are running your db and airflow in the same container you can indeed 
quite quickly reach 8gb+, also depending you db caching and parallelism 
settings. 

Bolke

Reply via email to