Celery executor.
12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box
Worker .cfg:
[core]
sql_alchemy_pool_size = 5
sql_alchemy_pool_recycle = 900
sql_alchemy_reconnect_timeout = 300
parallelism = 1200
dag_concurrency = 800
non_pooled_task_slot_count = 1200
max_active_runs_per_dag = 10
dagbag_import_timeout = 30
[celery]
worker_concurrency = 100
Scheduler .cfg:
[core]
sql_alchemy_pool_size = 30
sql_alchemy_pool_recycle = 300
sql_alchemy_reconnect_timeout = 300
parallelism = 1200
dag_concurrency = 800
non_pooled_task_slot_count = 1200
max_active_runs_per_dag = 10
[scheduler]
job_heartbeat_sec = 5
scheduler_heartbeat_sec = 5
run_duration = 1800
min_file_process_interval = 10
min_file_parsing_loop_time = 1
dag_dir_list_interval = 300
print_stats_interval = 30
scheduler_zombie_task_threshold = 300
max_tis_per_query = 1024
max_threads = 29
From workers I see 350 + connections at the start time, then it drops to 200
and then to 1-10 once tasks complete
From Scheduler very low 1-10:
MySQL connections:
331 worker1
215 worker2
349 worker53
335 worker54
347 worker55
336 worker56
336 worker57
354 worker58
339 worker59
328 worker60
333 worker61
337 worker62
2 scheduler
- Eugene
On 8/20/19, 8:51 AM, "Maxime Beauchemin" <[email protected]> wrote:
Delay between tasks could be due to not having enough worker slots. What
type of executor are you using, how is it configured?
Max
On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene <[email protected]>
wrote:
> Absolutely possible, Daniel,
>
> We are looking in all directions. Has anyone noticed performance
> improvements with PostgreSQL vs MySQL ?
>
> -Eugene
>
>
> On 8/15/19, 2:03 PM, "Daniel Standish" <[email protected]> wrote:
>
> It's not just webserver and scheduler that will parse your dag file.
> During the execution of a dag run, dag file will be re-parsed at the
> start
> of every task instance. If you have 1000 tasks running in short
> period of
> time, that's 1000 queries. It's possible these queries are piling up
> in a
> queue on your database. Dag read time has to be very fast for this
> reason.
>
>
>
> On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene
> <[email protected]>
> wrote:
>
> >
> > Thank you for your reply, Max
> >
> > Dynamic DAGs query the database for tables and generates DAGs and
> tasks
> > based on the output.
> > For Python does not take much to execute:
> >
> > Dynamic - 500 tasks:
> > time python PPAD_OIS_MASTER_IDI.py
> > [2019-08-15 12:57:48,522] {settings.py:174} INFO -
> > setting.configure_orm(): Using pool settings. pool_size=30,
> pool_recycle=300
> > real 0m1.830s
> > user 0m1.622s
> > sys 0m0.188s
> >
> >
> > Static - 100 tasks:
> > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py
> > [2019-08-15 12:59:24,959] {settings.py:174} INFO -
> > setting.configure_orm(): Using pool settings. pool_size=30,
> pool_recycle=300
> > real 0m1.009s
> > user 0m0.898s
> > sys 0m0.108s
> >
> >
> > We have 44 DAGs with 1003 Dynamic tasks. Parsing in quite time:
> > DagBag parsing time: 3.9385959999999995
> >
> > Parsing in time of execution, when scheduler submits the DAGs:
> > DagBag parsing time: 99.820316
> >
> > Delay between the task run inside a single DAG grow from 30 sec to
> 10 min,
> > then it drops back even thou tasks are runnign.
> >
> > Eugene
> >
> >
> >
> >
> >
> > On 8/15/19, 11:52 AM, "Maxime Beauchemin" <
> [email protected]>
> > wrote:
> >
> > What is your dynamic DAG doing? How long does it take to execute
> it
> > just as
> > a python script (`time python mydag.py`)?
> >
> > As an Airflow admin, people may want to lower the DAG parsing
> timeout
> > configuration key to force people to not do crazy thing in DAG
> module
> > scope. At some point at Airbnb we had someone running a Hive
> query in
> > DAG
> > scope, clearly that needs to be prevented.
> >
> > Loading DAGs by calling a database can bring all sorts of
> surprises
> > that
> > can drive everyone crazy. As mentioned in a recent post,
> > repo-contained,
> > deterministic "less dynamic" DAGs are great, because they are
> > self-contained and allow you to use source-control properly
> (revert a
> > bad
> > change for instance). That may mean having a process or script
> that
> > compiles external things that are dynamic into things like yaml
> files
> > checked into the code repo. Things as simple as parsing duration
> become
> > more predictable (network latency and database load are not part
> of
> > that
> > equation), but more importantly, all changes become tracked in
> the code
> > repo.
> >
> > yaml parsing in python can be pretty slow too, and there are
> solutions
> > /
> > alternatives there. Hocon is great. Also C-accelerated yaml is
> > possible:
> >
> >
>
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&data=01%7C01%7Cebacal%40paypal.com%7C59b81b86012c45cfb4ef08d7258648ce%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=PfChjwvsJ6BuE4GPvsXZk0ab4NLU4P5dZ4c5YWwkl1E%3D&reserved=0
> >
> > Max
> >
> > On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene
> > <[email protected]>
> > wrote:
> >
> > > Hello Airflow team,
> > >
> > > Please advise if you can. In our environment, we have noticed
> that
> > dynamic
> > > tasks place quite of stress on scheduler, webserver and
> increase
> > MySQL DB
> > > connections.
> > > We are run about 1000 Dynamic Tasks every 30 min and parsing
> time
> > > increases from 5 to 65 sec with Runtime from 2sec to 350+ .
> This
> > happens at
> > > execution time then it drops to normal while still executing
> tasks.
> > > Webserver hangs for few minutes.
> > >
> > > Airflow 1.10.1.
> > > MySQL DB
> > >
> > > Example:
> > >
> > > Dynamic Tasks:
> > > Number of DAGs: 44
> > > Total task number: 950
> > > DagBag parsing time: 65.879642000000001
> > >
> > > Static Tasks:
> > > Number of DAGs: 73
> > > Total task number: 1351
> > > DagBag parsing time: 1.731088
> > >
> > > Is this something you aware of? Any advises on Dynamic tasks
> > > optimization/best practices?
> > >
> > > Thank you in advance,
> > > Eugene
> > >
> > >
> > >
> >
> >
> >
>
>
>