Thank you for your reply, Max Dynamic DAGs query the database for tables and generates DAGs and tasks based on the output. For Python does not take much to execute:
Dynamic - 500 tasks: time python PPAD_OIS_MASTER_IDI.py [2019-08-15 12:57:48,522] {settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=30, pool_recycle=300 real 0m1.830s user 0m1.622s sys 0m0.188s Static - 100 tasks: time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py [2019-08-15 12:59:24,959] {settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=30, pool_recycle=300 real 0m1.009s user 0m0.898s sys 0m0.108s We have 44 DAGs with 1003 Dynamic tasks. Parsing in quite time: DagBag parsing time: 3.9385959999999995 Parsing in time of execution, when scheduler submits the DAGs: DagBag parsing time: 99.820316 Delay between the task run inside a single DAG grow from 30 sec to 10 min, then it drops back even thou tasks are runnign. Eugene On 8/15/19, 11:52 AM, "Maxime Beauchemin" <maximebeauche...@gmail.com> wrote: What is your dynamic DAG doing? How long does it take to execute it just as a python script (`time python mydag.py`)? As an Airflow admin, people may want to lower the DAG parsing timeout configuration key to force people to not do crazy thing in DAG module scope. At some point at Airbnb we had someone running a Hive query in DAG scope, clearly that needs to be prevented. Loading DAGs by calling a database can bring all sorts of surprises that can drive everyone crazy. As mentioned in a recent post, repo-contained, deterministic "less dynamic" DAGs are great, because they are self-contained and allow you to use source-control properly (revert a bad change for instance). That may mean having a process or script that compiles external things that are dynamic into things like yaml files checked into the code repo. Things as simple as parsing duration become more predictable (network latency and database load are not part of that equation), but more importantly, all changes become tracked in the code repo. yaml parsing in python can be pretty slow too, and there are solutions / alternatives there. Hocon is great. Also C-accelerated yaml is possible: https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&data=01%7C01%7Cebacal%40paypal.com%7Cb01b585b5bf348b7ee4808d721b1c363%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=n05lhbbyxOVY96UgCkOOg7zRVZD0KD78oD98RotL224%3D&reserved=0 Max On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene <eba...@paypal.com.invalid> wrote: > Hello Airflow team, > > Please advise if you can. In our environment, we have noticed that dynamic > tasks place quite of stress on scheduler, webserver and increase MySQL DB > connections. > We are run about 1000 Dynamic Tasks every 30 min and parsing time > increases from 5 to 65 sec with Runtime from 2sec to 350+ . This happens at > execution time then it drops to normal while still executing tasks. > Webserver hangs for few minutes. > > Airflow 1.10.1. > MySQL DB > > Example: > > Dynamic Tasks: > Number of DAGs: 44 > Total task number: 950 > DagBag parsing time: 65.879642000000001 > > Static Tasks: > Number of DAGs: 73 > Total task number: 1351 > DagBag parsing time: 1.731088 > > Is this something you aware of? Any advises on Dynamic tasks > optimization/best practices? > > Thank you in advance, > Eugene > > >