Absolutely possible, Daniel, We are looking in all directions. Has anyone noticed performance improvements with PostgreSQL vs MySQL ?
-Eugene On 8/15/19, 2:03 PM, "Daniel Standish" <dpstand...@gmail.com> wrote: It's not just webserver and scheduler that will parse your dag file. During the execution of a dag run, dag file will be re-parsed at the start of every task instance. If you have 1000 tasks running in short period of time, that's 1000 queries. It's possible these queries are piling up in a queue on your database. Dag read time has to be very fast for this reason. On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene <eba...@paypal.com.invalid> wrote: > > Thank you for your reply, Max > > Dynamic DAGs query the database for tables and generates DAGs and tasks > based on the output. > For Python does not take much to execute: > > Dynamic - 500 tasks: > time python PPAD_OIS_MASTER_IDI.py > [2019-08-15 12:57:48,522] {settings.py:174} INFO - > setting.configure_orm(): Using pool settings. pool_size=30, pool_recycle=300 > real 0m1.830s > user 0m1.622s > sys 0m0.188s > > > Static - 100 tasks: > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py > [2019-08-15 12:59:24,959] {settings.py:174} INFO - > setting.configure_orm(): Using pool settings. pool_size=30, pool_recycle=300 > real 0m1.009s > user 0m0.898s > sys 0m0.108s > > > We have 44 DAGs with 1003 Dynamic tasks. Parsing in quite time: > DagBag parsing time: 3.9385959999999995 > > Parsing in time of execution, when scheduler submits the DAGs: > DagBag parsing time: 99.820316 > > Delay between the task run inside a single DAG grow from 30 sec to 10 min, > then it drops back even thou tasks are runnign. > > Eugene > > > > > > On 8/15/19, 11:52 AM, "Maxime Beauchemin" <maximebeauche...@gmail.com> > wrote: > > What is your dynamic DAG doing? How long does it take to execute it > just as > a python script (`time python mydag.py`)? > > As an Airflow admin, people may want to lower the DAG parsing timeout > configuration key to force people to not do crazy thing in DAG module > scope. At some point at Airbnb we had someone running a Hive query in > DAG > scope, clearly that needs to be prevented. > > Loading DAGs by calling a database can bring all sorts of surprises > that > can drive everyone crazy. As mentioned in a recent post, > repo-contained, > deterministic "less dynamic" DAGs are great, because they are > self-contained and allow you to use source-control properly (revert a > bad > change for instance). That may mean having a process or script that > compiles external things that are dynamic into things like yaml files > checked into the code repo. Things as simple as parsing duration become > more predictable (network latency and database load are not part of > that > equation), but more importantly, all changes become tracked in the code > repo. > > yaml parsing in python can be pretty slow too, and there are solutions > / > alternatives there. Hocon is great. Also C-accelerated yaml is > possible: > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&data=01%7C01%7Cebacal%40paypal.com%7C52e34ece9af5449f231708d721c41370%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=bpUaQLCbkvcmxSZe9hKW4FaCgxwpX8BTuHNO9wYHpN0%3D&reserved=0 > > Max > > On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene > <eba...@paypal.com.invalid> > wrote: > > > Hello Airflow team, > > > > Please advise if you can. In our environment, we have noticed that > dynamic > > tasks place quite of stress on scheduler, webserver and increase > MySQL DB > > connections. > > We are run about 1000 Dynamic Tasks every 30 min and parsing time > > increases from 5 to 65 sec with Runtime from 2sec to 350+ . This > happens at > > execution time then it drops to normal while still executing tasks. > > Webserver hangs for few minutes. > > > > Airflow 1.10.1. > > MySQL DB > > > > Example: > > > > Dynamic Tasks: > > Number of DAGs: 44 > > Total task number: 950 > > DagBag parsing time: 65.879642000000001 > > > > Static Tasks: > > Number of DAGs: 73 > > Total task number: 1351 > > DagBag parsing time: 1.731088 > > > > Is this something you aware of? Any advises on Dynamic tasks > > optimization/best practices? > > > > Thank you in advance, > > Eugene > > > > > > > > >