Delay between tasks could be due to not having enough worker slots. What type of executor are you using, how is it configured?
Max On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene <eba...@paypal.com.invalid> wrote: > Absolutely possible, Daniel, > > We are looking in all directions. Has anyone noticed performance > improvements with PostgreSQL vs MySQL ? > > -Eugene > > > On 8/15/19, 2:03 PM, "Daniel Standish" <dpstand...@gmail.com> wrote: > > It's not just webserver and scheduler that will parse your dag file. > During the execution of a dag run, dag file will be re-parsed at the > start > of every task instance. If you have 1000 tasks running in short > period of > time, that's 1000 queries. It's possible these queries are piling up > in a > queue on your database. Dag read time has to be very fast for this > reason. > > > > On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene > <eba...@paypal.com.invalid> > wrote: > > > > > Thank you for your reply, Max > > > > Dynamic DAGs query the database for tables and generates DAGs and > tasks > > based on the output. > > For Python does not take much to execute: > > > > Dynamic - 500 tasks: > > time python PPAD_OIS_MASTER_IDI.py > > [2019-08-15 12:57:48,522] {settings.py:174} INFO - > > setting.configure_orm(): Using pool settings. pool_size=30, > pool_recycle=300 > > real 0m1.830s > > user 0m1.622s > > sys 0m0.188s > > > > > > Static - 100 tasks: > > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py > > [2019-08-15 12:59:24,959] {settings.py:174} INFO - > > setting.configure_orm(): Using pool settings. pool_size=30, > pool_recycle=300 > > real 0m1.009s > > user 0m0.898s > > sys 0m0.108s > > > > > > We have 44 DAGs with 1003 Dynamic tasks. Parsing in quite time: > > DagBag parsing time: 3.9385959999999995 > > > > Parsing in time of execution, when scheduler submits the DAGs: > > DagBag parsing time: 99.820316 > > > > Delay between the task run inside a single DAG grow from 30 sec to > 10 min, > > then it drops back even thou tasks are runnign. > > > > Eugene > > > > > > > > > > > > On 8/15/19, 11:52 AM, "Maxime Beauchemin" < > maximebeauche...@gmail.com> > > wrote: > > > > What is your dynamic DAG doing? How long does it take to execute > it > > just as > > a python script (`time python mydag.py`)? > > > > As an Airflow admin, people may want to lower the DAG parsing > timeout > > configuration key to force people to not do crazy thing in DAG > module > > scope. At some point at Airbnb we had someone running a Hive > query in > > DAG > > scope, clearly that needs to be prevented. > > > > Loading DAGs by calling a database can bring all sorts of > surprises > > that > > can drive everyone crazy. As mentioned in a recent post, > > repo-contained, > > deterministic "less dynamic" DAGs are great, because they are > > self-contained and allow you to use source-control properly > (revert a > > bad > > change for instance). That may mean having a process or script > that > > compiles external things that are dynamic into things like yaml > files > > checked into the code repo. Things as simple as parsing duration > become > > more predictable (network latency and database load are not part > of > > that > > equation), but more importantly, all changes become tracked in > the code > > repo. > > > > yaml parsing in python can be pretty slow too, and there are > solutions > > / > > alternatives there. Hocon is great. Also C-accelerated yaml is > > possible: > > > > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&data=01%7C01%7Cebacal%40paypal.com%7C52e34ece9af5449f231708d721c41370%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=bpUaQLCbkvcmxSZe9hKW4FaCgxwpX8BTuHNO9wYHpN0%3D&reserved=0 > > > > Max > > > > On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene > > <eba...@paypal.com.invalid> > > wrote: > > > > > Hello Airflow team, > > > > > > Please advise if you can. In our environment, we have noticed > that > > dynamic > > > tasks place quite of stress on scheduler, webserver and > increase > > MySQL DB > > > connections. > > > We are run about 1000 Dynamic Tasks every 30 min and parsing > time > > > increases from 5 to 65 sec with Runtime from 2sec to 350+ . > This > > happens at > > > execution time then it drops to normal while still executing > tasks. > > > Webserver hangs for few minutes. > > > > > > Airflow 1.10.1. > > > MySQL DB > > > > > > Example: > > > > > > Dynamic Tasks: > > > Number of DAGs: 44 > > > Total task number: 950 > > > DagBag parsing time: 65.879642000000001 > > > > > > Static Tasks: > > > Number of DAGs: 73 > > > Total task number: 1351 > > > DagBag parsing time: 1.731088 > > > > > > Is this something you aware of? Any advises on Dynamic tasks > > > optimization/best practices? > > > > > > Thank you in advance, > > > Eugene > > > > > > > > > > > > > > > > > >