Eugene Why don't you create a stack overflow post and give us the link?
That is probably a better way to help you through this. We will need to see what exactly you are doing in your dag files and potentially also hooks / operators. Thanks On Thu, Aug 29, 2019 at 10:41 AM Bacal, Eugene <eba...@paypal.com.invalid> wrote: > Can someone advise if this is expecting behavior, please? > > - DB connections are not being re-used > - Connections stay open while active only 5: > mysql> show global status like 'Thread%'; > +-------------------------+---------+ > | Variable_name | Value | > +-------------------------+---------+ > | Threadpool_idle_threads | 0 | > | Threadpool_threads | 0 | > | Threads_cached | 775 | > | Threads_connected | 5323 | > | Threads_created | 4846609 | > | Threads_running | 5 | > +-------------------------+---------+ > - Workers create hungreds of conections that’s remains open until > DB cleares them (900 sec) > > > > Eugene Bacal > > Workload Automation Engineer > Scheduling Platform Experience > Cell: 4802028764 > Slack: @ebacal > DL: dl-pp-cdp-...@paypal.com > > > Scottsdale, Arizona > > > On 8/21/19, 2:36 PM, "Bacal, Eugene" <eba...@paypal.com.INVALID> wrote: > > Hi Max, > > We have ran few testing today from DB side and noticed that: > > - DB connections are not being re-used > - Connections stay open while active only 5: > mysql> show global status like 'Thread%'; > +-------------------------+---------+ > | Variable_name | Value | > +-------------------------+---------+ > | Threadpool_idle_threads | 0 | > | Threadpool_threads | 0 | > | Threads_cached | 775 | > | Threads_connected | 5323 | > | Threads_created | 4846609 | > | Threads_running | 5 | > +-------------------------+---------+ > - Workers create hungreds of conections that’s remains open until > DB cleares them (900 sec) > > Seems that similar cases were reported some time back(2016), > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-680&data=01%7C01%7Cebacal%40paypal.com%7Cde99ce488b3e479020ca08d7267f9b30%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=YXWNRhht76O1djebOfm0oEwAWanKQNLaW0KWLWrrxms%3D&reserved=0 > > > On 8/20/19, 5:19 PM, "Bacal, Eugene" <eba...@paypal.com.INVALID> > wrote: > > Celery executor. > 12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box > > Worker .cfg: > [core] > sql_alchemy_pool_size = 5 > sql_alchemy_pool_recycle = 900 > sql_alchemy_reconnect_timeout = 300 > parallelism = 1200 > dag_concurrency = 800 > non_pooled_task_slot_count = 1200 > max_active_runs_per_dag = 10 > dagbag_import_timeout = 30 > [celery] > worker_concurrency = 100 > > Scheduler .cfg: > [core] > sql_alchemy_pool_size = 30 > sql_alchemy_pool_recycle = 300 > sql_alchemy_reconnect_timeout = 300 > parallelism = 1200 > dag_concurrency = 800 > non_pooled_task_slot_count = 1200 > max_active_runs_per_dag = 10 > [scheduler] > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > run_duration = 1800 > min_file_process_interval = 10 > min_file_parsing_loop_time = 1 > dag_dir_list_interval = 300 > print_stats_interval = 30 > scheduler_zombie_task_threshold = 300 > max_tis_per_query = 1024 > max_threads = 29 > > > From workers I see 350 + connections at the start time, then it > drops to 200 and then to 1-10 once tasks complete > From Scheduler very low 1-10: > MySQL connections: > 331 worker1 > 215 worker2 > 349 worker53 > 335 worker54 > 347 worker55 > 336 worker56 > 336 worker57 > 354 worker58 > 339 worker59 > 328 worker60 > 333 worker61 > 337 worker62 > 2 scheduler > > > > - Eugene > > > On 8/20/19, 8:51 AM, "Maxime Beauchemin" < > maximebeauche...@gmail.com> wrote: > > Delay between tasks could be due to not having enough worker > slots. What > type of executor are you using, how is it configured? > > Max > > On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene > <eba...@paypal.com.invalid> > wrote: > > > Absolutely possible, Daniel, > > > > We are looking in all directions. Has anyone noticed > performance > > improvements with PostgreSQL vs MySQL ? > > > > -Eugene > > > > > > On 8/15/19, 2:03 PM, "Daniel Standish" <dpstand...@gmail.com> > wrote: > > > > It's not just webserver and scheduler that will parse > your dag file. > > During the execution of a dag run, dag file will be > re-parsed at the > > start > > of every task instance. If you have 1000 tasks running > in short > > period of > > time, that's 1000 queries. It's possible these queries > are piling up > > in a > > queue on your database. Dag read time has to be very > fast for this > > reason. > > > > > > > > On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene > > <eba...@paypal.com.invalid> > > wrote: > > > > > > > > Thank you for your reply, Max > > > > > > Dynamic DAGs query the database for tables and > generates DAGs and > > tasks > > > based on the output. > > > For Python does not take much to execute: > > > > > > Dynamic - 500 tasks: > > > time python PPAD_OIS_MASTER_IDI.py > > > [2019-08-15 12:57:48,522] {settings.py:174} INFO - > > > setting.configure_orm(): Using pool settings. > pool_size=30, > > pool_recycle=300 > > > real 0m1.830s > > > user 0m1.622s > > > sys 0m0.188s > > > > > > > > > Static - 100 tasks: > > > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py > > > [2019-08-15 12:59:24,959] {settings.py:174} INFO - > > > setting.configure_orm(): Using pool settings. > pool_size=30, > > pool_recycle=300 > > > real 0m1.009s > > > user 0m0.898s > > > sys 0m0.108s > > > > > > > > > We have 44 DAGs with 1003 Dynamic tasks. Parsing in > quite time: > > > DagBag parsing time: 3.9385959999999995 > > > > > > Parsing in time of execution, when scheduler submits > the DAGs: > > > DagBag parsing time: 99.820316 > > > > > > Delay between the task run inside a single DAG grow > from 30 sec to > > 10 min, > > > then it drops back even thou tasks are runnign. > > > > > > Eugene > > > > > > > > > > > > > > > > > > On 8/15/19, 11:52 AM, "Maxime Beauchemin" < > > maximebeauche...@gmail.com> > > > wrote: > > > > > > What is your dynamic DAG doing? How long does it > take to execute > > it > > > just as > > > a python script (`time python mydag.py`)? > > > > > > As an Airflow admin, people may want to lower the > DAG parsing > > timeout > > > configuration key to force people to not do crazy > thing in DAG > > module > > > scope. At some point at Airbnb we had someone > running a Hive > > query in > > > DAG > > > scope, clearly that needs to be prevented. > > > > > > Loading DAGs by calling a database can bring all > sorts of > > surprises > > > that > > > can drive everyone crazy. As mentioned in a recent > post, > > > repo-contained, > > > deterministic "less dynamic" DAGs are great, > because they are > > > self-contained and allow you to use source-control > properly > > (revert a > > > bad > > > change for instance). That may mean having a > process or script > > that > > > compiles external things that are dynamic into > things like yaml > > files > > > checked into the code repo. Things as simple as > parsing duration > > become > > > more predictable (network latency and database > load are not part > > of > > > that > > > equation), but more importantly, all changes > become tracked in > > the code > > > repo. > > > > > > yaml parsing in python can be pretty slow too, and > there are > > solutions > > > / > > > alternatives there. Hocon is great. Also > C-accelerated yaml is > > > possible: > > > > > > > > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&data=01%7C01%7Cebacal%40paypal.com%7Cde99ce488b3e479020ca08d7267f9b30%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=HnqMESbF%2F6DiLbt0fCP%2ByJmF27P4Yovq%2Bx57oeLz0C4%3D&reserved=0 > > > > > > Max > > > > > > On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene > > > <eba...@paypal.com.invalid> > > > wrote: > > > > > > > Hello Airflow team, > > > > > > > > Please advise if you can. In our environment, we > have noticed > > that > > > dynamic > > > > tasks place quite of stress on scheduler, > webserver and > > increase > > > MySQL DB > > > > connections. > > > > We are run about 1000 Dynamic Tasks every 30 min > and parsing > > time > > > > increases from 5 to 65 sec with Runtime from > 2sec to 350+ . > > This > > > happens at > > > > execution time then it drops to normal while > still executing > > tasks. > > > > Webserver hangs for few minutes. > > > > > > > > Airflow 1.10.1. > > > > MySQL DB > > > > > > > > Example: > > > > > > > > Dynamic Tasks: > > > > Number of DAGs: 44 > > > > Total task number: 950 > > > > DagBag parsing time: 65.879642000000001 > > > > > > > > Static Tasks: > > > > Number of DAGs: 73 > > > > Total task number: 1351 > > > > DagBag parsing time: 1.731088 > > > > > > > > Is this something you aware of? Any advises on > Dynamic tasks > > > > optimization/best practices? > > > > > > > > Thank you in advance, > > > > Eugene > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >