Celery executor. 12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box
Worker .cfg: [core] sql_alchemy_pool_size = 5 sql_alchemy_pool_recycle = 900 sql_alchemy_reconnect_timeout = 300 parallelism = 1200 dag_concurrency = 800 non_pooled_task_slot_count = 1200 max_active_runs_per_dag = 10 dagbag_import_timeout = 30 [celery] worker_concurrency = 100 Scheduler .cfg: [core] sql_alchemy_pool_size = 30 sql_alchemy_pool_recycle = 300 sql_alchemy_reconnect_timeout = 300 parallelism = 1200 dag_concurrency = 800 non_pooled_task_slot_count = 1200 max_active_runs_per_dag = 10 [scheduler] job_heartbeat_sec = 5 scheduler_heartbeat_sec = 5 run_duration = 1800 min_file_process_interval = 10 min_file_parsing_loop_time = 1 dag_dir_list_interval = 300 print_stats_interval = 30 scheduler_zombie_task_threshold = 300 max_tis_per_query = 1024 max_threads = 29 From workers I see 350 + connections at the start time, then it drops to 200 and then to 1-10 once tasks complete From Scheduler very low 1-10: MySQL connections: 331 worker1 215 worker2 349 worker53 335 worker54 347 worker55 336 worker56 336 worker57 354 worker58 339 worker59 328 worker60 333 worker61 337 worker62 2 scheduler - Eugene On 8/20/19, 8:51 AM, "Maxime Beauchemin" <maximebeauche...@gmail.com> wrote: Delay between tasks could be due to not having enough worker slots. What type of executor are you using, how is it configured? Max On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene <eba...@paypal.com.invalid> wrote: > Absolutely possible, Daniel, > > We are looking in all directions. Has anyone noticed performance > improvements with PostgreSQL vs MySQL ? > > -Eugene > > > On 8/15/19, 2:03 PM, "Daniel Standish" <dpstand...@gmail.com> wrote: > > It's not just webserver and scheduler that will parse your dag file. > During the execution of a dag run, dag file will be re-parsed at the > start > of every task instance. If you have 1000 tasks running in short > period of > time, that's 1000 queries. It's possible these queries are piling up > in a > queue on your database. Dag read time has to be very fast for this > reason. > > > > On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene > <eba...@paypal.com.invalid> > wrote: > > > > > Thank you for your reply, Max > > > > Dynamic DAGs query the database for tables and generates DAGs and > tasks > > based on the output. > > For Python does not take much to execute: > > > > Dynamic - 500 tasks: > > time python PPAD_OIS_MASTER_IDI.py > > [2019-08-15 12:57:48,522] {settings.py:174} INFO - > > setting.configure_orm(): Using pool settings. pool_size=30, > pool_recycle=300 > > real 0m1.830s > > user 0m1.622s > > sys 0m0.188s > > > > > > Static - 100 tasks: > > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py > > [2019-08-15 12:59:24,959] {settings.py:174} INFO - > > setting.configure_orm(): Using pool settings. pool_size=30, > pool_recycle=300 > > real 0m1.009s > > user 0m0.898s > > sys 0m0.108s > > > > > > We have 44 DAGs with 1003 Dynamic tasks. Parsing in quite time: > > DagBag parsing time: 3.9385959999999995 > > > > Parsing in time of execution, when scheduler submits the DAGs: > > DagBag parsing time: 99.820316 > > > > Delay between the task run inside a single DAG grow from 30 sec to > 10 min, > > then it drops back even thou tasks are runnign. > > > > Eugene > > > > > > > > > > > > On 8/15/19, 11:52 AM, "Maxime Beauchemin" < > maximebeauche...@gmail.com> > > wrote: > > > > What is your dynamic DAG doing? How long does it take to execute > it > > just as > > a python script (`time python mydag.py`)? > > > > As an Airflow admin, people may want to lower the DAG parsing > timeout > > configuration key to force people to not do crazy thing in DAG > module > > scope. At some point at Airbnb we had someone running a Hive > query in > > DAG > > scope, clearly that needs to be prevented. > > > > Loading DAGs by calling a database can bring all sorts of > surprises > > that > > can drive everyone crazy. As mentioned in a recent post, > > repo-contained, > > deterministic "less dynamic" DAGs are great, because they are > > self-contained and allow you to use source-control properly > (revert a > > bad > > change for instance). That may mean having a process or script > that > > compiles external things that are dynamic into things like yaml > files > > checked into the code repo. Things as simple as parsing duration > become > > more predictable (network latency and database load are not part > of > > that > > equation), but more importantly, all changes become tracked in > the code > > repo. > > > > yaml parsing in python can be pretty slow too, and there are > solutions > > / > > alternatives there. Hocon is great. Also C-accelerated yaml is > > possible: > > > > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&data=01%7C01%7Cebacal%40paypal.com%7C59b81b86012c45cfb4ef08d7258648ce%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=PfChjwvsJ6BuE4GPvsXZk0ab4NLU4P5dZ4c5YWwkl1E%3D&reserved=0 > > > > Max > > > > On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene > > <eba...@paypal.com.invalid> > > wrote: > > > > > Hello Airflow team, > > > > > > Please advise if you can. In our environment, we have noticed > that > > dynamic > > > tasks place quite of stress on scheduler, webserver and > increase > > MySQL DB > > > connections. > > > We are run about 1000 Dynamic Tasks every 30 min and parsing > time > > > increases from 5 to 65 sec with Runtime from 2sec to 350+ . > This > > happens at > > > execution time then it drops to normal while still executing > tasks. > > > Webserver hangs for few minutes. > > > > > > Airflow 1.10.1. > > > MySQL DB > > > > > > Example: > > > > > > Dynamic Tasks: > > > Number of DAGs: 44 > > > Total task number: 950 > > > DagBag parsing time: 65.879642000000001 > > > > > > Static Tasks: > > > Number of DAGs: 73 > > > Total task number: 1351 > > > DagBag parsing time: 1.731088 > > > > > > Is this something you aware of? Any advises on Dynamic tasks > > > optimization/best practices? > > > > > > Thank you in advance, > > > Eugene > > > > > > > > > > > > > > > > > >