Hi Eugene - We run an Airflow deployment that executes ~6,000 hourly tasks across 32 dynamic DAGs - happy to jump on a call some time and talk about our setup, which now includes a step to generate one file per dynamic DAG, so that each DAG gets its own scheduler loop. Hit me up r...@astronomer.io if interested. We want to write a guide about it, but it's been hard to find time lately for that.
-Ry Sent via Superhuman ( https://sprh.mn/?vip=r...@rywalker.com ) On Thu, Sep 12, 2019 at 1:21 PM, Eugene Bacal < eba...@paypal.com.invalid > wrote: > > > > Wanted to ask for an advise... > > > > Because I run 1300 task every 30 min in parallel such scenario creates a > lot of DB connections that freezes the UI. I would like to segregate > result_backend and Airflow DB. > > > > Current setup: > broker_url = rabbitmq > result_backend = mysqldb > sql_alchemy_conn = mysqldb > > > > New setup: > broker_url = redis > result_backend = redis > sql_alchemy_conn = mysqldb > > > > I have tested it and see 2x reduction connection wise. But is there > something else I maybe missing, since Airflow does not recommend to use > redis as result_backend > > > > Please advise, > Eugene > > > > On 8/29/19, 4:35 PM, "Bacal, Eugene" < ebacal@ paypal. com. INVALID ( > eba...@paypal.com.INVALID ) > wrote: > > > > Thank you, Daniel! Sorry for spamming. > > > > https:/ / nam03. safelinks. protection. outlook. com/ > ?url=https%3A%2F%2Fstackoverflow. > com%2Fquestions%2F57718658%2Fairflow-celery-workers-too-many-mysql-connections&data=01%7C01%7Cebacal%40paypal. > com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=YcjJBkRhpHzT1UFSwdJabCDMAdKvTDu6eujQOB4dPhk%3D&reserved=0 > ( > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F57718658%2Fairflow-celery-workers-too-many-mysql-connections&data=01%7C01%7Cebacal%40paypal.com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=YcjJBkRhpHzT1UFSwdJabCDMAdKvTDu6eujQOB4dPhk%3D&reserved=0 > ) > > > > On 8/29/19, 1:14 PM, "Daniel Standish" < dpstandish@ gmail. com ( > dpstand...@gmail.com ) > wrote: > > > > Eugene > > > > Why don't you create a stack overflow post and give us the link? > > > > That is probably a better way to help you through this. > > > > We will need to see what exactly you are doing in your dag files and > potentially also hooks / operators. > > > > Thanks > > > > On Thu, Aug 29, 2019 at 10:41 AM Bacal, Eugene < ebacal@ paypal. com. invalid > ( eba...@paypal.com.invalid ) > wrote: > > >> >> >> Can someone advise if this is expecting behavior, please? >> >> >> >> - DB connections are not being re-used >> - Connections stay open while active only 5: >> mysql> show global status like 'Thread%'; >> +-------------------------+---------+ >> | Variable_name | Value | >> +-------------------------+---------+ >> | Threadpool_idle_threads | 0 | >> | Threadpool_threads | 0 | >> | Threads_cached | 775 | >> | Threads_connected | 5323 | >> | Threads_created | 4846609 | >> | Threads_running | 5 | >> +-------------------------+---------+ >> - Workers create hungreds of conections that’s remains open until DB >> cleares them (900 sec) >> >> >> >> Eugene Bacal >> >> >> >> Workload Automation Engineer >> Scheduling Platform Experience >> Cell: 4802028764 >> Slack: @ebacal >> DL: dl-pp-cdp-spx@ paypal. com ( dl-pp-cdp-...@paypal.com ) >> >> >> >> Scottsdale, Arizona >> >> >> >> On 8/21/19, 2:36 PM, "Bacal, Eugene" < ebacal@ paypal. com. INVALID ( >> eba...@paypal.com.INVALID ) > wrote: >> >> >> >> Hi Max, >> >> >> >> We have ran few testing today from DB side and noticed that: >> >> >> >> - DB connections are not being re-used >> - Connections stay open while active only 5: >> mysql> show global status like 'Thread%'; >> +-------------------------+---------+ >> | Variable_name | Value | >> +-------------------------+---------+ >> | Threadpool_idle_threads | 0 | >> | Threadpool_threads | 0 | >> | Threads_cached | 775 | >> | Threads_connected | 5323 | >> | Threads_created | 4846609 | >> | Threads_running | 5 | >> +-------------------------+---------+ >> - Workers create hungreds of conections that’s remains open until DB >> cleares them (900 sec) >> >> >> >> Seems that similar cases were reported some time back(2016), >> >> >> >> https:/ / nam03. safelinks. protection. outlook. com/ >> ?url=https%3A%2F%2Fissues. >> apache. org%2Fjira%2Fbrowse%2FAIRFLOW-680&data=01%7C01%7Cebacal%40paypal. >> com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=7pjGdmuyIuI4GRHPPa7LH3%2Bkq6TVeZJjrMFQf0HHuZg%3D&reserved=0 >> ( >> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-680&data=01%7C01%7Cebacal%40paypal.com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=7pjGdmuyIuI4GRHPPa7LH3%2Bkq6TVeZJjrMFQf0HHuZg%3D&reserved=0 >> ) >> >> >> >> On 8/20/19, 5:19 PM, "Bacal, Eugene" < ebacal@ paypal. com. INVALID ( >> eba...@paypal.com.INVALID ) > wrote: >> >> >> >> Celery executor. >> 12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box >> >> >> >> Worker .cfg: >> [core] >> sql_alchemy_pool_size = 5 >> sql_alchemy_pool_recycle = 900 >> sql_alchemy_reconnect_timeout = 300 >> parallelism = 1200 >> dag_concurrency = 800 >> non_pooled_task_slot_count = 1200 >> max_active_runs_per_dag = 10 >> dagbag_import_timeout = 30 >> [celery] >> worker_concurrency = 100 >> >> >> >> Scheduler .cfg: >> [core] >> sql_alchemy_pool_size = 30 >> sql_alchemy_pool_recycle = 300 >> sql_alchemy_reconnect_timeout = 300 >> parallelism = 1200 >> dag_concurrency = 800 >> non_pooled_task_slot_count = 1200 >> max_active_runs_per_dag = 10 >> [scheduler] >> job_heartbeat_sec = 5 >> scheduler_heartbeat_sec = 5 >> run_duration = 1800 >> min_file_process_interval = 10 >> min_file_parsing_loop_time = 1 >> dag_dir_list_interval = 300 >> print_stats_interval = 30 >> scheduler_zombie_task_threshold = 300 >> max_tis_per_query = 1024 >> max_threads = 29 >> >> >> >> From workers I see 350 + connections at the start time, then it drops to >> 200 and then to 1-10 once tasks complete >> From Scheduler very low 1-10: >> MySQL connections: >> 331 worker1 >> 215 worker2 >> 349 worker53 >> 335 worker54 >> 347 worker55 >> 336 worker56 >> 336 worker57 >> 354 worker58 >> 339 worker59 >> 328 worker60 >> 333 worker61 >> 337 worker62 >> 2 scheduler >> >> >> >> - Eugene >> >> >> >> On 8/20/19, 8:51 AM, "Maxime Beauchemin" < maximebeauchemin@ gmail. com ( >> maximebeauche...@gmail.com ) > wrote: >> >> >> >> Delay between tasks could be due to not having enough worker slots. What >> type of executor are you using, how is it configured? >> >> >> >> Max >> >> >> >> On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene >> < ebacal@ paypal. com. invalid ( eba...@paypal.com.invalid ) > >> wrote: >> >> >>> >>> >>> Absolutely possible, Daniel, >>> >>> >>> >>> We are looking in all directions. Has anyone noticed >>> >>> >> >> >> >> performance >> >> >>> >>> >>> improvements with PostgreSQL vs MySQL ? >>> >>> >>> >>> -Eugene >>> >>> >>> >>> On 8/15/19, 2:03 PM, "Daniel Standish" < dpstandish@ gmail. com ( >>> dpstand...@gmail.com ) > >>> >>> >> >> >> >> wrote: >> >> >>> >>> >>> It's not just webserver and scheduler that will parse >>> >>> >> >> >> >> your dag file. >> >> >>> >>> >>> During the execution of a dag run, dag file will be >>> >>> >> >> >> >> re-parsed at the >> >> >>> >>> >>> start >>> of every task instance. If you have 1000 tasks running >>> >>> >> >> >> >> in short >> >> >>> >>> >>> period of >>> time, that's 1000 queries. It's possible these queries >>> >>> >> >> >> >> are piling up >> >> >>> >>> >>> in a >>> queue on your database. Dag read time has to be very >>> >>> >> >> >> >> fast for this >> >> >>> >>> >>> reason. >>> >>> >>> >>> On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene >>> < ebacal@ paypal. com. invalid ( eba...@paypal.com.invalid ) > >>> wrote: >>> >>> >>>> >>>> >>>> Thank you for your reply, Max >>>> >>>> >>>> >>>> Dynamic DAGs query the database for tables and >>>> >>>> >>> >>> >> >> >> >> generates DAGs and >> >> >>> >>> >>> tasks >>> >>> >>>> >>>> >>>> based on the output. >>>> For Python does not take much to execute: >>>> >>>> >>>> >>>> Dynamic - 500 tasks: >>>> time python PPAD_OIS_MASTER_IDI. py ( http://ppad_ois_master_idi.py/ ) >>>> [2019-08-15 12:57:48,522] { settings. py:174 ( http://settings.py:174/ ) } >>>> INFO - setting.configure_orm(): Using pool settings. >>>> >>>> >>> >>> >> >> >> >> pool_size=30, >> >> >>> >>> >>> pool_recycle=300 >>> >>> >>>> >>>> >>>> real 0m1.830s >>>> user 0m1.622s >>>> sys 0m0.188s >>>> >>>> >>>> >>>> Static - 100 tasks: >>>> time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8. py ( >>>> http://ppad_ops_canary_connections_test_8.py/ ) >>>> [2019-08-15 12:59:24,959] { settings. py:174 ( http://settings.py:174/ ) } >>>> INFO - setting.configure_orm(): Using pool settings. >>>> >>>> >>> >>> >> >> >> >> pool_size=30, >> >> >>> >>> >>> pool_recycle=300 >>> >>> >>>> >>>> >>>> real 0m1.009s >>>> user 0m0.898s >>>> sys 0m0.108s >>>> >>>> >>>> >>>> We have 44 DAGs with 1003 Dynamic tasks. Parsing in >>>> >>>> >>> >>> >> >> >> >> quite time: >> >> >>> >>>> >>>> >>>> DagBag parsing time: 3.9385959999999995 >>>> >>>> >>>> >>>> Parsing in time of execution, when scheduler submits >>>> >>>> >>> >>> >> >> >> >> the DAGs: >> >> >>> >>>> >>>> >>>> DagBag parsing time: 99.820316 >>>> >>>> >>>> >>>> Delay between the task run inside a single DAG grow >>>> >>>> >>> >>> >> >> >> >> from 30 sec to >> >> >>> >>> >>> 10 min, >>> >>> >>>> >>>> >>>> then it drops back even thou tasks are runnign. >>>> >>>> >>>> >>>> Eugene >>>> >>>> >>>> >>>> On 8/15/19, 11:52 AM, "Maxime Beauchemin" < >>>> >>>> >>> >>> >>> >>> maximebeauchemin@ gmail. com ( maximebeauche...@gmail.com ) > >>> >>> >>>> >>>> >>>> wrote: >>>> >>>> >>>> >>>> What is your dynamic DAG doing? How long does it >>>> >>>> >>> >>> >> >> >> >> take to execute >> >> >>> >>> >>> it >>> >>> >>>> >>>> >>>> just as >>>> a python script (`time python mydag. py ( http://mydag.py/ ) `)? >>>> >>>> >>>> >>>> As an Airflow admin, people may want to lower the >>>> >>>> >>> >>> >> >> >> >> DAG parsing >> >> >>> >>> >>> timeout >>> >>> >>>> >>>> >>>> configuration key to force people to not do crazy >>>> >>>> >>> >>> >> >> >> >> thing in DAG >> >> >>> >>> >>> module >>> >>> >>>> >>>> >>>> scope. At some point at Airbnb we had someone >>>> >>>> >>> >>> >> >> >> >> running a Hive >> >> >>> >>> >>> query in >>> >>> >>>> >>>> >>>> DAG >>>> scope, clearly that needs to be prevented. >>>> >>>> >>>> >>>> Loading DAGs by calling a database can bring all >>>> >>>> >>> >>> >> >> >> >> sorts of >> >> >>> >>> >>> surprises >>> >>> >>>> >>>> >>>> that >>>> can drive everyone crazy. As mentioned in a recent >>>> >>>> >>> >>> >> >> >> >> post, >> >> >>> >>>> >>>> >>>> repo-contained, >>>> deterministic "less dynamic" DAGs are great, >>>> >>>> >>> >>> >> >> >> >> because they are >> >> >>> >>>> >>>> >>>> self-contained and allow you to use source-control >>>> >>>> >>> >>> >> >> >> >> properly >> >> >>> >>> >>> (revert a >>> >>> >>>> >>>> >>>> bad >>>> change for instance). That may mean having a >>>> >>>> >>> >>> >> >> >> >> process or script >> >> >>> >>> >>> that >>> >>> >>>> >>>> >>>> compiles external things that are dynamic into >>>> >>>> >>> >>> >> >> >> >> things like yaml >> >> >>> >>> >>> files >>> >>> >>>> >>>> >>>> checked into the code repo. Things as simple as >>>> >>>> >>> >>> >> >> >> >> parsing duration >> >> >>> >>> >>> become >>> >>> >>>> >>>> >>>> more predictable (network latency and database >>>> >>>> >>> >>> >> >> >> >> load are not part >> >> >>> >>> >>> of >>> >>> >>>> >>>> >>>> that >>>> equation), but more importantly, all changes >>>> >>>> >>> >>> >> >> >> >> become tracked in >> >> >>> >>> >>> the code >>> >>> >>>> >>>> >>>> repo. >>>> >>>> >>>> >>>> yaml parsing in python can be pretty slow too, and >>>> >>>> >>> >>> >> >> >> >> there are >> >> >>> >>> >>> solutions >>> >>> >>>> >>>> >>>> / >>>> alternatives there. Hocon is great. Also >>>> >>>> >>> >>> >> >> >> >> C-accelerated yaml is >> >> >>> >>>> >>>> >>>> possible: >>>> >>>> >>> >>> >> >> >> >> https:/ / nam03. safelinks. protection. outlook. com/ >> ?url=https%3A%2F%2Fstackoverflow. >> com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&data=01%7C01%7Cebacal%40paypal. >> com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=4T1xq7QBztFw24GvJEL0kR8%2BNFmDEtnSfDLtgWL9f%2BU%3D&reserved=0 >> ( >> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&data=01%7C01%7Cebacal%40paypal.com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&sdata=4T1xq7QBztFw24GvJEL0kR8%2BNFmDEtnSfDLtgWL9f%2BU%3D&reserved=0 >> ) >> >> >>> >>>> >>>> >>>> Max >>>> >>>> >>>> >>>> On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene >>>> < ebacal@ paypal. com. invalid ( eba...@paypal.com.invalid ) > >>>> wrote: >>>> >>>> >>>>> >>>>> >>>>> Hello Airflow team, >>>>> >>>>> >>>>> >>>>> Please advise if you can. In our environment, we >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> have noticed >> >> >>> >>> >>> that >>> >>> >>>> >>>> >>>> dynamic >>>> >>>> >>>>> >>>>> >>>>> tasks place quite of stress on scheduler, >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> webserver and >> >> >>> >>> >>> increase >>> >>> >>>> >>>> >>>> MySQL DB >>>> >>>> >>>>> >>>>> >>>>> connections. >>>>> We are run about 1000 Dynamic Tasks every 30 min >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> and parsing >> >> >>> >>> >>> time >>> >>> >>>> >>>>> >>>>> >>>>> increases from 5 to 65 sec with Runtime from >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> 2sec to 350+ . >> >> >>> >>> >>> This >>> >>> >>>> >>>> >>>> happens at >>>> >>>> >>>>> >>>>> >>>>> execution time then it drops to normal while >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> still executing >> >> >>> >>> >>> tasks. >>> >>> >>>> >>>>> >>>>> >>>>> Webserver hangs for few minutes. >>>>> >>>>> >>>>> >>>>> Airflow 1.10.1. >>>>> MySQL DB >>>>> >>>>> >>>>> >>>>> Example: >>>>> >>>>> >>>>> >>>>> Dynamic Tasks: >>>>> Number of DAGs: 44 >>>>> Total task number: 950 >>>>> DagBag parsing time: 65.879642000000001 >>>>> >>>>> >>>>> >>>>> Static Tasks: >>>>> Number of DAGs: 73 >>>>> Total task number: 1351 >>>>> DagBag parsing time: 1.731088 >>>>> >>>>> >>>>> >>>>> Is this something you aware of? Any advises on >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >> Dynamic tasks >> >> >>> >>>> >>>>> >>>>> >>>>> optimization/best practices? >>>>> >>>>> >>>>> >>>>> Thank you in advance, >>>>> Eugene >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > >