Thank you, Daniel! Sorry for spamming. 

https://stackoverflow.com/questions/57718658/airflow-celery-workers-too-many-mysql-connections



On 8/29/19, 1:14 PM, "Daniel Standish" <dpstand...@gmail.com> wrote:

    Eugene
    
    Why don't you create a stack overflow post and give us the link?
    
    That is probably a better way to help you through this.
    
    We will need to see what exactly you are doing in your dag files and
    potentially also hooks / operators.
    
    Thanks
    
    
    On Thu, Aug 29, 2019 at 10:41 AM Bacal, Eugene <eba...@paypal.com.invalid>
    wrote:
    
    > Can someone advise if this is expecting behavior, please?
    >
    > - DB connections are not being re-used
    >         - Connections stay open while active only 5:
    >                 mysql>  show global status like 'Thread%';
    >                 +-------------------------+---------+
    >                 | Variable_name           | Value   |
    >                 +-------------------------+---------+
    >                 | Threadpool_idle_threads | 0       |
    >                 | Threadpool_threads      | 0       |
    >                 | Threads_cached          | 775     |
    >                 | Threads_connected       | 5323    |
    >                 | Threads_created         | 4846609 |
    >                 | Threads_running         | 5       |
    >                 +-------------------------+---------+
    >         - Workers create hungreds of conections that’s remains open until
    > DB cleares them (900 sec)
    >
    >
    >
    > Eugene Bacal
    >
    > Workload Automation Engineer
    > Scheduling Platform Experience
    > Cell: 4802028764
    > Slack: @ebacal
    > DL: dl-pp-cdp-...@paypal.com
    >
    >
    > Scottsdale, Arizona
    >
    >
    > On 8/21/19, 2:36 PM, "Bacal, Eugene" <eba...@paypal.com.INVALID> wrote:
    >
    >     Hi Max,
    >
    >     We have ran few testing today from DB side and noticed that:
    >
    >         - DB connections are not being re-used
    >         - Connections stay open while active only 5:
    >                 mysql>  show global status like 'Thread%';
    >                 +-------------------------+---------+
    >                 | Variable_name           | Value   |
    >                 +-------------------------+---------+
    >                 | Threadpool_idle_threads | 0       |
    >                 | Threadpool_threads      | 0       |
    >                 | Threads_cached          | 775     |
    >                 | Threads_connected       | 5323    |
    >                 | Threads_created         | 4846609 |
    >                 | Threads_running         | 5       |
    >                 +-------------------------+---------+
    >         - Workers create hungreds of conections that’s remains open until
    > DB cleares them (900 sec)
    >
    >     Seems that similar cases were reported some time back(2016),
    >
    > 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-680&amp;data=01%7C01%7Cebacal%40paypal.com%7C2eecf1748c8c404051d808d72cbd79e2%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=HTn30LjNyTnZHULHu3eGJ9Gevo1UAZ%2BkM9qb8a8lEEY%3D&amp;reserved=0
    >
    >
    >     On 8/20/19, 5:19 PM, "Bacal, Eugene" <eba...@paypal.com.INVALID>
    > wrote:
    >
    >         Celery executor.
    >         12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box
    >
    >         Worker .cfg:
    >         [core]
    >         sql_alchemy_pool_size = 5
    >         sql_alchemy_pool_recycle = 900
    >         sql_alchemy_reconnect_timeout = 300
    >         parallelism = 1200
    >         dag_concurrency = 800
    >         non_pooled_task_slot_count = 1200
    >         max_active_runs_per_dag = 10
    >         dagbag_import_timeout = 30
    >         [celery]
    >         worker_concurrency = 100
    >
    >         Scheduler .cfg:
    >         [core]
    >         sql_alchemy_pool_size = 30
    >         sql_alchemy_pool_recycle = 300
    >         sql_alchemy_reconnect_timeout = 300
    >         parallelism = 1200
    >         dag_concurrency = 800
    >         non_pooled_task_slot_count = 1200
    >         max_active_runs_per_dag = 10
    >         [scheduler]
    >         job_heartbeat_sec = 5
    >         scheduler_heartbeat_sec = 5
    >         run_duration = 1800
    >         min_file_process_interval = 10
    >         min_file_parsing_loop_time = 1
    >         dag_dir_list_interval = 300
    >         print_stats_interval = 30
    >         scheduler_zombie_task_threshold = 300
    >         max_tis_per_query = 1024
    >         max_threads = 29
    >
    >
    >         From workers I see 350 + connections at the start time, then it
    > drops to 200 and then to 1-10 once tasks complete
    >         From Scheduler very low 1-10:
    >         MySQL connections:
    >             331 worker1
    >             215 worker2
    >             349 worker53
    >             335 worker54
    >             347 worker55
    >             336 worker56
    >             336 worker57
    >             354 worker58
    >             339 worker59
    >             328 worker60
    >             333 worker61
    >             337 worker62
    >              2 scheduler
    >
    >
    >
    >         - Eugene
    >
    >
    >         On 8/20/19, 8:51 AM, "Maxime Beauchemin" <
    > maximebeauche...@gmail.com> wrote:
    >
    >             Delay between tasks could be due to not having enough worker
    > slots. What
    >             type of executor are you using, how is it configured?
    >
    >             Max
    >
    >             On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene
    > <eba...@paypal.com.invalid>
    >             wrote:
    >
    >             > Absolutely possible, Daniel,
    >             >
    >             > We are looking in all directions. Has anyone noticed
    > performance
    >             > improvements with PostgreSQL vs MySQL ?
    >             >
    >             > -Eugene
    >             >
    >             >
    >             > On 8/15/19, 2:03 PM, "Daniel Standish" 
<dpstand...@gmail.com>
    > wrote:
    >             >
    >             >     It's not just webserver and scheduler that will parse
    > your dag file.
    >             >     During the execution of a dag run, dag file will be
    > re-parsed at the
    >             > start
    >             >     of every task instance.  If you have 1000 tasks running
    > in short
    >             > period of
    >             >     time, that's 1000 queries.  It's possible these queries
    > are piling up
    >             > in a
    >             >     queue on your database.  Dag read time has to be very
    > fast for this
    >             > reason.
    >             >
    >             >
    >             >
    >             >     On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene
    >             > <eba...@paypal.com.invalid>
    >             >     wrote:
    >             >
    >             >     >
    >             >     > Thank you for your reply, Max
    >             >     >
    >             >     > Dynamic DAGs query the database for tables and
    > generates DAGs and
    >             > tasks
    >             >     > based on the output.
    >             >     > For Python does not take much to execute:
    >             >     >
    >             >     > Dynamic - 500 tasks:
    >             >     > time python PPAD_OIS_MASTER_IDI.py
    >             >     > [2019-08-15 12:57:48,522] {settings.py:174} INFO -
    >             >     > setting.configure_orm(): Using pool settings.
    > pool_size=30,
    >             > pool_recycle=300
    >             >     > real    0m1.830s
    >             >     > user    0m1.622s
    >             >     > sys     0m0.188s
    >             >     >
    >             >     >
    >             >     > Static - 100 tasks:
    >             >     > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py
    >             >     > [2019-08-15 12:59:24,959] {settings.py:174} INFO -
    >             >     > setting.configure_orm(): Using pool settings.
    > pool_size=30,
    >             > pool_recycle=300
    >             >     > real    0m1.009s
    >             >     > user    0m0.898s
    >             >     > sys     0m0.108s
    >             >     >
    >             >     >
    >             >     > We have 44 DAGs with 1003 Dynamic tasks. Parsing in
    > quite time:
    >             >     > DagBag parsing time: 3.9385959999999995
    >             >     >
    >             >     > Parsing in time of execution, when scheduler submits
    > the DAGs:
    >             >     > DagBag parsing time: 99.820316
    >             >     >
    >             >     > Delay between the task run inside a single DAG grow
    > from 30 sec to
    >             > 10 min,
    >             >     > then it drops back even thou tasks are runnign.
    >             >     >
    >             >     > Eugene
    >             >     >
    >             >     >
    >             >     >
    >             >     >
    >             >     >
    >             >     > On 8/15/19, 11:52 AM, "Maxime Beauchemin" <
    >             > maximebeauche...@gmail.com>
    >             >     > wrote:
    >             >     >
    >             >     >     What is your dynamic DAG doing? How long does it
    > take to execute
    >             > it
    >             >     > just as
    >             >     >     a python script (`time python mydag.py`)?
    >             >     >
    >             >     >     As an Airflow admin, people may want to lower the
    > DAG parsing
    >             > timeout
    >             >     >     configuration key to force people to not do crazy
    > thing in DAG
    >             > module
    >             >     >     scope. At some point at Airbnb we had someone
    > running a Hive
    >             > query in
    >             >     > DAG
    >             >     >     scope, clearly that needs to be prevented.
    >             >     >
    >             >     >     Loading DAGs by calling a database can bring all
    > sorts of
    >             > surprises
    >             >     > that
    >             >     >     can drive everyone crazy. As mentioned in a recent
    > post,
    >             >     > repo-contained,
    >             >     >     deterministic "less dynamic" DAGs are great,
    > because they are
    >             >     >     self-contained and allow you to use source-control
    > properly
    >             > (revert a
    >             >     > bad
    >             >     >     change for instance). That may mean having a
    > process or script
    >             > that
    >             >     >     compiles external things that are dynamic into
    > things like yaml
    >             > files
    >             >     >     checked into the code repo. Things as simple as
    > parsing duration
    >             > become
    >             >     >     more predictable (network latency and database
    > load are not part
    >             > of
    >             >     > that
    >             >     >     equation), but more importantly, all changes
    > become tracked in
    >             > the code
    >             >     >     repo.
    >             >     >
    >             >     >     yaml parsing in python can be pretty slow too, and
    > there are
    >             > solutions
    >             >     > /
    >             >     >     alternatives there. Hocon is great. Also
    > C-accelerated yaml is
    >             >     > possible:
    >             >     >
    >             >     >
    >             >
    > 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&amp;data=01%7C01%7Cebacal%40paypal.com%7C2eecf1748c8c404051d808d72cbd79e2%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=TE8mFWO6z5p0aNT0rFV7uEKI7%2FGn6oY1QHmle3Fg%2Fbk%3D&amp;reserved=0
    >             >     >
    >             >     >     Max
    >             >     >
    >             >     >     On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene
    >             >     > <eba...@paypal.com.invalid>
    >             >     >     wrote:
    >             >     >
    >             >     >     > Hello Airflow team,
    >             >     >     >
    >             >     >     > Please advise if you can. In our environment, we
    > have noticed
    >             > that
    >             >     > dynamic
    >             >     >     > tasks place quite of stress on scheduler,
    > webserver and
    >             > increase
    >             >     > MySQL DB
    >             >     >     > connections.
    >             >     >     > We are run about 1000 Dynamic Tasks every 30 min
    > and parsing
    >             > time
    >             >     >     > increases from 5 to 65 sec with Runtime from
    > 2sec to 350+ .
    >             > This
    >             >     > happens at
    >             >     >     > execution time then it drops to normal while
    > still executing
    >             > tasks.
    >             >     >     > Webserver hangs for few minutes.
    >             >     >     >
    >             >     >     > Airflow 1.10.1.
    >             >     >     > MySQL DB
    >             >     >     >
    >             >     >     > Example:
    >             >     >     >
    >             >     >     > Dynamic Tasks:
    >             >     >     > Number of DAGs: 44
    >             >     >     > Total task number: 950
    >             >     >     > DagBag parsing time: 65.879642000000001
    >             >     >     >
    >             >     >     > Static Tasks:
    >             >     >     > Number of DAGs: 73
    >             >     >     > Total task number: 1351
    >             >     >     > DagBag parsing time: 1.731088
    >             >     >     >
    >             >     >     > Is this something you aware of? Any advises on
    > Dynamic tasks
    >             >     >     > optimization/best practices?
    >             >     >     >
    >             >     >     > Thank you in advance,
    >             >     >     > Eugene
    >             >     >     >
    >             >     >     >
    >             >     >     >
    >             >     >
    >             >     >
    >             >     >
    >             >
    >             >
    >             >
    >
    >
    >
    >
    >
    >
    >
    

Reply via email to