Wanted to ask for an advise...

Because I run 1300 task every 30 min in parallel such scenario creates a lot of 
DB connections that freezes the UI. 
I would like to segregate result_backend and Airflow DB.

Current setup:
broker_url = rabbitmq
result_backend = mysqldb
sql_alchemy_conn = mysqldb

New setup:
broker_url = redis
result_backend = redis
sql_alchemy_conn = mysqldb


I have tested it and see 2x reduction connection wise. But is there something 
else I maybe missing, since Airflow does not recommend to use redis as 
result_backend 

Please advise,
Eugene 

 

On 8/29/19, 4:35 PM, "Bacal, Eugene" <eba...@paypal.com.INVALID> wrote:

    Thank you, Daniel! Sorry for spamming. 
    
    
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F57718658%2Fairflow-celery-workers-too-many-mysql-connections&amp;data=01%7C01%7Cebacal%40paypal.com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=YcjJBkRhpHzT1UFSwdJabCDMAdKvTDu6eujQOB4dPhk%3D&amp;reserved=0
    
    
    
    On 8/29/19, 1:14 PM, "Daniel Standish" <dpstand...@gmail.com> wrote:
    
        Eugene
        
        Why don't you create a stack overflow post and give us the link?
        
        That is probably a better way to help you through this.
        
        We will need to see what exactly you are doing in your dag files and
        potentially also hooks / operators.
        
        Thanks
        
        
        On Thu, Aug 29, 2019 at 10:41 AM Bacal, Eugene 
<eba...@paypal.com.invalid>
        wrote:
        
        > Can someone advise if this is expecting behavior, please?
        >
        > - DB connections are not being re-used
        >         - Connections stay open while active only 5:
        >                 mysql>  show global status like 'Thread%';
        >                 +-------------------------+---------+
        >                 | Variable_name           | Value   |
        >                 +-------------------------+---------+
        >                 | Threadpool_idle_threads | 0       |
        >                 | Threadpool_threads      | 0       |
        >                 | Threads_cached          | 775     |
        >                 | Threads_connected       | 5323    |
        >                 | Threads_created         | 4846609 |
        >                 | Threads_running         | 5       |
        >                 +-------------------------+---------+
        >         - Workers create hungreds of conections that’s remains open 
until
        > DB cleares them (900 sec)
        >
        >
        >
        > Eugene Bacal
        >
        > Workload Automation Engineer
        > Scheduling Platform Experience
        > Cell: 4802028764
        > Slack: @ebacal
        > DL: dl-pp-cdp-...@paypal.com
        >
        >
        > Scottsdale, Arizona
        >
        >
        > On 8/21/19, 2:36 PM, "Bacal, Eugene" <eba...@paypal.com.INVALID> 
wrote:
        >
        >     Hi Max,
        >
        >     We have ran few testing today from DB side and noticed that:
        >
        >         - DB connections are not being re-used
        >         - Connections stay open while active only 5:
        >                 mysql>  show global status like 'Thread%';
        >                 +-------------------------+---------+
        >                 | Variable_name           | Value   |
        >                 +-------------------------+---------+
        >                 | Threadpool_idle_threads | 0       |
        >                 | Threadpool_threads      | 0       |
        >                 | Threads_cached          | 775     |
        >                 | Threads_connected       | 5323    |
        >                 | Threads_created         | 4846609 |
        >                 | Threads_running         | 5       |
        >                 +-------------------------+---------+
        >         - Workers create hungreds of conections that’s remains open 
until
        > DB cleares them (900 sec)
        >
        >     Seems that similar cases were reported some time back(2016),
        >
        > 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-680&amp;data=01%7C01%7Cebacal%40paypal.com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=7pjGdmuyIuI4GRHPPa7LH3%2Bkq6TVeZJjrMFQf0HHuZg%3D&amp;reserved=0
        >
        >
        >     On 8/20/19, 5:19 PM, "Bacal, Eugene" <eba...@paypal.com.INVALID>
        > wrote:
        >
        >         Celery executor.
        >         12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box
        >
        >         Worker .cfg:
        >         [core]
        >         sql_alchemy_pool_size = 5
        >         sql_alchemy_pool_recycle = 900
        >         sql_alchemy_reconnect_timeout = 300
        >         parallelism = 1200
        >         dag_concurrency = 800
        >         non_pooled_task_slot_count = 1200
        >         max_active_runs_per_dag = 10
        >         dagbag_import_timeout = 30
        >         [celery]
        >         worker_concurrency = 100
        >
        >         Scheduler .cfg:
        >         [core]
        >         sql_alchemy_pool_size = 30
        >         sql_alchemy_pool_recycle = 300
        >         sql_alchemy_reconnect_timeout = 300
        >         parallelism = 1200
        >         dag_concurrency = 800
        >         non_pooled_task_slot_count = 1200
        >         max_active_runs_per_dag = 10
        >         [scheduler]
        >         job_heartbeat_sec = 5
        >         scheduler_heartbeat_sec = 5
        >         run_duration = 1800
        >         min_file_process_interval = 10
        >         min_file_parsing_loop_time = 1
        >         dag_dir_list_interval = 300
        >         print_stats_interval = 30
        >         scheduler_zombie_task_threshold = 300
        >         max_tis_per_query = 1024
        >         max_threads = 29
        >
        >
        >         From workers I see 350 + connections at the start time, then 
it
        > drops to 200 and then to 1-10 once tasks complete
        >         From Scheduler very low 1-10:
        >         MySQL connections:
        >             331 worker1
        >             215 worker2
        >             349 worker53
        >             335 worker54
        >             347 worker55
        >             336 worker56
        >             336 worker57
        >             354 worker58
        >             339 worker59
        >             328 worker60
        >             333 worker61
        >             337 worker62
        >              2 scheduler
        >
        >
        >
        >         - Eugene
        >
        >
        >         On 8/20/19, 8:51 AM, "Maxime Beauchemin" <
        > maximebeauche...@gmail.com> wrote:
        >
        >             Delay between tasks could be due to not having enough 
worker
        > slots. What
        >             type of executor are you using, how is it configured?
        >
        >             Max
        >
        >             On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene
        > <eba...@paypal.com.invalid>
        >             wrote:
        >
        >             > Absolutely possible, Daniel,
        >             >
        >             > We are looking in all directions. Has anyone noticed
        > performance
        >             > improvements with PostgreSQL vs MySQL ?
        >             >
        >             > -Eugene
        >             >
        >             >
        >             > On 8/15/19, 2:03 PM, "Daniel Standish" 
<dpstand...@gmail.com>
        > wrote:
        >             >
        >             >     It's not just webserver and scheduler that will 
parse
        > your dag file.
        >             >     During the execution of a dag run, dag file will be
        > re-parsed at the
        >             > start
        >             >     of every task instance.  If you have 1000 tasks 
running
        > in short
        >             > period of
        >             >     time, that's 1000 queries.  It's possible these 
queries
        > are piling up
        >             > in a
        >             >     queue on your database.  Dag read time has to be 
very
        > fast for this
        >             > reason.
        >             >
        >             >
        >             >
        >             >     On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene
        >             > <eba...@paypal.com.invalid>
        >             >     wrote:
        >             >
        >             >     >
        >             >     > Thank you for your reply, Max
        >             >     >
        >             >     > Dynamic DAGs query the database for tables and
        > generates DAGs and
        >             > tasks
        >             >     > based on the output.
        >             >     > For Python does not take much to execute:
        >             >     >
        >             >     > Dynamic - 500 tasks:
        >             >     > time python PPAD_OIS_MASTER_IDI.py
        >             >     > [2019-08-15 12:57:48,522] {settings.py:174} INFO -
        >             >     > setting.configure_orm(): Using pool settings.
        > pool_size=30,
        >             > pool_recycle=300
        >             >     > real    0m1.830s
        >             >     > user    0m1.622s
        >             >     > sys     0m0.188s
        >             >     >
        >             >     >
        >             >     > Static - 100 tasks:
        >             >     > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py
        >             >     > [2019-08-15 12:59:24,959] {settings.py:174} INFO -
        >             >     > setting.configure_orm(): Using pool settings.
        > pool_size=30,
        >             > pool_recycle=300
        >             >     > real    0m1.009s
        >             >     > user    0m0.898s
        >             >     > sys     0m0.108s
        >             >     >
        >             >     >
        >             >     > We have 44 DAGs with 1003 Dynamic tasks. Parsing 
in
        > quite time:
        >             >     > DagBag parsing time: 3.9385959999999995
        >             >     >
        >             >     > Parsing in time of execution, when scheduler 
submits
        > the DAGs:
        >             >     > DagBag parsing time: 99.820316
        >             >     >
        >             >     > Delay between the task run inside a single DAG 
grow
        > from 30 sec to
        >             > 10 min,
        >             >     > then it drops back even thou tasks are runnign.
        >             >     >
        >             >     > Eugene
        >             >     >
        >             >     >
        >             >     >
        >             >     >
        >             >     >
        >             >     > On 8/15/19, 11:52 AM, "Maxime Beauchemin" <
        >             > maximebeauche...@gmail.com>
        >             >     > wrote:
        >             >     >
        >             >     >     What is your dynamic DAG doing? How long does 
it
        > take to execute
        >             > it
        >             >     > just as
        >             >     >     a python script (`time python mydag.py`)?
        >             >     >
        >             >     >     As an Airflow admin, people may want to lower 
the
        > DAG parsing
        >             > timeout
        >             >     >     configuration key to force people to not do 
crazy
        > thing in DAG
        >             > module
        >             >     >     scope. At some point at Airbnb we had someone
        > running a Hive
        >             > query in
        >             >     > DAG
        >             >     >     scope, clearly that needs to be prevented.
        >             >     >
        >             >     >     Loading DAGs by calling a database can bring 
all
        > sorts of
        >             > surprises
        >             >     > that
        >             >     >     can drive everyone crazy. As mentioned in a 
recent
        > post,
        >             >     > repo-contained,
        >             >     >     deterministic "less dynamic" DAGs are great,
        > because they are
        >             >     >     self-contained and allow you to use 
source-control
        > properly
        >             > (revert a
        >             >     > bad
        >             >     >     change for instance). That may mean having a
        > process or script
        >             > that
        >             >     >     compiles external things that are dynamic into
        > things like yaml
        >             > files
        >             >     >     checked into the code repo. Things as simple 
as
        > parsing duration
        >             > become
        >             >     >     more predictable (network latency and database
        > load are not part
        >             > of
        >             >     > that
        >             >     >     equation), but more importantly, all changes
        > become tracked in
        >             > the code
        >             >     >     repo.
        >             >     >
        >             >     >     yaml parsing in python can be pretty slow 
too, and
        > there are
        >             > solutions
        >             >     > /
        >             >     >     alternatives there. Hocon is great. Also
        > C-accelerated yaml is
        >             >     > possible:
        >             >     >
        >             >     >
        >             >
        > 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&amp;data=01%7C01%7Cebacal%40paypal.com%7C1af3b2b9d588408e315508d72cd99b17%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=4T1xq7QBztFw24GvJEL0kR8%2BNFmDEtnSfDLtgWL9f%2BU%3D&amp;reserved=0
        >             >     >
        >             >     >     Max
        >             >     >
        >             >     >     On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene
        >             >     > <eba...@paypal.com.invalid>
        >             >     >     wrote:
        >             >     >
        >             >     >     > Hello Airflow team,
        >             >     >     >
        >             >     >     > Please advise if you can. In our 
environment, we
        > have noticed
        >             > that
        >             >     > dynamic
        >             >     >     > tasks place quite of stress on scheduler,
        > webserver and
        >             > increase
        >             >     > MySQL DB
        >             >     >     > connections.
        >             >     >     > We are run about 1000 Dynamic Tasks every 
30 min
        > and parsing
        >             > time
        >             >     >     > increases from 5 to 65 sec with Runtime from
        > 2sec to 350+ .
        >             > This
        >             >     > happens at
        >             >     >     > execution time then it drops to normal while
        > still executing
        >             > tasks.
        >             >     >     > Webserver hangs for few minutes.
        >             >     >     >
        >             >     >     > Airflow 1.10.1.
        >             >     >     > MySQL DB
        >             >     >     >
        >             >     >     > Example:
        >             >     >     >
        >             >     >     > Dynamic Tasks:
        >             >     >     > Number of DAGs: 44
        >             >     >     > Total task number: 950
        >             >     >     > DagBag parsing time: 65.879642000000001
        >             >     >     >
        >             >     >     > Static Tasks:
        >             >     >     > Number of DAGs: 73
        >             >     >     > Total task number: 1351
        >             >     >     > DagBag parsing time: 1.731088
        >             >     >     >
        >             >     >     > Is this something you aware of? Any advises 
on
        > Dynamic tasks
        >             >     >     > optimization/best practices?
        >             >     >     >
        >             >     >     > Thank you in advance,
        >             >     >     > Eugene
        >             >     >     >
        >             >     >     >
        >             >     >     >
        >             >     >
        >             >     >
        >             >     >
        >             >
        >             >
        >             >
        >
        >
        >
        >
        >
        >
        >
        
    
    

Reply via email to