Celery executor.
12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box

Worker .cfg:
[core]
sql_alchemy_pool_size = 5
sql_alchemy_pool_recycle = 900
sql_alchemy_reconnect_timeout = 300
parallelism = 1200
dag_concurrency = 800
non_pooled_task_slot_count = 1200
max_active_runs_per_dag = 10
dagbag_import_timeout = 30
[celery]
worker_concurrency = 100

Scheduler .cfg:
[core]
sql_alchemy_pool_size = 30
sql_alchemy_pool_recycle = 300
sql_alchemy_reconnect_timeout = 300
parallelism = 1200
dag_concurrency = 800
non_pooled_task_slot_count = 1200
max_active_runs_per_dag = 10
[scheduler]
job_heartbeat_sec = 5
scheduler_heartbeat_sec = 5
run_duration = 1800
min_file_process_interval = 10
min_file_parsing_loop_time = 1
dag_dir_list_interval = 300
print_stats_interval = 30
scheduler_zombie_task_threshold = 300
max_tis_per_query = 1024
max_threads = 29


From workers I see 350 + connections at the start time, then it drops to 200 
and then to 1-10 once tasks complete
From Scheduler very low 1-10:
MySQL connections:
    331 worker1
    215 worker2
    349 worker53
    335 worker54
    347 worker55
    336 worker56
    336 worker57
    354 worker58
    339 worker59
    328 worker60
    333 worker61
    337 worker62
     2 scheduler



- Eugene
 

On 8/20/19, 8:51 AM, "Maxime Beauchemin" <maximebeauche...@gmail.com> wrote:

    Delay between tasks could be due to not having enough worker slots. What
    type of executor are you using, how is it configured?
    
    Max
    
    On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene <eba...@paypal.com.invalid>
    wrote:
    
    > Absolutely possible, Daniel,
    >
    > We are looking in all directions. Has anyone noticed performance
    > improvements with PostgreSQL vs MySQL ?
    >
    > -Eugene
    >
    >
    > On 8/15/19, 2:03 PM, "Daniel Standish" <dpstand...@gmail.com> wrote:
    >
    >     It's not just webserver and scheduler that will parse your dag file.
    >     During the execution of a dag run, dag file will be re-parsed at the
    > start
    >     of every task instance.  If you have 1000 tasks running in short
    > period of
    >     time, that's 1000 queries.  It's possible these queries are piling up
    > in a
    >     queue on your database.  Dag read time has to be very fast for this
    > reason.
    >
    >
    >
    >     On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene
    > <eba...@paypal.com.invalid>
    >     wrote:
    >
    >     >
    >     > Thank you for your reply, Max
    >     >
    >     > Dynamic DAGs query the database for tables and generates DAGs and
    > tasks
    >     > based on the output.
    >     > For Python does not take much to execute:
    >     >
    >     > Dynamic - 500 tasks:
    >     > time python PPAD_OIS_MASTER_IDI.py
    >     > [2019-08-15 12:57:48,522] {settings.py:174} INFO -
    >     > setting.configure_orm(): Using pool settings. pool_size=30,
    > pool_recycle=300
    >     > real    0m1.830s
    >     > user    0m1.622s
    >     > sys     0m0.188s
    >     >
    >     >
    >     > Static - 100 tasks:
    >     > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py
    >     > [2019-08-15 12:59:24,959] {settings.py:174} INFO -
    >     > setting.configure_orm(): Using pool settings. pool_size=30,
    > pool_recycle=300
    >     > real    0m1.009s
    >     > user    0m0.898s
    >     > sys     0m0.108s
    >     >
    >     >
    >     > We have 44 DAGs with 1003 Dynamic tasks. Parsing in quite time:
    >     > DagBag parsing time: 3.9385959999999995
    >     >
    >     > Parsing in time of execution, when scheduler submits the DAGs:
    >     > DagBag parsing time: 99.820316
    >     >
    >     > Delay between the task run inside a single DAG grow from 30 sec to
    > 10 min,
    >     > then it drops back even thou tasks are runnign.
    >     >
    >     > Eugene
    >     >
    >     >
    >     >
    >     >
    >     >
    >     > On 8/15/19, 11:52 AM, "Maxime Beauchemin" <
    > maximebeauche...@gmail.com>
    >     > wrote:
    >     >
    >     >     What is your dynamic DAG doing? How long does it take to execute
    > it
    >     > just as
    >     >     a python script (`time python mydag.py`)?
    >     >
    >     >     As an Airflow admin, people may want to lower the DAG parsing
    > timeout
    >     >     configuration key to force people to not do crazy thing in DAG
    > module
    >     >     scope. At some point at Airbnb we had someone running a Hive
    > query in
    >     > DAG
    >     >     scope, clearly that needs to be prevented.
    >     >
    >     >     Loading DAGs by calling a database can bring all sorts of
    > surprises
    >     > that
    >     >     can drive everyone crazy. As mentioned in a recent post,
    >     > repo-contained,
    >     >     deterministic "less dynamic" DAGs are great, because they are
    >     >     self-contained and allow you to use source-control properly
    > (revert a
    >     > bad
    >     >     change for instance). That may mean having a process or script
    > that
    >     >     compiles external things that are dynamic into things like yaml
    > files
    >     >     checked into the code repo. Things as simple as parsing duration
    > become
    >     >     more predictable (network latency and database load are not part
    > of
    >     > that
    >     >     equation), but more importantly, all changes become tracked in
    > the code
    >     >     repo.
    >     >
    >     >     yaml parsing in python can be pretty slow too, and there are
    > solutions
    >     > /
    >     >     alternatives there. Hocon is great. Also C-accelerated yaml is
    >     > possible:
    >     >
    >     >
    > 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&amp;data=01%7C01%7Cebacal%40paypal.com%7C59b81b86012c45cfb4ef08d7258648ce%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=PfChjwvsJ6BuE4GPvsXZk0ab4NLU4P5dZ4c5YWwkl1E%3D&amp;reserved=0
    >     >
    >     >     Max
    >     >
    >     >     On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene
    >     > <eba...@paypal.com.invalid>
    >     >     wrote:
    >     >
    >     >     > Hello Airflow team,
    >     >     >
    >     >     > Please advise if you can. In our environment, we have noticed
    > that
    >     > dynamic
    >     >     > tasks place quite of stress on scheduler, webserver and
    > increase
    >     > MySQL DB
    >     >     > connections.
    >     >     > We are run about 1000 Dynamic Tasks every 30 min and parsing
    > time
    >     >     > increases from 5 to 65 sec with Runtime from 2sec to 350+ .
    > This
    >     > happens at
    >     >     > execution time then it drops to normal while still executing
    > tasks.
    >     >     > Webserver hangs for few minutes.
    >     >     >
    >     >     > Airflow 1.10.1.
    >     >     > MySQL DB
    >     >     >
    >     >     > Example:
    >     >     >
    >     >     > Dynamic Tasks:
    >     >     > Number of DAGs: 44
    >     >     > Total task number: 950
    >     >     > DagBag parsing time: 65.879642000000001
    >     >     >
    >     >     > Static Tasks:
    >     >     > Number of DAGs: 73
    >     >     > Total task number: 1351
    >     >     > DagBag parsing time: 1.731088
    >     >     >
    >     >     > Is this something you aware of? Any advises on Dynamic tasks
    >     >     > optimization/best practices?
    >     >     >
    >     >     > Thank you in advance,
    >     >     > Eugene
    >     >     >
    >     >     >
    >     >     >
    >     >
    >     >
    >     >
    >
    >
    >
    

Reply via email to