This is in the works but indeed will require some time to stabilize it. I have a test brand for it.
Sent from my iPhone > On 13 jun. 2016, at 19:39, Ben Tallman <[email protected]> wrote: > > In the past, I have written/seen systems where the pattern us that a task > runner/worker is in charge of handling scheduling of the next tasks that > need to run on completion of a task and the Scheduler only handles "issues" > and initial kickoffs... > > Opens another can of worms, but I think I've seen discussion on here of > that idea and I thought I'd put my vote in for that pattern. > > On Mon, Jun 13, 2016 at 10:35 AM harish singh <[email protected]> > wrote: > >> So I changed the scheduler heartbeat to 60 sec >> >> [scheduler] >> job_heartbeat_sec = 5 >> scheduler_heartbeat_sec = 60 >> >> As expected, we did see a spiky cpu utilization. >> >> >> But I see a different problem (not so much of a problem, but just >> putting it here so that it may help someone who may need to do >> something similar) >> >> Suppose I have a DAG with Tasks as below:A-->B-->C >> >> If A starts and completes its execution in 10 secs. >> >> The next task B cannot start until the next scheduler heartbeat. That >> means, I may have to wait roughly around 60 seconds in the worst case. >> >> So one option is to have a heartbeat of around 30 seconds >> (or duration of whatever you believe is your least time consuming >> task). This would just be a rough optimization to make sure we make >> progress soon enough after the end of a task. >> >> >> Thanks,Harish >> >> >> On Mon, Jun 13, 2016 at 9:50 AM, harish singh <[email protected]> >> wrote: >> >>> yup. it is the scheduler process using cpu: >>> >>> >>> The below is the usage with default out-of-box heartbeat settings. >>> I will paste some numbers in something, as I play with airflow.cfg. >>> This is the output of `top` >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> COMMAND >>> * 65 airflow 20 0 445488 112648 13308 R 61.3 0.7 834:21.23 >>> airflow* >>> 1 airflow 20 0 17968 1912 1668 S 0.0 0.0 0:00.04 >>> entrypoint.sh >>> 64 airflow 20 0 407972 74876 13328 S 0.0 0.5 0:01.12 >> airflow >>> 154 airflow 20 0 404404 62528 4344 S 0.0 0.4 0:00.11 >> airflow >>> 155 airflow 20 0 404404 62532 4344 S 0.0 0.4 0:00.11 >> airflow >>> 156 airflow 20 0 404404 62532 4344 S 0.0 0.4 0:00.12 >> airflow >>> 157 airflow 20 0 404404 62532 4344 S 0.0 0.4 0:00.11 >> airflow >>> 158 airflow 20 0 404404 62532 4344 S 0.0 0.4 0:00.11 >> airflow >>> 159 airflow 20 0 404404 62532 4344 S 0.0 0.4 0:00.11 >> airflow >>> 160 airflow 20 0 404404 62488 4300 S 0.0 0.4 0:00.12 >> airflow >>> 161 airflow 20 0 404404 62468 4280 S 0.0 0.4 0:00.11 >> airflow >>> 163 airflow 20 0 50356 16680 5956 S 0.0 0.1 0:08.86 >>> gunicorn: maste >>> 168 airflow 20 0 457204 119940 13080 S 0.0 0.8 0:18.86 >>> gunicorn: worke >>> 170 airflow 20 0 463168 126028 13080 S 0.0 0.8 0:13.81 >>> gunicorn: worke >>> 171 airflow 20 0 464936 127672 13080 S 0.0 0.8 0:08.53 >>> gunicorn: worke >>> 174 airflow 20 0 467460 130240 13080 S 0.0 0.8 0:08.52 >>> gunicorn: worke >>> 7627 airflow 20 0 18208 3192 2672 S 0.0 0.0 0:00.04 bash >>> 8091 airflow 20 0 30312 7556 4784 S 0.0 0.0 0:00.00 >> python >>> 28808 airflow 20 0 18208 3352 2836 S 0.0 0.0 0:00.08 bash >>> 28822 airflow 20 0 19844 2340 2020 R 0.0 0.0 0:00.01 top >>> >>> >>> *PID 65 is the scheduler.* >>> >>> >>> airflow@80be4e775e55:~$ ps -ef >>> UID PID PPID C STIME TTY TIME CMD >>> airflow 1 0 0 Jun12 ? 00:00:00 /bin/bash ./entrypoint.sh >>> airflow 64 1 0 Jun12 ? 00:00:01 /usr/bin/python >>> /usr/local/bin/airflow webserver -p 8080 >>> airflow 65 1 62 Jun12 ? 13:54:22 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 154 65 0 Jun12 ? 00:00:00 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 155 65 0 Jun12 ? 00:00:00 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 156 65 0 Jun12 ? 00:00:00 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 157 65 0 Jun12 ? 00:00:00 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 158 65 0 Jun12 ? 00:00:00 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 159 65 0 Jun12 ? 00:00:00 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 160 65 0 Jun12 ? 00:00:00 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 161 65 0 Jun12 ? 00:00:00 /usr/bin/python >>> /usr/local/bin/airflow scheduler >>> airflow 163 64 0 Jun12 ? 00:00:08 gunicorn: master >>> [airflow.www.app:cached_app()] >>> airflow 168 163 0 Jun12 ? 00:00:18 gunicorn: worker >>> [airflow.www.app:cached_app()] >>> airflow 170 163 0 Jun12 ? 00:00:13 gunicorn: worker >>> [airflow.www.app:cached_app()] >>> airflow 171 163 0 Jun12 ? 00:00:08 gunicorn: worker >>> [airflow.www.app:cached_app()] >>> airflow 174 163 0 Jun12 ? 00:00:08 gunicorn: worker >>> [airflow.www.app:cached_app()] >>> airflow 7627 0 0 Jun12 ? 00:00:00 bash >>> airflow 8091 7627 0 Jun12 ? 00:00:00 python >>> airflow 28808 0 0 16:44 ? 00:00:00 bash >>> airflow 28823 28808 0 16:44 ? 00:00:00 ps -ef >>> >>> On Mon, Jun 13, 2016 at 8:42 AM, Maxime Beauchemin < >>> [email protected]> wrote: >>> >>>> Can you confirm that it's the scheduler process using that CPU? >>>> >>>> The SCHEDULER_HEARTBEAT_SEC configuration defines a minimum duration for >>>> scheduling cycles, where the scheduler evaluates all active DagRun and >>>> attempts to kick off task instances whose dependencies are met. Once the >>>> cycle is done, the scheduler should sleep until the next heartbeat, so >> CPU >>>> should look spiky. >>>> >>>> Max >>>> >>>> On Mon, Jun 13, 2016 at 8:26 AM, harish singh <[email protected] >>> >>>> wrote: >>>> >>>>> Yup, I tried changing the scheduler heartbeat to 60 seconds.. >>>>> Apart from not getting any update for 60 seconds, What are the side >>>> effects >>>>> of changing the two heartbeats? Shouldn't impact performance? >>>>> >>>>> Also, I understand this cpu usage if there are 100s of dags. But with >>>> just >>>>> one active dag, doesnt 70% seem high? Esp in my case where there are >>>> only >>>>> 10 tasks in the dag making only curls (BashOperators). >>>>> >>>>> Also, a side now, in a different environment where we have 10 dags >>>> active, >>>>> the cpu usage stays in the same 70-80% range. >>>>> >>>>> On Mon, Jun 13, 2016, 8:14 AM Maxime Beauchemin < >>>>> [email protected]> >>>>> wrote: >>>>> >>>>>> The scheduler constantly attempts to schedule tasks, interacting >> with >>>> the >>>>>> database and reloading DAG definition. In most larg-ish >> environments, >>>>>> burning up to a CPU to run the scheduler doesn't seem outrageous to >>>> me. >>>>>> >>>>>> If you want to reduce the CPU load related to the scheduler check >> out >>>>>> SCHEDULER_HEARTBEAT_SEC and MAX_THREADS in the scheduler section of >>>>>> `airflow.cfg` >>>>>> >>>>>> Max >>>>>> >>>>>> On Sun, Jun 12, 2016 at 1:24 PM, harish singh < >>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> We are running airflow (for about 3 months now) inside a docker >>>>> container >>>>>>> on aws. >>>>>>> >>>>>>> I just did a docker stats to check whats going on. The cpu >>>> consumption >>>>> is >>>>>>> huge. >>>>>>> We have around 15 DAGS. Only one DAG is turned ON. the remaining >> are >>>>> OFF. >>>>>>> The DAG runs with a HOURLY schedule. >>>>>>> >>>>>>> Right now, airflow is consuming almost 1 complete core. >>>>>>> It seems there is some unnecessary spinning? >>>>>>> This doesnt look like the right behavior. >>>>>>> Is there a bug already filed for this? Or am not sure if there is >>>>>> something >>>>>>> incorrect in the way I am using the airflow configuration. >>>>>>> >>>>>>> CONTAINER CPU % MEM USAGE / LIMIT MEM >> % >>>>>>> NET I/O BLOCK I/O >>>>>>> CCC 68.17% 619.7 MB / 2.147 >> GB >>>>>>> 28.85% 1.408 GB / 939.4 MB 7.856 MB / 0 B >>>>>>> XXX 64.36% 619.4 MB / 2.147 >> GB >>>>>>> 28.84% 1.211 GB / 807.6 MB 7.856 MB / 0 B >>>>>>> >>>>>>> >>>>>>> Ariflow version 1.7.0 >>>>>>> >>>>>>> Airflow config: >>>>>>> >>>>>>> sql_alchemy_pool_size = 5 >>>>>>> sql_alchemy_pool_recycle = 3600 >>>>>>> parallelism = 8 >>>>>>> dag_concurrency = 8 >>>>>>> max_active_runs_per_dag = 8 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Harish >>
