teastburn opened a new issue #11365: URL: https://github.com/apache/airflow/issues/11365
<!-- Welcome to Apache Airflow! For a smooth issue process, try to answer the following questions. Don't worry if they're not all applicable; just try to include what you can :-) If you need to include code snippets or logs, please put them in fenced code blocks. If they're super-long, please use the details tag like <details><summary>super-long log</summary> lots of stuff </details> Please delete these comment blocks before submitting the issue. --> <!-- IMPORTANT!!! PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE NEXT TO "SUBMIT NEW ISSUE" BUTTON!!! PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!! Please complete the next sections or the issue will be closed. These questions are the first thing we need to know to understand the context. --> Thanks in advance for your help and work on Airflow. ❤️ **Apache Airflow version**: 1.10.12 **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): Celery **Environment**: - **Cloud provider or hardware configuration**: AWS ECS - **OS** (e.g. from /etc/os-release): Ubuntu 18.04 - **Kernel** (e.g. `uname -a`): 4.15.0 - **DB**: Postgres (AWS RDS) - **Scheduler settings**: max_threads = 10 job_heartbeat_sec = 5 scheduler_heartbeat_sec = 30 run_duration = 600 num_runs = -1 processor_poll_interval = 1 min_file_process_interval = 30 dag_dir_list_interval = 300 scheduler_health_check_threshold = 300 scheduler_zombie_task_threshold = 300 max_tis_per_query = 64 - **Install tools**: conda, pip, ??? - **Others**: **What happened**: After upgrading from 1.8.2 to 1.10.12 we experience ~1-5 scheduler out of memory (OOM) issues per day. The CPU will bottom out and the scheduler will stop scheduling new work. A container restart will bring up a new scheduler which will work until the next OOM. **What you expected to happen**: Scheduler to use a normal amount of CPU & RAM, not exceed MAX_THREADS and to continue schedulering new work. **How to reproduce it**: Let scheduler run for a day. Sorry I don't have much more data. We run ~250 dags across ~25 files and ~5000 tasks per hour. **Similar issues**: https://github.com/apache/airflow/issues/7935 -- we also experience this issue, and they seem related. **Anything else we need to know**: After the upgrade we raised our RAM from ~2GB to ~6GB and we still get this issue. There is no reason our scheduler should need ~6GB of RAM. ![Screen Shot 2020-10-08 at 2 30 26 PM](https://user-images.githubusercontent.com/134710/95515737-f2d68980-0972-11eb-907d-9a5e82492bba.png) Above is an example of the scheduler CPU and RAM during OOM event. Recovery was done manually. OOM logs for Python process (not always the same as this): ``` 2020-10-06T05:34:18.092Z,OSError: [Errno 12] Cannot allocate memory 2020-10-06T05:34:18.092Z," self.pid = os.fork()" 2020-10-06T05:34:18.092Z," File ""/conda/env/lib/python2.7/multiprocessing/forking.py"", line 121, in __init__" 2020-10-06T05:34:18.092Z," self._popen = Popen(self)" 2020-10-06T05:34:18.092Z," File ""/conda/env/lib/python2.7/multiprocessing/process.py"", line 130, in start" 2020-10-06T05:34:18.092Z," self._process.start()" 2020-10-06T05:34:18.092Z," File ""/airflow/airflow/jobs/scheduler_job.py"", line 203, in start" 2020-10-06T05:34:18.092Z," processor.start()" 2020-10-06T05:34:18.092Z," File ""/airflow/airflow/utils/dag_processing.py"", line 1250, in start_new_processes" 2020-10-06T05:34:18.092Z," self.start_new_processes()" 2020-10-06T05:34:18.092Z," File ""/airflow/airflow/utils/dag_processing.py"", line 886, in start" 2020-10-06T05:34:18.091Z," processor_manager.start()" 2020-10-06T05:34:18.091Z," File ""/airflow/airflow/utils/dag_processing.py"", line 634, in _run_processor_manager" 2020-10-06T05:34:18.091Z," self._target(*self._args, **self._kwargs)" 2020-10-06T05:34:18.091Z," File ""/conda/env/lib/python2.7/multiprocessing/process.py"", line 114, in run" 2020-10-06T05:34:18.091Z," self.run()" 2020-10-06T05:34:18.091Z," File ""/conda/env/lib/python2.7/multiprocessing/process.py"", line 267, in _bootstrap" 2020-10-06T05:34:18.091Z,Traceback (most recent call last): 2020-10-06T05:34:18.091Z,Process Process-1: ``` <details><summary>OOM logs from host OS (there were 18 separate oom-killer events, this is one)</summary> ``` [201596.826370] Killed process 73325 (/conda/env/) total-vm:2848568kB, anon-rss:166544kB, file-rss:7928kB, shmem-rss:4kB [201596.826340] Memory cgroup out of memory: Kill process 86085 (/conda/env/) score 33 or sacrifice child [201596.826316] [73474] 0 73474 660268 38726 1187840 0 0 airflow schedul [201596.826314] [73472] 0 73472 660268 39102 1196032 0 0 airflow schedul [201596.826311] [73445] 0 73445 660268 39135 1196032 0 0 airflow schedul [201596.826309] [73436] 0 73436 660268 39184 1196032 0 0 airflow schedul [201596.826308] [73435] 0 73435 660268 39135 1196032 0 0 airflow schedul [201596.826307] [73434] 0 73434 660268 39134 1196032 0 0 airflow schedul [201596.826305] [73410] 0 73410 660268 39752 1204224 0 0 airflow schedul [201596.826304] [73405] 0 73405 712142 42404 1224704 0 0 /conda/env/ [201596.826302] [73404] 0 73404 712142 42404 1224704 0 0 /conda/env/ [201596.826300] [73403] 0 73403 712142 42404 1224704 0 0 /conda/env/ [201596.826299] [73402] 0 73402 712142 42404 1224704 0 0 /conda/env/ [201596.826297] [73401] 0 73401 712142 42404 1224704 0 0 /conda/env/ [201596.826295] [73400] 0 73400 712142 42404 1224704 0 0 /conda/env/ [201596.826293] [73399] 0 73399 712142 42404 1224704 0 0 /conda/env/ [201596.826291] [73398] 0 73398 712142 42404 1224704 0 0 /conda/env/ [201596.826290] [73397] 0 73397 712142 42404 1224704 0 0 /conda/env/ [201596.826287] [73396] 0 73396 712142 42404 1224704 0 0 /conda/env/ [201596.826286] [73395] 0 73395 712142 42404 1224704 0 0 /conda/env/ [201596.826284] [73394] 0 73394 712142 42404 1224704 0 0 /conda/env/ [201596.826282] [73393] 0 73393 712142 42404 1224704 0 0 /conda/env/ [201596.826280] [73392] 0 73392 712142 42404 1224704 0 0 /conda/env/ [201596.826279] [73391] 0 73391 712142 42410 1224704 0 0 /conda/env/ [201596.826277] [73390] 0 73390 712142 42404 1224704 0 0 /conda/env/ [201596.826275] [73389] 0 73389 712142 42404 1224704 0 0 /conda/env/ [201596.826274] [73388] 0 73388 712142 42404 1224704 0 0 /conda/env/ [201596.826272] [73387] 0 73387 712142 42404 1224704 0 0 /conda/env/ [201596.826270] [73386] 0 73386 712142 42404 1224704 0 0 /conda/env/ [201596.826269] [73385] 0 73385 712142 42404 1224704 0 0 /conda/env/ [201596.826266] [73384] 0 73384 712142 42404 1224704 0 0 /conda/env/ [201596.826264] [73383] 0 73383 712142 42404 1224704 0 0 /conda/env/ [201596.826263] [73382] 0 73382 712142 42404 1224704 0 0 /conda/env/ [201596.826261] [73381] 0 73381 712142 42404 1224704 0 0 /conda/env/ [201596.826259] [73380] 0 73380 712142 42404 1224704 0 0 /conda/env/ [201596.826258] [73379] 0 73379 712142 42404 1224704 0 0 /conda/env/ [201596.826256] [73378] 0 73378 712142 42404 1224704 0 0 /conda/env/ [201596.826254] [73377] 0 73377 712142 42404 1224704 0 0 /conda/env/ [201596.826252] [73376] 0 73376 712142 42404 1224704 0 0 /conda/env/ [201596.826250] [73375] 0 73375 712142 42404 1224704 0 0 /conda/env/ [201596.826248] [73374] 0 73374 712142 42404 1224704 0 0 /conda/env/ [201596.826247] [73373] 0 73373 712142 42404 1224704 0 0 /conda/env/ [201596.826245] [73372] 0 73372 712142 42592 1228800 0 0 /conda/env/ [201596.826244] [73371] 0 73371 712142 43187 1236992 0 0 /conda/env/ [201596.826242] [73370] 0 73370 712142 43171 1236992 0 0 /conda/env/ [201596.826240] [73369] 0 73369 712142 43187 1236992 0 0 /conda/env/ [201596.826239] [73368] 0 73368 712142 43013 1236992 0 0 /conda/env/ [201596.826237] [73367] 0 73367 712142 43013 1236992 0 0 /conda/env/ [201596.826236] [73366] 0 73366 712142 43029 1236992 0 0 /conda/env/ [201596.826235] [73365] 0 73365 712142 43029 1236992 0 0 /conda/env/ [201596.826233] [73364] 0 73364 712142 43579 1236992 0 0 /conda/env/ [201596.826232] [73363] 0 73363 712142 43588 1236992 0 0 /conda/env/ [201596.826230] [73362] 0 73362 712142 43579 1236992 0 0 /conda/env/ [201596.826229] [73361] 0 73361 712142 43588 1236992 0 0 /conda/env/ [201596.826227] [73360] 0 73360 712142 43588 1236992 0 0 /conda/env/ [201596.826225] [73359] 0 73359 712142 43619 1236992 0 0 /conda/env/ [201596.826224] [73358] 0 73358 712142 43619 1236992 0 0 /conda/env/ [201596.826222] [73357] 0 73357 712142 43619 1236992 0 0 /conda/env/ [201596.826221] [73356] 0 73356 712142 43619 1236992 0 0 /conda/env/ [201596.826219] [73355] 0 73355 712142 43619 1236992 0 0 /conda/env/ [201596.826217] [73354] 0 73354 712142 43619 1236992 0 0 /conda/env/ [201596.826215] [73353] 0 73353 712142 43619 1236992 0 0 /conda/env/ [201596.826214] [73352] 0 73352 712142 43614 1236992 0 0 /conda/env/ [201596.826212] [73351] 0 73351 712142 43618 1236992 0 0 /conda/env/ [201596.826210] [73350] 0 73350 712142 43612 1236992 0 0 /conda/env/ [201596.826208] [73349] 0 73349 712142 43619 1236992 0 0 /conda/env/ [201596.826207] [73348] 0 73348 712142 43561 1236992 0 0 /conda/env/ [201596.826205] [73347] 0 73347 712142 43551 1236992 0 0 /conda/env/ [201596.826203] [73346] 0 73346 712142 43619 1236992 0 0 /conda/env/ [201596.826201] [73345] 0 73345 712142 43619 1236992 0 0 /conda/env/ [201596.826199] [73344] 0 73344 712142 43619 1236992 0 0 /conda/env/ [201596.826198] [73343] 0 73343 712142 43619 1236992 0 0 /conda/env/ [201596.826196] [73342] 0 73342 712142 43619 1236992 0 0 /conda/env/ [201596.826195] [73341] 0 73341 712142 43619 1236992 0 0 /conda/env/ [201596.826193] [73340] 0 73340 712142 43619 1236992 0 0 /conda/env/ [201596.826192] [73339] 0 73339 712142 43619 1236992 0 0 /conda/env/ [201596.826190] [73338] 0 73338 712142 43619 1236992 0 0 /conda/env/ [201596.826189] [73337] 0 73337 712142 43619 1236992 0 0 /conda/env/ [201596.826187] [73336] 0 73336 712142 43619 1236992 0 0 /conda/env/ [201596.826185] [73335] 0 73335 712142 43619 1236992 0 0 /conda/env/ [201596.826184] [73334] 0 73334 712142 43619 1236992 0 0 /conda/env/ [201596.826182] [73333] 0 73333 712142 43619 1236992 0 0 /conda/env/ [201596.826180] [73332] 0 73332 712142 43619 1236992 0 0 /conda/env/ [201596.826178] [73331] 0 73331 712142 43619 1236992 0 0 /conda/env/ [201596.826176] [73330] 0 73330 712142 43619 1236992 0 0 /conda/env/ [201596.826175] [73329] 0 73329 712142 43619 1236992 0 0 /conda/env/ [201596.826173] [73328] 0 73328 712142 43583 1236992 0 0 /conda/env/ [201596.826172] [73327] 0 73327 712142 43574 1236992 0 0 /conda/env/ [201596.826170] [73326] 0 73326 712142 43614 1236992 0 0 /conda/env/ [201596.826168] [73325] 0 73325 712142 43619 1236992 0 0 /conda/env/ [201596.826165] [73321] 0 73321 712142 43609 1236992 0 0 /conda/env/ [201596.826164] [73320] 0 73320 712142 43614 1236992 0 0 /conda/env/ [201596.826161] [73317] 0 73317 712142 43533 1236992 0 0 /conda/env/ [201596.826159] [73316] 0 73316 712142 43535 1236992 0 0 /conda/env/ [201596.826158] [73315] 0 73315 712142 43617 1236992 0 0 /conda/env/ [201596.826156] [73314] 0 73314 712142 43617 1236992 0 0 /conda/env/ [201596.826147] [73254] 0 73254 665259 44877 1245184 0 0 airflow schedul [201596.826146] [73253] 0 73253 663081 42595 1224704 0 0 airflow schedul [201596.826144] [73247] 0 73247 660815 38868 1187840 0 0 airflow schedul [201596.826142] [73241] 0 73241 665744 45272 1245184 0 0 airflow schedul [201596.826090] [72514] 0 72514 1136 200 57344 0 0 sleep [201596.826083] [71891] 0 71891 1136 190 57344 0 0 sleep [201596.825820] [86649] 0 86649 660268 38751 1191936 0 0 airflow schedul [201596.825818] [86085] 0 86085 712142 51436 1343488 0 0 /conda/env/ [201596.825727] [74651] 0 74651 4630 805 81920 0 0 run_airflow.sh [201596.825725] [74634] 0 74634 1160 431 53248 0 0 update-dags-che [201596.825724] [74543] 0 74543 4630 847 86016 0 0 start [201596.825060] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [201596.825037] Memory cgroup stats for /ecs/99ee477ab6cd4c6988a5ad1476591f6c/3528c8b404fe2558ba6e47bc84fd515f5a81ea14ed4f6c0a8b7cf668aeeecf45: cache:64KB rss:5785836KB rss_huge:0KB shmem:32KB mapped_file:12KB dirty:0KB writeback:0KB inactive_anon:16KB active_anon:5785296KB inactive_file:28KB active_file:0KB unevictable:0KB [201596.825037] kmem: usage 358100kB, limit 9007199254740988kB, failcnt 0 [201596.825036] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 [201596.825036] memory: usage 6144000kB, limit 6144000kB, failcnt 19653 [201596.825031] Task in /ecs/99ee477ab6cd4c6988a5ad1476591f6c/3528c8b404fe2558ba6e47bc84fd515f5a81ea14ed4f6c0a8b7cf668aeeecf45 killed as a result of limit of /ecs/99ee477ab6cd4c6988a5ad1476591f6c/3528c8b404fe2558ba6e47bc84fd515f5a81ea14ed4f6c0a8b7cf668aeeecf45 [201596.825030] R13: 00000000047d1970 R14: 00007f0fe8ac2090 R15: 00000000047d1970 [201596.825030] R10: 00000000012de010 R11: 0000000000000004 R12: 000000000000ff00 [201596.825029] RBP: 00007f0fe8c0d5d0 R08: 0000000000000000 R09: 0000000000000000 [201596.825029] RDX: 00000000047d1994 RSI: 0000000000000000 RDI: 00000000047d5000 [201596.825028] RAX: 0000000000000000 RBX: 000000000000ff00 RCX: 000000000000c894 [201596.825027] RSP: 002b:00007ffd2ff66bb8 EFLAGS: 00010206 [201596.825026] RIP: 0033:0x7f0fe7ac419d [201596.825024] async_page_fault+0x45/0x50 [201596.825022] do_async_page_fault+0x51/0x80 [201596.825019] ? async_page_fault+0x2f/0x50 [201596.825015] do_page_fault+0x2e/0xe0 [201596.825014] __do_page_fault+0x4a5/0x4d0 [201596.825013] mm_fault_error+0x90/0x180 [201596.825007] pagefault_out_of_memory+0x36/0x7b [201596.825006] ? mem_cgroup_css_online+0x40/0x40 [201596.825004] mem_cgroup_oom_synchronize+0x2e8/0x320 [201596.825002] mem_cgroup_out_of_memory+0x4b/0x80 [201596.824998] out_of_memory+0x2d1/0x4f0 [201596.824996] oom_kill_process+0x220/0x440 [201596.824994] dump_header+0x71/0x285 [201596.824988] dump_stack+0x63/0x8b [201596.824980] Call Trace: [201596.824980] Hardware name: Amazon EC2 c5d.24xlarge/, BIOS 1.0 10/16/2017 [201596.824979] CPU: 21 PID: 73436 Comm: airflow schedul Not tainted 4.15.0-1039-aws #41-Ubuntu [201596.824974] airflow schedul cpuset=3528c8b404fe2558ba6e47bc84fd515f5a81ea14ed4f6c0a8b7cf668aeeecf45 mems_allowed=0-1 [201596.824973] airflow schedul invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 [201596.735306] oom_reaper: reaped process 73324 (/conda/env/), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB ``` </details> Similar to issue https://github.com/apache/airflow/issues/7935 we see many weird processes spawned that look like dupes of the main scheduler process (they are not dag processing child processes): <details><summary>Normal dag processing (one dag being processed)</summary> <p> ``` $ ps faux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 18516 3264 ? Ss 02:23 0:00 /bin/bash ./start scheduler -r 600 root 35 0.0 0.0 18520 3284 ? S 02:23 0:00 /bin/bash /app/run_airflow.sh scheduler -r 600 root 18481 5.7 0.0 2210036 176712 ? S 04:34 0:10 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 18541 0.5 0.0 2011744 139364 ? S 04:34 0:00 \_ airflow scheduler -- DagFileProcessorManager root 20173 0.0 0.0 2011744 136904 ? R 04:37 0:00 \_ airflow scheduler - DagFileProcessor /dags/db_stream_2datalake/stream_2datalake.py ``` </p> </details> <details><summary>Weird dag processing (excess of duplicate threads taking up RAM?). This can be seen by just running `ps faux` a bunch of times during normal/non OOM times.</summary> <p> ``` $ ps faux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 18516 3264 ? Ss 02:23 0:01 /bin/bash ./start scheduler -r 600 root 35 0.0 0.0 18520 3284 ? S 02:23 0:00 /bin/bash /app/run_airflow.sh scheduler -r 600 root 29041 4.3 0.0 2211608 177860 ? Sl 04:54 0:25 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 29101 0.5 0.0 2011728 139212 ? S 04:54 0:03 \_ airflow scheduler -- DagFileProcessorManager root 36058 69.3 0.0 2016512 148300 ? S 05:04 0:02 | \_ airflow scheduler - DagFileProcessor /dags/pubsub_hourly/pubsub_hourly.py root 36123 0.0 0.0 2211624 142008 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36124 0.0 0.0 2211624 142004 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36125 0.0 0.0 2211624 142008 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36126 0.0 0.0 2211624 142012 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36127 0.0 0.0 2211624 142012 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36128 0.0 0.0 2211624 142012 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36129 0.0 0.0 2211624 142012 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36130 0.0 0.0 2211624 142012 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36131 0.0 0.0 2211624 142012 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36132 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36133 0.0 0.0 2211624 142028 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36134 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36135 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36136 0.0 0.0 2211624 142028 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36137 0.0 0.0 2211624 142040 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36138 0.0 0.0 2211624 142040 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36139 0.0 0.0 2211624 142040 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36140 0.0 0.0 2211624 142028 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36141 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36142 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36143 0.0 0.0 2211624 142028 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36144 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36145 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36146 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36147 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36148 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36149 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36150 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36151 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36152 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36153 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36154 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36155 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36156 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36157 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36158 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36159 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36160 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36161 0.0 0.0 2211624 142016 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36162 0.0 0.0 2211624 137964 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36163 0.0 0.0 2211624 137964 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36164 0.0 0.0 2211624 137964 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36165 0.0 0.0 2211624 137964 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36166 0.0 0.0 2211624 137964 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36167 0.0 0.0 2211624 137964 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36168 0.0 0.0 2211624 137964 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 root 36169 0.0 0.0 2211624 137964 ? S 05:04 0:00 \_ /conda/env/bin/python /conda/env/bin/airflow scheduler -r 600 ``` </p> </details> <details><summary>Number of Airflow processes running at a time, every .5 seconds (max_threads is 10)</summary> <p> ``` $ while true; do pgrep -f 'airflow scheduler' | wc -l; sleep .5; done 39 4 4 4 39 39 39 39 39 5 5 5 5 5 5 5 3 3 3 38 3 3 2 2 2 2 2 37 2 2 2 2 2 2 2 7 2 8 3 8 2 4 3 3 3 3 2 2 2 2 2 2 2 2 4 3 3 3 9 3 3 3 13 3 3 3 17 2 2 2 2 2 2 2 24 2 2 4 ``` </p> </details> I will try to get a py-spy dump of a few processes after the next OOM event. Any help would be much appreciated! Our on call engineers are having sleepless nights. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org