This is an automated email from the ASF dual-hosted git repository.

yhu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 7431dea675b [#37930][Runners] Reap child processes in ProcessManager 
to prevent zombie accumulation (#37932)
7431dea675b is described below

commit 7431dea675b5c3da2ae27be207954cc9abbe3eb2
Author: Andres Tiko <[email protected]>
AuthorDate: Wed Mar 25 21:23:39 2026 +0200

    [#37930][Runners] Reap child processes in ProcessManager to prevent zombie 
accumulation (#37932)
    
    * [#37930][Runners] Reap child processes in ProcessManager to prevent 
zombie accumulation
    
    ProcessManager.stopProcess() calls destroy()/destroyForcibly() to terminate
    child processes, but never calls Process.waitFor() to collect the exit 
status.
    On POSIX systems, this means the terminated child process entry remains in 
the
    kernel process table as a zombie (state Z/defunct) until the parent process
    itself exits.
    
    In long-running environments like Flink TaskManagers using
    --environment_type=PROCESS, the expansion service processes are repeatedly
    spawned and stopped but never reaped. Over time this leads to significant
    zombie accumulation (176+ observed on production hosts).
    
    The fix adds process.waitFor() calls after process termination in:
    - stopProcess(): after the destroy/destroyForcibly sequence
    - killAllProcesses(): after destroyForcibly in the shutdown hook path
    
    Fixes #37930
    
    * Update CHANGES.md with ProcessManager zombie fix
---
 CHANGES.md                                           |  1 +
 .../fnexecution/environment/ProcessManager.java      | 20 ++++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index 064b1485449..99b05ab9aaa 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -84,6 +84,7 @@
 ## Bugfixes
 
 * Fixed X (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
+* Fixed ProcessManager not reaping child processes, causing zombie process 
accumulation on long-running Flink deployments (Java) 
([#37930](https://github.com/apache/beam/issues/37930)).
 
 ## Security Fixes
 
diff --git 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
index 3570fef00df..86299762783 100644
--- 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
+++ 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
@@ -203,6 +203,14 @@ public class ProcessManager {
         }
       }
     }
+    // Reap the child process to prevent zombie accumulation. 
destroy()/destroyForcibly() send
+    // signals but do not call waitpid(), so the terminated process remains in 
the kernel process
+    // table as a zombie until waitFor() collects its exit status.
+    try {
+      process.waitFor();
+    } catch (InterruptedException e) {
+      Thread.currentThread().interrupt();
+    }
   }
 
   /** Returns true if the process exists within maxWaitTimeMillis. */
@@ -249,8 +257,16 @@ public class ProcessManager {
     processes.forEach((id, process) -> process.destroy());
   }
 
-  /** Kill all remaining processes forcibly, i.e. upon JVM shutdown */
+  /** Kill all remaining processes forcibly and reap them, i.e. upon JVM 
shutdown. */
   private void killAllProcesses() {
-    processes.forEach((id, process) -> process.destroyForcibly());
+    processes.forEach(
+        (id, process) -> {
+          process.destroyForcibly();
+          try {
+            process.waitFor();
+          } catch (InterruptedException e) {
+            Thread.currentThread().interrupt();
+          }
+        });
   }
 }

Reply via email to