m1a2st opened a new pull request, #20899:
URL: https://github.com/apache/kafka/pull/20899
## Problem
While running Kafka e2e tests, various tests were failing with
`TimeoutError('Kafka node failed to stop in 60 seconds')`. In Kafka e2e tests,
we check the PID to ensure the Kafka server has shut down. After investigating
this issue, I found that the Kafka process was a zombie process in the
container:
```bash
ducker@ducker05:/$ jcmd
285 kafka.Kafka /mnt/kafka/kafka.properties
18207 jdk.jcmd/sun.tools.jcmd.JCmd
ducker@ducker05:/$ cat /proc/285/status | grep -i state
State: Z (zombie)
```
## Root Cause
This issue is related to [this
change](https://github.com/apache/kafka/pull/17554/files#r1845737954). When
using `CMD ["sudo", "service", "ssh", "start", "-D"]`, PID 1 becomes the SSH
service, which does not handle `SIGCHLD` signals and therefore won't reap
zombie processes:
```bash
ducker@ducker05:/$ cat /proc/1/cmdline | tr '\0' ' '
sudo service ssh start -D
```
However, with the old syntax `CMD sudo service ssh start && tail -f
/dev/null`, PID 1 is `/bin/sh`, which is a shell that properly reaps zombie
processes:
```bash
ducker@ducker05:/$ cat /proc/1/cmdline | tr '\0' ' '
/bin/sh -c sudo service ssh start && tail -f /dev/null
```
## Solution
Use `tini` as PID 1 to properly manage processes and avoid zombie processes
from remaining in the system.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]