[ https://issues.apache.org/jira/browse/MESOS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400697#comment-15400697 ]
Jie Yu commented on MESOS-5893: ------------------------------- Can you try not use the `namespace/pid` isolator and see if the problem still exists. I suspect this is due to the fact that our executor does not do reaping (like init). > mesos-executor terminated forked children turn to zombies > --------------------------------------------------------- > > Key: MESOS-5893 > URL: https://issues.apache.org/jira/browse/MESOS-5893 > Project: Mesos > Issue Type: Bug > Components: containerization > Affects Versions: 1.1.0 > Environment: mesos compiled from git master ( 1.1.0 ) > {{../configure --enable-ssl --enable-libevent --prefix=/usr --enable-optimize > --enable-silent-rules --enable-xfs-disk-isolator}} > isolators : > {{namespaces/pid,cgroups/cpu,cgroups/mem,filesystem/linux,docker/runtime,network/cni,docker/volume}} > Reporter: Stéphane Cottin > Labels: containerizer > > mesos containerizer does not properly handle children death. > discovered using marathon-lb, each topology update fork another haproxy, the > old haproxy process should properly die after its last client connection is > terminated, but turn into a zombie. > {noformat} > 7716 ? Ssl 0:00 | \_ mesos-executor > --launcher_dir=/usr/libexec/mesos --sandbox_directory=/mnt/mesos/sandbox > --user=root --working_directory=/marathon-lb > --rootfs=/mnt/mesos/provisioner/containers/3b381d5c-7490-4dcd-ab4b-81051226075a/backends/overlay/rootfses/a4beacac-2d7e-445b-80c8-a9b4e480c491 > 7813 ? Ss 0:00 | | \_ sh -c /marathon-lb/run sse > --marathon https://marathon:8443 --auth-credentials user:pass --group > 'external' --ssl-certs /certs --max-serv-port-ip-per-task 20050 > 7823 ? S 0:00 | | | \_ /bin/bash /marathon-lb/run sse > --marathon https://marathon:8443 --auth-credentials user:pass --group > external --ssl-certs /certs --max-serv-port-ip-per-task 20050 > 7827 ? S 0:00 | | | \_ /usr/bin/runsv > /marathon-lb/service/haproxy > 7829 ? S 0:00 | | | | \_ /bin/bash ./run > 8879 ? S 0:00 | | | | \_ sleep 0.5 > 7828 ? Sl 0:00 | | | \_ python3 > /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config > /marathon-lb/haproxy.cfg --ssl-certs /certs --command sv reload > /marathon-lb/service/haproxy --sse --marathon https://marathon:8443 > --auth-credentials user:pass --group external --max-serv-port-ip-per-task > 20050 > 7906 ? Zs 0:00 | | \_ [haproxy] <defunct> > 8628 ? Zs 0:00 | | \_ [haproxy] <defunct> > 8722 ? Ss 0:00 | | \_ haproxy -p /tmp/haproxy.pid -f > /marathon-lb/haproxy.cfg -D -sf 144 52 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)