On Tue, Oct 20, 2015 at 10:34 AM, Dmitry Vyukov <dvyu...@google.com> wrote: > On Mon, Oct 19, 2015 at 10:17 PM, Dmitry Vyukov <dvyu...@google.com> wrote: >> On Mon, Oct 19, 2015 at 9:49 PM, Oleg Nesterov <o...@redhat.com> wrote: >>> On 10/19, Dmitry Vyukov wrote: >>>> >>>> The following program hangs in some interesting state and is not >>>> killable (started by a normal user, not root): >>> >>> Thanks. >>> >>>> #include <pthread.h> >>>> #include <unistd.h> >>>> #include <sys/ptrace.h> >>>> #include <stdio.h> >>>> #include <signal.h> >>>> >>>> void *thr(void *arg) { >>>> ptrace(PTRACE_TRACEME, 0, 0, 0); >>>> sleep(3); >>>> kill(getpid(), SIGCHLD); >>>> return 0; >>>> } >>>> >>>> int main() { >>>> if (fork() == 0) { >>>> sleep(1); >>>> pthread_t th; >>>> pthread_create(&th, 0, thr, 0); >>>> sleep(1); >>>> } >>>> return 0; >>>> } >>>> >>>> >>>> The child process attaches as tracee to init process >>> >>> Yes, although in a racy manner, the parent can exit after >>> PTRACE_TRACEME in this case the kernel will untrace the task >>> before reparenting. Not that this matters. >>> >>>> and then hangs in >>>> a state that I don't understand. When I did a similar thing but >>>> attached it to a normal parent process (shell), I still was able to >>>> get rid of it by killing parent (shell). >>> >>> See above. >>> >>> So I bet the problem is that your /sbin/init doesn't use __WALL, >>> so wait() doesn't reap the traced zombie sub-thread, and thus it >>> can't release the non-empty thread group. >>> >>> Could you please verify? Just do "strace -p1" and send SIGCHLD to >>> init. >>> >>> perhaps eligible_child() should assume WALL if ptrace && ZOMBIE... >> >> >> I am using Ubuntu. >> Here strace output from init: >> >> waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WSTOPPED|WCONTINUED, NULL) = 0 >> >> So what should be fixed here? Kernel of distro init? > > waitpid(__WALL) indeed joins these processes. > But __WALL can't be used with waitid and Ubuntu init uses waitid...
I am thinking how to workaround this issue. The following program joins both child processes: #include <pthread.h> #include <unistd.h> #include <sys/ptrace.h> #include <stdio.h> #include <errno.h> #include <signal.h> #include <sys/types.h> #include <sys/wait.h> void *thr(void *arg) { ptrace(PTRACE_TRACEME, 0, 0, 0); return 0; } int main() { int pid = fork(); if (pid == 0) { pthread_t th; pthread_create(&th, 0, thr, 0); sleep(1); return 0; } siginfo_t info = {}; int status = 0; int res = waitpid(-1, &status, __WALL); printf("pid=%d res=%d errno=%d\n", pid, res, errno); res = waitpid(-1, &status, __WALL); printf("pid=%d res=%d errno=%d\n", pid, res, errno); return 0; } However, I need to wait for a particular child and if I change the first waitpid to: int res = waitpid(pid, &status, __WALL); then it does not terminate. So how can I wait for such child process? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/