Vovo Yang <[email protected]> writes: > On Fri, May 12, 2017 at 7:19 AM, Eric W. Biederman > <[email protected]> wrote: >> Guenter Roeck <[email protected]> writes: >> >>> What I know so far is >>> - We see this condition on a regular basis in the field. Regular is >>> relative, of course - let's say maybe 1 in a Milion Chromebooks >>> per day reports a crash because of it. That is not that many, >>> but it adds up. >>> - We are able to reproduce the problem with a performance benchmark >>> which opens 100 chrome tabs. While that is a lot, it should not >>> result in a kernel hang/crash. >>> - Vovo proviced the test code last night. I don't know if this is >>> exactly what is observed in the benchmark, or how it relates to the >>> benchmark in the first place, but it is the first time we are actually >>> able to reliably create a condition where the problem is seen. >> >> Thank you. I will be interesting to hear what is happening in the >> chrome perfomance benchmark that triggers this. >> > What's happening in the benchmark: > 1. A chrome renderer process was created with CLONE_NEWPID > 2. The process crashed > 3. Chrome breakpad service calls ptrace(PTRACE_ATTACH, ..) to attach to every > threads of the crashed process to dump info > 4. When breakpad detach the crashed process, the crashed process stuck in > zap_pid_ns_processes()
Very interesting thank you. So the question is specifically which interaction is causing this. In the test case provided it was a sibling task in the pid namespace dying and not being reaped. Which may be what is happening with breakpad. So far I have yet to see kernel bug but I won't rule one out. Eric

