On Fri, Jul 3, 2020 at 5:54 AM Hugo Cornelis
<hugo.corne...@essensium.com> wrote:
>
> Thanks for your answer.
>
> On Mon, Jun 29, 2020 at 9:10 PM Ian Lance Taylor <i...@golang.org> wrote:
>>
>> Thanks for the background.
>>
>> Earlier I suggested looking at the output of "strace -f" for the
>> programs that fail.  Does that show anything of interest?
>
>
> What follows is the analysis of one strace (strace -fv -s 100) attached to 
> the docker daemon.
>
> The strace log file shows the creation of a chain of processes: dockerd forks 
> containerd forks containerd-shim forks runc forks a command that runs inside 
> the container (the command is '/usr/bin/find .').  This is also expected.
>
> When the I/O of the process /usr/bin/find in the docker container is blocked, 
> strace shows that the Golang schedulers are still active:
>
> (line 298015)
> [pid  2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000} 
> <unfinished ...>
> [pid  2264] sched_yield( <unfinished ...>
> [pid  2270] swapcontext(0x2190a580, 0 <unfinished ...>
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 0
> [pid  2264] sched_yield( <unfinished ...>
> [pid  2270] swapcontext(0, 0x214dda80 <unfinished ...>
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 558750336
> [pid  2264] sched_yield( <unfinished ...>
> [pid  2270] swapcontext(0, 0x2190a580 <unfinished ...>
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 563127680
> [pid  2264] sched_yield( <unfinished ...>
> [pid  2270] swapcontext(0x2190a580, 0 <unfinished ...>
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 0
> ...
> (line 298134)
> [pid  2266] <... _newselect resumed>)   = 0 (Timeout)
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 563127680
> [pid  2266] epoll_wait(4,  <unfinished ...>
> [pid  2264] sched_yield( <unfinished ...>
> [pid  2270] swapcontext(0x2190a580, 0 <unfinished ...>
> [pid  2266] <... epoll_wait resumed>[], 128, 0) = 0
> [pid  2264] <... sched_yield resumed>)  = 0
> [pid  2270] <... swapcontext resumed>)  = 0
> [pid  2266] _newselect(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000} 
> <unfinished ...>
>
> TIDs 2264, 2266 and 2270 belong to the process runc.  The strace log has 
> similar straces for the other processes (dockerd, containerd, 
> containerd-shim), so I assume also their goroutine schedulers were active.  I 
> am actually wondering how to relate the arguments listed in the strace file 
> to the Golang or C code.
>
> Just before the container gets blocked it runs the command 'find .' that 
> should produce output to the terminal (but there is no output at first, that 
> is the problem).  The data is visible in the strace log through the 'write()' 
> system call:
>
> [pid  2204] execve("/usr/bin/find", ["/usr/bin/find", "."], 
> ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", 
> "HOSTNAME=caae2fb3bccf", "TERM=xterm", "HOME=/root"] <unfinished ...>
> ...
> [pid  2204] lstat64(".",  <unfinished ...>
> ...
> [pid  2204] write(1, ".\n", 2 <unfinished ...>
> ...
> [pid  2204] lstat64("./var",  <unfinished ...>
> ...
> [pid  2204] write(1, "./var\n", 6 <unfinished ...>
> ...
>
> The 'find' process writes 626 lines to stdout (18778 characters, this seems 
> to be reproducible).  The last lines are:
>
> ...
> [pid  2204] lstat64("./sys/kernel/slab/rpc_buffers",  <unfinished ...>
> ...
> [pid  2204] write(1, "./sys/kernel/slab/rpc_buffers\n", 30 <unfinished ...>
>
> All the write() system calls except the last one are successfully completed.  
> The last one remains blocked.  At that time PID 2204 hangs for a long time
>
> runc / dockerd / containerd / containerd-shim have continuous activity as in 
> the first strace that I showed above.
>
> When I terminate the docker process with a signal SIGINT, the characters 
> written by PID 2204 are suddenly flushed to the terminal.
>
> Are there anything specific things to look for in the strace files (eg. 
> specific epoll() calls).  Is there a way to map the arguments and return 
> values of swapcontext() to goroutines, or would this be a useless thing to 
> try to do?
>
> This is the analysis of one trial.  Some of the other trials did not start 
> the full chain of processes, it looks like the behaviour of the bug is also 
> timing dependent.


That looks like the process is writing to a pipe that nothing is reading from.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcWn-em6UzwUifemOttSToZNyNJ2VoqMZsH%2B-%3DhdUVoC%3DQ%40mail.gmail.com.

Reply via email to