Hello Hannes, thank you for your previous reply.
Am 03.02.2016 um 02:43 schrieb Hannes Frederic Sowa: > On 02.02.2016 17:25, Philipp Hahn wrote: >> we recently updated our kernel to 4.1.16 + patch for "unix: properly >> account for FDs passed over unix sockets" and have since then >> self-detected stalls triggered by the Samba daemon: ... >> We have not yet been able to reproduce the hang, but going back to our >> previous kernel 4.1.12 makes the problem go away. > > Can you remove the patch "unix: properly account for FDs passed over > unix sockets" and see if the problem still happens? I will try. The problem is that I can't trigger the bug reliably. It always happens to "smbd", but I don't know the triggering condition. > I couldn't quickly see any problems with your added patch. I currently > suspect a tight loop because of a SOCK_DEAD flag set but the socket not > removed from unix_socket_table or the vfs. Hmmm... All occurrences of the bug show _raw_spin_lock() as the sleeping function, never anything other thus far. I have seen the bug both on amd64 and i386. If it would spin in > 1097 restart: > 1098 »···»···other = unix_find_other(net, sunaddr, alen, sock->type, hash, > &err); > 1099 »···»···if (!other) > 1100 »···»···»···goto out; > 1101 > 1102 »···»···unix_state_double_lock(sk, other); > 1103 > 1104 »···»···/* Apparently VFS overslept socket death. Retry. */ > 1105 »···»···if (sock_flag(other, SOCK_DEAD)) { > 1106 »···»···»···unix_state_double_unlock(sk, other); > 1107 »···»···»···sock_put(other); > 1108 »···»···»···goto restart; > 1109 »···»···} I would expect some other function to show up in the trace, but it's always the call chain "unix_dgram_connect -> unix_state_double_lock -> _raw_spin_lock". The only case I see is that unix_find_other() returns the same "other" again and again. Q: How will that dead "other" be garbage collected finally? The kernel is > # CONFIG_PREEMPT_NONE is not set > CONFIG_PREEMPT_VOLUNTARY=y > # CONFIG_PREEMPT is not set During my hunt I found the following difference between 4.4 and 4.1.16 in net/unix/af_unix.c: > commit 1586a5877db9eee313379738d6581bc7c6ffb5e3 > Author: Eric Dumazet <eduma...@google.com> > Date: Fri Oct 23 10:59:16 2015 -0700 > > af_unix: do not report POLLOUT on listeners > > poll(POLLOUT) on a listener should not report fd is ready for > a write(). > > This would break some applications using poll() and pfd.events = -1, > as they would not block in poll() > > Signed-off-by: Eric Dumazet <eduma...@google.com> > Reported-by: Alan Burlison <alan.burli...@oracle.com> > Tested-by: Eric Dumazet <eduma...@google.com> > Signed-off-by: David S. Miller <da...@davemloft.net> ... > -static inline int unix_writable(struct sock *sk) > +static int unix_writable(const struct sock *sk) > { > - return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; > + return sk->sk_state != TCP_LISTEN && > + (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; > } Q: That *TCP*_LISTEN in net/*unix*/ looks strange to my eyes, but I'm not a kernel guru (yet). There is one hunk in 5c77e26862ce604edea05b3442ed765e9756fe0f, which uses the result of that function, which might explain the difference, why it shows with 4.1.16 but not with 4.4? > @@ -2245,14 +2388,16 @@ static unsigned int unix_dgram_poll(struct file > *file, struct socket *sock, > writable = unix_writable(sk); > - other = unix_peer_get(sk); > - if (other) { > - if (unix_peer(other) != sk) { > - sock_poll_wait(file, &unix_sk(other)->peer_wait, > wait); > - if (unix_recvq_full(other)) > - writable = 0; > - } > - sock_put(other); > + if (writable) { > + unix_state_lock(sk); > + > + other = unix_peer(sk); > + if (other && unix_peer(other) != sk && > + unix_recvq_full(other) && > + unix_dgram_peer_wake_me(sk, other)) > + writable = 0; > + > + unix_state_unlock(sk); > } > if (writable) I will for now build a new kernel with > $ git log --oneline v4.1.12..v4.1.17 -- net/unix > dc6b0ec unix: properly account for FDs passed over unix sockets > cc01a0a af_unix: Revert 'lock_interruptible' in stream receive code > 5c77e26 unix: avoid use-after-free in ep_remove_wait_queue reverted to see if it still happens. The "middle" patch seems harmless, as it only changes a code path for STREAMS, while the bug triggers with DGRAMS only. > The stack trace is rather unreliable, maybe something completely > different happend. Do you happend to see better reports? So far they look all the same. Anything more I can do to prepare for collection better information next time I get that bug? Thanks again. Philipp