Mathias Krause <mini...@googlemail.com> writes: > this is an attempt to resurrect the thread initially started here: > > http://thread.gmane.org/gmane.linux.network/353003 > > As that patch fixed the issue for the mentioned reproducer, it did not > fix the bug for the production code Olivier is using. :( > > Changing the reproducer only slightly allows me to trigger the following > list debug splat (CONFIG_DEBUG_LIST=y) reliable within seconds -- even > with the above linked patch applied:
The patch was --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c <at> <at> -2233,10 +2233,14 <at> <at> static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, writable = unix_writable(sk); other = unix_peer_get(sk); if (other) { - if (unix_peer(other) != sk) { + unix_state_lock(other); + if (!sock_flag(other, SOCK_DEAD) && unix_peer(other) != sk) { + unix_state_unlock(other); sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); if (unix_recvq_full(other)) writable = 0; + } else { + unix_state_unlock(other); } sock_put(other); } That's obviously not going to help you when 'racing with unix_release_sock' as the socket might be released immediately after the unix_state_unlock, ie, before sock_poll_wait is called. Provided I understand this correctly, the problem is that the socket reference count may have become 1 by the time sock_put is called but the sock_poll_wait has created a new reference to it which isn't accounted for. A simple way to fix that could be to do something like unix_state_lock(other); if (!sock_flag(other, SOCK_DEAD)) sock_poll_wait(...) unix_state_unlock(other); This would imply that unix_release_sock either marked the socket as dead before the sock_poll_wait was executed or that the wake_up_interruptible call in there will run after ->peer_wait was used (and it will thus 'unpollwait' it again). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html