Eric Dumazet <eric.duma...@gmail.com> writes: > On Wed, 2015-11-25 at 18:24 +0000, Rainer Weikusat wrote: >> Eric Dumazet <eric.duma...@gmail.com> writes: >> > On Wed, 2015-11-25 at 17:30 +0000, Rainer Weikusat wrote: >> > >> >> In case this is wrong, it obviously implies that sk_sleep(sk) must not >> >> be used anywhere as it accesses the same struck sock, hence, when that >> >> can "suddenly" disappear despite locks are used in the way indicated >> >> above, there is now safe way to invoke that, either, as it just does a >> >> rcu_dereference_raw based on the assumption that the caller knows that >> >> the i-node (and the corresponding wait queue) still exist. >> >> >> > >> > Oh well. >> > >> > sk_sleep() is not used if the return is NULL
[...] >> finish_wait(sk_sleep(sk), &wait); >> unix_state_unlock(sk); >> return timeo; >> } >> >> Neither prepare_to_wait nor finish_wait check if the pointer is >> null. [...] > You are looking at the wrong side. > > Of course, the thread 'owning' a socket has a reference on it, so it > knows sk->sk_socket and sk->sk_ww is not NULL. I could continue argueing about this but IMHO, it just leads away from the actual issue. Taking a couple of steps back, therefore, ------------ diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 4e95bdf..5c87ea6 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1754,8 +1754,8 @@ restart_locked: skb_queue_tail(&other->sk_receive_queue, skb); if (max_level > unix_sk(other)->recursion_level) unix_sk(other)->recursion_level = max_level; - unix_state_unlock(other); other->sk_data_ready(other); + unix_state_unlock(other); sock_put(other); scm_destroy(&scm); return len; @@ -1860,8 +1860,8 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, skb_queue_tail(&other->sk_receive_queue, skb); if (max_level > unix_sk(other)->recursion_level) unix_sk(other)->recursion_level = max_level; - unix_state_unlock(other); other->sk_data_ready(other); + unix_state_unlock(other); sent += size; } ------------- I'm convinced this will work for the given problem (I don't claim that it's technically superior to the larger change in any aspect except that it's simpler and localized) because 1) The use-after-free occurred while accessing the struck sock allocated together with the socket inode. 2) This happened because a close was racing with a write. 3) The socket inode, struct sock and struct socket_wq are freed by sock_destroy_inode. 4) sock_destroy_inode can only be called as consequence of the iput in sock_release. 5) sock_release invokes the per-protocol/ family release function before doing the iput. 6) unix_sock_release has to acquire the unix_state_lock on the socket referred to as other in the code above before it can do anything, in particular, before it calls sock_orphan which resets the struct sock and wq pointers and also sets the SOCK_DEAD flag. 7) the unix_stream_sendmsg code acquires the unix_state_lock on other and then checks the SOCK_DEAD flag. The code above only runs if it was not set, hence, the iput in sock_release can't have happened yet because a concurrent unix_sock_release must still be blocked on the unix_state_lock of other. If there's an error in this reasoning, I'd very much like to know where and what it is, not the least because the unix_dgram_peer_wake_relay function I wrote also relies on it being correct (wrt to using the result of sk_sleep outside of rcu_read_lock/ rcu_read_unlock). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/