Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
From: Linus Torvalds Date: Fri, 2 Sep 2016 11:13:09 -0700 (PDT) > > From: Linus Torvalds > Date: Thu, 1 Sep 2016 14:43:53 -0700 > Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and > 'bindlock' > > Right now we use the 'readlock' both for protecting some of the af_unix > IO path and for making the bind be single-threaded. > > The two are independent, but using the same lock makes for a nasty > deadlock due to ordering with regards to filesystem locking. The bind > locking would want to nest outside the VSF pathname locking, but the IO > locking wants to nest inside some of those same locks. > > We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix > splice-bind deadlock") which moved the readlock inside the vfs locks, > but that caused problems with overlayfs that will then call back into > filesystem routines that take the lock in the wrong order anyway. > > Splitting the locks means that we can go back to having the bind lock be > the outermost lock, and we don't have any deadlocks with lock ordering. > > Acked-by: Rainer Weikusat > Acked-by: Al Viro > Signed-off-by: Linus Torvalds Applied.
Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
On 02.09.2016 20:13, Linus Torvalds wrote: > > From: Linus Torvalds > Date: Thu, 1 Sep 2016 14:43:53 -0700 > Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and > 'bindlock' > > Right now we use the 'readlock' both for protecting some of the af_unix > IO path and for making the bind be single-threaded. > > The two are independent, but using the same lock makes for a nasty > deadlock due to ordering with regards to filesystem locking. The bind > locking would want to nest outside the VSF pathname locking, but the IO > locking wants to nest inside some of those same locks. > > We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix > splice-bind deadlock") which moved the readlock inside the vfs locks, > but that caused problems with overlayfs that will then call back into > filesystem routines that take the lock in the wrong order anyway. > > Splitting the locks means that we can go back to having the bind lock be > the outermost lock, and we don't have any deadlocks with lock ordering. > > Acked-by: Rainer Weikusat > Acked-by: Al Viro > Signed-off-by: Linus Torvalds > --- > > This patch is really trivial, and I've tried to be careful and look at the > locking, but somebody who really knows the AF_UNIX code should definitely > take a second look. > > Note that I did the revert (that re-introduces the original splice > deadlock) first, because that made the whole series much easier to > explain. Doing it in the other order made the revert nastier because this > patch obviously touches the same code that the revert in 1/2 does. > > So this way the series ends up being "go back to the original code with > the original deadlock, and then fix that original deadlock by splitting > the bind lock". Acked-by: Hannes Frederic Sowa
Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
On 02.09.2016 21:15, David Miller wrote: > From: Linus Torvalds > Date: Fri, 2 Sep 2016 11:17:18 -0700 > >> Oh, this was missing a >> >> Reported-and-tested-by: CAI Qian >> >> who found the new deadlock. >> >> There's now *another* lockdep deadlock report by him, but that one has >> nothing to do with networking. >> >> (And neither of these deadlocks will actually deadlock the machine in >> practice, but you can trigger the lockdep reports with some odd splice >> patterns and overlayfs use) > > I read over this and can't find any problems. > > The main thing I was concerned about was an I/O path that really > expects the socket's hash not to change for whatever reason, but even > all of the unix_find_other() calls are done outside of the mutex > already. Furthermore we don't allow rehashing, if the socket is "published" once, it can only be destroyed but can't change identity during its lifetime. IIRC this was another bug we had a while ago.
Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
On Fri, Sep 2, 2016 at 12:15 PM, David Miller wrote: > > The main thing I was concerned about was an I/O path that really > expects the socket's hash not to change for whatever reason, but even > all of the unix_find_other() calls are done outside of the mutex > already. That's my read of it too. I did actually go through the effort on trying to follow all the locking in there yesterday, so while I would want an AF_UNIX person to double-check me, I think it's fine. The real locking seem to be the u->lock spinlock (taken by unix_state_lock() and friends). The bindlock is only about serializing the binders, which do things that can sleep (notably the mknod of the unix socket node in the filesystem). The one thing I looked at and wasn't sure about was unix_stream_connect(), which creates a *new* unix domain socket and binds it to the old one without holding the bindlock. Note that it never did hold the lock, so that's not a new thing, but it worries me a bit. I *think* it's ok, because I think this all happens before that new socket is actually reachable, so it cannot race against somebody else doing a bind. And my patch doesn't actually change anything in this area, so I'm not making it worse. But it was the one thing I reacted to when going through the locking there. (Well, not the "one thing". The other thing I reacted to was the overall reaction that none of this is documented, and the locking was clearly not very clear). Linus
Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
From: Linus Torvalds Date: Fri, 2 Sep 2016 11:17:18 -0700 > Oh, this was missing a > > Reported-and-tested-by: CAI Qian > > who found the new deadlock. > > There's now *another* lockdep deadlock report by him, but that one has > nothing to do with networking. > > (And neither of these deadlocks will actually deadlock the machine in > practice, but you can trigger the lockdep reports with some odd splice > patterns and overlayfs use) I read over this and can't find any problems. The main thing I was concerned about was an I/O path that really expects the socket's hash not to change for whatever reason, but even all of the unix_find_other() calls are done outside of the mutex already.
Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
On Fri, Sep 2, 2016 at 11:13 AM, Linus Torvalds wrote: > > From: Linus Torvalds > Date: Thu, 1 Sep 2016 14:43:53 -0700 > Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and > 'bindlock' > > Right now we use the 'readlock' both for protecting some of the af_unix > IO path and for making the bind be single-threaded. > > The two are independent, but using the same lock makes for a nasty > deadlock due to ordering with regards to filesystem locking. The bind > locking would want to nest outside the VSF pathname locking, but the IO > locking wants to nest inside some of those same locks. > > We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix > splice-bind deadlock") which moved the readlock inside the vfs locks, > but that caused problems with overlayfs that will then call back into > filesystem routines that take the lock in the wrong order anyway. > > Splitting the locks means that we can go back to having the bind lock be > the outermost lock, and we don't have any deadlocks with lock ordering. > > Acked-by: Rainer Weikusat > Acked-by: Al Viro > Signed-off-by: Linus Torvalds Oh, this was missing a Reported-and-tested-by: CAI Qian who found the new deadlock. There's now *another* lockdep deadlock report by him, but that one has nothing to do with networking. (And neither of these deadlocks will actually deadlock the machine in practice, but you can trigger the lockdep reports with some odd splice patterns and overlayfs use) Linus
[PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
From: Linus Torvalds Date: Thu, 1 Sep 2016 14:43:53 -0700 Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' Right now we use the 'readlock' both for protecting some of the af_unix IO path and for making the bind be single-threaded. The two are independent, but using the same lock makes for a nasty deadlock due to ordering with regards to filesystem locking. The bind locking would want to nest outside the VSF pathname locking, but the IO locking wants to nest inside some of those same locks. We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix splice-bind deadlock") which moved the readlock inside the vfs locks, but that caused problems with overlayfs that will then call back into filesystem routines that take the lock in the wrong order anyway. Splitting the locks means that we can go back to having the bind lock be the outermost lock, and we don't have any deadlocks with lock ordering. Acked-by: Rainer Weikusat Acked-by: Al Viro Signed-off-by: Linus Torvalds --- This patch is really trivial, and I've tried to be careful and look at the locking, but somebody who really knows the AF_UNIX code should definitely take a second look. Note that I did the revert (that re-introduces the original splice deadlock) first, because that made the whole series much easier to explain. Doing it in the other order made the revert nastier because this patch obviously touches the same code that the revert in 1/2 does. So this way the series ends up being "go back to the original code with the original deadlock, and then fix that original deadlock by splitting the bind lock". include/net/af_unix.h | 2 +- net/unix/af_unix.c| 45 +++-- 2 files changed, 24 insertions(+), 23 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 9b4c418bebd8..fd60eccb59a6 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -52,7 +52,7 @@ struct unix_sock { struct sock sk; struct unix_address *addr; struct path path; - struct mutexreadlock; + struct mutexiolock, bindlock; struct sock *peer; struct list_headlink; atomic_long_t inflight; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 433ae1bbef97..8309687a56b0 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -661,11 +661,11 @@ static int unix_set_peek_off(struct sock *sk, int val) { struct unix_sock *u = unix_sk(sk); - if (mutex_lock_interruptible(&u->readlock)) + if (mutex_lock_interruptible(&u->iolock)) return -EINTR; sk->sk_peek_off = val; - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); return 0; } @@ -779,7 +779,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) spin_lock_init(&u->lock); atomic_long_set(&u->inflight, 0); INIT_LIST_HEAD(&u->link); - mutex_init(&u->readlock); /* single task reading lock */ + mutex_init(&u->iolock); /* single task reading lock */ + mutex_init(&u->bindlock); /* single task binding lock */ init_waitqueue_head(&u->peer_wait); init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); @@ -848,7 +849,7 @@ static int unix_autobind(struct socket *sock) int err; unsigned int retries = 0; - err = mutex_lock_interruptible(&u->readlock); + err = mutex_lock_interruptible(&u->bindlock); if (err) return err; @@ -895,7 +896,7 @@ retry: spin_unlock(&unix_table_lock); err = 0; -out: mutex_unlock(&u->readlock); +out: mutex_unlock(&u->bindlock); return err; } @@ -1009,7 +1010,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) goto out; addr_len = err; - err = mutex_lock_interruptible(&u->readlock); + err = mutex_lock_interruptible(&u->bindlock); if (err) goto out; @@ -1063,7 +1064,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) out_unlock: spin_unlock(&unix_table_lock); out_up: - mutex_unlock(&u->readlock); + mutex_unlock(&u->bindlock); out: return err; } @@ -1955,17 +1956,17 @@ static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page, if (false) { alloc_skb: unix_state_unlock(other); - mutex_unlock(&unix_sk(other)->readlock); + mutex_unlock(&unix_sk(other)->iolock