Re: Recursive NET_LOCK()
Mike Belopuhov wrote: > The big problem with the non-recursive netlock is that to fix > recursions in many cases we have to release the netlock in one > of the callers up the chain which makes the interim code run > without any protection and we don't have a way of assessing > the situation short of code inspection which is fallible. > > Sprinkling NET_ASSERT_LOCKED() everywhere is possible, however > we still end up with NET_LOCK/UNLOCK chains in the code that > was previously running uninterrupted. > > Does this outweigh the desire not to use recursive rwlocks > anywhere else? I can see the danger of breaking atomic sections into two parts. However, having these long critical sections increases the chance of introducing lock ordering bugs. Code A calls code B calls code C. A and C need the netlock, but code B requires some other lock?
Re: Recursive NET_LOCK()
On 05/01/17(Thu) 16:09, Mike Belopuhov wrote: > [...] > The big problem with the non-recursive netlock is that to fix > recursions in many cases we have to release the netlock in one > of the callers up the chain which makes the interim code run > without any protection and we don't have a way of assessing > the situation short of code inspection which is fallible. No, that's not to fix the recursions. That's to know where they are. Nobody came up with a solution for the recursions so far. I doubt there's a common solution to all the code paths. That's something I'd like to think about in a second step. The current problem is still unknown, we just know that using chrome hang the machine with either flavor of the NET_LOCK(). ajacoutot@ reported additional information, he cannot reproduce the hang when he uses my scheduler diff with a single process queue. > Sprinkling NET_ASSERT_LOCKED() everywhere is possible, however > we still end up with NET_LOCK/UNLOCK chains in the code that > was previously running uninterrupted. It is possible but I don't think it helps. Because not everything need to be protected by the NET_LOCK(). > Does this outweigh the desire not to use recursive rwlocks > anywhere else? It's, in my opinion, too soon to answer this question. Let's wait until recursions are the only problems left.
Re: Recursive NET_LOCK()
On 3 January 2017 at 20:08, Ted Unangstwrote: > Martin Pieuchot wrote: >> It seems that most of the problems exposed by the introduction of the >> NET_LOCK() are related to the non-recursive nature of the rwlock. Some >> known issues involve pflow(4), cloning interfaces an NFS. >> >> Diff below makes use of a recursive-rwlock instead. I just finished a >> build on NFS with it, so I'd like to know if it works for your use case. > > I don't want to interfere with progress, so if you think this is best, carry > on. > > But I'll note that using recursive locks leads to sloppy locking discipline, > which is exactly how we find ourselves in this mess today. Need the lock? Not > sure? Grab it anyway. > > The rrwlock was added for the particular use case of vfs locking, which is > already a recursive cluster fuck, but wasn't supposed to be used for new code. > Of course, the network stack also qualifies as legacy code, so can't object > much to using it there either. > > I imagine you've already thought of this. :) > The big problem with the non-recursive netlock is that to fix recursions in many cases we have to release the netlock in one of the callers up the chain which makes the interim code run without any protection and we don't have a way of assessing the situation short of code inspection which is fallible. Sprinkling NET_ASSERT_LOCKED() everywhere is possible, however we still end up with NET_LOCK/UNLOCK chains in the code that was previously running uninterrupted. Does this outweigh the desire not to use recursive rwlocks anywhere else?
Re: Recursive NET_LOCK()
On Tue, Jan 03, 2017 at 08:21:58PM +0100, Martin Pieuchot wrote: > I am also looking for more feedbacks and/or inputs so I appreciate your > email on the matter. Together with a coworker Zaur Molotnikov we are using static code analysis to find places where a netlock is needed. We try to create possible reverse stack traces: |- if_linkstate |- if_down |- pppoe_ioctl |- if_clone_destroy |- if_downall |- if_setrdomain |- if_clone_create |- ifioctl |- soo_ioctl |- nfs_boot_init |- ifioctl |- soo_ioctl |- nfs_boot_init |- sppp_lcp_down |- sppp_keepalive |- pppdealloc |- pppclose |- trunk_port_destroy |- trunk_clone_destroy |- trunk_port_ifdetach |- trunk_ioctl |- if_up |- pppoe_ioctl |- ifioctl |- soo_ioctl |- nfs_boot_init |- sppp_ioctl |- pppoe_ioctl |- sppp_lcp_tlu |- loioctl |- mpeioctl |- gif_ioctl |- if_linkstate_task We tell the system if_linkstate() needs the lock, then leaves have to provide it somehow. As our implementation is in an early stage, analysis stops at function pointers. At the end it prints traces where it cannot find where the netlock is taken. Analyzing Locks Lock not found: [Node: if_linkstate, Node: if_down, Node: pppoe_ioctl] Lock not found: [Node: if_linkstate, Node: if_down, Node: sppp_lcp_down] Lock not found: [Node: if_linkstate, Node: if_down, Node: sppp_keepalive] Lock not found: [Node: if_linkstate, Node: if_down, Node: pppdealloc, Node: pppclose] Lock not found: [Node: if_linkstate, Node: if_down, Node: trunk_port_destroy, Node: trunk_clone_destroy] Lock not found: [Node: if_linkstate, Node: if_down, Node: trunk_port_destroy, Node: trunk_port_ifdetach] Lock not found: [Node: if_linkstate, Node: if_down, Node: trunk_port_destroy, Node: trunk_ioctl] Lock not found: [Node: if_linkstate, Node: if_up, Node: pppoe_ioctl] Lock not found: [Node: if_linkstate, Node: if_up, Node: sppp_ioctl, Node: pppoe_ioctl] Lock not found: [Node: if_linkstate, Node: if_up, Node: sppp_lcp_tlu] Lock not found: [Node: if_linkstate, Node: if_up, Node: loioctl] Lock not found: [Node: if_linkstate, Node: if_up, Node: mpeioctl] Lock not found: [Node: if_linkstate, Node: if_up, Node: gif_ioctl] Done This is work in progress, I am not sure wether it will work in the end. Now we are concentrating on missing locks, but it could also find recursive locks. bluhm
Re: Recursive NET_LOCK()
> Date: Tue, 3 Jan 2017 17:13:27 +0100 > From: Martin Pieuchot> > It seems that most of the problems exposed by the introduction of the > NET_LOCK() are related to the non-recursive nature of the rwlock. Some > known issues involve pflow(4), cloning interfaces an NFS. > > Diff below makes use of a recursive-rwlock instead. I just finished a > build on NFS with it, so I'd like to know if it works for your use case. > > Please test and report back. This does not fix the chrome hang that I was seeing. > Index: kern/kern_rwlock.c > === > RCS file: /cvs/src/sys/kern/kern_rwlock.c,v > retrieving revision 1.27 > diff -u -p -r1.27 kern_rwlock.c > --- kern/kern_rwlock.c14 Mar 2015 07:33:42 - 1.27 > +++ kern/kern_rwlock.c3 Jan 2017 14:23:56 - > @@ -346,3 +346,23 @@ rrw_status(struct rrwlock *rrwl) > { > return (rw_status(>rrwl_lock)); > } > + > +int > +rrw_release_all(struct rrwlock *rrwl) > +{ > + int rv; > + > + rv = rrwl->rrwl_wcnt; > + KASSERT(rv >= 1); > + rrwl->rrwl_wcnt = 1; > + rrw_exit(rrwl); > + > + return (rv); > +} > + > +void > +rrw_acquire_count(struct rrwlock *rrwl, int count, int flags) > +{ > + while (count--) > + rrw_enter(rrwl, flags); > +} > Index: kern/kern_synch.c > === > RCS file: /cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.135 > diff -u -p -r1.135 kern_synch.c > --- kern/kern_synch.c 13 Sep 2016 08:32:44 - 1.135 > +++ kern/kern_synch.c 3 Jan 2017 14:07:23 - > @@ -231,27 +231,28 @@ msleep(const volatile void *ident, struc > * entered the sleep queue we drop the it. After sleeping we re-lock. > */ > int > -rwsleep(const volatile void *ident, struct rwlock *wl, int priority, > +rrwsleep(const volatile void *ident, struct rrwlock *rrwl, int priority, > const char *wmesg, int timo) > { > struct sleep_state sls; > int error, error1; > + int count; > > KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); > - rw_assert_wrlock(wl); > + KASSERT(rrw_status(rrwl) == RW_WRITE); > > sleep_setup(, ident, priority, wmesg); > sleep_setup_timeout(, timo); > sleep_setup_signal(, priority); > > - rw_exit_write(wl); > + count = rrw_release_all(rrwl); > > sleep_finish(, 1); > error1 = sleep_finish_timeout(); > error = sleep_finish_signal(); > > if ((priority & PNORELOCK) == 0) > - rw_enter_write(wl); > + rrw_acquire_count(rrwl, count, RW_WRITE); > > /* Signal errors are higher priority than timeouts. */ > if (error == 0 && error1 != 0) > Index: kern/uipc_socket.c > === > RCS file: /cvs/src/sys/kern/uipc_socket.c,v > retrieving revision 1.171 > diff -u -p -r1.171 uipc_socket.c > --- kern/uipc_socket.c29 Dec 2016 12:12:43 - 1.171 > +++ kern/uipc_socket.c3 Jan 2017 14:04:42 - > @@ -256,7 +256,7 @@ soclose(struct socket *so) > (so->so_state & SS_NBIO)) > goto drop; > while (so->so_state & SS_ISCONNECTED) { > - error = tsleep(>so_timeo, > + error = rrwsleep(>so_timeo, , > PSOCK | PCATCH, "netcls", > so->so_linger * hz); > if (error) > @@ -1039,7 +1039,7 @@ sorflush(struct socket *so) > struct sockbuf asb; > > sb->sb_flags |= SB_NOINTR; > - (void) sblock(sb, M_WAITOK, NULL); > + (void) sblock(sb, M_WAITOK, ); > socantrcvmore(so); > sbunlock(sb); > asb = *sb; > Index: kern/uipc_socket2.c > === > RCS file: /cvs/src/sys/kern/uipc_socket2.c,v > retrieving revision 1.70 > diff -u -p -r1.70 uipc_socket2.c > --- kern/uipc_socket2.c 29 Dec 2016 12:12:43 - 1.70 > +++ kern/uipc_socket2.c 3 Jan 2017 14:06:54 - > @@ -53,7 +53,7 @@ u_long sb_max = SB_MAX;/* patchable */ > extern struct pool mclpools[]; > extern struct pool mbpool; > > -int sbsleep(struct sockbuf *, struct rwlock *); > +int sbsleep(struct sockbuf *, struct rrwlock *); > > /* > * Procedures to manipulate state flags of socket > @@ -276,18 +276,18 @@ sbwait(struct sockbuf *sb) > NET_ASSERT_LOCKED(); > > sb->sb_flagsintr |= SB_WAIT; > - return (tsleep(>sb_cc, > + return (rrwsleep(>sb_cc, , > (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "netio", > sb->sb_timeo)); > } > > int > -sbsleep(struct sockbuf *sb, struct rwlock *lock) > +sbsleep(struct sockbuf *sb, struct rrwlock *rrwl) > { > int error, prio = (sb->sb_flags & SB_NOINTR) ? PSOCK :
Re: Recursive NET_LOCK()
On 03/01/17(Tue) 14:08, Ted Unangst wrote: > Martin Pieuchot wrote: > > It seems that most of the problems exposed by the introduction of the > > NET_LOCK() are related to the non-recursive nature of the rwlock. Some > > known issues involve pflow(4), cloning interfaces an NFS. > > > > Diff below makes use of a recursive-rwlock instead. I just finished a > > build on NFS with it, so I'd like to know if it works for your use case. > > I don't want to interfere with progress, so if you think this is best, carry > on. > > But I'll note that using recursive locks leads to sloppy locking discipline, > which is exactly how we find ourselves in this mess today. Need the lock? Not > sure? Grab it anyway. > > The rrwlock was added for the particular use case of vfs locking, which is > already a recursive cluster fuck, but wasn't supposed to be used for new code. > Of course, the network stack also qualifies as legacy code, so can't object > much to using it there either. > > I imagine you've already thought of this. :) I did :) To be honest I'm not sure if we should take this direction. For the moment I'm trying to learn as much as possible about the problems we're facing. If it appears to be just sloppy recursive code then we can decide what to do with it and when. I am also looking for more feedbacks and/or inputs so I appreciate your email on the matter.
Re: Recursive NET_LOCK()
Martin Pieuchot wrote: > It seems that most of the problems exposed by the introduction of the > NET_LOCK() are related to the non-recursive nature of the rwlock. Some > known issues involve pflow(4), cloning interfaces an NFS. > > Diff below makes use of a recursive-rwlock instead. I just finished a > build on NFS with it, so I'd like to know if it works for your use case. I don't want to interfere with progress, so if you think this is best, carry on. But I'll note that using recursive locks leads to sloppy locking discipline, which is exactly how we find ourselves in this mess today. Need the lock? Not sure? Grab it anyway. The rrwlock was added for the particular use case of vfs locking, which is already a recursive cluster fuck, but wasn't supposed to be used for new code. Of course, the network stack also qualifies as legacy code, so can't object much to using it there either. I imagine you've already thought of this. :)
Re: Recursive NET_LOCK()
On Tue, Jan 03, 2017 at 05:13:27PM +0100, Martin Pieuchot wrote: > It seems that most of the problems exposed by the introduction of the > NET_LOCK() are related to the non-recursive nature of the rwlock. Some > known issues involve pflow(4), cloning interfaces an NFS. > > Diff below makes use of a recursive-rwlock instead. I just finished a > build on NFS with it, so I'd like to know if it works for your use case. > > Please test and report back. I started to use it on my gateway (which use pflow and which paniced previously). For now, there is no problem, whereas previously panic occurred soon after starting. I will keep it running. Thanks. -- Sebastien Marie
Recursive NET_LOCK()
It seems that most of the problems exposed by the introduction of the NET_LOCK() are related to the non-recursive nature of the rwlock. Some known issues involve pflow(4), cloning interfaces an NFS. Diff below makes use of a recursive-rwlock instead. I just finished a build on NFS with it, so I'd like to know if it works for your use case. Please test and report back. Index: kern/kern_rwlock.c === RCS file: /cvs/src/sys/kern/kern_rwlock.c,v retrieving revision 1.27 diff -u -p -r1.27 kern_rwlock.c --- kern/kern_rwlock.c 14 Mar 2015 07:33:42 - 1.27 +++ kern/kern_rwlock.c 3 Jan 2017 14:23:56 - @@ -346,3 +346,23 @@ rrw_status(struct rrwlock *rrwl) { return (rw_status(>rrwl_lock)); } + +int +rrw_release_all(struct rrwlock *rrwl) +{ + int rv; + + rv = rrwl->rrwl_wcnt; + KASSERT(rv >= 1); + rrwl->rrwl_wcnt = 1; + rrw_exit(rrwl); + + return (rv); +} + +void +rrw_acquire_count(struct rrwlock *rrwl, int count, int flags) +{ + while (count--) + rrw_enter(rrwl, flags); +} Index: kern/kern_synch.c === RCS file: /cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.135 diff -u -p -r1.135 kern_synch.c --- kern/kern_synch.c 13 Sep 2016 08:32:44 - 1.135 +++ kern/kern_synch.c 3 Jan 2017 14:07:23 - @@ -231,27 +231,28 @@ msleep(const volatile void *ident, struc * entered the sleep queue we drop the it. After sleeping we re-lock. */ int -rwsleep(const volatile void *ident, struct rwlock *wl, int priority, +rrwsleep(const volatile void *ident, struct rrwlock *rrwl, int priority, const char *wmesg, int timo) { struct sleep_state sls; int error, error1; + int count; KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); - rw_assert_wrlock(wl); + KASSERT(rrw_status(rrwl) == RW_WRITE); sleep_setup(, ident, priority, wmesg); sleep_setup_timeout(, timo); sleep_setup_signal(, priority); - rw_exit_write(wl); + count = rrw_release_all(rrwl); sleep_finish(, 1); error1 = sleep_finish_timeout(); error = sleep_finish_signal(); if ((priority & PNORELOCK) == 0) - rw_enter_write(wl); + rrw_acquire_count(rrwl, count, RW_WRITE); /* Signal errors are higher priority than timeouts. */ if (error == 0 && error1 != 0) Index: kern/uipc_socket.c === RCS file: /cvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.171 diff -u -p -r1.171 uipc_socket.c --- kern/uipc_socket.c 29 Dec 2016 12:12:43 - 1.171 +++ kern/uipc_socket.c 3 Jan 2017 14:04:42 - @@ -256,7 +256,7 @@ soclose(struct socket *so) (so->so_state & SS_NBIO)) goto drop; while (so->so_state & SS_ISCONNECTED) { - error = tsleep(>so_timeo, + error = rrwsleep(>so_timeo, , PSOCK | PCATCH, "netcls", so->so_linger * hz); if (error) @@ -1039,7 +1039,7 @@ sorflush(struct socket *so) struct sockbuf asb; sb->sb_flags |= SB_NOINTR; - (void) sblock(sb, M_WAITOK, NULL); + (void) sblock(sb, M_WAITOK, ); socantrcvmore(so); sbunlock(sb); asb = *sb; Index: kern/uipc_socket2.c === RCS file: /cvs/src/sys/kern/uipc_socket2.c,v retrieving revision 1.70 diff -u -p -r1.70 uipc_socket2.c --- kern/uipc_socket2.c 29 Dec 2016 12:12:43 - 1.70 +++ kern/uipc_socket2.c 3 Jan 2017 14:06:54 - @@ -53,7 +53,7 @@ u_longsb_max = SB_MAX;/* patchable */ extern struct pool mclpools[]; extern struct pool mbpool; -int sbsleep(struct sockbuf *, struct rwlock *); +int sbsleep(struct sockbuf *, struct rrwlock *); /* * Procedures to manipulate state flags of socket @@ -276,18 +276,18 @@ sbwait(struct sockbuf *sb) NET_ASSERT_LOCKED(); sb->sb_flagsintr |= SB_WAIT; - return (tsleep(>sb_cc, + return (rrwsleep(>sb_cc, , (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "netio", sb->sb_timeo)); } int -sbsleep(struct sockbuf *sb, struct rwlock *lock) +sbsleep(struct sockbuf *sb, struct rrwlock *rrwl) { int error, prio = (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH; - if (lock != NULL) - error = rwsleep(>sb_flags, lock, prio, "netlck", 0); + if (rrwl != NULL) + error = rrwsleep(>sb_flags, rrwl, prio, "netlck", 0); else error = tsleep(>sb_flags, prio, "netlck", 0); @@ -295,7 +295,7 @@ sbsleep(struct sockbuf *sb, struct