Re: Recursive NET_LOCK()

2017-01-05 Thread Ted Unangst
Mike Belopuhov wrote:
> The big problem with the non-recursive netlock is that to fix
> recursions in many cases we have to release the netlock in one
> of the callers up the chain which makes the interim code run
> without any protection and we don't have a way of assessing
> the situation short of code inspection which is fallible.
> 
> Sprinkling NET_ASSERT_LOCKED() everywhere is possible, however
> we still end up with NET_LOCK/UNLOCK chains in the code that
> was previously running uninterrupted.
> 
> Does this outweigh the desire not to use recursive rwlocks
> anywhere else?

I can see the danger of breaking atomic sections into two parts. However,
having these long critical sections increases the chance of introducing lock
ordering bugs. Code A calls code B calls code C. A and C need the netlock, but
code B requires some other lock?




Re: Recursive NET_LOCK()

2017-01-05 Thread Martin Pieuchot
On 05/01/17(Thu) 16:09, Mike Belopuhov wrote:
> [...] 
> The big problem with the non-recursive netlock is that to fix
> recursions in many cases we have to release the netlock in one
> of the callers up the chain which makes the interim code run
> without any protection and we don't have a way of assessing
> the situation short of code inspection which is fallible.

No, that's not to fix the recursions.  That's to know where they are.

Nobody came up with a solution for the recursions so far.  I doubt
there's a common solution to all the code paths.  That's something
I'd like to think about in a second step.

The current problem is still unknown, we just know that using chrome
hang the machine with either flavor of the NET_LOCK().  ajacoutot@
reported additional information, he cannot reproduce the hang when
he uses my scheduler diff with a single process queue.

> Sprinkling NET_ASSERT_LOCKED() everywhere is possible, however
> we still end up with NET_LOCK/UNLOCK chains in the code that
> was previously running uninterrupted.

It is possible but I don't think it helps.  Because not everything
need to be protected by the NET_LOCK().

> Does this outweigh the desire not to use recursive rwlocks
> anywhere else?

It's, in my opinion, too soon to answer this question.  Let's wait until
recursions are the only problems left.



Re: Recursive NET_LOCK()

2017-01-05 Thread Mike Belopuhov
On 3 January 2017 at 20:08, Ted Unangst  wrote:
> Martin Pieuchot wrote:
>> It seems that most of the problems exposed by the introduction of the
>> NET_LOCK() are related to the non-recursive nature of the rwlock.  Some
>> known issues involve pflow(4), cloning interfaces an NFS.
>>
>> Diff below makes use of a recursive-rwlock instead.  I just finished a
>> build on NFS with it, so I'd like to know if it works for your use case.
>
> I don't want to interfere with progress, so if you think this is best, carry
> on.
>
> But I'll note that using recursive locks leads to sloppy locking discipline,
> which is exactly how we find ourselves in this mess today. Need the lock? Not
> sure? Grab it anyway.
>
> The rrwlock was added for the particular use case of vfs locking, which is
> already a recursive cluster fuck, but wasn't supposed to be used for new code.
> Of course, the network stack also qualifies as legacy code, so can't object
> much to using it there either.
>
> I imagine you've already thought of this. :)
>

The big problem with the non-recursive netlock is that to fix
recursions in many cases we have to release the netlock in one
of the callers up the chain which makes the interim code run
without any protection and we don't have a way of assessing
the situation short of code inspection which is fallible.

Sprinkling NET_ASSERT_LOCKED() everywhere is possible, however
we still end up with NET_LOCK/UNLOCK chains in the code that
was previously running uninterrupted.

Does this outweigh the desire not to use recursive rwlocks
anywhere else?



Re: Recursive NET_LOCK()

2017-01-04 Thread Alexander Bluhm
On Tue, Jan 03, 2017 at 08:21:58PM +0100, Martin Pieuchot wrote:
> I am also looking for more feedbacks and/or inputs so I appreciate your
> email on the matter.

Together with a coworker Zaur Molotnikov we are using static code
analysis to find places where a netlock is needed.  We try to create
possible reverse stack traces:

 |- if_linkstate
|- if_down
   |- pppoe_ioctl
   |- if_clone_destroy
   |- if_downall
   |- if_setrdomain
  |- if_clone_create
  |- ifioctl
 |- soo_ioctl
 |- nfs_boot_init
   |- ifioctl
  |- soo_ioctl
  |- nfs_boot_init
   |- sppp_lcp_down
   |- sppp_keepalive
   |- pppdealloc
  |- pppclose
   |- trunk_port_destroy
  |- trunk_clone_destroy
  |- trunk_port_ifdetach
  |- trunk_ioctl
|- if_up
   |- pppoe_ioctl
   |- ifioctl
  |- soo_ioctl
  |- nfs_boot_init
   |- sppp_ioctl
  |- pppoe_ioctl
   |- sppp_lcp_tlu
   |- loioctl
   |- mpeioctl
   |- gif_ioctl
|- if_linkstate_task

We tell the system if_linkstate() needs the lock, then leaves have
to provide it somehow.  As our implementation is in an early stage,
analysis stops at function pointers.

At the end it prints traces where it cannot find where the netlock
is taken.

Analyzing Locks
Lock not found:  [Node: if_linkstate, Node: if_down, Node: pppoe_ioctl]
Lock not found:  [Node: if_linkstate, Node: if_down, Node: sppp_lcp_down]
Lock not found:  [Node: if_linkstate, Node: if_down, Node: sppp_keepalive]
Lock not found:  [Node: if_linkstate, Node: if_down, Node: pppdealloc, Node: 
pppclose]
Lock not found:  [Node: if_linkstate, Node: if_down, Node: trunk_port_destroy, 
Node: trunk_clone_destroy]
Lock not found:  [Node: if_linkstate, Node: if_down, Node: trunk_port_destroy, 
Node: trunk_port_ifdetach]
Lock not found:  [Node: if_linkstate, Node: if_down, Node: trunk_port_destroy, 
Node: trunk_ioctl]
Lock not found:  [Node: if_linkstate, Node: if_up, Node: pppoe_ioctl]
Lock not found:  [Node: if_linkstate, Node: if_up, Node: sppp_ioctl, Node: 
pppoe_ioctl]
Lock not found:  [Node: if_linkstate, Node: if_up, Node: sppp_lcp_tlu]
Lock not found:  [Node: if_linkstate, Node: if_up, Node: loioctl]
Lock not found:  [Node: if_linkstate, Node: if_up, Node: mpeioctl]
Lock not found:  [Node: if_linkstate, Node: if_up, Node: gif_ioctl]
Done

This is work in progress, I am not sure wether it will work in the
end.  Now we are concentrating on missing locks, but it could also
find recursive locks.

bluhm



Re: Recursive NET_LOCK()

2017-01-03 Thread Mark Kettenis
> Date: Tue, 3 Jan 2017 17:13:27 +0100
> From: Martin Pieuchot 
> 
> It seems that most of the problems exposed by the introduction of the
> NET_LOCK() are related to the non-recursive nature of the rwlock.  Some
> known issues involve pflow(4), cloning interfaces an NFS.
> 
> Diff below makes use of a recursive-rwlock instead.  I just finished a
> build on NFS with it, so I'd like to know if it works for your use case.
> 
> Please test and report back.

This does not fix the chrome hang that I was seeing.

> Index: kern/kern_rwlock.c
> ===
> RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
> retrieving revision 1.27
> diff -u -p -r1.27 kern_rwlock.c
> --- kern/kern_rwlock.c14 Mar 2015 07:33:42 -  1.27
> +++ kern/kern_rwlock.c3 Jan 2017 14:23:56 -
> @@ -346,3 +346,23 @@ rrw_status(struct rrwlock *rrwl)
>  {
>   return (rw_status(>rrwl_lock));
>  }
> +
> +int
> +rrw_release_all(struct rrwlock *rrwl)
> +{
> + int rv;
> +
> + rv = rrwl->rrwl_wcnt;
> + KASSERT(rv >= 1);
> + rrwl->rrwl_wcnt = 1;
> + rrw_exit(rrwl);
> +
> + return (rv);
> +}
> +
> +void
> +rrw_acquire_count(struct rrwlock *rrwl, int count, int flags)
> +{
> + while (count--)
> + rrw_enter(rrwl, flags);
> +}
> Index: kern/kern_synch.c
> ===
> RCS file: /cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.135
> diff -u -p -r1.135 kern_synch.c
> --- kern/kern_synch.c 13 Sep 2016 08:32:44 -  1.135
> +++ kern/kern_synch.c 3 Jan 2017 14:07:23 -
> @@ -231,27 +231,28 @@ msleep(const volatile void *ident, struc
>   * entered the sleep queue we drop the it. After sleeping we re-lock.
>   */
>  int
> -rwsleep(const volatile void *ident, struct rwlock *wl, int priority,
> +rrwsleep(const volatile void *ident, struct rrwlock *rrwl, int priority,
>  const char *wmesg, int timo)
>  {
>   struct sleep_state sls;
>   int error, error1;
> + int count;
>  
>   KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
> - rw_assert_wrlock(wl);
> + KASSERT(rrw_status(rrwl) == RW_WRITE);
>  
>   sleep_setup(, ident, priority, wmesg);
>   sleep_setup_timeout(, timo);
>   sleep_setup_signal(, priority);
>  
> - rw_exit_write(wl);
> + count = rrw_release_all(rrwl);
>  
>   sleep_finish(, 1);
>   error1 = sleep_finish_timeout();
>   error = sleep_finish_signal();
>  
>   if ((priority & PNORELOCK) == 0)
> - rw_enter_write(wl);
> + rrw_acquire_count(rrwl, count, RW_WRITE);
>  
>   /* Signal errors are higher priority than timeouts. */
>   if (error == 0 && error1 != 0)
> Index: kern/uipc_socket.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_socket.c,v
> retrieving revision 1.171
> diff -u -p -r1.171 uipc_socket.c
> --- kern/uipc_socket.c29 Dec 2016 12:12:43 -  1.171
> +++ kern/uipc_socket.c3 Jan 2017 14:04:42 -
> @@ -256,7 +256,7 @@ soclose(struct socket *so)
>   (so->so_state & SS_NBIO))
>   goto drop;
>   while (so->so_state & SS_ISCONNECTED) {
> - error = tsleep(>so_timeo,
> + error = rrwsleep(>so_timeo, ,
>   PSOCK | PCATCH, "netcls",
>   so->so_linger * hz);
>   if (error)
> @@ -1039,7 +1039,7 @@ sorflush(struct socket *so)
>   struct sockbuf asb;
>  
>   sb->sb_flags |= SB_NOINTR;
> - (void) sblock(sb, M_WAITOK, NULL);
> + (void) sblock(sb, M_WAITOK, );
>   socantrcvmore(so);
>   sbunlock(sb);
>   asb = *sb;
> Index: kern/uipc_socket2.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_socket2.c,v
> retrieving revision 1.70
> diff -u -p -r1.70 uipc_socket2.c
> --- kern/uipc_socket2.c   29 Dec 2016 12:12:43 -  1.70
> +++ kern/uipc_socket2.c   3 Jan 2017 14:06:54 -
> @@ -53,7 +53,7 @@ u_long  sb_max = SB_MAX;/* patchable */
>  extern struct pool mclpools[];
>  extern struct pool mbpool;
>  
> -int sbsleep(struct sockbuf *, struct rwlock *);
> +int sbsleep(struct sockbuf *, struct rrwlock *);
>  
>  /*
>   * Procedures to manipulate state flags of socket
> @@ -276,18 +276,18 @@ sbwait(struct sockbuf *sb)
>   NET_ASSERT_LOCKED();
>  
>   sb->sb_flagsintr |= SB_WAIT;
> - return (tsleep(>sb_cc,
> + return (rrwsleep(>sb_cc, ,
>   (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "netio",
>   sb->sb_timeo));
>  }
>  
>  int
> -sbsleep(struct sockbuf *sb, struct rwlock *lock)
> +sbsleep(struct sockbuf *sb, struct rrwlock *rrwl)
>  {
>   int error, prio = (sb->sb_flags & SB_NOINTR) ? PSOCK : 

Re: Recursive NET_LOCK()

2017-01-03 Thread Martin Pieuchot
On 03/01/17(Tue) 14:08, Ted Unangst wrote:
> Martin Pieuchot wrote:
> > It seems that most of the problems exposed by the introduction of the
> > NET_LOCK() are related to the non-recursive nature of the rwlock.  Some
> > known issues involve pflow(4), cloning interfaces an NFS.
> > 
> > Diff below makes use of a recursive-rwlock instead.  I just finished a
> > build on NFS with it, so I'd like to know if it works for your use case.
> 
> I don't want to interfere with progress, so if you think this is best, carry
> on.
> 
> But I'll note that using recursive locks leads to sloppy locking discipline,
> which is exactly how we find ourselves in this mess today. Need the lock? Not
> sure? Grab it anyway.
> 
> The rrwlock was added for the particular use case of vfs locking, which is
> already a recursive cluster fuck, but wasn't supposed to be used for new code.
> Of course, the network stack also qualifies as legacy code, so can't object
> much to using it there either.
> 
> I imagine you've already thought of this. :)

I did :) 

To be honest I'm not sure if we should take this direction.  For the
moment I'm trying to learn as much as possible about the problems we're
facing.  If it appears to be just sloppy recursive code then we can
decide what to do with it and when.

I am also looking for more feedbacks and/or inputs so I appreciate your
email on the matter.



Re: Recursive NET_LOCK()

2017-01-03 Thread Ted Unangst
Martin Pieuchot wrote:
> It seems that most of the problems exposed by the introduction of the
> NET_LOCK() are related to the non-recursive nature of the rwlock.  Some
> known issues involve pflow(4), cloning interfaces an NFS.
> 
> Diff below makes use of a recursive-rwlock instead.  I just finished a
> build on NFS with it, so I'd like to know if it works for your use case.

I don't want to interfere with progress, so if you think this is best, carry
on.

But I'll note that using recursive locks leads to sloppy locking discipline,
which is exactly how we find ourselves in this mess today. Need the lock? Not
sure? Grab it anyway.

The rrwlock was added for the particular use case of vfs locking, which is
already a recursive cluster fuck, but wasn't supposed to be used for new code.
Of course, the network stack also qualifies as legacy code, so can't object
much to using it there either.

I imagine you've already thought of this. :)



Re: Recursive NET_LOCK()

2017-01-03 Thread Sebastien Marie
On Tue, Jan 03, 2017 at 05:13:27PM +0100, Martin Pieuchot wrote:
> It seems that most of the problems exposed by the introduction of the
> NET_LOCK() are related to the non-recursive nature of the rwlock.  Some
> known issues involve pflow(4), cloning interfaces an NFS.
> 
> Diff below makes use of a recursive-rwlock instead.  I just finished a
> build on NFS with it, so I'd like to know if it works for your use case.
> 
> Please test and report back.

I started to use it on my gateway (which use pflow and which paniced
previously).

For now, there is no problem, whereas previously panic occurred soon
after starting.

I will keep it running.

Thanks.
-- 
Sebastien Marie



Recursive NET_LOCK()

2017-01-03 Thread Martin Pieuchot
It seems that most of the problems exposed by the introduction of the
NET_LOCK() are related to the non-recursive nature of the rwlock.  Some
known issues involve pflow(4), cloning interfaces an NFS.

Diff below makes use of a recursive-rwlock instead.  I just finished a
build on NFS with it, so I'd like to know if it works for your use case.

Please test and report back.

Index: kern/kern_rwlock.c
===
RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
retrieving revision 1.27
diff -u -p -r1.27 kern_rwlock.c
--- kern/kern_rwlock.c  14 Mar 2015 07:33:42 -  1.27
+++ kern/kern_rwlock.c  3 Jan 2017 14:23:56 -
@@ -346,3 +346,23 @@ rrw_status(struct rrwlock *rrwl)
 {
return (rw_status(>rrwl_lock));
 }
+
+int
+rrw_release_all(struct rrwlock *rrwl)
+{
+   int rv;
+
+   rv = rrwl->rrwl_wcnt;
+   KASSERT(rv >= 1);
+   rrwl->rrwl_wcnt = 1;
+   rrw_exit(rrwl);
+
+   return (rv);
+}
+
+void
+rrw_acquire_count(struct rrwlock *rrwl, int count, int flags)
+{
+   while (count--)
+   rrw_enter(rrwl, flags);
+}
Index: kern/kern_synch.c
===
RCS file: /cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.135
diff -u -p -r1.135 kern_synch.c
--- kern/kern_synch.c   13 Sep 2016 08:32:44 -  1.135
+++ kern/kern_synch.c   3 Jan 2017 14:07:23 -
@@ -231,27 +231,28 @@ msleep(const volatile void *ident, struc
  * entered the sleep queue we drop the it. After sleeping we re-lock.
  */
 int
-rwsleep(const volatile void *ident, struct rwlock *wl, int priority,
+rrwsleep(const volatile void *ident, struct rrwlock *rrwl, int priority,
 const char *wmesg, int timo)
 {
struct sleep_state sls;
int error, error1;
+   int count;
 
KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0);
-   rw_assert_wrlock(wl);
+   KASSERT(rrw_status(rrwl) == RW_WRITE);
 
sleep_setup(, ident, priority, wmesg);
sleep_setup_timeout(, timo);
sleep_setup_signal(, priority);
 
-   rw_exit_write(wl);
+   count = rrw_release_all(rrwl);
 
sleep_finish(, 1);
error1 = sleep_finish_timeout();
error = sleep_finish_signal();
 
if ((priority & PNORELOCK) == 0)
-   rw_enter_write(wl);
+   rrw_acquire_count(rrwl, count, RW_WRITE);
 
/* Signal errors are higher priority than timeouts. */
if (error == 0 && error1 != 0)
Index: kern/uipc_socket.c
===
RCS file: /cvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.171
diff -u -p -r1.171 uipc_socket.c
--- kern/uipc_socket.c  29 Dec 2016 12:12:43 -  1.171
+++ kern/uipc_socket.c  3 Jan 2017 14:04:42 -
@@ -256,7 +256,7 @@ soclose(struct socket *so)
(so->so_state & SS_NBIO))
goto drop;
while (so->so_state & SS_ISCONNECTED) {
-   error = tsleep(>so_timeo,
+   error = rrwsleep(>so_timeo, ,
PSOCK | PCATCH, "netcls",
so->so_linger * hz);
if (error)
@@ -1039,7 +1039,7 @@ sorflush(struct socket *so)
struct sockbuf asb;
 
sb->sb_flags |= SB_NOINTR;
-   (void) sblock(sb, M_WAITOK, NULL);
+   (void) sblock(sb, M_WAITOK, );
socantrcvmore(so);
sbunlock(sb);
asb = *sb;
Index: kern/uipc_socket2.c
===
RCS file: /cvs/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.70
diff -u -p -r1.70 uipc_socket2.c
--- kern/uipc_socket2.c 29 Dec 2016 12:12:43 -  1.70
+++ kern/uipc_socket2.c 3 Jan 2017 14:06:54 -
@@ -53,7 +53,7 @@ u_longsb_max = SB_MAX;/* patchable */
 extern struct pool mclpools[];
 extern struct pool mbpool;
 
-int sbsleep(struct sockbuf *, struct rwlock *);
+int sbsleep(struct sockbuf *, struct rrwlock *);
 
 /*
  * Procedures to manipulate state flags of socket
@@ -276,18 +276,18 @@ sbwait(struct sockbuf *sb)
NET_ASSERT_LOCKED();
 
sb->sb_flagsintr |= SB_WAIT;
-   return (tsleep(>sb_cc,
+   return (rrwsleep(>sb_cc, ,
(sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "netio",
sb->sb_timeo));
 }
 
 int
-sbsleep(struct sockbuf *sb, struct rwlock *lock)
+sbsleep(struct sockbuf *sb, struct rrwlock *rrwl)
 {
int error, prio = (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH;
 
-   if (lock != NULL)
-   error = rwsleep(>sb_flags, lock, prio, "netlck", 0);
+   if (rrwl != NULL)
+   error = rrwsleep(>sb_flags, rrwl, prio, "netlck", 0);
else
error = tsleep(>sb_flags, prio, "netlck", 0);
 
@@ -295,7 +295,7 @@ sbsleep(struct sockbuf *sb, struct