Re: Use SMR instead of SRP list in rtsock.c

2022-08-10 Thread Visa Hankala
On Wed, Aug 10, 2022 at 11:08:06AM +0200, Claudio Jeker wrote:
> On Fri, Jul 01, 2022 at 04:03:21PM +, Visa Hankala wrote:
> > On Fri, Jul 01, 2022 at 09:59:11AM +0200, Claudio Jeker wrote:
> > > On Thu, Jun 30, 2022 at 03:46:35PM +, Visa Hankala wrote:
> > > > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> > > > > After discussing this with mpi@ and jmatthew@ we came to the 
> > > > > conclusion
> > > > > that we need to smr_barrier() before refcnt_finalize() to ensure that 
> > > > > no
> > > > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> > > > > smr_read_leave().
> > > > 
> > > > [...]
> > > > 
> > > > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
> > > > >   return;
> > > > >   }
> > > > >  
> > > > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> > > > > + smr_read_enter();
> > > > > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
> > > > >   /*
> > > > >* If route socket is bound to an address family only 
> > > > > send
> > > > >* messages that match the address family. Address 
> > > > > family
> > > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > > > >   rop->rop_proto != sa_family)
> > > > >   continue;
> > > > >  
> > > > > -
> > > > > + refcnt_take(&rop->rop_refcnt);
> > > > > + smr_read_leave();
> > > > >   so = rop->rop_socket;
> > > > >   solock(so);
> > > > >  
> > > > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
> > > > >   rtm_sendup(so, m);
> > > > >  next:
> > > > >   sounlock(so);
> > > > > + smr_read_enter();
> > > > > + refcnt_rele_wake(&rop->rop_refcnt);
> > > > 
> > > > This does not look correct.
> > > > 
> > > > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
> > > > might drop the final reference and this thread can no longer access
> > > > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).
> > > > 
> > > > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
> > > > smr_read_leave(). After this thread leaves the read-side critical
> > > > section, another thread might free rop's successor.
> > > 
> > > So we need to either smr_barrier() before and after the refcnt_finalize()
> > > to make sure that the rop pointer remains stable in both cases or we alter
> > > the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before
> > > refcnt_rele_wake().
> > > 
> > > While the double smr_barrier() is trivial it is not ideal and I think it
> > > is better to adjust the loop since SMR loops with sleep points is a
> > > somewhat common issue and so we should have a good clear way on how to
> > > solve it.
> > 
> > Adjusting SMR_TAILQ_FOREACH() will not help.
> > 
> > In general, a reader cannot resume a lockless iteration after it has
> > left the read-side critical section and crossed a sleep point. The
> > guarantee of consistent(-looking) forward linkage is gone. The reader
> > no longer knows if the value of SMR_TAILQ_NEXT() is valid. If the
> > reader wants to continue with the list, it has to re-enter the read-side
> > critical section and restart the iteration.
> 
> This is not a real SMR_TAILQ_FOREACH() use case so trying to use
> SMR_TAILQ_FOREACH() here is not right. The code wants to walk the list of
> route pcbs linked via rop_list. The code just needs to walk all active
> connections and does not care about races with sockets that are concurrently
> closed or opened. In the first case SS_CANTRCVMORE will be set and the
> socket is skipped and in the second case the socket is simply ignored
> because new sockets are inserted at the head of the list.
>  
> It is not a lockless iteration over the full list. It is not required to
> be either. The only thing that matters is that the forward linkage is
> consitent (not pointing to invalid objects).
> 
> There is no need to restart the iteration because element on the list can
> not be reinserted. They can only be removed and a removed element does not
> get any message anyway (either by not visiting the object or by skipping
> it in the loop).
> 
> The refcnt ensures that the currently used pcb is not freed before the
> next element is picked. As long as the refcnt is hold the object can't be
> removed.

Lets assume that another thread begins to detach rop while
route_input() sleeps. The reference prevents early freeing of rop.

Lets assume further that yet another thread detaches and frees the
successor of rop while the first thread is still sleeping. What will
SMR_LIST_NEXT(rop, rop_list) point to?



Re: Use SMR instead of SRP list in rtsock.c

2022-08-10 Thread Claudio Jeker
On Fri, Jul 01, 2022 at 04:03:21PM +, Visa Hankala wrote:
> On Fri, Jul 01, 2022 at 09:59:11AM +0200, Claudio Jeker wrote:
> > On Thu, Jun 30, 2022 at 03:46:35PM +, Visa Hankala wrote:
> > > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> > > > After discussing this with mpi@ and jmatthew@ we came to the conclusion
> > > > that we need to smr_barrier() before refcnt_finalize() to ensure that no
> > > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> > > > smr_read_leave().
> > > 
> > > [...]
> > > 
> > > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
> > > > return;
> > > > }
> > > >  
> > > > -   SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> > > > +   smr_read_enter();
> > > > +   SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
> > > > /*
> > > >  * If route socket is bound to an address family only 
> > > > send
> > > >  * messages that match the address family. Address 
> > > > family
> > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > > > rop->rop_proto != sa_family)
> > > > continue;
> > > >  
> > > > -
> > > > +   refcnt_take(&rop->rop_refcnt);
> > > > +   smr_read_leave();
> > > > so = rop->rop_socket;
> > > > solock(so);
> > > >  
> > > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
> > > > rtm_sendup(so, m);
> > > >  next:
> > > > sounlock(so);
> > > > +   smr_read_enter();
> > > > +   refcnt_rele_wake(&rop->rop_refcnt);
> > > 
> > > This does not look correct.
> > > 
> > > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
> > > might drop the final reference and this thread can no longer access
> > > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).
> > > 
> > > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
> > > smr_read_leave(). After this thread leaves the read-side critical
> > > section, another thread might free rop's successor.
> > 
> > So we need to either smr_barrier() before and after the refcnt_finalize()
> > to make sure that the rop pointer remains stable in both cases or we alter
> > the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before
> > refcnt_rele_wake().
> > 
> > While the double smr_barrier() is trivial it is not ideal and I think it
> > is better to adjust the loop since SMR loops with sleep points is a
> > somewhat common issue and so we should have a good clear way on how to
> > solve it.
> 
> Adjusting SMR_TAILQ_FOREACH() will not help.
> 
> In general, a reader cannot resume a lockless iteration after it has
> left the read-side critical section and crossed a sleep point. The
> guarantee of consistent(-looking) forward linkage is gone. The reader
> no longer knows if the value of SMR_TAILQ_NEXT() is valid. If the
> reader wants to continue with the list, it has to re-enter the read-side
> critical section and restart the iteration.

This is not a real SMR_TAILQ_FOREACH() use case so trying to use
SMR_TAILQ_FOREACH() here is not right. The code wants to walk the list of
route pcbs linked via rop_list. The code just needs to walk all active
connections and does not care about races with sockets that are concurrently
closed or opened. In the first case SS_CANTRCVMORE will be set and the
socket is skipped and in the second case the socket is simply ignored
because new sockets are inserted at the head of the list.
 
It is not a lockless iteration over the full list. It is not required to
be either. The only thing that matters is that the forward linkage is
consitent (not pointing to invalid objects).

There is no need to restart the iteration because element on the list can
not be reinserted. They can only be removed and a removed element does not
get any message anyway (either by not visiting the object or by skipping
it in the loop).

The refcnt ensures that the currently used pcb is not freed before the
next element is picked. As long as the refcnt is hold the object can't be
removed.
 
> I guess I should finish the sleepable variant of SMR that I was
> tinkering with long ago...

I switched from SMR_TAILQ to SMR_LIST since there is no need to access the
tail. Also the code uses just SMR_LIST_FIRST() and SMR_LIST_NEXT() and not
SMR_LIST_FOREACH().

-- 
:wq Claudio

Index: rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.334
diff -u -p -r1.334 rtsock.c
--- rtsock.c28 Jun 2022 10:01:13 -  1.334
+++ rtsock.c10 Aug 2022 08:43:31 -
@@ -71,7 +71,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -107,8 +107,6 @@ struct walkarg {
 };
 
 void   route_prinit(void);
-void   rcb_ref(void *, void *);
-void   rcb_unref(void *, voi

Re: Use SMR instead of SRP list in rtsock.c

2022-07-01 Thread Visa Hankala
On Fri, Jul 01, 2022 at 09:59:11AM +0200, Claudio Jeker wrote:
> On Thu, Jun 30, 2022 at 03:46:35PM +, Visa Hankala wrote:
> > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> > > After discussing this with mpi@ and jmatthew@ we came to the conclusion
> > > that we need to smr_barrier() before refcnt_finalize() to ensure that no
> > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> > > smr_read_leave().
> > 
> > [...]
> > 
> > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
> > >   return;
> > >   }
> > >  
> > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> > > + smr_read_enter();
> > > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
> > >   /*
> > >* If route socket is bound to an address family only send
> > >* messages that match the address family. Address family
> > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > >   rop->rop_proto != sa_family)
> > >   continue;
> > >  
> > > -
> > > + refcnt_take(&rop->rop_refcnt);
> > > + smr_read_leave();
> > >   so = rop->rop_socket;
> > >   solock(so);
> > >  
> > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
> > >   rtm_sendup(so, m);
> > >  next:
> > >   sounlock(so);
> > > + smr_read_enter();
> > > + refcnt_rele_wake(&rop->rop_refcnt);
> > 
> > This does not look correct.
> > 
> > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
> > might drop the final reference and this thread can no longer access
> > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).
> > 
> > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
> > smr_read_leave(). After this thread leaves the read-side critical
> > section, another thread might free rop's successor.
> 
> So we need to either smr_barrier() before and after the refcnt_finalize()
> to make sure that the rop pointer remains stable in both cases or we alter
> the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before
> refcnt_rele_wake().
> 
> While the double smr_barrier() is trivial it is not ideal and I think it
> is better to adjust the loop since SMR loops with sleep points is a
> somewhat common issue and so we should have a good clear way on how to
> solve it.

Adjusting SMR_TAILQ_FOREACH() will not help.

In general, a reader cannot resume a lockless iteration after it has
left the read-side critical section and crossed a sleep point. The
guarantee of consistent(-looking) forward linkage is gone. The reader
no longer knows if the value of SMR_TAILQ_NEXT() is valid. If the
reader wants to continue with the list, it has to re-enter the read-side
critical section and restart the iteration.


I guess I should finish the sleepable variant of SMR that I was
tinkering with long ago...



Re: Use SMR instead of SRP list in rtsock.c

2022-07-01 Thread Claudio Jeker
On Thu, Jun 30, 2022 at 03:46:35PM +, Visa Hankala wrote:
> On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> > After discussing this with mpi@ and jmatthew@ we came to the conclusion
> > that we need to smr_barrier() before refcnt_finalize() to ensure that no
> > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> > smr_read_leave().
> 
> [...]
> 
> > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
> > return;
> > }
> >  
> > -   SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> > +   smr_read_enter();
> > +   SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
> > /*
> >  * If route socket is bound to an address family only send
> >  * messages that match the address family. Address family
> > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > rop->rop_proto != sa_family)
> > continue;
> >  
> > -
> > +   refcnt_take(&rop->rop_refcnt);
> > +   smr_read_leave();
> > so = rop->rop_socket;
> > solock(so);
> >  
> > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
> > rtm_sendup(so, m);
> >  next:
> > sounlock(so);
> > +   smr_read_enter();
> > +   refcnt_rele_wake(&rop->rop_refcnt);
> 
> This does not look correct.
> 
> smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
> might drop the final reference and this thread can no longer access
> rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).
> 
> Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
> smr_read_leave(). After this thread leaves the read-side critical
> section, another thread might free rop's successor.

So we need to either smr_barrier() before and after the refcnt_finalize()
to make sure that the rop pointer remains stable in both cases or we alter
the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before
refcnt_rele_wake().

While the double smr_barrier() is trivial it is not ideal and I think it
is better to adjust the loop since SMR loops with sleep points is a
somewhat common issue and so we should have a good clear way on how to
solve it.

I'll send out a new diff next week
-- 
:wq Claudio



Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Visa Hankala
On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> After discussing this with mpi@ and jmatthew@ we came to the conclusion
> that we need to smr_barrier() before refcnt_finalize() to ensure that no
> other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> smr_read_leave().

[...]

> @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
>   return;
>   }
>  
> - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> + smr_read_enter();
> + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
>   /*
>* If route socket is bound to an address family only send
>* messages that match the address family. Address family
> @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
>   rop->rop_proto != sa_family)
>   continue;
>  
> -
> + refcnt_take(&rop->rop_refcnt);
> + smr_read_leave();
>   so = rop->rop_socket;
>   solock(so);
>  
> @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
>   rtm_sendup(so, m);
>  next:
>   sounlock(so);
> + smr_read_enter();
> + refcnt_rele_wake(&rop->rop_refcnt);

This does not look correct.

smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
might drop the final reference and this thread can no longer access
rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).

Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
smr_read_leave(). After this thread leaves the read-side critical
section, another thread might free rop's successor.



Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Vitaliy Makkoveev
On Thu, Jun 30, 2022 at 02:32:03PM +0200, Claudio Jeker wrote:
> On Thu, Jun 30, 2022 at 03:21:40PM +0300, Vitaliy Makkoveev wrote:
> > On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote:
> > > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote:
> > > > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> > > > > This diff converts the SRP list to a SMR list in rtsock.c
> > > > > SRP is a bit strange with how it works and the SMR code is a bit 
> > > > > easier to
> > > > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to 
> > > > > grab
> > > > > a refcount on the route pcb so that we can leave the SMR critical 
> > > > > section
> > > > > and then enter the SMR critical section at the end of the loop before
> > > > > dropping the refcount again.
> > > > > 
> > > > > The diff does not immeditaly explode but I doubt we can exploit
> > > > > parallelism in route_input() so this may fail at some later stage if 
> > > > > it is
> > > > > wrong.
> > > > > 
> > > > > Comments from the lock critics welcome
> > > > 
> > > > We use `so_lock' rwlock(9) to protect route domain sockets. We can't
> > > > convert this SRP list to SMR list because we call solock() within
> > > > foreach loop.
> > > 
> > > because of the so_lock the code uses a refcnt on the route pcb to make
> > > sure that the object is not freed while we sleep. So that is handled by
> > > this diff.
> > >  
> > > > We can easily crash kernel by running in parallel some "route monitor"
> > > > commands and "while true; ifconfig vether0 create ; ifconfig vether0
> > > > destroy; done".
> > > 
> > > That does not cause problem on my system.
> > >  
> > 
> > Sorry, I missed you leave SMR section before solock() call:
> > 
> > > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > > > >   rop->rop_proto != sa_family)
> > > > >   continue;
> > > > >  
> > > > > -
> > > > > + refcnt_take(&rop->rop_refcnt);
> > > > > + smr_read_leave();
> > > > >   so = rop->rop_socket;
> > > > >   solock(so);
> > > > >  
> > 
> > My system is stable with the second version of this diff and the
> > following test.
> 
> Cool but I fear that the kernel still synchronizes somewhere and we don't
> really get full concurrency here. So I'm not sure if this is able to
> trigger a reference bug.
>  

route_input() and route_{attach,detach}() are fully asynchronous, except
the `rtp_lk' rwlock(9). When called from route_output() path,
route_input() is not kernel or net locked. We can add some such threads
if we want more concurrency.



Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Claudio Jeker
On Thu, Jun 30, 2022 at 03:21:40PM +0300, Vitaliy Makkoveev wrote:
> On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote:
> > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote:
> > > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> > > > This diff converts the SRP list to a SMR list in rtsock.c
> > > > SRP is a bit strange with how it works and the SMR code is a bit easier 
> > > > to
> > > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to 
> > > > grab
> > > > a refcount on the route pcb so that we can leave the SMR critical 
> > > > section
> > > > and then enter the SMR critical section at the end of the loop before
> > > > dropping the refcount again.
> > > > 
> > > > The diff does not immeditaly explode but I doubt we can exploit
> > > > parallelism in route_input() so this may fail at some later stage if it 
> > > > is
> > > > wrong.
> > > > 
> > > > Comments from the lock critics welcome
> > > 
> > > We use `so_lock' rwlock(9) to protect route domain sockets. We can't
> > > convert this SRP list to SMR list because we call solock() within
> > > foreach loop.
> > 
> > because of the so_lock the code uses a refcnt on the route pcb to make
> > sure that the object is not freed while we sleep. So that is handled by
> > this diff.
> >  
> > > We can easily crash kernel by running in parallel some "route monitor"
> > > commands and "while true; ifconfig vether0 create ; ifconfig vether0
> > > destroy; done".
> > 
> > That does not cause problem on my system.
> >  
> 
> Sorry, I missed you leave SMR section before solock() call:
> 
> > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > > > rop->rop_proto != sa_family)
> > > > continue;
> > > >  
> > > > -
> > > > +   refcnt_take(&rop->rop_refcnt);
> > > > +   smr_read_leave();
> > > > so = rop->rop_socket;
> > > > solock(so);
> > > >  
> 
> My system is stable with the second version of this diff and the
> following test.

Cool but I fear that the kernel still synchronizes somewhere and we don't
really get full concurrency here. So I'm not sure if this is able to
trigger a reference bug.
 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> static struct ifreq ifr;
> 
> static void *clone(void *arg)
> {
>   int s;
> 
>   if ((s = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
>   err(1, "socket");
>   while (1) {
>   if (ioctl(s, SIOCIFCREATE, &ifr) < 0)
>   if (errno == EINVAL)
>   err(1, "ioctl");
>   if(ioctl(s, SIOCIFDESTROY, &ifr)<0)
>   if(errno == EINVAL)
>   err(1, "ioctl");
>   }
> 
>   return NULL;
> }
> 
> static void *rtsock(void *arg)
> {
>   int s;
> 
>   while (1){
>   if ((s = socket(AF_ROUTE, SOCK_RAW, 0)) < 0)
>   err(1, "socket");
>   close(s);
>   }
> 
>   return NULL;
> }
> 
> int main(int argc, char *argv[])
> {
>   pthread_t thr;
>   int i;
> 
>   if( argc != 2) {
>   fprintf(stderr, "usage: %s ifname\n", getprogname());
>   return 1;
>   }
> 
>   if (getuid() != 0) {
>   fprintf(stderr, "should be root\n");
>   return 1;
>   }
> 
>   memset(&ifr, 0, sizeof(ifr));
>   strlcpy(ifr.ifr_name, argv[1], sizeof(ifr.ifr_name));
> 
>   for(i = 0; i < 8 * 4; ++i) {
>   if (pthread_create(&thr, NULL, clone, NULL)!=0)
>   errx(1, "pthread_create");
>   }
> 
>   for (i = 0; i < 8 * 4; ++i) {
>   if (pthread_create(&thr, NULL, rtsock, NULL) != 0)
>   errx(1, "pthread_create");
>   }
> 
>   select(0, NULL, NULL, NULL, NULL);
> 
>   return 0;
> }
> 

-- 
:wq Claudio



Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Vitaliy Makkoveev
On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> > This diff converts the SRP list to a SMR list in rtsock.c
> > SRP is a bit strange with how it works and the SMR code is a bit easier to
> > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab
> > a refcount on the route pcb so that we can leave the SMR critical section
> > and then enter the SMR critical section at the end of the loop before
> > dropping the refcount again.
> > 
> > The diff does not immeditaly explode but I doubt we can exploit
> > parallelism in route_input() so this may fail at some later stage if it is
> > wrong.
> > 
> > Comments from the lock critics welcome
> 
> After discussing this with mpi@ and jmatthew@ we came to the conclusion
> that we need to smr_barrier() before refcnt_finalize() to ensure that no
> other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> smr_read_leave().
> 
> Updated diff below

ok by me

> -- 
> :wq Claudio
> 
> Index: net/rtsock.c
> ===
> RCS file: /cvs/src/sys/net/rtsock.c,v
> retrieving revision 1.334
> diff -u -p -r1.334 rtsock.c
> --- net/rtsock.c  28 Jun 2022 10:01:13 -  1.334
> +++ net/rtsock.c  30 Jun 2022 09:25:53 -
> @@ -71,7 +71,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -107,8 +107,6 @@ struct walkarg {
>  };
>  
>  void route_prinit(void);
> -void rcb_ref(void *, void *);
> -void rcb_unref(void *, void *);
>  int  route_output(struct mbuf *, struct socket *, struct sockaddr *,
>   struct mbuf *);
>  int  route_ctloutput(int, struct socket *, int, int, struct mbuf *);
> @@ -149,7 +147,7 @@ intrt_setsource(unsigned int, struct 
>  struct rtpcb {
>   struct socket   *rop_socket;/* [I] */
>  
> - SRPL_ENTRY(rtpcb)   rop_list;
> + SMR_TAILQ_ENTRY(rtpcb)  rop_list;
>   struct refcnt   rop_refcnt;
>   struct timeout  rop_timeout;
>   unsigned introp_msgfilter;  /* [s] */
> @@ -162,8 +160,7 @@ struct rtpcb {
>  #define  sotortpcb(so)   ((struct rtpcb *)(so)->so_pcb)
>  
>  struct rtptable {
> - SRPL_HEAD(, rtpcb)  rtp_list;
> - struct srpl_rc  rtp_rc;
> + SMR_TAILQ_HEAD(, rtpcb) rtp_list;
>   struct rwlock   rtp_lk;
>   unsigned intrtp_count;
>  };
> @@ -185,29 +182,12 @@ struct rtptable rtptable;
>  void
>  route_prinit(void)
>  {
> - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL);
>   rw_init(&rtptable.rtp_lk, "rtsock");
> - SRPL_INIT(&rtptable.rtp_list);
> + SMR_TAILQ_INIT(&rtptable.rtp_list);
>   pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0,
>   IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL);
>  }
>  
> -void
> -rcb_ref(void *null, void *v)
> -{
> - struct rtpcb *rop = v;
> -
> - refcnt_take(&rop->rop_refcnt);
> -}
> -
> -void
> -rcb_unref(void *null, void *v)
> -{
> - struct rtpcb *rop = v;
> -
> - refcnt_rele_wake(&rop->rop_refcnt);
> -}
> -
>  int
>  route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
>  struct mbuf *control, struct proc *p)
> @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot
>   so->so_options |= SO_USELOOPBACK;
>  
>   rw_enter(&rtptable.rtp_lk, RW_WRITE);
> - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop,
> - rop_list);
> + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list);
>   rtptable.rtp_count++;
>   rw_exit(&rtptable.rtp_lk);
>  
> @@ -347,13 +326,13 @@ route_detach(struct socket *so)
>   rw_enter(&rtptable.rtp_lk, RW_WRITE);
>  
>   rtptable.rtp_count--;
> - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb,
> - rop_list);
> + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list);
>   rw_exit(&rtptable.rtp_lk);
>  
>   sounlock(so);
>  
>   /* wait for all references to drop */
> + smr_barrier();
>   refcnt_finalize(&rop->rop_refcnt, "rtsockrefs");
>   timeout_del_barrier(&rop->rop_timeout);
>  
> @@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock
>   struct rtpcb *rop;
>   struct rt_msghdr *rtm;
>   struct mbuf *m = m0;
> - struct srp_ref sr;
>  
>   /* ensure that we can access the rtm_type via mtod() */
>   if (m->m_len < offsetof(struct rt_msghdr, rtm_type) + 1) {
> @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
>   return;
>   }
>  
> - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> + smr_read_enter();
> + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
>   /*
>* If route socket is bound to an address family only send
>* messages that match the address family. Address family
> @@ -519,7 +498,8 @@ 

Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Vitaliy Makkoveev
On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote:
> On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote:
> > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> > > This diff converts the SRP list to a SMR list in rtsock.c
> > > SRP is a bit strange with how it works and the SMR code is a bit easier to
> > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab
> > > a refcount on the route pcb so that we can leave the SMR critical section
> > > and then enter the SMR critical section at the end of the loop before
> > > dropping the refcount again.
> > > 
> > > The diff does not immeditaly explode but I doubt we can exploit
> > > parallelism in route_input() so this may fail at some later stage if it is
> > > wrong.
> > > 
> > > Comments from the lock critics welcome
> > 
> > We use `so_lock' rwlock(9) to protect route domain sockets. We can't
> > convert this SRP list to SMR list because we call solock() within
> > foreach loop.
> 
> because of the so_lock the code uses a refcnt on the route pcb to make
> sure that the object is not freed while we sleep. So that is handled by
> this diff.
>  
> > We can easily crash kernel by running in parallel some "route monitor"
> > commands and "while true; ifconfig vether0 create ; ifconfig vether0
> > destroy; done".
> 
> That does not cause problem on my system.
>  

Sorry, I missed you leave SMR section before solock() call:

> > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > >   rop->rop_proto != sa_family)
> > >   continue;
> > >  
> > > -
> > > + refcnt_take(&rop->rop_refcnt);
> > > + smr_read_leave();
> > >   so = rop->rop_socket;
> > >   solock(so);
> > >  

My system is stable with the second version of this diff and the
following test.

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static struct ifreq ifr;

static void *clone(void *arg)
{
int s;

if ((s = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
err(1, "socket");
while (1) {
if (ioctl(s, SIOCIFCREATE, &ifr) < 0)
if (errno == EINVAL)
err(1, "ioctl");
if(ioctl(s, SIOCIFDESTROY, &ifr)<0)
if(errno == EINVAL)
err(1, "ioctl");
}

return NULL;
}

static void *rtsock(void *arg)
{
int s;

while (1){
if ((s = socket(AF_ROUTE, SOCK_RAW, 0)) < 0)
err(1, "socket");
close(s);
}

return NULL;
}

int main(int argc, char *argv[])
{
pthread_t thr;
int i;

if( argc != 2) {
fprintf(stderr, "usage: %s ifname\n", getprogname());
return 1;
}

if (getuid() != 0) {
fprintf(stderr, "should be root\n");
return 1;
}

memset(&ifr, 0, sizeof(ifr));
strlcpy(ifr.ifr_name, argv[1], sizeof(ifr.ifr_name));

for(i = 0; i < 8 * 4; ++i) {
if (pthread_create(&thr, NULL, clone, NULL)!=0)
errx(1, "pthread_create");
}

for (i = 0; i < 8 * 4; ++i) {
if (pthread_create(&thr, NULL, rtsock, NULL) != 0)
errx(1, "pthread_create");
}

select(0, NULL, NULL, NULL, NULL);

return 0;
}



Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Martin Pieuchot
On 30/06/22(Thu) 11:56, Claudio Jeker wrote:
> On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote:
> > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> > > This diff converts the SRP list to a SMR list in rtsock.c
> > > SRP is a bit strange with how it works and the SMR code is a bit easier to
> > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab
> > > a refcount on the route pcb so that we can leave the SMR critical section
> > > and then enter the SMR critical section at the end of the loop before
> > > dropping the refcount again.
> > > 
> > > The diff does not immeditaly explode but I doubt we can exploit
> > > parallelism in route_input() so this may fail at some later stage if it is
> > > wrong.
> > > 
> > > Comments from the lock critics welcome
> > 
> > We use `so_lock' rwlock(9) to protect route domain sockets. We can't
> > convert this SRP list to SMR list because we call solock() within
> > foreach loop.

We shouldn't use SRP list either, no?  Or are we allowed to sleep
holding a SRP reference?  That's the question that triggered this diff.

> because of the so_lock the code uses a refcnt on the route pcb to make
> sure that the object is not freed while we sleep. So that is handled by
> this diff.
>  
> > We can easily crash kernel by running in parallel some "route monitor"
> > commands and "while true; ifconfig vether0 create ; ifconfig vether0
> > destroy; done".
> 
> That does not cause problem on my system.
>  
> > > -- 
> > > :wq Claudio
> > > 
> > > Index: sys/net/rtsock.c
> > > ===
> > > RCS file: /cvs/src/sys/net/rtsock.c,v
> > > retrieving revision 1.334
> > > diff -u -p -r1.334 rtsock.c
> > > --- sys/net/rtsock.c  28 Jun 2022 10:01:13 -  1.334
> > > +++ sys/net/rtsock.c  30 Jun 2022 08:02:09 -
> > > @@ -71,7 +71,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > -#include 
> > > +#include 
> > >  
> > >  #include 
> > >  #include 
> > > @@ -107,8 +107,6 @@ struct walkarg {
> > >  };
> > >  
> > >  void route_prinit(void);
> > > -void rcb_ref(void *, void *);
> > > -void rcb_unref(void *, void *);
> > >  int  route_output(struct mbuf *, struct socket *, struct sockaddr *,
> > >   struct mbuf *);
> > >  int  route_ctloutput(int, struct socket *, int, int, struct mbuf *);
> > > @@ -149,7 +147,7 @@ intrt_setsource(unsigned int, struct 
> > >  struct rtpcb {
> > >   struct socket   *rop_socket;/* [I] */
> > >  
> > > - SRPL_ENTRY(rtpcb)   rop_list;
> > > + SMR_TAILQ_ENTRY(rtpcb)  rop_list;
> > >   struct refcnt   rop_refcnt;
> > >   struct timeout  rop_timeout;
> > >   unsigned introp_msgfilter;  /* [s] */
> > > @@ -162,8 +160,7 @@ struct rtpcb {
> > >  #define  sotortpcb(so)   ((struct rtpcb *)(so)->so_pcb)
> > >  
> > >  struct rtptable {
> > > - SRPL_HEAD(, rtpcb)  rtp_list;
> > > - struct srpl_rc  rtp_rc;
> > > + SMR_TAILQ_HEAD(, rtpcb) rtp_list;
> > >   struct rwlock   rtp_lk;
> > >   unsigned intrtp_count;
> > >  };
> > > @@ -185,29 +182,12 @@ struct rtptable rtptable;
> > >  void
> > >  route_prinit(void)
> > >  {
> > > - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL);
> > >   rw_init(&rtptable.rtp_lk, "rtsock");
> > > - SRPL_INIT(&rtptable.rtp_list);
> > > + SMR_TAILQ_INIT(&rtptable.rtp_list);
> > >   pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0,
> > >   IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL);
> > >  }
> > >  
> > > -void
> > > -rcb_ref(void *null, void *v)
> > > -{
> > > - struct rtpcb *rop = v;
> > > -
> > > - refcnt_take(&rop->rop_refcnt);
> > > -}
> > > -
> > > -void
> > > -rcb_unref(void *null, void *v)
> > > -{
> > > - struct rtpcb *rop = v;
> > > -
> > > - refcnt_rele_wake(&rop->rop_refcnt);
> > > -}
> > > -
> > >  int
> > >  route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf 
> > > *nam,
> > >  struct mbuf *control, struct proc *p)
> > > @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot
> > >   so->so_options |= SO_USELOOPBACK;
> > >  
> > >   rw_enter(&rtptable.rtp_lk, RW_WRITE);
> > > - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop,
> > > - rop_list);
> > > + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list);
> > >   rtptable.rtp_count++;
> > >   rw_exit(&rtptable.rtp_lk);
> > >  
> > > @@ -347,8 +326,7 @@ route_detach(struct socket *so)
> > >   rw_enter(&rtptable.rtp_lk, RW_WRITE);
> > >  
> > >   rtptable.rtp_count--;
> > > - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb,
> > > - rop_list);
> > > + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list);
> > >   rw_exit(&rtptable.rtp_lk);
> > >  
> > >   sounlock(so);
> > > @@ -356,6 +334,7 @@ route_detach(struct socket *so)
> > >   /* wait for all references to drop */
> > >   refcnt_finalize(&rop->rop_refcnt, "rtsockrefs");
> > 

Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Claudio Jeker
On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote:
> On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> > This diff converts the SRP list to a SMR list in rtsock.c
> > SRP is a bit strange with how it works and the SMR code is a bit easier to
> > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab
> > a refcount on the route pcb so that we can leave the SMR critical section
> > and then enter the SMR critical section at the end of the loop before
> > dropping the refcount again.
> > 
> > The diff does not immeditaly explode but I doubt we can exploit
> > parallelism in route_input() so this may fail at some later stage if it is
> > wrong.
> > 
> > Comments from the lock critics welcome
> 
> We use `so_lock' rwlock(9) to protect route domain sockets. We can't
> convert this SRP list to SMR list because we call solock() within
> foreach loop.

because of the so_lock the code uses a refcnt on the route pcb to make
sure that the object is not freed while we sleep. So that is handled by
this diff.
 
> We can easily crash kernel by running in parallel some "route monitor"
> commands and "while true; ifconfig vether0 create ; ifconfig vether0
> destroy; done".

That does not cause problem on my system.
 
> > -- 
> > :wq Claudio
> > 
> > Index: sys/net/rtsock.c
> > ===
> > RCS file: /cvs/src/sys/net/rtsock.c,v
> > retrieving revision 1.334
> > diff -u -p -r1.334 rtsock.c
> > --- sys/net/rtsock.c28 Jun 2022 10:01:13 -  1.334
> > +++ sys/net/rtsock.c30 Jun 2022 08:02:09 -
> > @@ -71,7 +71,7 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -107,8 +107,6 @@ struct walkarg {
> >  };
> >  
> >  void   route_prinit(void);
> > -void   rcb_ref(void *, void *);
> > -void   rcb_unref(void *, void *);
> >  introute_output(struct mbuf *, struct socket *, struct sockaddr *,
> > struct mbuf *);
> >  introute_ctloutput(int, struct socket *, int, int, struct mbuf *);
> > @@ -149,7 +147,7 @@ int  rt_setsource(unsigned int, struct 
> >  struct rtpcb {
> > struct socket   *rop_socket;/* [I] */
> >  
> > -   SRPL_ENTRY(rtpcb)   rop_list;
> > +   SMR_TAILQ_ENTRY(rtpcb)  rop_list;
> > struct refcnt   rop_refcnt;
> > struct timeout  rop_timeout;
> > unsigned introp_msgfilter;  /* [s] */
> > @@ -162,8 +160,7 @@ struct rtpcb {
> >  #definesotortpcb(so)   ((struct rtpcb *)(so)->so_pcb)
> >  
> >  struct rtptable {
> > -   SRPL_HEAD(, rtpcb)  rtp_list;
> > -   struct srpl_rc  rtp_rc;
> > +   SMR_TAILQ_HEAD(, rtpcb) rtp_list;
> > struct rwlock   rtp_lk;
> > unsigned intrtp_count;
> >  };
> > @@ -185,29 +182,12 @@ struct rtptable rtptable;
> >  void
> >  route_prinit(void)
> >  {
> > -   srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL);
> > rw_init(&rtptable.rtp_lk, "rtsock");
> > -   SRPL_INIT(&rtptable.rtp_list);
> > +   SMR_TAILQ_INIT(&rtptable.rtp_list);
> > pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0,
> > IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL);
> >  }
> >  
> > -void
> > -rcb_ref(void *null, void *v)
> > -{
> > -   struct rtpcb *rop = v;
> > -
> > -   refcnt_take(&rop->rop_refcnt);
> > -}
> > -
> > -void
> > -rcb_unref(void *null, void *v)
> > -{
> > -   struct rtpcb *rop = v;
> > -
> > -   refcnt_rele_wake(&rop->rop_refcnt);
> > -}
> > -
> >  int
> >  route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
> >  struct mbuf *control, struct proc *p)
> > @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot
> > so->so_options |= SO_USELOOPBACK;
> >  
> > rw_enter(&rtptable.rtp_lk, RW_WRITE);
> > -   SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop,
> > -   rop_list);
> > +   SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list);
> > rtptable.rtp_count++;
> > rw_exit(&rtptable.rtp_lk);
> >  
> > @@ -347,8 +326,7 @@ route_detach(struct socket *so)
> > rw_enter(&rtptable.rtp_lk, RW_WRITE);
> >  
> > rtptable.rtp_count--;
> > -   SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb,
> > -   rop_list);
> > +   SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list);
> > rw_exit(&rtptable.rtp_lk);
> >  
> > sounlock(so);
> > @@ -356,6 +334,7 @@ route_detach(struct socket *so)
> > /* wait for all references to drop */
> > refcnt_finalize(&rop->rop_refcnt, "rtsockrefs");
> > timeout_del_barrier(&rop->rop_timeout);
> > +   smr_barrier();
> >  
> > solock(so);
> >  
> > @@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock
> > struct rtpcb *rop;
> > struct rt_msghdr *rtm;
> > struct mbuf *m = m0;
> > -   struct srp_ref sr;
> >  
> > /* ensure that we can access the rtm_type via mtod() */
> >

Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Claudio Jeker
On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> This diff converts the SRP list to a SMR list in rtsock.c
> SRP is a bit strange with how it works and the SMR code is a bit easier to
> understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab
> a refcount on the route pcb so that we can leave the SMR critical section
> and then enter the SMR critical section at the end of the loop before
> dropping the refcount again.
> 
> The diff does not immeditaly explode but I doubt we can exploit
> parallelism in route_input() so this may fail at some later stage if it is
> wrong.
> 
> Comments from the lock critics welcome

After discussing this with mpi@ and jmatthew@ we came to the conclusion
that we need to smr_barrier() before refcnt_finalize() to ensure that no
other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
smr_read_leave().

Updated diff below
-- 
:wq Claudio

Index: net/rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.334
diff -u -p -r1.334 rtsock.c
--- net/rtsock.c28 Jun 2022 10:01:13 -  1.334
+++ net/rtsock.c30 Jun 2022 09:25:53 -
@@ -71,7 +71,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -107,8 +107,6 @@ struct walkarg {
 };
 
 void   route_prinit(void);
-void   rcb_ref(void *, void *);
-void   rcb_unref(void *, void *);
 introute_output(struct mbuf *, struct socket *, struct sockaddr *,
struct mbuf *);
 introute_ctloutput(int, struct socket *, int, int, struct mbuf *);
@@ -149,7 +147,7 @@ int  rt_setsource(unsigned int, struct 
 struct rtpcb {
struct socket   *rop_socket;/* [I] */
 
-   SRPL_ENTRY(rtpcb)   rop_list;
+   SMR_TAILQ_ENTRY(rtpcb)  rop_list;
struct refcnt   rop_refcnt;
struct timeout  rop_timeout;
unsigned introp_msgfilter;  /* [s] */
@@ -162,8 +160,7 @@ struct rtpcb {
 #definesotortpcb(so)   ((struct rtpcb *)(so)->so_pcb)
 
 struct rtptable {
-   SRPL_HEAD(, rtpcb)  rtp_list;
-   struct srpl_rc  rtp_rc;
+   SMR_TAILQ_HEAD(, rtpcb) rtp_list;
struct rwlock   rtp_lk;
unsigned intrtp_count;
 };
@@ -185,29 +182,12 @@ struct rtptable rtptable;
 void
 route_prinit(void)
 {
-   srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL);
rw_init(&rtptable.rtp_lk, "rtsock");
-   SRPL_INIT(&rtptable.rtp_list);
+   SMR_TAILQ_INIT(&rtptable.rtp_list);
pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0,
IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL);
 }
 
-void
-rcb_ref(void *null, void *v)
-{
-   struct rtpcb *rop = v;
-
-   refcnt_take(&rop->rop_refcnt);
-}
-
-void
-rcb_unref(void *null, void *v)
-{
-   struct rtpcb *rop = v;
-
-   refcnt_rele_wake(&rop->rop_refcnt);
-}
-
 int
 route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
 struct mbuf *control, struct proc *p)
@@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot
so->so_options |= SO_USELOOPBACK;
 
rw_enter(&rtptable.rtp_lk, RW_WRITE);
-   SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop,
-   rop_list);
+   SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list);
rtptable.rtp_count++;
rw_exit(&rtptable.rtp_lk);
 
@@ -347,13 +326,13 @@ route_detach(struct socket *so)
rw_enter(&rtptable.rtp_lk, RW_WRITE);
 
rtptable.rtp_count--;
-   SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb,
-   rop_list);
+   SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list);
rw_exit(&rtptable.rtp_lk);
 
sounlock(so);
 
/* wait for all references to drop */
+   smr_barrier();
refcnt_finalize(&rop->rop_refcnt, "rtsockrefs");
timeout_del_barrier(&rop->rop_timeout);
 
@@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock
struct rtpcb *rop;
struct rt_msghdr *rtm;
struct mbuf *m = m0;
-   struct srp_ref sr;
 
/* ensure that we can access the rtm_type via mtod() */
if (m->m_len < offsetof(struct rt_msghdr, rtm_type) + 1) {
@@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
return;
}
 
-   SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
+   smr_read_enter();
+   SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
/*
 * If route socket is bound to an address family only send
 * messages that match the address family. Address family
@@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
rop->rop_proto != sa_family)
continue;
 
-
+   refcnt_take(&rop->rop_refcnt);
+   smr_read_leave();
so = rop->rop_soc

Re: Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Vitaliy Makkoveev
On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> This diff converts the SRP list to a SMR list in rtsock.c
> SRP is a bit strange with how it works and the SMR code is a bit easier to
> understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab
> a refcount on the route pcb so that we can leave the SMR critical section
> and then enter the SMR critical section at the end of the loop before
> dropping the refcount again.
> 
> The diff does not immeditaly explode but I doubt we can exploit
> parallelism in route_input() so this may fail at some later stage if it is
> wrong.
> 
> Comments from the lock critics welcome

We use `so_lock' rwlock(9) to protect route domain sockets. We can't
convert this SRP list to SMR list because we call solock() within
foreach loop.

We can easily crash kernel by running in parallel some "route monitor"
commands and "while true; ifconfig vether0 create ; ifconfig vether0
destroy; done".

> -- 
> :wq Claudio
> 
> Index: sys/net/rtsock.c
> ===
> RCS file: /cvs/src/sys/net/rtsock.c,v
> retrieving revision 1.334
> diff -u -p -r1.334 rtsock.c
> --- sys/net/rtsock.c  28 Jun 2022 10:01:13 -  1.334
> +++ sys/net/rtsock.c  30 Jun 2022 08:02:09 -
> @@ -71,7 +71,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -107,8 +107,6 @@ struct walkarg {
>  };
>  
>  void route_prinit(void);
> -void rcb_ref(void *, void *);
> -void rcb_unref(void *, void *);
>  int  route_output(struct mbuf *, struct socket *, struct sockaddr *,
>   struct mbuf *);
>  int  route_ctloutput(int, struct socket *, int, int, struct mbuf *);
> @@ -149,7 +147,7 @@ intrt_setsource(unsigned int, struct 
>  struct rtpcb {
>   struct socket   *rop_socket;/* [I] */
>  
> - SRPL_ENTRY(rtpcb)   rop_list;
> + SMR_TAILQ_ENTRY(rtpcb)  rop_list;
>   struct refcnt   rop_refcnt;
>   struct timeout  rop_timeout;
>   unsigned introp_msgfilter;  /* [s] */
> @@ -162,8 +160,7 @@ struct rtpcb {
>  #define  sotortpcb(so)   ((struct rtpcb *)(so)->so_pcb)
>  
>  struct rtptable {
> - SRPL_HEAD(, rtpcb)  rtp_list;
> - struct srpl_rc  rtp_rc;
> + SMR_TAILQ_HEAD(, rtpcb) rtp_list;
>   struct rwlock   rtp_lk;
>   unsigned intrtp_count;
>  };
> @@ -185,29 +182,12 @@ struct rtptable rtptable;
>  void
>  route_prinit(void)
>  {
> - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL);
>   rw_init(&rtptable.rtp_lk, "rtsock");
> - SRPL_INIT(&rtptable.rtp_list);
> + SMR_TAILQ_INIT(&rtptable.rtp_list);
>   pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0,
>   IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL);
>  }
>  
> -void
> -rcb_ref(void *null, void *v)
> -{
> - struct rtpcb *rop = v;
> -
> - refcnt_take(&rop->rop_refcnt);
> -}
> -
> -void
> -rcb_unref(void *null, void *v)
> -{
> - struct rtpcb *rop = v;
> -
> - refcnt_rele_wake(&rop->rop_refcnt);
> -}
> -
>  int
>  route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
>  struct mbuf *control, struct proc *p)
> @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot
>   so->so_options |= SO_USELOOPBACK;
>  
>   rw_enter(&rtptable.rtp_lk, RW_WRITE);
> - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop,
> - rop_list);
> + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list);
>   rtptable.rtp_count++;
>   rw_exit(&rtptable.rtp_lk);
>  
> @@ -347,8 +326,7 @@ route_detach(struct socket *so)
>   rw_enter(&rtptable.rtp_lk, RW_WRITE);
>  
>   rtptable.rtp_count--;
> - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb,
> - rop_list);
> + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list);
>   rw_exit(&rtptable.rtp_lk);
>  
>   sounlock(so);
> @@ -356,6 +334,7 @@ route_detach(struct socket *so)
>   /* wait for all references to drop */
>   refcnt_finalize(&rop->rop_refcnt, "rtsockrefs");
>   timeout_del_barrier(&rop->rop_timeout);
> + smr_barrier();
>  
>   solock(so);
>  
> @@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock
>   struct rtpcb *rop;
>   struct rt_msghdr *rtm;
>   struct mbuf *m = m0;
> - struct srp_ref sr;
>  
>   /* ensure that we can access the rtm_type via mtod() */
>   if (m->m_len < offsetof(struct rt_msghdr, rtm_type) + 1) {
> @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
>   return;
>   }
>  
> - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> + smr_read_enter();
> + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
>   /*
>* If route socket is bound to an address family only send
>* messages that match the address family. Address famil

Use SMR instead of SRP list in rtsock.c

2022-06-30 Thread Claudio Jeker
This diff converts the SRP list to a SMR list in rtsock.c
SRP is a bit strange with how it works and the SMR code is a bit easier to
understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab
a refcount on the route pcb so that we can leave the SMR critical section
and then enter the SMR critical section at the end of the loop before
dropping the refcount again.

The diff does not immeditaly explode but I doubt we can exploit
parallelism in route_input() so this may fail at some later stage if it is
wrong.

Comments from the lock critics welcome
-- 
:wq Claudio

Index: sys/net/rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.334
diff -u -p -r1.334 rtsock.c
--- sys/net/rtsock.c28 Jun 2022 10:01:13 -  1.334
+++ sys/net/rtsock.c30 Jun 2022 08:02:09 -
@@ -71,7 +71,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -107,8 +107,6 @@ struct walkarg {
 };
 
 void   route_prinit(void);
-void   rcb_ref(void *, void *);
-void   rcb_unref(void *, void *);
 introute_output(struct mbuf *, struct socket *, struct sockaddr *,
struct mbuf *);
 introute_ctloutput(int, struct socket *, int, int, struct mbuf *);
@@ -149,7 +147,7 @@ int  rt_setsource(unsigned int, struct 
 struct rtpcb {
struct socket   *rop_socket;/* [I] */
 
-   SRPL_ENTRY(rtpcb)   rop_list;
+   SMR_TAILQ_ENTRY(rtpcb)  rop_list;
struct refcnt   rop_refcnt;
struct timeout  rop_timeout;
unsigned introp_msgfilter;  /* [s] */
@@ -162,8 +160,7 @@ struct rtpcb {
 #definesotortpcb(so)   ((struct rtpcb *)(so)->so_pcb)
 
 struct rtptable {
-   SRPL_HEAD(, rtpcb)  rtp_list;
-   struct srpl_rc  rtp_rc;
+   SMR_TAILQ_HEAD(, rtpcb) rtp_list;
struct rwlock   rtp_lk;
unsigned intrtp_count;
 };
@@ -185,29 +182,12 @@ struct rtptable rtptable;
 void
 route_prinit(void)
 {
-   srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL);
rw_init(&rtptable.rtp_lk, "rtsock");
-   SRPL_INIT(&rtptable.rtp_list);
+   SMR_TAILQ_INIT(&rtptable.rtp_list);
pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0,
IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL);
 }
 
-void
-rcb_ref(void *null, void *v)
-{
-   struct rtpcb *rop = v;
-
-   refcnt_take(&rop->rop_refcnt);
-}
-
-void
-rcb_unref(void *null, void *v)
-{
-   struct rtpcb *rop = v;
-
-   refcnt_rele_wake(&rop->rop_refcnt);
-}
-
 int
 route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam,
 struct mbuf *control, struct proc *p)
@@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot
so->so_options |= SO_USELOOPBACK;
 
rw_enter(&rtptable.rtp_lk, RW_WRITE);
-   SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop,
-   rop_list);
+   SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list);
rtptable.rtp_count++;
rw_exit(&rtptable.rtp_lk);
 
@@ -347,8 +326,7 @@ route_detach(struct socket *so)
rw_enter(&rtptable.rtp_lk, RW_WRITE);
 
rtptable.rtp_count--;
-   SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb,
-   rop_list);
+   SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list);
rw_exit(&rtptable.rtp_lk);
 
sounlock(so);
@@ -356,6 +334,7 @@ route_detach(struct socket *so)
/* wait for all references to drop */
refcnt_finalize(&rop->rop_refcnt, "rtsockrefs");
timeout_del_barrier(&rop->rop_timeout);
+   smr_barrier();
 
solock(so);
 
@@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock
struct rtpcb *rop;
struct rt_msghdr *rtm;
struct mbuf *m = m0;
-   struct srp_ref sr;
 
/* ensure that we can access the rtm_type via mtod() */
if (m->m_len < offsetof(struct rt_msghdr, rtm_type) + 1) {
@@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
return;
}
 
-   SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
+   smr_read_enter();
+   SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
/*
 * If route socket is bound to an address family only send
 * messages that match the address family. Address family
@@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
rop->rop_proto != sa_family)
continue;
 
-
+   refcnt_take(&rop->rop_refcnt);
+   smr_read_leave();
so = rop->rop_socket;
solock(so);
 
@@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
rtm_sendup(so, m);
 next:
sounlock(so);
+   smr_read_enter();
+   refcnt_rele_wake(&rop->rop_refcnt);