Re: Use SMR instead of SRP list in rtsock.c
On Wed, Aug 10, 2022 at 11:08:06AM +0200, Claudio Jeker wrote: > On Fri, Jul 01, 2022 at 04:03:21PM +, Visa Hankala wrote: > > On Fri, Jul 01, 2022 at 09:59:11AM +0200, Claudio Jeker wrote: > > > On Thu, Jun 30, 2022 at 03:46:35PM +, Visa Hankala wrote: > > > > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote: > > > > > After discussing this with mpi@ and jmatthew@ we came to the > > > > > conclusion > > > > > that we need to smr_barrier() before refcnt_finalize() to ensure that > > > > > no > > > > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and > > > > > smr_read_leave(). > > > > > > > > [...] > > > > > > > > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock > > > > > return; > > > > > } > > > > > > > > > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { > > > > > + smr_read_enter(); > > > > > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { > > > > > /* > > > > >* If route socket is bound to an address family only > > > > > send > > > > >* messages that match the address family. Address > > > > > family > > > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock > > > > > rop->rop_proto != sa_family) > > > > > continue; > > > > > > > > > > - > > > > > + refcnt_take(&rop->rop_refcnt); > > > > > + smr_read_leave(); > > > > > so = rop->rop_socket; > > > > > solock(so); > > > > > > > > > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock > > > > > rtm_sendup(so, m); > > > > > next: > > > > > sounlock(so); > > > > > + smr_read_enter(); > > > > > + refcnt_rele_wake(&rop->rop_refcnt); > > > > > > > > This does not look correct. > > > > > > > > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake() > > > > might drop the final reference and this thread can no longer access > > > > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()). > > > > > > > > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after > > > > smr_read_leave(). After this thread leaves the read-side critical > > > > section, another thread might free rop's successor. > > > > > > So we need to either smr_barrier() before and after the refcnt_finalize() > > > to make sure that the rop pointer remains stable in both cases or we alter > > > the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before > > > refcnt_rele_wake(). > > > > > > While the double smr_barrier() is trivial it is not ideal and I think it > > > is better to adjust the loop since SMR loops with sleep points is a > > > somewhat common issue and so we should have a good clear way on how to > > > solve it. > > > > Adjusting SMR_TAILQ_FOREACH() will not help. > > > > In general, a reader cannot resume a lockless iteration after it has > > left the read-side critical section and crossed a sleep point. The > > guarantee of consistent(-looking) forward linkage is gone. The reader > > no longer knows if the value of SMR_TAILQ_NEXT() is valid. If the > > reader wants to continue with the list, it has to re-enter the read-side > > critical section and restart the iteration. > > This is not a real SMR_TAILQ_FOREACH() use case so trying to use > SMR_TAILQ_FOREACH() here is not right. The code wants to walk the list of > route pcbs linked via rop_list. The code just needs to walk all active > connections and does not care about races with sockets that are concurrently > closed or opened. In the first case SS_CANTRCVMORE will be set and the > socket is skipped and in the second case the socket is simply ignored > because new sockets are inserted at the head of the list. > > It is not a lockless iteration over the full list. It is not required to > be either. The only thing that matters is that the forward linkage is > consitent (not pointing to invalid objects). > > There is no need to restart the iteration because element on the list can > not be reinserted. They can only be removed and a removed element does not > get any message anyway (either by not visiting the object or by skipping > it in the loop). > > The refcnt ensures that the currently used pcb is not freed before the > next element is picked. As long as the refcnt is hold the object can't be > removed. Lets assume that another thread begins to detach rop while route_input() sleeps. The reference prevents early freeing of rop. Lets assume further that yet another thread detaches and frees the successor of rop while the first thread is still sleeping. What will SMR_LIST_NEXT(rop, rop_list) point to?
Re: Use SMR instead of SRP list in rtsock.c
On Fri, Jul 01, 2022 at 04:03:21PM +, Visa Hankala wrote: > On Fri, Jul 01, 2022 at 09:59:11AM +0200, Claudio Jeker wrote: > > On Thu, Jun 30, 2022 at 03:46:35PM +, Visa Hankala wrote: > > > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote: > > > > After discussing this with mpi@ and jmatthew@ we came to the conclusion > > > > that we need to smr_barrier() before refcnt_finalize() to ensure that no > > > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and > > > > smr_read_leave(). > > > > > > [...] > > > > > > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock > > > > return; > > > > } > > > > > > > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { > > > > + smr_read_enter(); > > > > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { > > > > /* > > > > * If route socket is bound to an address family only > > > > send > > > > * messages that match the address family. Address > > > > family > > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock > > > > rop->rop_proto != sa_family) > > > > continue; > > > > > > > > - > > > > + refcnt_take(&rop->rop_refcnt); > > > > + smr_read_leave(); > > > > so = rop->rop_socket; > > > > solock(so); > > > > > > > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock > > > > rtm_sendup(so, m); > > > > next: > > > > sounlock(so); > > > > + smr_read_enter(); > > > > + refcnt_rele_wake(&rop->rop_refcnt); > > > > > > This does not look correct. > > > > > > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake() > > > might drop the final reference and this thread can no longer access > > > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()). > > > > > > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after > > > smr_read_leave(). After this thread leaves the read-side critical > > > section, another thread might free rop's successor. > > > > So we need to either smr_barrier() before and after the refcnt_finalize() > > to make sure that the rop pointer remains stable in both cases or we alter > > the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before > > refcnt_rele_wake(). > > > > While the double smr_barrier() is trivial it is not ideal and I think it > > is better to adjust the loop since SMR loops with sleep points is a > > somewhat common issue and so we should have a good clear way on how to > > solve it. > > Adjusting SMR_TAILQ_FOREACH() will not help. > > In general, a reader cannot resume a lockless iteration after it has > left the read-side critical section and crossed a sleep point. The > guarantee of consistent(-looking) forward linkage is gone. The reader > no longer knows if the value of SMR_TAILQ_NEXT() is valid. If the > reader wants to continue with the list, it has to re-enter the read-side > critical section and restart the iteration. This is not a real SMR_TAILQ_FOREACH() use case so trying to use SMR_TAILQ_FOREACH() here is not right. The code wants to walk the list of route pcbs linked via rop_list. The code just needs to walk all active connections and does not care about races with sockets that are concurrently closed or opened. In the first case SS_CANTRCVMORE will be set and the socket is skipped and in the second case the socket is simply ignored because new sockets are inserted at the head of the list. It is not a lockless iteration over the full list. It is not required to be either. The only thing that matters is that the forward linkage is consitent (not pointing to invalid objects). There is no need to restart the iteration because element on the list can not be reinserted. They can only be removed and a removed element does not get any message anyway (either by not visiting the object or by skipping it in the loop). The refcnt ensures that the currently used pcb is not freed before the next element is picked. As long as the refcnt is hold the object can't be removed. > I guess I should finish the sleepable variant of SMR that I was > tinkering with long ago... I switched from SMR_TAILQ to SMR_LIST since there is no need to access the tail. Also the code uses just SMR_LIST_FIRST() and SMR_LIST_NEXT() and not SMR_LIST_FOREACH(). -- :wq Claudio Index: rtsock.c === RCS file: /cvs/src/sys/net/rtsock.c,v retrieving revision 1.334 diff -u -p -r1.334 rtsock.c --- rtsock.c28 Jun 2022 10:01:13 - 1.334 +++ rtsock.c10 Aug 2022 08:43:31 - @@ -71,7 +71,7 @@ #include #include #include -#include +#include #include #include @@ -107,8 +107,6 @@ struct walkarg { }; void route_prinit(void); -void rcb_ref(void *, void *); -void rcb_unref(void *, voi
Re: Use SMR instead of SRP list in rtsock.c
On Fri, Jul 01, 2022 at 09:59:11AM +0200, Claudio Jeker wrote: > On Thu, Jun 30, 2022 at 03:46:35PM +, Visa Hankala wrote: > > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote: > > > After discussing this with mpi@ and jmatthew@ we came to the conclusion > > > that we need to smr_barrier() before refcnt_finalize() to ensure that no > > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and > > > smr_read_leave(). > > > > [...] > > > > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock > > > return; > > > } > > > > > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { > > > + smr_read_enter(); > > > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { > > > /* > > >* If route socket is bound to an address family only send > > >* messages that match the address family. Address family > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock > > > rop->rop_proto != sa_family) > > > continue; > > > > > > - > > > + refcnt_take(&rop->rop_refcnt); > > > + smr_read_leave(); > > > so = rop->rop_socket; > > > solock(so); > > > > > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock > > > rtm_sendup(so, m); > > > next: > > > sounlock(so); > > > + smr_read_enter(); > > > + refcnt_rele_wake(&rop->rop_refcnt); > > > > This does not look correct. > > > > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake() > > might drop the final reference and this thread can no longer access > > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()). > > > > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after > > smr_read_leave(). After this thread leaves the read-side critical > > section, another thread might free rop's successor. > > So we need to either smr_barrier() before and after the refcnt_finalize() > to make sure that the rop pointer remains stable in both cases or we alter > the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before > refcnt_rele_wake(). > > While the double smr_barrier() is trivial it is not ideal and I think it > is better to adjust the loop since SMR loops with sleep points is a > somewhat common issue and so we should have a good clear way on how to > solve it. Adjusting SMR_TAILQ_FOREACH() will not help. In general, a reader cannot resume a lockless iteration after it has left the read-side critical section and crossed a sleep point. The guarantee of consistent(-looking) forward linkage is gone. The reader no longer knows if the value of SMR_TAILQ_NEXT() is valid. If the reader wants to continue with the list, it has to re-enter the read-side critical section and restart the iteration. I guess I should finish the sleepable variant of SMR that I was tinkering with long ago...
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 03:46:35PM +, Visa Hankala wrote: > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote: > > After discussing this with mpi@ and jmatthew@ we came to the conclusion > > that we need to smr_barrier() before refcnt_finalize() to ensure that no > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and > > smr_read_leave(). > > [...] > > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock > > return; > > } > > > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { > > + smr_read_enter(); > > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { > > /* > > * If route socket is bound to an address family only send > > * messages that match the address family. Address family > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock > > rop->rop_proto != sa_family) > > continue; > > > > - > > + refcnt_take(&rop->rop_refcnt); > > + smr_read_leave(); > > so = rop->rop_socket; > > solock(so); > > > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock > > rtm_sendup(so, m); > > next: > > sounlock(so); > > + smr_read_enter(); > > + refcnt_rele_wake(&rop->rop_refcnt); > > This does not look correct. > > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake() > might drop the final reference and this thread can no longer access > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()). > > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after > smr_read_leave(). After this thread leaves the read-side critical > section, another thread might free rop's successor. So we need to either smr_barrier() before and after the refcnt_finalize() to make sure that the rop pointer remains stable in both cases or we alter the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before refcnt_rele_wake(). While the double smr_barrier() is trivial it is not ideal and I think it is better to adjust the loop since SMR loops with sleep points is a somewhat common issue and so we should have a good clear way on how to solve it. I'll send out a new diff next week -- :wq Claudio
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote: > After discussing this with mpi@ and jmatthew@ we came to the conclusion > that we need to smr_barrier() before refcnt_finalize() to ensure that no > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and > smr_read_leave(). [...] > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock > return; > } > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { > + smr_read_enter(); > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { > /* >* If route socket is bound to an address family only send >* messages that match the address family. Address family > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock > rop->rop_proto != sa_family) > continue; > > - > + refcnt_take(&rop->rop_refcnt); > + smr_read_leave(); > so = rop->rop_socket; > solock(so); > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock > rtm_sendup(so, m); > next: > sounlock(so); > + smr_read_enter(); > + refcnt_rele_wake(&rop->rop_refcnt); This does not look correct. smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake() might drop the final reference and this thread can no longer access rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()). Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after smr_read_leave(). After this thread leaves the read-side critical section, another thread might free rop's successor.
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 02:32:03PM +0200, Claudio Jeker wrote: > On Thu, Jun 30, 2022 at 03:21:40PM +0300, Vitaliy Makkoveev wrote: > > On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote: > > > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote: > > > > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > > > > > This diff converts the SRP list to a SMR list in rtsock.c > > > > > SRP is a bit strange with how it works and the SMR code is a bit > > > > > easier to > > > > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to > > > > > grab > > > > > a refcount on the route pcb so that we can leave the SMR critical > > > > > section > > > > > and then enter the SMR critical section at the end of the loop before > > > > > dropping the refcount again. > > > > > > > > > > The diff does not immeditaly explode but I doubt we can exploit > > > > > parallelism in route_input() so this may fail at some later stage if > > > > > it is > > > > > wrong. > > > > > > > > > > Comments from the lock critics welcome > > > > > > > > We use `so_lock' rwlock(9) to protect route domain sockets. We can't > > > > convert this SRP list to SMR list because we call solock() within > > > > foreach loop. > > > > > > because of the so_lock the code uses a refcnt on the route pcb to make > > > sure that the object is not freed while we sleep. So that is handled by > > > this diff. > > > > > > > We can easily crash kernel by running in parallel some "route monitor" > > > > commands and "while true; ifconfig vether0 create ; ifconfig vether0 > > > > destroy; done". > > > > > > That does not cause problem on my system. > > > > > > > Sorry, I missed you leave SMR section before solock() call: > > > > > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock > > > > > rop->rop_proto != sa_family) > > > > > continue; > > > > > > > > > > - > > > > > + refcnt_take(&rop->rop_refcnt); > > > > > + smr_read_leave(); > > > > > so = rop->rop_socket; > > > > > solock(so); > > > > > > > > > My system is stable with the second version of this diff and the > > following test. > > Cool but I fear that the kernel still synchronizes somewhere and we don't > really get full concurrency here. So I'm not sure if this is able to > trigger a reference bug. > route_input() and route_{attach,detach}() are fully asynchronous, except the `rtp_lk' rwlock(9). When called from route_output() path, route_input() is not kernel or net locked. We can add some such threads if we want more concurrency.
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 03:21:40PM +0300, Vitaliy Makkoveev wrote: > On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote: > > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote: > > > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > > > > This diff converts the SRP list to a SMR list in rtsock.c > > > > SRP is a bit strange with how it works and the SMR code is a bit easier > > > > to > > > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to > > > > grab > > > > a refcount on the route pcb so that we can leave the SMR critical > > > > section > > > > and then enter the SMR critical section at the end of the loop before > > > > dropping the refcount again. > > > > > > > > The diff does not immeditaly explode but I doubt we can exploit > > > > parallelism in route_input() so this may fail at some later stage if it > > > > is > > > > wrong. > > > > > > > > Comments from the lock critics welcome > > > > > > We use `so_lock' rwlock(9) to protect route domain sockets. We can't > > > convert this SRP list to SMR list because we call solock() within > > > foreach loop. > > > > because of the so_lock the code uses a refcnt on the route pcb to make > > sure that the object is not freed while we sleep. So that is handled by > > this diff. > > > > > We can easily crash kernel by running in parallel some "route monitor" > > > commands and "while true; ifconfig vether0 create ; ifconfig vether0 > > > destroy; done". > > > > That does not cause problem on my system. > > > > Sorry, I missed you leave SMR section before solock() call: > > > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock > > > > rop->rop_proto != sa_family) > > > > continue; > > > > > > > > - > > > > + refcnt_take(&rop->rop_refcnt); > > > > + smr_read_leave(); > > > > so = rop->rop_socket; > > > > solock(so); > > > > > > My system is stable with the second version of this diff and the > following test. Cool but I fear that the kernel still synchronizes somewhere and we don't really get full concurrency here. So I'm not sure if this is able to trigger a reference bug. > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > > static struct ifreq ifr; > > static void *clone(void *arg) > { > int s; > > if ((s = socket(AF_INET, SOCK_DGRAM, 0)) < 0) > err(1, "socket"); > while (1) { > if (ioctl(s, SIOCIFCREATE, &ifr) < 0) > if (errno == EINVAL) > err(1, "ioctl"); > if(ioctl(s, SIOCIFDESTROY, &ifr)<0) > if(errno == EINVAL) > err(1, "ioctl"); > } > > return NULL; > } > > static void *rtsock(void *arg) > { > int s; > > while (1){ > if ((s = socket(AF_ROUTE, SOCK_RAW, 0)) < 0) > err(1, "socket"); > close(s); > } > > return NULL; > } > > int main(int argc, char *argv[]) > { > pthread_t thr; > int i; > > if( argc != 2) { > fprintf(stderr, "usage: %s ifname\n", getprogname()); > return 1; > } > > if (getuid() != 0) { > fprintf(stderr, "should be root\n"); > return 1; > } > > memset(&ifr, 0, sizeof(ifr)); > strlcpy(ifr.ifr_name, argv[1], sizeof(ifr.ifr_name)); > > for(i = 0; i < 8 * 4; ++i) { > if (pthread_create(&thr, NULL, clone, NULL)!=0) > errx(1, "pthread_create"); > } > > for (i = 0; i < 8 * 4; ++i) { > if (pthread_create(&thr, NULL, rtsock, NULL) != 0) > errx(1, "pthread_create"); > } > > select(0, NULL, NULL, NULL, NULL); > > return 0; > } > -- :wq Claudio
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote: > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > > This diff converts the SRP list to a SMR list in rtsock.c > > SRP is a bit strange with how it works and the SMR code is a bit easier to > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab > > a refcount on the route pcb so that we can leave the SMR critical section > > and then enter the SMR critical section at the end of the loop before > > dropping the refcount again. > > > > The diff does not immeditaly explode but I doubt we can exploit > > parallelism in route_input() so this may fail at some later stage if it is > > wrong. > > > > Comments from the lock critics welcome > > After discussing this with mpi@ and jmatthew@ we came to the conclusion > that we need to smr_barrier() before refcnt_finalize() to ensure that no > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and > smr_read_leave(). > > Updated diff below ok by me > -- > :wq Claudio > > Index: net/rtsock.c > === > RCS file: /cvs/src/sys/net/rtsock.c,v > retrieving revision 1.334 > diff -u -p -r1.334 rtsock.c > --- net/rtsock.c 28 Jun 2022 10:01:13 - 1.334 > +++ net/rtsock.c 30 Jun 2022 09:25:53 - > @@ -71,7 +71,7 @@ > #include > #include > #include > -#include > +#include > > #include > #include > @@ -107,8 +107,6 @@ struct walkarg { > }; > > void route_prinit(void); > -void rcb_ref(void *, void *); > -void rcb_unref(void *, void *); > int route_output(struct mbuf *, struct socket *, struct sockaddr *, > struct mbuf *); > int route_ctloutput(int, struct socket *, int, int, struct mbuf *); > @@ -149,7 +147,7 @@ intrt_setsource(unsigned int, struct > struct rtpcb { > struct socket *rop_socket;/* [I] */ > > - SRPL_ENTRY(rtpcb) rop_list; > + SMR_TAILQ_ENTRY(rtpcb) rop_list; > struct refcnt rop_refcnt; > struct timeout rop_timeout; > unsigned introp_msgfilter; /* [s] */ > @@ -162,8 +160,7 @@ struct rtpcb { > #define sotortpcb(so) ((struct rtpcb *)(so)->so_pcb) > > struct rtptable { > - SRPL_HEAD(, rtpcb) rtp_list; > - struct srpl_rc rtp_rc; > + SMR_TAILQ_HEAD(, rtpcb) rtp_list; > struct rwlock rtp_lk; > unsigned intrtp_count; > }; > @@ -185,29 +182,12 @@ struct rtptable rtptable; > void > route_prinit(void) > { > - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL); > rw_init(&rtptable.rtp_lk, "rtsock"); > - SRPL_INIT(&rtptable.rtp_list); > + SMR_TAILQ_INIT(&rtptable.rtp_list); > pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0, > IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL); > } > > -void > -rcb_ref(void *null, void *v) > -{ > - struct rtpcb *rop = v; > - > - refcnt_take(&rop->rop_refcnt); > -} > - > -void > -rcb_unref(void *null, void *v) > -{ > - struct rtpcb *rop = v; > - > - refcnt_rele_wake(&rop->rop_refcnt); > -} > - > int > route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam, > struct mbuf *control, struct proc *p) > @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot > so->so_options |= SO_USELOOPBACK; > > rw_enter(&rtptable.rtp_lk, RW_WRITE); > - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, > - rop_list); > + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list); > rtptable.rtp_count++; > rw_exit(&rtptable.rtp_lk); > > @@ -347,13 +326,13 @@ route_detach(struct socket *so) > rw_enter(&rtptable.rtp_lk, RW_WRITE); > > rtptable.rtp_count--; > - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb, > - rop_list); > + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list); > rw_exit(&rtptable.rtp_lk); > > sounlock(so); > > /* wait for all references to drop */ > + smr_barrier(); > refcnt_finalize(&rop->rop_refcnt, "rtsockrefs"); > timeout_del_barrier(&rop->rop_timeout); > > @@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock > struct rtpcb *rop; > struct rt_msghdr *rtm; > struct mbuf *m = m0; > - struct srp_ref sr; > > /* ensure that we can access the rtm_type via mtod() */ > if (m->m_len < offsetof(struct rt_msghdr, rtm_type) + 1) { > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock > return; > } > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { > + smr_read_enter(); > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { > /* >* If route socket is bound to an address family only send >* messages that match the address family. Address family > @@ -519,7 +498,8 @@
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote: > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote: > > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > > > This diff converts the SRP list to a SMR list in rtsock.c > > > SRP is a bit strange with how it works and the SMR code is a bit easier to > > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab > > > a refcount on the route pcb so that we can leave the SMR critical section > > > and then enter the SMR critical section at the end of the loop before > > > dropping the refcount again. > > > > > > The diff does not immeditaly explode but I doubt we can exploit > > > parallelism in route_input() so this may fail at some later stage if it is > > > wrong. > > > > > > Comments from the lock critics welcome > > > > We use `so_lock' rwlock(9) to protect route domain sockets. We can't > > convert this SRP list to SMR list because we call solock() within > > foreach loop. > > because of the so_lock the code uses a refcnt on the route pcb to make > sure that the object is not freed while we sleep. So that is handled by > this diff. > > > We can easily crash kernel by running in parallel some "route monitor" > > commands and "while true; ifconfig vether0 create ; ifconfig vether0 > > destroy; done". > > That does not cause problem on my system. > Sorry, I missed you leave SMR section before solock() call: > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock > > > rop->rop_proto != sa_family) > > > continue; > > > > > > - > > > + refcnt_take(&rop->rop_refcnt); > > > + smr_read_leave(); > > > so = rop->rop_socket; > > > solock(so); > > > My system is stable with the second version of this diff and the following test. #include #include #include #include #include #include #include #include #include #include #include static struct ifreq ifr; static void *clone(void *arg) { int s; if ((s = socket(AF_INET, SOCK_DGRAM, 0)) < 0) err(1, "socket"); while (1) { if (ioctl(s, SIOCIFCREATE, &ifr) < 0) if (errno == EINVAL) err(1, "ioctl"); if(ioctl(s, SIOCIFDESTROY, &ifr)<0) if(errno == EINVAL) err(1, "ioctl"); } return NULL; } static void *rtsock(void *arg) { int s; while (1){ if ((s = socket(AF_ROUTE, SOCK_RAW, 0)) < 0) err(1, "socket"); close(s); } return NULL; } int main(int argc, char *argv[]) { pthread_t thr; int i; if( argc != 2) { fprintf(stderr, "usage: %s ifname\n", getprogname()); return 1; } if (getuid() != 0) { fprintf(stderr, "should be root\n"); return 1; } memset(&ifr, 0, sizeof(ifr)); strlcpy(ifr.ifr_name, argv[1], sizeof(ifr.ifr_name)); for(i = 0; i < 8 * 4; ++i) { if (pthread_create(&thr, NULL, clone, NULL)!=0) errx(1, "pthread_create"); } for (i = 0; i < 8 * 4; ++i) { if (pthread_create(&thr, NULL, rtsock, NULL) != 0) errx(1, "pthread_create"); } select(0, NULL, NULL, NULL, NULL); return 0; }
Re: Use SMR instead of SRP list in rtsock.c
On 30/06/22(Thu) 11:56, Claudio Jeker wrote: > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote: > > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > > > This diff converts the SRP list to a SMR list in rtsock.c > > > SRP is a bit strange with how it works and the SMR code is a bit easier to > > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab > > > a refcount on the route pcb so that we can leave the SMR critical section > > > and then enter the SMR critical section at the end of the loop before > > > dropping the refcount again. > > > > > > The diff does not immeditaly explode but I doubt we can exploit > > > parallelism in route_input() so this may fail at some later stage if it is > > > wrong. > > > > > > Comments from the lock critics welcome > > > > We use `so_lock' rwlock(9) to protect route domain sockets. We can't > > convert this SRP list to SMR list because we call solock() within > > foreach loop. We shouldn't use SRP list either, no? Or are we allowed to sleep holding a SRP reference? That's the question that triggered this diff. > because of the so_lock the code uses a refcnt on the route pcb to make > sure that the object is not freed while we sleep. So that is handled by > this diff. > > > We can easily crash kernel by running in parallel some "route monitor" > > commands and "while true; ifconfig vether0 create ; ifconfig vether0 > > destroy; done". > > That does not cause problem on my system. > > > > -- > > > :wq Claudio > > > > > > Index: sys/net/rtsock.c > > > === > > > RCS file: /cvs/src/sys/net/rtsock.c,v > > > retrieving revision 1.334 > > > diff -u -p -r1.334 rtsock.c > > > --- sys/net/rtsock.c 28 Jun 2022 10:01:13 - 1.334 > > > +++ sys/net/rtsock.c 30 Jun 2022 08:02:09 - > > > @@ -71,7 +71,7 @@ > > > #include > > > #include > > > #include > > > -#include > > > +#include > > > > > > #include > > > #include > > > @@ -107,8 +107,6 @@ struct walkarg { > > > }; > > > > > > void route_prinit(void); > > > -void rcb_ref(void *, void *); > > > -void rcb_unref(void *, void *); > > > int route_output(struct mbuf *, struct socket *, struct sockaddr *, > > > struct mbuf *); > > > int route_ctloutput(int, struct socket *, int, int, struct mbuf *); > > > @@ -149,7 +147,7 @@ intrt_setsource(unsigned int, struct > > > struct rtpcb { > > > struct socket *rop_socket;/* [I] */ > > > > > > - SRPL_ENTRY(rtpcb) rop_list; > > > + SMR_TAILQ_ENTRY(rtpcb) rop_list; > > > struct refcnt rop_refcnt; > > > struct timeout rop_timeout; > > > unsigned introp_msgfilter; /* [s] */ > > > @@ -162,8 +160,7 @@ struct rtpcb { > > > #define sotortpcb(so) ((struct rtpcb *)(so)->so_pcb) > > > > > > struct rtptable { > > > - SRPL_HEAD(, rtpcb) rtp_list; > > > - struct srpl_rc rtp_rc; > > > + SMR_TAILQ_HEAD(, rtpcb) rtp_list; > > > struct rwlock rtp_lk; > > > unsigned intrtp_count; > > > }; > > > @@ -185,29 +182,12 @@ struct rtptable rtptable; > > > void > > > route_prinit(void) > > > { > > > - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL); > > > rw_init(&rtptable.rtp_lk, "rtsock"); > > > - SRPL_INIT(&rtptable.rtp_list); > > > + SMR_TAILQ_INIT(&rtptable.rtp_list); > > > pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0, > > > IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL); > > > } > > > > > > -void > > > -rcb_ref(void *null, void *v) > > > -{ > > > - struct rtpcb *rop = v; > > > - > > > - refcnt_take(&rop->rop_refcnt); > > > -} > > > - > > > -void > > > -rcb_unref(void *null, void *v) > > > -{ > > > - struct rtpcb *rop = v; > > > - > > > - refcnt_rele_wake(&rop->rop_refcnt); > > > -} > > > - > > > int > > > route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf > > > *nam, > > > struct mbuf *control, struct proc *p) > > > @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot > > > so->so_options |= SO_USELOOPBACK; > > > > > > rw_enter(&rtptable.rtp_lk, RW_WRITE); > > > - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, > > > - rop_list); > > > + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list); > > > rtptable.rtp_count++; > > > rw_exit(&rtptable.rtp_lk); > > > > > > @@ -347,8 +326,7 @@ route_detach(struct socket *so) > > > rw_enter(&rtptable.rtp_lk, RW_WRITE); > > > > > > rtptable.rtp_count--; > > > - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb, > > > - rop_list); > > > + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list); > > > rw_exit(&rtptable.rtp_lk); > > > > > > sounlock(so); > > > @@ -356,6 +334,7 @@ route_detach(struct socket *so) > > > /* wait for all references to drop */ > > > refcnt_finalize(&rop->rop_refcnt, "rtsockrefs"); > >
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote: > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > > This diff converts the SRP list to a SMR list in rtsock.c > > SRP is a bit strange with how it works and the SMR code is a bit easier to > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab > > a refcount on the route pcb so that we can leave the SMR critical section > > and then enter the SMR critical section at the end of the loop before > > dropping the refcount again. > > > > The diff does not immeditaly explode but I doubt we can exploit > > parallelism in route_input() so this may fail at some later stage if it is > > wrong. > > > > Comments from the lock critics welcome > > We use `so_lock' rwlock(9) to protect route domain sockets. We can't > convert this SRP list to SMR list because we call solock() within > foreach loop. because of the so_lock the code uses a refcnt on the route pcb to make sure that the object is not freed while we sleep. So that is handled by this diff. > We can easily crash kernel by running in parallel some "route monitor" > commands and "while true; ifconfig vether0 create ; ifconfig vether0 > destroy; done". That does not cause problem on my system. > > -- > > :wq Claudio > > > > Index: sys/net/rtsock.c > > === > > RCS file: /cvs/src/sys/net/rtsock.c,v > > retrieving revision 1.334 > > diff -u -p -r1.334 rtsock.c > > --- sys/net/rtsock.c28 Jun 2022 10:01:13 - 1.334 > > +++ sys/net/rtsock.c30 Jun 2022 08:02:09 - > > @@ -71,7 +71,7 @@ > > #include > > #include > > #include > > -#include > > +#include > > > > #include > > #include > > @@ -107,8 +107,6 @@ struct walkarg { > > }; > > > > void route_prinit(void); > > -void rcb_ref(void *, void *); > > -void rcb_unref(void *, void *); > > introute_output(struct mbuf *, struct socket *, struct sockaddr *, > > struct mbuf *); > > introute_ctloutput(int, struct socket *, int, int, struct mbuf *); > > @@ -149,7 +147,7 @@ int rt_setsource(unsigned int, struct > > struct rtpcb { > > struct socket *rop_socket;/* [I] */ > > > > - SRPL_ENTRY(rtpcb) rop_list; > > + SMR_TAILQ_ENTRY(rtpcb) rop_list; > > struct refcnt rop_refcnt; > > struct timeout rop_timeout; > > unsigned introp_msgfilter; /* [s] */ > > @@ -162,8 +160,7 @@ struct rtpcb { > > #definesotortpcb(so) ((struct rtpcb *)(so)->so_pcb) > > > > struct rtptable { > > - SRPL_HEAD(, rtpcb) rtp_list; > > - struct srpl_rc rtp_rc; > > + SMR_TAILQ_HEAD(, rtpcb) rtp_list; > > struct rwlock rtp_lk; > > unsigned intrtp_count; > > }; > > @@ -185,29 +182,12 @@ struct rtptable rtptable; > > void > > route_prinit(void) > > { > > - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL); > > rw_init(&rtptable.rtp_lk, "rtsock"); > > - SRPL_INIT(&rtptable.rtp_list); > > + SMR_TAILQ_INIT(&rtptable.rtp_list); > > pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0, > > IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL); > > } > > > > -void > > -rcb_ref(void *null, void *v) > > -{ > > - struct rtpcb *rop = v; > > - > > - refcnt_take(&rop->rop_refcnt); > > -} > > - > > -void > > -rcb_unref(void *null, void *v) > > -{ > > - struct rtpcb *rop = v; > > - > > - refcnt_rele_wake(&rop->rop_refcnt); > > -} > > - > > int > > route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam, > > struct mbuf *control, struct proc *p) > > @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot > > so->so_options |= SO_USELOOPBACK; > > > > rw_enter(&rtptable.rtp_lk, RW_WRITE); > > - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, > > - rop_list); > > + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list); > > rtptable.rtp_count++; > > rw_exit(&rtptable.rtp_lk); > > > > @@ -347,8 +326,7 @@ route_detach(struct socket *so) > > rw_enter(&rtptable.rtp_lk, RW_WRITE); > > > > rtptable.rtp_count--; > > - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb, > > - rop_list); > > + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list); > > rw_exit(&rtptable.rtp_lk); > > > > sounlock(so); > > @@ -356,6 +334,7 @@ route_detach(struct socket *so) > > /* wait for all references to drop */ > > refcnt_finalize(&rop->rop_refcnt, "rtsockrefs"); > > timeout_del_barrier(&rop->rop_timeout); > > + smr_barrier(); > > > > solock(so); > > > > @@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock > > struct rtpcb *rop; > > struct rt_msghdr *rtm; > > struct mbuf *m = m0; > > - struct srp_ref sr; > > > > /* ensure that we can access the rtm_type via mtod() */ > >
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > This diff converts the SRP list to a SMR list in rtsock.c > SRP is a bit strange with how it works and the SMR code is a bit easier to > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab > a refcount on the route pcb so that we can leave the SMR critical section > and then enter the SMR critical section at the end of the loop before > dropping the refcount again. > > The diff does not immeditaly explode but I doubt we can exploit > parallelism in route_input() so this may fail at some later stage if it is > wrong. > > Comments from the lock critics welcome After discussing this with mpi@ and jmatthew@ we came to the conclusion that we need to smr_barrier() before refcnt_finalize() to ensure that no other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and smr_read_leave(). Updated diff below -- :wq Claudio Index: net/rtsock.c === RCS file: /cvs/src/sys/net/rtsock.c,v retrieving revision 1.334 diff -u -p -r1.334 rtsock.c --- net/rtsock.c28 Jun 2022 10:01:13 - 1.334 +++ net/rtsock.c30 Jun 2022 09:25:53 - @@ -71,7 +71,7 @@ #include #include #include -#include +#include #include #include @@ -107,8 +107,6 @@ struct walkarg { }; void route_prinit(void); -void rcb_ref(void *, void *); -void rcb_unref(void *, void *); introute_output(struct mbuf *, struct socket *, struct sockaddr *, struct mbuf *); introute_ctloutput(int, struct socket *, int, int, struct mbuf *); @@ -149,7 +147,7 @@ int rt_setsource(unsigned int, struct struct rtpcb { struct socket *rop_socket;/* [I] */ - SRPL_ENTRY(rtpcb) rop_list; + SMR_TAILQ_ENTRY(rtpcb) rop_list; struct refcnt rop_refcnt; struct timeout rop_timeout; unsigned introp_msgfilter; /* [s] */ @@ -162,8 +160,7 @@ struct rtpcb { #definesotortpcb(so) ((struct rtpcb *)(so)->so_pcb) struct rtptable { - SRPL_HEAD(, rtpcb) rtp_list; - struct srpl_rc rtp_rc; + SMR_TAILQ_HEAD(, rtpcb) rtp_list; struct rwlock rtp_lk; unsigned intrtp_count; }; @@ -185,29 +182,12 @@ struct rtptable rtptable; void route_prinit(void) { - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL); rw_init(&rtptable.rtp_lk, "rtsock"); - SRPL_INIT(&rtptable.rtp_list); + SMR_TAILQ_INIT(&rtptable.rtp_list); pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0, IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL); } -void -rcb_ref(void *null, void *v) -{ - struct rtpcb *rop = v; - - refcnt_take(&rop->rop_refcnt); -} - -void -rcb_unref(void *null, void *v) -{ - struct rtpcb *rop = v; - - refcnt_rele_wake(&rop->rop_refcnt); -} - int route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam, struct mbuf *control, struct proc *p) @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot so->so_options |= SO_USELOOPBACK; rw_enter(&rtptable.rtp_lk, RW_WRITE); - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, - rop_list); + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list); rtptable.rtp_count++; rw_exit(&rtptable.rtp_lk); @@ -347,13 +326,13 @@ route_detach(struct socket *so) rw_enter(&rtptable.rtp_lk, RW_WRITE); rtptable.rtp_count--; - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb, - rop_list); + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list); rw_exit(&rtptable.rtp_lk); sounlock(so); /* wait for all references to drop */ + smr_barrier(); refcnt_finalize(&rop->rop_refcnt, "rtsockrefs"); timeout_del_barrier(&rop->rop_timeout); @@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock struct rtpcb *rop; struct rt_msghdr *rtm; struct mbuf *m = m0; - struct srp_ref sr; /* ensure that we can access the rtm_type via mtod() */ if (m->m_len < offsetof(struct rt_msghdr, rtm_type) + 1) { @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock return; } - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { + smr_read_enter(); + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { /* * If route socket is bound to an address family only send * messages that match the address family. Address family @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock rop->rop_proto != sa_family) continue; - + refcnt_take(&rop->rop_refcnt); + smr_read_leave(); so = rop->rop_soc
Re: Use SMR instead of SRP list in rtsock.c
On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote: > This diff converts the SRP list to a SMR list in rtsock.c > SRP is a bit strange with how it works and the SMR code is a bit easier to > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab > a refcount on the route pcb so that we can leave the SMR critical section > and then enter the SMR critical section at the end of the loop before > dropping the refcount again. > > The diff does not immeditaly explode but I doubt we can exploit > parallelism in route_input() so this may fail at some later stage if it is > wrong. > > Comments from the lock critics welcome We use `so_lock' rwlock(9) to protect route domain sockets. We can't convert this SRP list to SMR list because we call solock() within foreach loop. We can easily crash kernel by running in parallel some "route monitor" commands and "while true; ifconfig vether0 create ; ifconfig vether0 destroy; done". > -- > :wq Claudio > > Index: sys/net/rtsock.c > === > RCS file: /cvs/src/sys/net/rtsock.c,v > retrieving revision 1.334 > diff -u -p -r1.334 rtsock.c > --- sys/net/rtsock.c 28 Jun 2022 10:01:13 - 1.334 > +++ sys/net/rtsock.c 30 Jun 2022 08:02:09 - > @@ -71,7 +71,7 @@ > #include > #include > #include > -#include > +#include > > #include > #include > @@ -107,8 +107,6 @@ struct walkarg { > }; > > void route_prinit(void); > -void rcb_ref(void *, void *); > -void rcb_unref(void *, void *); > int route_output(struct mbuf *, struct socket *, struct sockaddr *, > struct mbuf *); > int route_ctloutput(int, struct socket *, int, int, struct mbuf *); > @@ -149,7 +147,7 @@ intrt_setsource(unsigned int, struct > struct rtpcb { > struct socket *rop_socket;/* [I] */ > > - SRPL_ENTRY(rtpcb) rop_list; > + SMR_TAILQ_ENTRY(rtpcb) rop_list; > struct refcnt rop_refcnt; > struct timeout rop_timeout; > unsigned introp_msgfilter; /* [s] */ > @@ -162,8 +160,7 @@ struct rtpcb { > #define sotortpcb(so) ((struct rtpcb *)(so)->so_pcb) > > struct rtptable { > - SRPL_HEAD(, rtpcb) rtp_list; > - struct srpl_rc rtp_rc; > + SMR_TAILQ_HEAD(, rtpcb) rtp_list; > struct rwlock rtp_lk; > unsigned intrtp_count; > }; > @@ -185,29 +182,12 @@ struct rtptable rtptable; > void > route_prinit(void) > { > - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL); > rw_init(&rtptable.rtp_lk, "rtsock"); > - SRPL_INIT(&rtptable.rtp_list); > + SMR_TAILQ_INIT(&rtptable.rtp_list); > pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0, > IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL); > } > > -void > -rcb_ref(void *null, void *v) > -{ > - struct rtpcb *rop = v; > - > - refcnt_take(&rop->rop_refcnt); > -} > - > -void > -rcb_unref(void *null, void *v) > -{ > - struct rtpcb *rop = v; > - > - refcnt_rele_wake(&rop->rop_refcnt); > -} > - > int > route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam, > struct mbuf *control, struct proc *p) > @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot > so->so_options |= SO_USELOOPBACK; > > rw_enter(&rtptable.rtp_lk, RW_WRITE); > - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, > - rop_list); > + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list); > rtptable.rtp_count++; > rw_exit(&rtptable.rtp_lk); > > @@ -347,8 +326,7 @@ route_detach(struct socket *so) > rw_enter(&rtptable.rtp_lk, RW_WRITE); > > rtptable.rtp_count--; > - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb, > - rop_list); > + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list); > rw_exit(&rtptable.rtp_lk); > > sounlock(so); > @@ -356,6 +334,7 @@ route_detach(struct socket *so) > /* wait for all references to drop */ > refcnt_finalize(&rop->rop_refcnt, "rtsockrefs"); > timeout_del_barrier(&rop->rop_timeout); > + smr_barrier(); > > solock(so); > > @@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock > struct rtpcb *rop; > struct rt_msghdr *rtm; > struct mbuf *m = m0; > - struct srp_ref sr; > > /* ensure that we can access the rtm_type via mtod() */ > if (m->m_len < offsetof(struct rt_msghdr, rtm_type) + 1) { > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock > return; > } > > - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { > + smr_read_enter(); > + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { > /* >* If route socket is bound to an address family only send >* messages that match the address family. Address famil
Use SMR instead of SRP list in rtsock.c
This diff converts the SRP list to a SMR list in rtsock.c SRP is a bit strange with how it works and the SMR code is a bit easier to understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to grab a refcount on the route pcb so that we can leave the SMR critical section and then enter the SMR critical section at the end of the loop before dropping the refcount again. The diff does not immeditaly explode but I doubt we can exploit parallelism in route_input() so this may fail at some later stage if it is wrong. Comments from the lock critics welcome -- :wq Claudio Index: sys/net/rtsock.c === RCS file: /cvs/src/sys/net/rtsock.c,v retrieving revision 1.334 diff -u -p -r1.334 rtsock.c --- sys/net/rtsock.c28 Jun 2022 10:01:13 - 1.334 +++ sys/net/rtsock.c30 Jun 2022 08:02:09 - @@ -71,7 +71,7 @@ #include #include #include -#include +#include #include #include @@ -107,8 +107,6 @@ struct walkarg { }; void route_prinit(void); -void rcb_ref(void *, void *); -void rcb_unref(void *, void *); introute_output(struct mbuf *, struct socket *, struct sockaddr *, struct mbuf *); introute_ctloutput(int, struct socket *, int, int, struct mbuf *); @@ -149,7 +147,7 @@ int rt_setsource(unsigned int, struct struct rtpcb { struct socket *rop_socket;/* [I] */ - SRPL_ENTRY(rtpcb) rop_list; + SMR_TAILQ_ENTRY(rtpcb) rop_list; struct refcnt rop_refcnt; struct timeout rop_timeout; unsigned introp_msgfilter; /* [s] */ @@ -162,8 +160,7 @@ struct rtpcb { #definesotortpcb(so) ((struct rtpcb *)(so)->so_pcb) struct rtptable { - SRPL_HEAD(, rtpcb) rtp_list; - struct srpl_rc rtp_rc; + SMR_TAILQ_HEAD(, rtpcb) rtp_list; struct rwlock rtp_lk; unsigned intrtp_count; }; @@ -185,29 +182,12 @@ struct rtptable rtptable; void route_prinit(void) { - srpl_rc_init(&rtptable.rtp_rc, rcb_ref, rcb_unref, NULL); rw_init(&rtptable.rtp_lk, "rtsock"); - SRPL_INIT(&rtptable.rtp_list); + SMR_TAILQ_INIT(&rtptable.rtp_list); pool_init(&rtpcb_pool, sizeof(struct rtpcb), 0, IPL_SOFTNET, PR_WAITOK, "rtpcb", NULL); } -void -rcb_ref(void *null, void *v) -{ - struct rtpcb *rop = v; - - refcnt_take(&rop->rop_refcnt); -} - -void -rcb_unref(void *null, void *v) -{ - struct rtpcb *rop = v; - - refcnt_rele_wake(&rop->rop_refcnt); -} - int route_usrreq(struct socket *so, int req, struct mbuf *m, struct mbuf *nam, struct mbuf *control, struct proc *p) @@ -325,8 +305,7 @@ route_attach(struct socket *so, int prot so->so_options |= SO_USELOOPBACK; rw_enter(&rtptable.rtp_lk, RW_WRITE); - SRPL_INSERT_HEAD_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, - rop_list); + SMR_TAILQ_INSERT_HEAD_LOCKED(&rtptable.rtp_list, rop, rop_list); rtptable.rtp_count++; rw_exit(&rtptable.rtp_lk); @@ -347,8 +326,7 @@ route_detach(struct socket *so) rw_enter(&rtptable.rtp_lk, RW_WRITE); rtptable.rtp_count--; - SRPL_REMOVE_LOCKED(&rtptable.rtp_rc, &rtptable.rtp_list, rop, rtpcb, - rop_list); + SMR_TAILQ_REMOVE_LOCKED(&rtptable.rtp_list, rop, rop_list); rw_exit(&rtptable.rtp_lk); sounlock(so); @@ -356,6 +334,7 @@ route_detach(struct socket *so) /* wait for all references to drop */ refcnt_finalize(&rop->rop_refcnt, "rtsockrefs"); timeout_del_barrier(&rop->rop_timeout); + smr_barrier(); solock(so); @@ -501,7 +480,6 @@ route_input(struct mbuf *m0, struct sock struct rtpcb *rop; struct rt_msghdr *rtm; struct mbuf *m = m0; - struct srp_ref sr; /* ensure that we can access the rtm_type via mtod() */ if (m->m_len < offsetof(struct rt_msghdr, rtm_type) + 1) { @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock return; } - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) { + smr_read_enter(); + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) { /* * If route socket is bound to an address family only send * messages that match the address family. Address family @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock rop->rop_proto != sa_family) continue; - + refcnt_take(&rop->rop_refcnt); + smr_read_leave(); so = rop->rop_socket; solock(so); @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock rtm_sendup(so, m); next: sounlock(so); + smr_read_enter(); + refcnt_rele_wake(&rop->rop_refcnt);