Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
Hi, One more thing. http://www.netbsd.org/~ozaki-r/wm-split-mutex.diff This patch splits the mutex of wm into two: one for tx and the other for rx. By doing so, lock contentions can be reduced. We lock both for other operations that need locking, e.g., init, stop and ioctl. I didn't do it before because I saw performance degradation by doing so. However, I tested again and confirmed that there is no visible degradation. Any comments? ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
Hi all, http://www.netbsd.org/~ozaki-r/mpsafe-vioif.diff https://github.com/ozaki-r/netbsd-src/commits/experimental/mpsafe-vioif This is the last piece of my MPSAFE series and it makes if_vioif MPSAFE. The modifications are mostly same as what I did for if_wm. One point I should note here is that sc_flags is added to virtio_softc, which is used to tell that the interrupt handler of vioif is MPSAFE and the others (viomb and ld_virtio) are not. As usual, the MPSAFE feature is off by default. You can enable it by setting NET_MPSAFE in if.h or your kernel config file. Any comments are appreciated. Regards, ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
No comment or objection? Then, I'll commit the patch tomorrow. ozaki-r On Wed, Jul 9, 2014 at 12:55 PM, Ryota Ozaki wrote: > On Tue, Jul 8, 2014 at 12:54 PM, Ryota Ozaki wrote: >> Hi, >> >> A new patch has come: http://www.netbsd.org/~ozaki-r/mpsafe-bridge.diff > > I confirmed the patch doesn't add new failures > in both NET_MPSAFE and non-NET_MPSAFE cases. > > ozaki-r > >> >> The patch makes bridge forwarding MPSAFE. As same as wm, >> it introduces BRIDGE_MPSAFE to switch MPSAFE and non-MPSAFE >> codes. However, in the case of bridge, some locking codes >> are always enabled to reduce ifdef switches. I think it's >> not a problem because the codes are not performance critical. >> And also some splnet are still there for the same reason. >> >> Another note is about bif (bridge member list entry) object >> reference counting. It enables fine-grain locking for bridge >> member lists by allowing to not hold a lock during touching >> a bif. In order to do so, bridge_release_member is added >> to decrement the reference count and a condition variable >> to do bridge_delete_member graceful. >> >> You can try the patch with MPSAFE enabled by defining >> NET_MPSAFE in if.h or your kernel config file. If your >> machine has Intel 1G NICs (wm), by applying a patch(*), >> you can see bridge_forward running in parallel. >> >> (*) >> https://github.com/ozaki-r/netbsd-src/commit/2879f184e336376574c7a07d9ab34d9d55449a7b >> >> Have fun, >> ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Tue, Jul 8, 2014 at 12:54 PM, Ryota Ozaki wrote: > Hi, > > A new patch has come: http://www.netbsd.org/~ozaki-r/mpsafe-bridge.diff I confirmed the patch doesn't add new failures in both NET_MPSAFE and non-NET_MPSAFE cases. ozaki-r > > The patch makes bridge forwarding MPSAFE. As same as wm, > it introduces BRIDGE_MPSAFE to switch MPSAFE and non-MPSAFE > codes. However, in the case of bridge, some locking codes > are always enabled to reduce ifdef switches. I think it's > not a problem because the codes are not performance critical. > And also some splnet are still there for the same reason. > > Another note is about bif (bridge member list entry) object > reference counting. It enables fine-grain locking for bridge > member lists by allowing to not hold a lock during touching > a bif. In order to do so, bridge_release_member is added > to decrement the reference count and a condition variable > to do bridge_delete_member graceful. > > You can try the patch with MPSAFE enabled by defining > NET_MPSAFE in if.h or your kernel config file. If your > machine has Intel 1G NICs (wm), by applying a patch(*), > you can see bridge_forward running in parallel. > > (*) > https://github.com/ozaki-r/netbsd-src/commit/2879f184e336376574c7a07d9ab34d9d55449a7b > > Have fun, > ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
Hi, A new patch has come: http://www.netbsd.org/~ozaki-r/mpsafe-bridge.diff The patch makes bridge forwarding MPSAFE. As same as wm, it introduces BRIDGE_MPSAFE to switch MPSAFE and non-MPSAFE codes. However, in the case of bridge, some locking codes are always enabled to reduce ifdef switches. I think it's not a problem because the codes are not performance critical. And also some splnet are still there for the same reason. Another note is about bif (bridge member list entry) object reference counting. It enables fine-grain locking for bridge member lists by allowing to not hold a lock during touching a bif. In order to do so, bridge_release_member is added to decrement the reference count and a condition variable to do bridge_delete_member graceful. You can try the patch with MPSAFE enabled by defining NET_MPSAFE in if.h or your kernel config file. If your machine has Intel 1G NICs (wm), by applying a patch(*), you can see bridge_forward running in parallel. (*) https://github.com/ozaki-r/netbsd-src/commit/2879f184e336376574c7a07d9ab34d9d55449a7b Have fun, ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Wed, Jun 11, 2014 at 5:20 PM, Ryota Ozaki wrote: > On Tue, Jun 3, 2014 at 10:06 PM, Ryota Ozaki wrote: >> Hi Darren, >> >> On Tue, Jun 3, 2014 at 9:16 PM, Darren Reed wrote: >>> I assume this to be related to what rmind is doing, yes? >> >> Yes and no. We have a same goal (networking parallelism), but we're >> working on different components; he is mainly at L3 and above, and >> we are below L3. We collaborate a little to avoid overlapping each >> effort, but we are working separately. >> >>> >>> Is there a projects page that outlines the approach, etc? >> >> Not yet. I'll propose it our members. > > The page will be somewhere under > https://github.com/IIJ-NetBSD/netbsd-src/wiki . https://github.com/IIJ-NetBSD/netbsd-src/wiki/smpnet Here you are. Any comments are appreciated :) Regards, ozaki-r > > ozaki-r > >> >>> >>> This URL: https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif >>> isn't really very useful in a web browser. Do you have one that is? >> >> Please look at >> >> https://github.com/ozaki-r/netbsd-src/compare/master...experimental%2Fmpsafe-bridge-wm-vioif >> for a readable diff, or >> http://www.netbsd.org/~ozaki-r/mpsafe-bridge-wm-vioif.diff >> for a raw patch. >> >> Regards, >> ozaki-r >> >>> >>> Darren >>>
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Mon, Jun 30, 2014 at 4:09 PM, Ryota Ozaki wrote: > Hi, > > http://www.netbsd.org/~ozaki-r/mpsafe-ifq-wm.diff > > I've updated the patch. The changes include: > - Make callouts MPSAFE > - Reduce splnet as much as possible > - Make ifconfig up/down work correctly under load > > if_wm is now MPSAFEd as much as possible at this point. I've tested the combinations of the two cases: - NET_MPSAFE or not - NEWQUEUE or not (Rangeley or KVM) All works fine for me. I'll commit the patch soon, but any comments and suggestions are still not too late! Thanks, ozaki-r > > Thanks, > ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
Hi, http://www.netbsd.org/~ozaki-r/mpsafe-ifq-wm.diff I've updated the patch. The changes include: - Make callouts MPSAFE - Reduce splnet as much as possible - Make ifconfig up/down work correctly under load if_wm is now MPSAFEd as much as possible at this point. Thanks, ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Sun, Jun 22, 2014 at 7:14 AM, Matt Thomas wrote: > > On Jun 21, 2014, at 5:56 AM, Ryota Ozaki wrote: > >> On Sat, Jun 21, 2014 at 10:00 AM, Matt Thomas wrote: >>> >>> >>> On Jun 20, 2014, at 5:57 AM, Ryota Ozaki wrote: >>> Hi, I've prepared a trial patch of MPSAFE networking. http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff >>> >>> The kmutex_t in ifqueue, etc. should be pointers and not in the structure >>> themselves. >>> That can simply the macros to test for a NULL pointer for the locks in the >>> non-WM case. >>> >>> Consider using mutex_obj_alloc to get mutexes instead of the embedding them >>> in the >>> structures. >> >> Well...do you mean that the macros should be like these? >> >> #define WM_LOCK(_sc) if ((_sc)->sc_txrx_lock) >> mutex_enter((_sc)->sc_txrx_lock) >> #define WM_UNLOCK(_sc) if ((_sc)->sc_txrx_lock) >> mutex_exit((_sc)->sc_txrx_lock) > > I more thinking of the ifq macros. > I see. So I updated the patch: http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff Regards, ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Jun 21, 2014, at 9:48 PM, Darren Reed wrote: > On 22/06/2014 8:13 AM, Matt Thomas wrote: >> On Jun 21, 2014, at 4:56 AM, Darren Reed wrote: >>> On 21/06/2014 11:00 AM, Matt Thomas wrote: On Jun 20, 2014, at 5:57 AM, Ryota Ozaki wrote: > Hi, > > I've prepared a trial patch of MPSAFE networking. > > http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff > The kmutex_t in ifqueue, etc. should be pointers and not in the structure themselves. That can simply the macros to test for a NULL pointer for the locks in the non-WM case. Consider using mutex_obj_alloc to get mutexes instead of the embedding them in the structures. >>> >>> This coding pattern goes against what is done almost everywhere else. >>> >>> What's the rationale behind it? >> >> That's not true. uvm_object's, sockets, etc. all use external kmutex's. >> In this isntance It allows for the ifqueue to use device kmutex is >> there is one. Also, mutexes should be cacheline aligned and that is >> difficult to do in a structure where mutex_obj_alloc does that >> autmoatuiccally. > > Ah, I didn't know that this type of approach was used elsewhere in > NetBSD. Are there any compiler hints that can be embedded in structs > that will result in padding and correct alignment? My solution has > been to always put the lock(s) at the front of a structure. > Can __attribute__((aligned(X)) be used appropriately here or is it > likely to be ignored in the middle of a structure? The problem is that the memory allocators won't honor it since they will return a maximum alignment of ALIGNBYTES+1, where COHERENCY_UNIT can be much higher than that.
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On 22/06/2014 8:13 AM, Matt Thomas wrote: > On Jun 21, 2014, at 4:56 AM, Darren Reed wrote: >> On 21/06/2014 11:00 AM, Matt Thomas wrote: >>> On Jun 20, 2014, at 5:57 AM, Ryota Ozaki wrote: >>> Hi, I've prepared a trial patch of MPSAFE networking. http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff >>> The kmutex_t in ifqueue, etc. should be pointers and not in the structure >>> themselves. >>> That can simply the macros to test for a NULL pointer for the locks in the >>> non-WM case. >>> >>> Consider using mutex_obj_alloc to get mutexes instead of the embedding them >>> in the >>> structures. >> >> This coding pattern goes against what is done almost everywhere else. >> >> What's the rationale behind it? > > That's not true. uvm_object's, sockets, etc. all use external kmutex's. > In this isntance It allows for the ifqueue to use device kmutex is > there is one. Also, mutexes should be cacheline aligned and that is > difficult to do in a structure where mutex_obj_alloc does that > autmoatuiccally. Ah, I didn't know that this type of approach was used elsewhere in NetBSD. Are there any compiler hints that can be embedded in structs that will result in padding and correct alignment? My solution has been to always put the lock(s) at the front of a structure. Can __attribute__((aligned(X)) be used appropriately here or is it likely to be ignored in the middle of a structure? Darren
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Jun 21, 2014, at 5:56 AM, Ryota Ozaki wrote: > On Sat, Jun 21, 2014 at 10:00 AM, Matt Thomas wrote: >> >> >> On Jun 20, 2014, at 5:57 AM, Ryota Ozaki wrote: >> >>> Hi, >>> >>> I've prepared a trial patch of MPSAFE networking. >>> >>> http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff >>> >> >> The kmutex_t in ifqueue, etc. should be pointers and not in the structure >> themselves. >> That can simply the macros to test for a NULL pointer for the locks in the >> non-WM case. >> >> Consider using mutex_obj_alloc to get mutexes instead of the embedding them >> in the >> structures. > > Well...do you mean that the macros should be like these? > > #define WM_LOCK(_sc) if ((_sc)->sc_txrx_lock) > mutex_enter((_sc)->sc_txrx_lock) > #define WM_UNLOCK(_sc) if ((_sc)->sc_txrx_lock) > mutex_exit((_sc)->sc_txrx_lock) I more thinking of the ifq macros.
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Jun 21, 2014, at 4:56 AM, Darren Reed wrote: > On 21/06/2014 11:00 AM, Matt Thomas wrote: >> On Jun 20, 2014, at 5:57 AM, Ryota Ozaki wrote: >> >>> Hi, >>> >>> I've prepared a trial patch of MPSAFE networking. >>> >>> http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff >>> >> The kmutex_t in ifqueue, etc. should be pointers and not in the structure >> themselves. >> That can simply the macros to test for a NULL pointer for the locks in the >> non-WM case. >> >> Consider using mutex_obj_alloc to get mutexes instead of the embedding them >> in the >> structures. > > This coding pattern goes against what is done almost everywhere else. > > What's the rationale behind it? That's not true. uvm_object's, sockets, etc. all use external kmutex's. In this isntance It allows for the ifqueue to use device kmutex is there is one. Also, mutexes should be cacheline aligned and that is difficult to do in a structure where mutex_obj_alloc does that autmoatuiccally.
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Sat, Jun 21, 2014 at 10:00 AM, Matt Thomas wrote: > > > On Jun 20, 2014, at 5:57 AM, Ryota Ozaki wrote: > > > Hi, > > > > I've prepared a trial patch of MPSAFE networking. > > > > http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff > > > > The kmutex_t in ifqueue, etc. should be pointers and not in the structure > themselves. > That can simply the macros to test for a NULL pointer for the locks in the > non-WM case. > > Consider using mutex_obj_alloc to get mutexes instead of the embedding them > in the > structures. Well...do you mean that the macros should be like these? #define WM_LOCK(_sc) if ((_sc)->sc_txrx_lock) mutex_enter((_sc)->sc_txrx_lock) #define WM_UNLOCK(_sc) if ((_sc)->sc_txrx_lock) mutex_exit((_sc)->sc_txrx_lock) > > > Some of your macros are missing 'do's :) Sure :) Thanks! ozaki-r > > > It enables the interrupt handler of if_wm to run without > > KERNEL_LOCK; an interrupt context and a LWP context (e.g., > > wm_start) run in parallel safely. > > > > You can try it by applying the patch to -current > > and commenting in NET_MPSAFE in sys/net/if.h. > > > > A complete patch of my work can be found at usual places: > > - http://www.netbsd.org/~ozaki-r/mpsafe-bridge-wm-vioif.diff > > - > > https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif > > > > ozaki-r >
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On 21/06/2014 11:00 AM, Matt Thomas wrote: > On Jun 20, 2014, at 5:57 AM, Ryota Ozaki wrote: > >> Hi, >> >> I've prepared a trial patch of MPSAFE networking. >> >> http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff >> > The kmutex_t in ifqueue, etc. should be pointers and not in the structure > themselves. > That can simply the macros to test for a NULL pointer for the locks in the > non-WM case. > > Consider using mutex_obj_alloc to get mutexes instead of the embedding them > in the > structures. This coding pattern goes against what is done almost everywhere else. What's the rationale behind it? Darren
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Jun 20, 2014, at 5:57 AM, Ryota Ozaki wrote: > Hi, > > I've prepared a trial patch of MPSAFE networking. > > http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff > The kmutex_t in ifqueue, etc. should be pointers and not in the structure themselves. That can simply the macros to test for a NULL pointer for the locks in the non-WM case. Consider using mutex_obj_alloc to get mutexes instead of the embedding them in the structures. Some of your macros are missing 'do's :) > It enables the interrupt handler of if_wm to run without > KERNEL_LOCK; an interrupt context and a LWP context (e.g., > wm_start) run in parallel safely. > > You can try it by applying the patch to -current > and commenting in NET_MPSAFE in sys/net/if.h. > > A complete patch of my work can be found at usual places: > - http://www.netbsd.org/~ozaki-r/mpsafe-bridge-wm-vioif.diff > - > https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif > > ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
Hi, I've prepared a trial patch of MPSAFE networking. http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff It enables the interrupt handler of if_wm to run without KERNEL_LOCK; an interrupt context and a LWP context (e.g., wm_start) run in parallel safely. You can try it by applying the patch to -current and commenting in NET_MPSAFE in sys/net/if.h. A complete patch of my work can be found at usual places: - http://www.netbsd.org/~ozaki-r/mpsafe-bridge-wm-vioif.diff - https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif ozaki-r
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Tue, Jun 3, 2014 at 10:06 PM, Ryota Ozaki wrote: > Hi Darren, > > On Tue, Jun 3, 2014 at 9:16 PM, Darren Reed wrote: >> I assume this to be related to what rmind is doing, yes? > > Yes and no. We have a same goal (networking parallelism), but we're > working on different components; he is mainly at L3 and above, and > we are below L3. We collaborate a little to avoid overlapping each > effort, but we are working separately. > >> >> Is there a projects page that outlines the approach, etc? > > Not yet. I'll propose it our members. The page will be somewhere under https://github.com/IIJ-NetBSD/netbsd-src/wiki . ozaki-r > >> >> This URL: >>> >>> https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif >> isn't really very useful in a web browser. Do you have one that is? > > Please look at > > https://github.com/ozaki-r/netbsd-src/compare/master...experimental%2Fmpsafe-bridge-wm-vioif > for a readable diff, or > http://www.netbsd.org/~ozaki-r/mpsafe-bridge-wm-vioif.diff > for a raw patch. > > Regards, > ozaki-r > >> >> Darren >>
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On Wed, Jun 4, 2014 at 12:23 AM, Darren Reed wrote: > On 3/06/2014 11:06 PM, Ryota Ozaki wrote: >> >> Hi Darren, >> >> On Tue, Jun 3, 2014 at 9:16 PM, Darren Reed wrote: >>> >>> I assume this to be related to what rmind is doing, yes? >> >> Yes and no. We have a same goal (networking parallelism), but we're >> working on different components; he is mainly at L3 and above, and >> we are below L3. We collaborate a little to avoid overlapping each >> effort, but we are working separately. >> > > In so much as it is possible, I would suggest following on what rmind > is doing to make the code paths lock free as much as possible ... or are > you introducing locks that he's removing? For example, looking at the > code, you're adding locks around IFQ_DEQUEUE but he's removing that > macro and locks ... or? I think there are two steps to make the networking code MP-capable. The first step is to remove big kernel locks and make it work with some fine-grain locks (MP-safe), and the second is to make it scalable, for example, by employing lockless data structures. We are still on the first step while rmind is on the second. I think we can use rmind's pktqueue for bridge's forwarding queues to get rid of the locks around IFQ_DEQUEUE. Once pktqueue is merged, I will try it for bridge. > > Are you all committing to the same branch/tree? Not yet, but we (IIJ guys) plan to prepare an integrated branch for convenience to test. ozaki-r > > Darren >
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
On 3/06/2014 11:06 PM, Ryota Ozaki wrote: Hi Darren, On Tue, Jun 3, 2014 at 9:16 PM, Darren Reed wrote: I assume this to be related to what rmind is doing, yes? Yes and no. We have a same goal (networking parallelism), but we're working on different components; he is mainly at L3 and above, and we are below L3. We collaborate a little to avoid overlapping each effort, but we are working separately. In so much as it is possible, I would suggest following on what rmind is doing to make the code paths lock free as much as possible ... or are you introducing locks that he's removing? For example, looking at the code, you're adding locks around IFQ_DEQUEUE but he's removing that macro and locks ... or? Are you all committing to the same branch/tree? Darren
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
Hi Darren, On Tue, Jun 3, 2014 at 9:16 PM, Darren Reed wrote: > I assume this to be related to what rmind is doing, yes? Yes and no. We have a same goal (networking parallelism), but we're working on different components; he is mainly at L3 and above, and we are below L3. We collaborate a little to avoid overlapping each effort, but we are working separately. > > Is there a projects page that outlines the approach, etc? Not yet. I'll propose it our members. > > This URL: >> >> https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif > isn't really very useful in a web browser. Do you have one that is? Please look at https://github.com/ozaki-r/netbsd-src/compare/master...experimental%2Fmpsafe-bridge-wm-vioif for a readable diff, or http://www.netbsd.org/~ozaki-r/mpsafe-bridge-wm-vioif.diff for a raw patch. Regards, ozaki-r > > Darren >
Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)
I assume this to be related to what rmind is doing, yes? Is there a projects page that outlines the approach, etc? This URL: > https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif isn't really very useful in a web browser. Do you have one that is? Darren
RFC: mpsafe bridge and NIC drivers (vioif and wm)
Hi all, We're working on making the network stack (mainly below L3) and device drivers MP-safe. This RFC is one of the efforts, making bridge (but STP), vioif and wm MP-safe. Bridging (L2 forwarding) works in parallel. The patch set can be found here: https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif Note that applying only this patch has less fun because all interrupts are delivered to CPU#0 so that bridge's softint runs on only the CPU. Please apply another patch set (*); it allows to change a destination CPU of an IRQ to a CPU but CPU#0. The following example changes the destination CPU of IRQ#22 to CPU#2: sysctl -w kern.cpu_affinity.irq="22:2" With the patch set you can fully try the above new feature. (*) https://github.com/knakahara/netbsd-src/commits/k-nakahara-POC-irq-affinity Any comments are appreciated. Thanks, ozaki-r