Re: svn commit: r254520 - in head/sys: kern sys
On 21.08.2013 21:59, Navdeep Parhar wrote: On 08/21/13 12:41, Scott Long wrote: On Aug 21, 2013, at 8:59 AM, Andre Oppermann wrote: On 19.08.2013 23:45, Navdeep Parhar wrote: On 08/19/13 13:58, Andre Oppermann wrote: On 19.08.2013 19:33, Navdeep Parhar wrote: On 08/19/13 04:16, Andre Oppermann wrote: Author: andre Date: Mon Aug 19 11:16:53 2013 New Revision: 254520 URL: http://svnweb.freebsd.org/changeset/base/254520 Log: Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree users for a very long time, if ever. Should such a functionality ever be needed again the appropriate and much better way to do it is through a custom EXT_SOMETHING external mbuf type together with a dedicated *ext_free function. Discussed with:trociny, glebius Modified: head/sys/kern/kern_mbuf.c head/sys/kern/uipc_mbuf.c head/sys/sys/mbuf.h Hello Andre, Is this just garbage collection or is there some other reason for this? This is garbage collection and removal of not quite right, rotten, functionality. I recently tried some experiments to reduce the number of mbuf and cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved very useful and the code changes to the kernel were minimal. See user/np/cxl_tuning. The experiment was quite successful and I was planning to bring in most of those changes to HEAD. I was hoping to get some runtime mileage on the approach in general before tweaking the ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt within the cluster. But now M_NOFREE has vanished without a warning... I'm looking through your experimental code and that is some really good numbers you're achieving there! However a couple things don't feel quite right, hackish even, and not fit for HEAD. This is a bit the same situation we had with some of the first 1GigE cards quite a number of years back (mostly ti(4)). There we ended up with a couple of just good enough hacks to make it fast. Most of the remains I've collected today. If M_NOFREE and EXT_EXTREF are properly supported in the tree (and I'm arguing that they were, before r254520) then the changes are perfectly legitimate. The only hackish part was that I was getting the cluster from the jumbop zone while bypassing its normal refcnt mechanism. This I did so as to use the same zone as m_uiotombuf to keep it "hot" for all consumers (driver + network stack). If you insist I'll revert the commit removing M_NOFREE. EXT_EXTREF isn't touched yet, but should get better support. The hackish part for me is that the driver again manages its own memory pool. Windows works that way, NetBSD is moving towards it while FreeBSD has and remains at a central network memory pool. The latter (our current) way of doing it seems more efficient overall especially on heavily loaded networked machines. There may be significant queues building (think app blocked having many sockets buffer fill up) up delaying the freeing and returning of network memory resources. Together with fragmentation this can lead to bad very outcomes. Router applications with many interfaces also greatly benefit from central memory pools. So I'm really not sure that we should move back in the driver owned pool direction with lots of code duplication and copy-pasting (see NetBSD). Also it is kinda weird to have a kernel based pool for data going down the stack and another one in each driver for those going up. Actually I'm of the opinion that we should stay with the central memory pool and fix so that it works just as well for those cases a driver pool currently performs better. The central memory pool approach is too slow, unfortunately. There's a reason that other OS's are moving to them. At Netflix we are currently working on some approaches to private memory pools in order to achieve better efficiency, and we're closely watching and anticipating Navdeep's work. I should point out that I went to great lengths to use the jumbop zone in my experiments, and not create my own pool of memory for the rx buffers. The hope was to share cache warmth (sounds very cosy :-) with the likes of m_uiotombuf (which uses jumbop too) etc. So I'm actually in the camp that prefers central pools. I'm just trying out ways to reduce the trips we have to make to the pool(s) involved. Laying down mbufs within clusters, and packing multiple frames per cluster clearly helps. Careful cluster recycling within the NIC seems to work too. What you describe does make a lot of sense. Jumbop is the optimal size for the VM. We should really look at and pushing forward to have a nicer M_NOFREE+M_EXTREF API for 10 and new HEAD going forward. As always it seems to depend on the use case and what is being measured too. Is it single stream performance? Concurrent streams? Both ways or only one way, in or out? Each makes very different use of many parts of the stack and driver leading to different bottlenecks. The Netfli
Re: svn commit: r254520 - in head/sys: kern sys
On 08/21/13 12:41, Scott Long wrote: > > On Aug 21, 2013, at 8:59 AM, Andre Oppermann wrote: > >> On 19.08.2013 23:45, Navdeep Parhar wrote: >>> On 08/19/13 13:58, Andre Oppermann wrote: On 19.08.2013 19:33, Navdeep Parhar wrote: > On 08/19/13 04:16, Andre Oppermann wrote: >> Author: andre >> Date: Mon Aug 19 11:16:53 2013 >> New Revision: 254520 >> URL: http://svnweb.freebsd.org/changeset/base/254520 >> >> Log: >>Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree >> users >>for a very long time, if ever. >> >>Should such a functionality ever be needed again the appropriate and >>much better way to do it is through a custom EXT_SOMETHING >> external mbuf >>type together with a dedicated *ext_free function. >> >>Discussed with:trociny, glebius >> >> Modified: >>head/sys/kern/kern_mbuf.c >>head/sys/kern/uipc_mbuf.c >>head/sys/sys/mbuf.h >> > > Hello Andre, > > Is this just garbage collection or is there some other reason for this? This is garbage collection and removal of not quite right, rotten, functionality. > I recently tried some experiments to reduce the number of mbuf and > cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved > very useful and the code changes to the kernel were minimal. See > user/np/cxl_tuning. The experiment was quite successful and I was > planning to bring in most of those changes to HEAD. I was hoping to get > some runtime mileage on the approach in general before tweaking the > ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt > within the cluster. But now M_NOFREE has vanished without a warning... I'm looking through your experimental code and that is some really good numbers you're achieving there! However a couple things don't feel quite right, hackish even, and not fit for HEAD. This is a bit the same situation we had with some of the first 1GigE cards quite a number of years back (mostly ti(4)). There we ended up with a couple of just good enough hacks to make it fast. Most of the remains I've collected today. >>> >>> If M_NOFREE and EXT_EXTREF are properly supported in the tree (and I'm >>> arguing that they were, before r254520) then the changes are perfectly >>> legitimate. The only hackish part was that I was getting the cluster >>> from the jumbop zone while bypassing its normal refcnt mechanism. This >>> I did so as to use the same zone as m_uiotombuf to keep it "hot" for all >>> consumers (driver + network stack). >> >> If you insist I'll revert the commit removing M_NOFREE. EXT_EXTREF isn't >> touched yet, but should get better support. >> >> The hackish part for me is that the driver again manages its own memory >> pool. Windows works that way, NetBSD is moving towards it while FreeBSD >> has and remains at a central network memory pool. The latter (our current) >> way of doing it seems more efficient overall especially on heavily loaded >> networked machines. There may be significant queues building (think app >> blocked having many sockets buffer fill up) up delaying the freeing and >> returning of network memory resources. Together with fragmentation this >> can lead to bad very outcomes. Router applications with many interfaces >> also greatly benefit from central memory pools. >> >> So I'm really not sure that we should move back in the driver owned pool >> direction with lots of code duplication and copy-pasting (see NetBSD). >> Also it is kinda weird to have a kernel based pool for data going down >> the stack and another one in each driver for those going up. >> >> Actually I'm of the opinion that we should stay with the central memory >> pool and fix so that it works just as well for those cases a driver pool >> currently performs better. > > The central memory pool approach is too slow, unfortunately. There's a > reason that other OS's are moving to them. At Netflix we are > currently working on some approaches to private memory pools in order to > achieve better efficiency, and we're closely watching and anticipating > Navdeep's > work. I should point out that I went to great lengths to use the jumbop zone in my experiments, and not create my own pool of memory for the rx buffers. The hope was to share cache warmth (sounds very cosy :-) with the likes of m_uiotombuf (which uses jumbop too) etc. So I'm actually in the camp that prefers central pools. I'm just trying out ways to reduce the trips we have to make to the pool(s) involved. Laying down mbufs within clusters, and packing multiple frames per cluster clearly helps. Careful cluster recycling within the NIC seems to work too. Regards, Navdeep ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-s
Re: svn commit: r254520 - in head/sys: kern sys
On Aug 21, 2013, at 8:59 AM, Andre Oppermann wrote: > On 19.08.2013 23:45, Navdeep Parhar wrote: >> On 08/19/13 13:58, Andre Oppermann wrote: >>> On 19.08.2013 19:33, Navdeep Parhar wrote: On 08/19/13 04:16, Andre Oppermann wrote: > Author: andre > Date: Mon Aug 19 11:16:53 2013 > New Revision: 254520 > URL: http://svnweb.freebsd.org/changeset/base/254520 > > Log: >Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree > users >for a very long time, if ever. > >Should such a functionality ever be needed again the appropriate and >much better way to do it is through a custom EXT_SOMETHING > external mbuf >type together with a dedicated *ext_free function. > >Discussed with:trociny, glebius > > Modified: >head/sys/kern/kern_mbuf.c >head/sys/kern/uipc_mbuf.c >head/sys/sys/mbuf.h > Hello Andre, Is this just garbage collection or is there some other reason for this? >>> >>> This is garbage collection and removal of not quite right, rotten, >>> functionality. >>> I recently tried some experiments to reduce the number of mbuf and cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved very useful and the code changes to the kernel were minimal. See user/np/cxl_tuning. The experiment was quite successful and I was planning to bring in most of those changes to HEAD. I was hoping to get some runtime mileage on the approach in general before tweaking the ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt within the cluster. But now M_NOFREE has vanished without a warning... >>> >>> I'm looking through your experimental code and that is some really good >>> numbers you're achieving there! >>> >>> However a couple things don't feel quite right, hackish even, and not >>> fit for HEAD. This is a bit the same situation we had with some of the >>> first 1GigE cards quite a number of years back (mostly ti(4)). There >>> we ended up with a couple of just good enough hacks to make it fast. >>> Most of the remains I've collected today. >> >> If M_NOFREE and EXT_EXTREF are properly supported in the tree (and I'm >> arguing that they were, before r254520) then the changes are perfectly >> legitimate. The only hackish part was that I was getting the cluster >> from the jumbop zone while bypassing its normal refcnt mechanism. This >> I did so as to use the same zone as m_uiotombuf to keep it "hot" for all >> consumers (driver + network stack). > > If you insist I'll revert the commit removing M_NOFREE. EXT_EXTREF isn't > touched yet, but should get better support. > > The hackish part for me is that the driver again manages its own memory > pool. Windows works that way, NetBSD is moving towards it while FreeBSD > has and remains at a central network memory pool. The latter (our current) > way of doing it seems more efficient overall especially on heavily loaded > networked machines. There may be significant queues building (think app > blocked having many sockets buffer fill up) up delaying the freeing and > returning of network memory resources. Together with fragmentation this > can lead to bad very outcomes. Router applications with many interfaces > also greatly benefit from central memory pools. > > So I'm really not sure that we should move back in the driver owned pool > direction with lots of code duplication and copy-pasting (see NetBSD). > Also it is kinda weird to have a kernel based pool for data going down > the stack and another one in each driver for those going up. > > Actually I'm of the opinion that we should stay with the central memory > pool and fix so that it works just as well for those cases a driver pool > currently performs better. The central memory pool approach is too slow, unfortunately. There's a reason that other OS's are moving to them. At Netflix we are currently working on some approaches to private memory pools in order to achieve better efficiency, and we're closely watching and anticipating Navdeep's work. Scott ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
Hi Andre, On 21.08.2013 17:03, Peter Grehan wrote: The way to go should be 4K clusters as they are native to the architecture. IIRC a PCIe DMA can't cross a 4K boundary anyway That's a 4G boundary, for some devices. I meant a single PCIe DMA transaction can be at most 4K before it has to set up another one? 4K is the maximum TLP size but that's rarely (ever?) used - 128 or 256 bytes is more common. A DMA operation is just a sequence of TLPs originating from the endpoint. later, Peter. ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On 21.08.2013 17:03, Peter Grehan wrote: The way to go should be 4K clusters as they are native to the architecture. IIRC a PCIe DMA can't cross a 4K boundary anyway That's a 4G boundary, for some devices. I meant a single PCIe DMA transaction can be at most 4K before it has to set up another one? -- Andre ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
The way to go should be 4K clusters as they are native to the architecture. IIRC a PCIe DMA can't cross a 4K boundary anyway That's a 4G boundary, for some devices. later, Peter. ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On 20.08.2013 00:03, Navdeep Parhar wrote: On 08/19/13 14:08, Andre Oppermann wrote: On 19.08.2013 19:40, Peter Grehan wrote: I recently tried some experiments to reduce the number of mbuf and cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved very useful and the code changes to the kernel were minimal. See user/np/cxl_tuning. The experiment was quite successful and I was planning to bring in most of those changes to HEAD. I was hoping to get some runtime mileage on the approach in general before tweaking the ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt within the cluster. But now M_NOFREE has vanished without a warning... I also had a virtualization work-in-progress where static mbufs were allocated in the kernel and M_NOFREE set. Might be worth sending a prior heads-up for these type of changes. I'm sorry for ambushing but this stuff has to be done. I have provided an alternative way of handling it and I'm happy to help you with your use case to make it good for you and to prevent the mbuf system from getting bloated and hackish again. I don't know what Peter's use case is but I'm curious about the already-available alternative to M_NOFREE, if that's what you meant. Can you please elaborate? When you supply your own (*ext_free) function you can simply omit freeing the mbuf itself. Should make sure not to leak it though. -- Andre ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On 19.08.2013 23:45, Navdeep Parhar wrote: On 08/19/13 13:58, Andre Oppermann wrote: On 19.08.2013 19:33, Navdeep Parhar wrote: On 08/19/13 04:16, Andre Oppermann wrote: Author: andre Date: Mon Aug 19 11:16:53 2013 New Revision: 254520 URL: http://svnweb.freebsd.org/changeset/base/254520 Log: Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree users for a very long time, if ever. Should such a functionality ever be needed again the appropriate and much better way to do it is through a custom EXT_SOMETHING external mbuf type together with a dedicated *ext_free function. Discussed with:trociny, glebius Modified: head/sys/kern/kern_mbuf.c head/sys/kern/uipc_mbuf.c head/sys/sys/mbuf.h Hello Andre, Is this just garbage collection or is there some other reason for this? This is garbage collection and removal of not quite right, rotten, functionality. I recently tried some experiments to reduce the number of mbuf and cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved very useful and the code changes to the kernel were minimal. See user/np/cxl_tuning. The experiment was quite successful and I was planning to bring in most of those changes to HEAD. I was hoping to get some runtime mileage on the approach in general before tweaking the ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt within the cluster. But now M_NOFREE has vanished without a warning... I'm looking through your experimental code and that is some really good numbers you're achieving there! However a couple things don't feel quite right, hackish even, and not fit for HEAD. This is a bit the same situation we had with some of the first 1GigE cards quite a number of years back (mostly ti(4)). There we ended up with a couple of just good enough hacks to make it fast. Most of the remains I've collected today. If M_NOFREE and EXT_EXTREF are properly supported in the tree (and I'm arguing that they were, before r254520) then the changes are perfectly legitimate. The only hackish part was that I was getting the cluster from the jumbop zone while bypassing its normal refcnt mechanism. This I did so as to use the same zone as m_uiotombuf to keep it "hot" for all consumers (driver + network stack). If you insist I'll revert the commit removing M_NOFREE. EXT_EXTREF isn't touched yet, but should get better support. The hackish part for me is that the driver again manages its own memory pool. Windows works that way, NetBSD is moving towards it while FreeBSD has and remains at a central network memory pool. The latter (our current) way of doing it seems more efficient overall especially on heavily loaded networked machines. There may be significant queues building (think app blocked having many sockets buffer fill up) up delaying the freeing and returning of network memory resources. Together with fragmentation this can lead to bad very outcomes. Router applications with many interfaces also greatly benefit from central memory pools. So I'm really not sure that we should move back in the driver owned pool direction with lots of code duplication and copy-pasting (see NetBSD). Also it is kinda weird to have a kernel based pool for data going down the stack and another one in each driver for those going up. Actually I'm of the opinion that we should stay with the central memory pool and fix so that it works just as well for those cases a driver pool currently performs better. I believe most of the improvements you've shown can be implemented in a more generic and safe way into the mbuf system. Also a number of things in your experimental code may have side-effects in situations other than netperf runs. Agreed. As I mentioned my long term plan was to tweak the jumboX zones to allow them to allocate cluster with embedded mbuf + refcount. The M_NOFREE+EXT_EXTREF approach is the perfect bridge from here to there. It is non-intrusive and lends itself well to experimentation. Agreed, full support on fixing the refcount issue. To summarize what I get from your experimental branch commits: - the Chelsio T4/T5 card can DMA multiple ethernet frames (packets) into a single memory buffer, provided it is large enough. - you make use of that feature and point multiple m_ext mbufs into that buffer to separate the packets and send them up the stack. - you embed the m_ext refcount into the single memory buffer as well. yes, yes, and yes. - you recycle mbufs? (I'm not entirely clear on that as I'm not familiar with the cxgbe code) I recycle the cluster (and the embedded mbuf in it) when possible. All depends on whether it's still in use by the time the rx queue needs it back. There's always a couple of problems with driver managed pools. Driver shutdown/ unloading requires all such managed mbufs to have returned before it can proceed. This may take a undetermined long time as mbufs are sitting in socket buffers or other q
Re: svn commit: r254520 - in head/sys: kern sys
On 08/19/13 14:08, Andre Oppermann wrote: > On 19.08.2013 19:40, Peter Grehan wrote: >>> I recently tried some experiments to reduce the number of mbuf and >>> cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved >>> very useful and the code changes to the kernel were minimal. See >>> user/np/cxl_tuning. The experiment was quite successful and I was >>> planning to bring in most of those changes to HEAD. I was hoping to get >>> some runtime mileage on the approach in general before tweaking the >>> ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt >>> within the cluster. But now M_NOFREE has vanished without a warning... >> >> I also had a virtualization work-in-progress where static mbufs were >> allocated in the kernel and >> M_NOFREE set. >> >> Might be worth sending a prior heads-up for these type of changes. > > I'm sorry for ambushing but this stuff has to be done. I have provided > an alternative way of handling it and I'm happy to help you with your > use case to make it good for you and to prevent the mbuf system from > getting bloated and hackish again. I don't know what Peter's use case is but I'm curious about the already-available alternative to M_NOFREE, if that's what you meant. Can you please elaborate? Regards, Navdeep ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On 08/19/13 13:58, Andre Oppermann wrote: > On 19.08.2013 19:33, Navdeep Parhar wrote: >> On 08/19/13 04:16, Andre Oppermann wrote: >>> Author: andre >>> Date: Mon Aug 19 11:16:53 2013 >>> New Revision: 254520 >>> URL: http://svnweb.freebsd.org/changeset/base/254520 >>> >>> Log: >>>Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree >>> users >>>for a very long time, if ever. >>> >>>Should such a functionality ever be needed again the appropriate and >>>much better way to do it is through a custom EXT_SOMETHING >>> external mbuf >>>type together with a dedicated *ext_free function. >>> >>>Discussed with:trociny, glebius >>> >>> Modified: >>>head/sys/kern/kern_mbuf.c >>>head/sys/kern/uipc_mbuf.c >>>head/sys/sys/mbuf.h >>> >> >> Hello Andre, >> >> Is this just garbage collection or is there some other reason for this? > > This is garbage collection and removal of not quite right, rotten, > functionality. > >> I recently tried some experiments to reduce the number of mbuf and >> cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved >> very useful and the code changes to the kernel were minimal. See >> user/np/cxl_tuning. The experiment was quite successful and I was >> planning to bring in most of those changes to HEAD. I was hoping to get >> some runtime mileage on the approach in general before tweaking the >> ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt >> within the cluster. But now M_NOFREE has vanished without a warning... > > I'm looking through your experimental code and that is some really good > numbers you're achieving there! > > However a couple things don't feel quite right, hackish even, and not > fit for HEAD. This is a bit the same situation we had with some of the > first 1GigE cards quite a number of years back (mostly ti(4)). There > we ended up with a couple of just good enough hacks to make it fast. > Most of the remains I've collected today. If M_NOFREE and EXT_EXTREF are properly supported in the tree (and I'm arguing that they were, before r254520) then the changes are perfectly legitimate. The only hackish part was that I was getting the cluster from the jumbop zone while bypassing its normal refcnt mechanism. This I did so as to use the same zone as m_uiotombuf to keep it "hot" for all consumers (driver + network stack). > I believe most of the improvements you've shown can be implemented in > a more generic and safe way into the mbuf system. Also a number of things > in your experimental code may have side-effects in situations other than > netperf runs. Agreed. As I mentioned my long term plan was to tweak the jumboX zones to allow them to allocate cluster with embedded mbuf + refcount. The M_NOFREE+EXT_EXTREF approach is the perfect bridge from here to there. It is non-intrusive and lends itself well to experimentation. > > To summarize what I get from your experimental branch commits: > - the Chelsio T4/T5 card can DMA multiple ethernet frames (packets) into > a single memory buffer, provided it is large enough. > - you make use of that feature and point multiple m_ext mbufs into that > buffer to separate the packets and send them up the stack. > - you embed the m_ext refcount into the single memory buffer as well. yes, yes, and yes. > - you recycle mbufs? (I'm not entirely clear on that as I'm not familiar > with the cxgbe code) I recycle the cluster (and the embedded mbuf in it) when possible. All depends on whether it's still in use by the time the rx queue needs it back. > > Lets examine and discuss these parts: > - M_NOFREE wasn't really safe with bpf anyway at least for packets going > down the stack. I see. Can you point out the parts of bpf unable to deal with M_NOFREE? > - Instead of M_NOFREE a custom *ext_free should be used that has the same > and even more functionality. Yes, that's what I was planning too, with the jumboX zone changes. It would be faster than running the m_ext's free function (which is a function dereference+call). > - Recycling mbufs may cause imbalances to the mbuf pool with multiple cores > and possibly NUMA in the future. Not that we are very good with it at > the moment but bypassing the mbuf allocator shouldn't become the norm. > If it is a problem/bottleneck again it should be fixed, not worked around > and copy-pasted n-times in so many drivers. If/when a cluster is recycled, it is given back to the same rx ithread that originally allocated it, and not not any other queue. If the ithread stays in the same NUMA domain (and it really should) then recycling the cluster in the same queue really shouldn't cause imbalances. > - jumbo9 and jumbo16 mbufs should not be used because they are more special > than necessary with being KVM and physically contiguous. This probably > isn't necessary for the T4/T5 cards and any other modern DMA engine. > Under heavy diverse network the kernel memory becom
Re: svn commit: r254520 - in head/sys: kern sys
fwiw - +1 on everything you've said to date about this. -adrian ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On 19.08.2013 19:40, Peter Grehan wrote: I recently tried some experiments to reduce the number of mbuf and cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved very useful and the code changes to the kernel were minimal. See user/np/cxl_tuning. The experiment was quite successful and I was planning to bring in most of those changes to HEAD. I was hoping to get some runtime mileage on the approach in general before tweaking the ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt within the cluster. But now M_NOFREE has vanished without a warning... I also had a virtualization work-in-progress where static mbufs were allocated in the kernel and M_NOFREE set. Might be worth sending a prior heads-up for these type of changes. I'm sorry for ambushing but this stuff has to be done. I have provided an alternative way of handling it and I'm happy to help you with your use case to make it good for you and to prevent the mbuf system from getting bloated and hackish again. Mbuf is a core system for the kernel and we should avoid to kitchen-sink it again while at the same time to keep speedy enough to keep up with the speed requirements. I believe it would be bad to have Navdeep, you and others invent their own network buffer management routines over again in slightly different ways tailored to each immediate use case. Can you please describe your intended use of M_NOFREE to better understand the shortcomings of the current mbuf systems and the additional advantages of the M_NOFREE case? -- Andre ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On 19.08.2013 19:33, Navdeep Parhar wrote: On 08/19/13 04:16, Andre Oppermann wrote: Author: andre Date: Mon Aug 19 11:16:53 2013 New Revision: 254520 URL: http://svnweb.freebsd.org/changeset/base/254520 Log: Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree users for a very long time, if ever. Should such a functionality ever be needed again the appropriate and much better way to do it is through a custom EXT_SOMETHING external mbuf type together with a dedicated *ext_free function. Discussed with: trociny, glebius Modified: head/sys/kern/kern_mbuf.c head/sys/kern/uipc_mbuf.c head/sys/sys/mbuf.h Hello Andre, Is this just garbage collection or is there some other reason for this? This is garbage collection and removal of not quite right, rotten, functionality. I recently tried some experiments to reduce the number of mbuf and cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved very useful and the code changes to the kernel were minimal. See user/np/cxl_tuning. The experiment was quite successful and I was planning to bring in most of those changes to HEAD. I was hoping to get some runtime mileage on the approach in general before tweaking the ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt within the cluster. But now M_NOFREE has vanished without a warning... I'm looking through your experimental code and that is some really good numbers you're achieving there! However a couple things don't feel quite right, hackish even, and not fit for HEAD. This is a bit the same situation we had with some of the first 1GigE cards quite a number of years back (mostly ti(4)). There we ended up with a couple of just good enough hacks to make it fast. Most of the remains I've collected today. I believe most of the improvements you've shown can be implemented in a more generic and safe way into the mbuf system. Also a number of things in your experimental code may have side-effects in situations other than netperf runs. To summarize what I get from your experimental branch commits: - the Chelsio T4/T5 card can DMA multiple ethernet frames (packets) into a single memory buffer, provided it is large enough. - you make use of that feature and point multiple m_ext mbufs into that buffer to separate the packets and send them up the stack. - you embed the m_ext refcount into the single memory buffer as well. - you recycle mbufs? (I'm not entirely clear on that as I'm not familiar with the cxgbe code) Lets examine and discuss these parts: - M_NOFREE wasn't really safe with bpf anyway at least for packets going down the stack. - Instead of M_NOFREE a custom *ext_free should be used that has the same and even more functionality. - Recycling mbufs may cause imbalances to the mbuf pool with multiple cores and possibly NUMA in the future. Not that we are very good with it at the moment but bypassing the mbuf allocator shouldn't become the norm. If it is a problem/bottleneck again it should be fixed, not worked around and copy-pasted n-times in so many drivers. - jumbo9 and jumbo16 mbufs should not be used because they are more special than necessary with being KVM and physically contiguous. This probably isn't necessary for the T4/T5 cards and any other modern DMA engine. Under heavy diverse network the kernel memory becomes fragmented and can't find memory fulfilling both criteria anymore. In fact both are an artifact of the early GigE hacks when high speed DMA engines were in their infancy. Both jumbo9 and jumbo16 should go away without direct replacement. In your T4/T5 case the alternative would be either to a) allocate your own memory directly from KVM with the necessary properties (KVM and/or phys contig); b) have such a generic kernel mbuf type fulfilling the same role. There may be some cache line issues on non-x86 systems that have to be though and taken care of. - Refcounts should have the option of being separately allocated. It was mandatory to use the refcount zone so far because there wasn't any other type of refcount. Now that we have one we should revisit the issue. Actually the entire mbuf allocation and initialization could be streamlined as a whole. On a side note a different mbuf handling for the RX DMA rings may give some significant improvements as well: allocate just a cluster without a mbuf through m_cljget() and put it into the RX ring. Only when the DMA has put a packet into it allocated attach the mbuf (in the drivers RX function). This avoids the cache pollution from touching the mbuf fields during initial allocation, including the refcount. Another nice trick would be to shorten the mbuf by 4 bytes (in ext_size) and put the refcount there. Lets work on these together. -- Andre ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, sen
Re: svn commit: r254520 - in head/sys: kern sys
I recently tried some experiments to reduce the number of mbuf and cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved very useful and the code changes to the kernel were minimal. See user/np/cxl_tuning. The experiment was quite successful and I was planning to bring in most of those changes to HEAD. I was hoping to get some runtime mileage on the approach in general before tweaking the ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt within the cluster. But now M_NOFREE has vanished without a warning... I also had a virtualization work-in-progress where static mbufs were allocated in the kernel and M_NOFREE set. Might be worth sending a prior heads-up for these type of changes. later, Peter. ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On 08/19/13 04:16, Andre Oppermann wrote: > Author: andre > Date: Mon Aug 19 11:16:53 2013 > New Revision: 254520 > URL: http://svnweb.freebsd.org/changeset/base/254520 > > Log: > Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree users > for a very long time, if ever. > > Should such a functionality ever be needed again the appropriate and > much better way to do it is through a custom EXT_SOMETHING external mbuf > type together with a dedicated *ext_free function. > > Discussed with: trociny, glebius > > Modified: > head/sys/kern/kern_mbuf.c > head/sys/kern/uipc_mbuf.c > head/sys/sys/mbuf.h > Hello Andre, Is this just garbage collection or is there some other reason for this? I recently tried some experiments to reduce the number of mbuf and cluster allocations in a 40G NIC driver. M_NOFREE and EXT_EXTREF proved very useful and the code changes to the kernel were minimal. See user/np/cxl_tuning. The experiment was quite successful and I was planning to bring in most of those changes to HEAD. I was hoping to get some runtime mileage on the approach in general before tweaking the ctors/dtors for jumpbo, jumbo9, jumbo16 to allow for an mbuf+refcnt within the cluster. But now M_NOFREE has vanished without a warning... Regards, Navdeep ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On 19.08.2013 13:23, Davide Italiano wrote: On Mon, Aug 19, 2013 at 1:16 PM, Andre Oppermann wrote: Modified: head/sys/sys/mbuf.h == --- head/sys/sys/mbuf.h Mon Aug 19 11:08:36 2013(r254519) +++ head/sys/sys/mbuf.h Mon Aug 19 11:16:53 2013(r254520) @@ -200,7 +200,7 @@ struct mbuf { /* 0x8000free */ #defineM_VLANTAG 0x0001 /* ether_vtag is valid */ #defineM_PROMISC 0x0002 /* packet was not for us */ -#defineM_NOFREE0x0004 /* do not free mbuf, embedded in cluster */ +/* 0x0004free */ I think you should just use M_UNUSED or something similar here for consistency, like it's happening in td_pflags (sys/sys/proc.h), e.g.: #define TDP_UNUSED9 0x0100 /* --available-- */ There's a couple of more changes upcoming that will take care of it. -- Andre ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r254520 - in head/sys: kern sys
On Mon, Aug 19, 2013 at 1:16 PM, Andre Oppermann wrote: > Author: andre > Date: Mon Aug 19 11:16:53 2013 > New Revision: 254520 > URL: http://svnweb.freebsd.org/changeset/base/254520 > > Log: > Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree users > for a very long time, if ever. > > Should such a functionality ever be needed again the appropriate and > much better way to do it is through a custom EXT_SOMETHING external mbuf > type together with a dedicated *ext_free function. > > Discussed with: trociny, glebius > > Modified: > head/sys/kern/kern_mbuf.c > head/sys/kern/uipc_mbuf.c > head/sys/sys/mbuf.h > > Modified: head/sys/kern/kern_mbuf.c > == > --- head/sys/kern/kern_mbuf.c Mon Aug 19 11:08:36 2013(r254519) > +++ head/sys/kern/kern_mbuf.c Mon Aug 19 11:16:53 2013(r254520) > @@ -474,7 +474,6 @@ mb_dtor_mbuf(void *mem, int size, void * > if ((flags & MB_NOTAGS) == 0 && (m->m_flags & M_PKTHDR) != 0) > m_tag_delete_chain(m, NULL); > KASSERT((m->m_flags & M_EXT) == 0, ("%s: M_EXT set", __func__)); > - KASSERT((m->m_flags & M_NOFREE) == 0, ("%s: M_NOFREE set", __func__)); > #ifdef INVARIANTS > trash_dtor(mem, size, arg); > #endif > > Modified: head/sys/kern/uipc_mbuf.c > == > --- head/sys/kern/uipc_mbuf.c Mon Aug 19 11:08:36 2013(r254519) > +++ head/sys/kern/uipc_mbuf.c Mon Aug 19 11:16:53 2013(r254520) > @@ -278,17 +278,10 @@ m_extadd(struct mbuf *mb, caddr_t buf, u > void > mb_free_ext(struct mbuf *m) > { > - int skipmbuf; > > KASSERT((m->m_flags & M_EXT) == M_EXT, ("%s: M_EXT not set", > __func__)); > KASSERT(m->m_ext.ref_cnt != NULL, ("%s: ref_cnt not set", __func__)); > > - > - /* > -* check if the header is embedded in the cluster > -*/ > - skipmbuf = (m->m_flags & M_NOFREE); > - > /* Free attached storage if this mbuf is the only reference to it. */ > if (*(m->m_ext.ref_cnt) == 1 || > atomic_fetchadd_int(m->m_ext.ref_cnt, -1) == 1) { > @@ -329,8 +322,6 @@ mb_free_ext(struct mbuf *m) > ("%s: unknown ext_type", __func__)); > } > } > - if (skipmbuf) > - return; > > /* > * Free this mbuf back to the mbuf zone with all m_ext > @@ -395,7 +386,7 @@ m_demote(struct mbuf *m0, int all) > m_freem(m->m_nextpkt); > m->m_nextpkt = NULL; > } > - m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE); > + m->m_flags = m->m_flags & (M_EXT|M_RDONLY); > } > } > > > Modified: head/sys/sys/mbuf.h > == > --- head/sys/sys/mbuf.h Mon Aug 19 11:08:36 2013(r254519) > +++ head/sys/sys/mbuf.h Mon Aug 19 11:16:53 2013(r254520) > @@ -200,7 +200,7 @@ struct mbuf { > /* 0x8000free */ > #defineM_VLANTAG 0x0001 /* ether_vtag is valid */ > #defineM_PROMISC 0x0002 /* packet was not for us */ > -#defineM_NOFREE0x0004 /* do not free mbuf, embedded in > cluster */ > +/* 0x0004free */ I think you should just use M_UNUSED or something similar here for consistency, like it's happening in td_pflags (sys/sys/proc.h), e.g.: #define TDP_UNUSED9 0x0100 /* --available-- */ > #defineM_PROTO60x0008 /* protocol-specific */ > #defineM_PROTO70x0010 /* protocol-specific */ > #defineM_PROTO80x0020 /* protocol-specific */ > @@ -515,7 +515,7 @@ m_free(struct mbuf *m) > > if (m->m_flags & M_EXT) > mb_free_ext(m); > - else if ((m->m_flags & M_NOFREE) == 0) > + else > uma_zfree(zone_mbuf, m); > return (n); > } -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
svn commit: r254520 - in head/sys: kern sys
Author: andre Date: Mon Aug 19 11:16:53 2013 New Revision: 254520 URL: http://svnweb.freebsd.org/changeset/base/254520 Log: Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree users for a very long time, if ever. Should such a functionality ever be needed again the appropriate and much better way to do it is through a custom EXT_SOMETHING external mbuf type together with a dedicated *ext_free function. Discussed with: trociny, glebius Modified: head/sys/kern/kern_mbuf.c head/sys/kern/uipc_mbuf.c head/sys/sys/mbuf.h Modified: head/sys/kern/kern_mbuf.c == --- head/sys/kern/kern_mbuf.c Mon Aug 19 11:08:36 2013(r254519) +++ head/sys/kern/kern_mbuf.c Mon Aug 19 11:16:53 2013(r254520) @@ -474,7 +474,6 @@ mb_dtor_mbuf(void *mem, int size, void * if ((flags & MB_NOTAGS) == 0 && (m->m_flags & M_PKTHDR) != 0) m_tag_delete_chain(m, NULL); KASSERT((m->m_flags & M_EXT) == 0, ("%s: M_EXT set", __func__)); - KASSERT((m->m_flags & M_NOFREE) == 0, ("%s: M_NOFREE set", __func__)); #ifdef INVARIANTS trash_dtor(mem, size, arg); #endif Modified: head/sys/kern/uipc_mbuf.c == --- head/sys/kern/uipc_mbuf.c Mon Aug 19 11:08:36 2013(r254519) +++ head/sys/kern/uipc_mbuf.c Mon Aug 19 11:16:53 2013(r254520) @@ -278,17 +278,10 @@ m_extadd(struct mbuf *mb, caddr_t buf, u void mb_free_ext(struct mbuf *m) { - int skipmbuf; KASSERT((m->m_flags & M_EXT) == M_EXT, ("%s: M_EXT not set", __func__)); KASSERT(m->m_ext.ref_cnt != NULL, ("%s: ref_cnt not set", __func__)); - - /* -* check if the header is embedded in the cluster -*/ - skipmbuf = (m->m_flags & M_NOFREE); - /* Free attached storage if this mbuf is the only reference to it. */ if (*(m->m_ext.ref_cnt) == 1 || atomic_fetchadd_int(m->m_ext.ref_cnt, -1) == 1) { @@ -329,8 +322,6 @@ mb_free_ext(struct mbuf *m) ("%s: unknown ext_type", __func__)); } } - if (skipmbuf) - return; /* * Free this mbuf back to the mbuf zone with all m_ext @@ -395,7 +386,7 @@ m_demote(struct mbuf *m0, int all) m_freem(m->m_nextpkt); m->m_nextpkt = NULL; } - m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE); + m->m_flags = m->m_flags & (M_EXT|M_RDONLY); } } Modified: head/sys/sys/mbuf.h == --- head/sys/sys/mbuf.h Mon Aug 19 11:08:36 2013(r254519) +++ head/sys/sys/mbuf.h Mon Aug 19 11:16:53 2013(r254520) @@ -200,7 +200,7 @@ struct mbuf { /* 0x8000free */ #defineM_VLANTAG 0x0001 /* ether_vtag is valid */ #defineM_PROMISC 0x0002 /* packet was not for us */ -#defineM_NOFREE0x0004 /* do not free mbuf, embedded in cluster */ +/* 0x0004free */ #defineM_PROTO60x0008 /* protocol-specific */ #defineM_PROTO70x0010 /* protocol-specific */ #defineM_PROTO80x0020 /* protocol-specific */ @@ -515,7 +515,7 @@ m_free(struct mbuf *m) if (m->m_flags & M_EXT) mb_free_ext(m); - else if ((m->m_flags & M_NOFREE) == 0) + else uma_zfree(zone_mbuf, m); return (n); } ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"