Re: nmbclusters: how do we want to fix this for 8.3 ?
On Thu, Feb 23, 2012 at 07:19, Ivan Voras wrote: > On 23/02/2012 09:19, Fabien Thomas wrote: >> I think this is more reasonable to setup interface with one queue. > > Unfortunately, the moment you do that, two things will happen: > 1) users will start complaining again how FreeBSD is slow > 2) the setting will be come a "sacred cow" and nobody will change this > default for the next 10 years. Is this any better than making queue-per-core a sacred cow? Even very small systems with comparatively-low memory these days have an increasing number of cores. They also usually have more RAM to go with those cores, but not always. Queue-per-core isn't even optimal for some kinds of workloads, and is harmful to overall performance at higher levels. It also assumes that every core should be utilized for the exciting task of receiving packets. This makes sense on some systems, but not all. Plus more queues doesn't necessarily equal better performance even on systems where you have the memory and cores to spare. On systems with non-uniform memory architectures, routinely processing queues on different physical packages can make networking performance worse. More queues is not a magic wand, it can be roughly the equivalent of go-faster stripes. Queue-per-core has a sort of logic to it, but is not necessarily sensible, like the funroll-all-loops school of system optimization. Which sounds slightly off-topic, except that dedicating loads of mbufs to receive queues that will sit empty on the vast majority of systems and receive a few packets per second in the service of some kind of magical thinking or excitement about multiqueue reception may be a little ill-advised. On my desktop with hardware supporting multiple queues, do I really want to use the maximum number of them just to handle a few thousand packets per second? One core can do that just fine. FreeBSD's great to drop-in on forwarding systems that will have moderate load, but it seems the best justification for this default is so users need fewer reboots to get more queues to spread what is assumed to be an evenly-distributed load over other cores. In practice, isn't the real problem that we have no facility for changing the number of queues at runtime? If the number of queues weren't fixed at boot, users could actually find the number that suits them best with a plausible amount of work, and the point about FreeBSD being "slow" goes away since it's perhaps one more sysctl to set (or one per-interface) or one (or one-per) ifconfig line to run, along with enabling forwarding, etc. The big commitment that multi-queue drivers ask for when they use the maximum number of queues on boot and then demand to fill those queues up with mbufs is unreasonable, even if it can be met on a growing number of systems without much in the way of pain. It's unreasonable, but perhaps it feels good to see all those interrupts bouncing around, all those threads running from time to time in top. Maybe it makes FreeBSD seem more serious, or perhaps it's something that gets people excited. It gives the appearance of doing quite a bit behind the scenes, and perhaps that's beneficial in and of itself, and will keep users from imagining that FreeBSD is slow, to your point. We should be clear, though, whether we are motivated by technical or psychological constraints and benefits. Thanks, Juli. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nmbclusters: how do we want to fix this for 8.3 ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/22/2012 13:51, Jack Vogel wrote: > > > On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo <mailto:ri...@iet.unipi.it>> wrote: > > On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote: >> On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote: > ... >>> I have hit this problem recently, too. Maybe the issue >>> mostly/only exists on 32-bit systems. >> >> No, we kept hitting mbuf pool limits on 64-bit systems when we >> started working on FreeBSD support. > > ok never mind then, the mechanism would be the same, though the > limits (especially VM_LIMIT) would be different. > >>> Here is a possible approach: >>> >>> 1. nmbclusters consume the kernel virtual address space so >>> there must be some upper limit, say >>> >>> VM_LIMIT = 256000 (translates to 512MB of address space) >>> >>> 2. also you don't want the clusters to take up too much of the > available >>> memory. This one would only trigger for minimal-memory >>> systems, or virtual machines, but still... >>> >>> MEM_LIMIT = (physical_ram / 2) / 2048 >>> >>> 3. one may try to set a suitably large, desirable number of >>> buffers >>> >>> TARGET_CLUSTERS = 128000 >>> >>> 4. and finally we could use the current default as the >>> absolute > minimum >>> >>> MIN_CLUSTERS = 1024 + maxusers*64 >>> >>> Then at boot the system could say >>> >>> nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT) >>> >>> nmbclusters = max(nmbclusters, MIN_CLUSTERS) >>> >>> >>> In turn, i believe interfaces should do their part and by >>> default never try to allocate more than a fraction of the total >>> number of buffers, >> >> Well what fraction should that be? It surely depends on how >> many interfaces are in the system and how many queues the other >> interfaces have. > >>> if necessary reducing the number of active queues. >> >> So now I have too few queues on my interface even after I >> increase the limit. >> >> There ought to be a standard way to configure numbers of queues >> and default queue lengths. > > Jack raised the problem that there is a poorly chosen default for > nmbclusters, causing one interface to consume all the buffers. If > the user explicitly overrides the value then the number of cluster > should be what the user asks (memory permitting). The next step is > on devices: if there are no overrides, the default for a driver is > to be lean. I would say that topping the request between 1/4 and > 1/8 of the total buffers is surely better than the current > situation. Of course if there is an explicit override, then use it > whatever happens to the others. > > cheers luigi > > > Hmmm, well, I could make the default use only 1 queue or something > like that, was thinking more of what actual users of the hardware > would want. > > After the installed kernel is booted and the admin would do > whatever post install modifications they wish it could be changed, > along with nmbclusters. > > This was why i sought opinions, of the algorithm itself, but also > anyone using ixgbe and igb in heavy use, what would you find most > convenient? > > Jack > The default setting is a thorn in our (with my ixsystems servers for freebsd hat on) side. A system with a quad port igb card and two onboard igb NICs won't boot stable/8 or 8.x-R to multiuser. Ditto for a dual port chelsio or ixgbe alongside dual onboard igb interfaces. My vote would be having systems over some minimum threshold of system ram to come up with a higher default for nmbclusters. You don't see too many 10gbe NICs in systems with 2GB of RAM - -- Thanks, Josh Paetzel FreeBSD -- The power to serve -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPRmIGAAoJEKFq1/n1feG229gIAIciDDKnc/K6/dgBA2YFGuuV V9cYD6+Zm4bVT9nvFhxJCUj+3CTGQFvNwi2sQx6pVMUWQC7Cpb323CShc8BBNjV3 vCzTmvqVshO+LWhx6J8lq4rqTU+PIKajF3GnwIWt4xmZ6WhrjCUySORYSAINQjKr iXJg/HBA7z/tsPUqOvzU0esZ4moUECapoldEOe0EF2jidARuM4q6MD1+QLMs+JSO JUS5yMPV022NVYS79NsUfvJ1cuwd6/I7CPvsJsW0E+zMMF2BjKZesU89zyFDST80 0WptoEqR9cuyApwu0OfDDzKyL7Z6G9yaAr0zkCAHATxkK0KArMJP/j2eT5uzZkE= =b44v -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nmbclusters: how do we want to fix this for 8.3 ?
On 23/02/2012 09:19, Fabien Thomas wrote: > I think this is more reasonable to setup interface with one queue. Unfortunately, the moment you do that, two things will happen: 1) users will start complaining again how FreeBSD is slow 2) the setting will be come a "sacred cow" and nobody will change this default for the next 10 years. If it really comes down to enabling only one queue, something needs to complain extremely loudly that this isn't an optimal setting. Only printing it out at boot may not be enough - what's needed is possibly a script in periodic/daily which checks "system sanity" every day and e-mails the operator. signature.asc Description: OpenPGP digital signature
Re: nmbclusters: how do we want to fix this for 8.3 ?
On Thu, Feb 23, 2012 at 9:19 AM, Fabien Thomas wrote: > > Le 22 févr. 2012 à 22:51, Jack Vogel a écrit : > > > On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo wrote: > > > >> On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote: > >>> On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote: > >> ... > >>>> I have hit this problem recently, too. > >>>> Maybe the issue mostly/only exists on 32-bit systems. > >>> > >>> No, we kept hitting mbuf pool limits on 64-bit systems when we started > >>> working on FreeBSD support. > >> > >> ok never mind then, the mechanism would be the same, though > >> the limits (especially VM_LIMIT) would be different. > >> > >>>> Here is a possible approach: > >>>> > >>>> 1. nmbclusters consume the kernel virtual address space so there > >>>> must be some upper limit, say > >>>> > >>>>VM_LIMIT = 256000 (translates to 512MB of address space) > >>>> > >>>> 2. also you don't want the clusters to take up too much of the > >> available > >>>> memory. This one would only trigger for minimal-memory systems, > >>>> or virtual machines, but still... > >>>> > >>>>MEM_LIMIT = (physical_ram / 2) / 2048 > >>>> > >>>> 3. one may try to set a suitably large, desirable number of buffers > >>>> > >>>>TARGET_CLUSTERS = 128000 > >>>> > >>>> 4. and finally we could use the current default as the absolute > minimum > >>>> > >>>>MIN_CLUSTERS = 1024 + maxusers*64 > >>>> > >>>> Then at boot the system could say > >>>> > >>>>nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT) > >>>> > >>>>nmbclusters = max(nmbclusters, MIN_CLUSTERS) > >>>> > >>>> > >>>> In turn, i believe interfaces should do their part and by default > >>>> never try to allocate more than a fraction of the total number > >>>> of buffers, > >>> > >>> Well what fraction should that be? It surely depends on how many > >>> interfaces are in the system and how many queues the other interfaces > >>> have. > >> > >>>> if necessary reducing the number of active queues. > >>> > >>> So now I have too few queues on my interface even after I increase the > >>> limit. > >>> > >>> There ought to be a standard way to configure numbers of queues and > >>> default queue lengths. > >> > >> Jack raised the problem that there is a poorly chosen default for > >> nmbclusters, causing one interface to consume all the buffers. > >> If the user explicitly overrides the value then > >> the number of cluster should be what the user asks (memory permitting). > >> The next step is on devices: if there are no overrides, the default > >> for a driver is to be lean. I would say that topping the request between > >> 1/4 and 1/8 of the total buffers is surely better than the current > >> situation. Of course if there is an explicit override, then use > >> it whatever happens to the others. > >> > >> cheers > >> luigi > >> > > > > Hmmm, well, I could make the default use only 1 queue or something like > > that, > > was thinking more of what actual users of the hardware would want. > > > > I think this is more reasonable to setup interface with one queue. > Even if the cluster does not hit the max you will end up with unbalanced > setting that > let very low mbuf count for other uses. > If interfaces have the possibility to use more queues, they should, imo so I'm all for rasing the default size. For those systems with very limited memory it's easily changed. > > > After the installed kernel is booted and the admin would do whatever post > > install > > modifications they wish it could be changed, along with nmbclusters. > > > > This was why i sought opinions, of the algorithm itself, but also anyone > > using > > ixgbe and igb in heavy use, what would you find most convenient? > > > > Jack > > ___ > > freebsd-...@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nmbclusters: how do we want to fix this for 8.3 ?
Le 22 févr. 2012 à 22:51, Jack Vogel a écrit : > On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo wrote: > >> On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote: >>> On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote: >> ... >>>> I have hit this problem recently, too. >>>> Maybe the issue mostly/only exists on 32-bit systems. >>> >>> No, we kept hitting mbuf pool limits on 64-bit systems when we started >>> working on FreeBSD support. >> >> ok never mind then, the mechanism would be the same, though >> the limits (especially VM_LIMIT) would be different. >> >>>> Here is a possible approach: >>>> >>>> 1. nmbclusters consume the kernel virtual address space so there >>>> must be some upper limit, say >>>> >>>>VM_LIMIT = 256000 (translates to 512MB of address space) >>>> >>>> 2. also you don't want the clusters to take up too much of the >> available >>>> memory. This one would only trigger for minimal-memory systems, >>>> or virtual machines, but still... >>>> >>>>MEM_LIMIT = (physical_ram / 2) / 2048 >>>> >>>> 3. one may try to set a suitably large, desirable number of buffers >>>> >>>>TARGET_CLUSTERS = 128000 >>>> >>>> 4. and finally we could use the current default as the absolute minimum >>>> >>>>MIN_CLUSTERS = 1024 + maxusers*64 >>>> >>>> Then at boot the system could say >>>> >>>>nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT) >>>> >>>>nmbclusters = max(nmbclusters, MIN_CLUSTERS) >>>> >>>> >>>> In turn, i believe interfaces should do their part and by default >>>> never try to allocate more than a fraction of the total number >>>> of buffers, >>> >>> Well what fraction should that be? It surely depends on how many >>> interfaces are in the system and how many queues the other interfaces >>> have. >> >>>> if necessary reducing the number of active queues. >>> >>> So now I have too few queues on my interface even after I increase the >>> limit. >>> >>> There ought to be a standard way to configure numbers of queues and >>> default queue lengths. >> >> Jack raised the problem that there is a poorly chosen default for >> nmbclusters, causing one interface to consume all the buffers. >> If the user explicitly overrides the value then >> the number of cluster should be what the user asks (memory permitting). >> The next step is on devices: if there are no overrides, the default >> for a driver is to be lean. I would say that topping the request between >> 1/4 and 1/8 of the total buffers is surely better than the current >> situation. Of course if there is an explicit override, then use >> it whatever happens to the others. >> >> cheers >> luigi >> > > Hmmm, well, I could make the default use only 1 queue or something like > that, > was thinking more of what actual users of the hardware would want. > I think this is more reasonable to setup interface with one queue. Even if the cluster does not hit the max you will end up with unbalanced setting that let very low mbuf count for other uses. > After the installed kernel is booted and the admin would do whatever post > install > modifications they wish it could be changed, along with nmbclusters. > > This was why i sought opinions, of the algorithm itself, but also anyone > using > ixgbe and igb in heavy use, what would you find most convenient? > > Jack > ___ > freebsd-...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: nmbclusters: how do we want to fix this for 8.3 ?
It could do some good to think of the scale of the problem and maybe the driver can tune to the hardware. First, is 8k packet buffers a reasonable default on a GigE port? Well... on a GigE port, you could have from 100k pps (packets per second) at 1500 bytes to 500k pps at around 300 bytes to truly pathological rates of packets (2M pps at the Ethernet-minimum of 64 bytes). 8k buffers vanish in 1/10th of a second in the 1500 byte case and that doesn't even really speak to the buffers getting emptied by other software. Do you maybe want to have a switch whereby the GigE port is in performance or non-performance mode? Do you want to assume that systems with GigE ports are also not pathologically low in memory? Perhaps in 10 or 100 megabit mode, the driver should make smaller rings? For that matter, if mbufs come in a page's worth at a time, what's the drawback of scaling them up and down with network vs. memory vs. cache pressure? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nmbclusters: how do we want to fix this for 8.3 ?
On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo wrote: > On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote: > > On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote: > ... > > > I have hit this problem recently, too. > > > Maybe the issue mostly/only exists on 32-bit systems. > > > > No, we kept hitting mbuf pool limits on 64-bit systems when we started > > working on FreeBSD support. > > ok never mind then, the mechanism would be the same, though > the limits (especially VM_LIMIT) would be different. > > > > Here is a possible approach: > > > > > > 1. nmbclusters consume the kernel virtual address space so there > > >must be some upper limit, say > > > > > > VM_LIMIT = 256000 (translates to 512MB of address space) > > > > > > 2. also you don't want the clusters to take up too much of the > available > > >memory. This one would only trigger for minimal-memory systems, > > >or virtual machines, but still... > > > > > > MEM_LIMIT = (physical_ram / 2) / 2048 > > > > > > 3. one may try to set a suitably large, desirable number of buffers > > > > > > TARGET_CLUSTERS = 128000 > > > > > > 4. and finally we could use the current default as the absolute minimum > > > > > > MIN_CLUSTERS = 1024 + maxusers*64 > > > > > > Then at boot the system could say > > > > > > nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT) > > > > > > nmbclusters = max(nmbclusters, MIN_CLUSTERS) > > > > > > > > > In turn, i believe interfaces should do their part and by default > > > never try to allocate more than a fraction of the total number > > > of buffers, > > > > Well what fraction should that be? It surely depends on how many > > interfaces are in the system and how many queues the other interfaces > > have. > > > > if necessary reducing the number of active queues. > > > > So now I have too few queues on my interface even after I increase the > > limit. > > > > There ought to be a standard way to configure numbers of queues and > > default queue lengths. > > Jack raised the problem that there is a poorly chosen default for > nmbclusters, causing one interface to consume all the buffers. > If the user explicitly overrides the value then > the number of cluster should be what the user asks (memory permitting). > The next step is on devices: if there are no overrides, the default > for a driver is to be lean. I would say that topping the request between > 1/4 and 1/8 of the total buffers is surely better than the current > situation. Of course if there is an explicit override, then use > it whatever happens to the others. > > cheers > luigi > Hmmm, well, I could make the default use only 1 queue or something like that, was thinking more of what actual users of the hardware would want. After the installed kernel is booted and the admin would do whatever post install modifications they wish it could be changed, along with nmbclusters. This was why i sought opinions, of the algorithm itself, but also anyone using ixgbe and igb in heavy use, what would you find most convenient? Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nmbclusters: how do we want to fix this for 8.3 ?
On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote: > On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote: ... > > I have hit this problem recently, too. > > Maybe the issue mostly/only exists on 32-bit systems. > > No, we kept hitting mbuf pool limits on 64-bit systems when we started > working on FreeBSD support. ok never mind then, the mechanism would be the same, though the limits (especially VM_LIMIT) would be different. > > Here is a possible approach: > > > > 1. nmbclusters consume the kernel virtual address space so there > >must be some upper limit, say > > > > VM_LIMIT = 256000 (translates to 512MB of address space) > > > > 2. also you don't want the clusters to take up too much of the available > >memory. This one would only trigger for minimal-memory systems, > >or virtual machines, but still... > > > > MEM_LIMIT = (physical_ram / 2) / 2048 > > > > 3. one may try to set a suitably large, desirable number of buffers > > > > TARGET_CLUSTERS = 128000 > > > > 4. and finally we could use the current default as the absolute minimum > > > > MIN_CLUSTERS = 1024 + maxusers*64 > > > > Then at boot the system could say > > > > nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT) > > > > nmbclusters = max(nmbclusters, MIN_CLUSTERS) > > > > > > In turn, i believe interfaces should do their part and by default > > never try to allocate more than a fraction of the total number > > of buffers, > > Well what fraction should that be? It surely depends on how many > interfaces are in the system and how many queues the other interfaces > have. > > if necessary reducing the number of active queues. > > So now I have too few queues on my interface even after I increase the > limit. > > There ought to be a standard way to configure numbers of queues and > default queue lengths. Jack raised the problem that there is a poorly chosen default for nmbclusters, causing one interface to consume all the buffers. If the user explicitly overrides the value then the number of cluster should be what the user asks (memory permitting). The next step is on devices: if there are no overrides, the default for a driver is to be lean. I would say that topping the request between 1/4 and 1/8 of the total buffers is surely better than the current situation. Of course if there is an explicit override, then use it whatever happens to the others. cheers luigi ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nmbclusters: how do we want to fix this for 8.3 ?
On Wed, Feb 22, 2012 at 11:56:29AM -0800, Jack Vogel wrote: > Using igb and/or ixgbe on a reasonably powered server requires 1K mbuf > clusters per MSIX vector, > that's how many are in a ring. Either driver will configure 8 queues on a > system with that many or more > cores, so 8K clusters per port... > > My test engineer has a system with 2 igb ports, and 2 10G ixgbe, this is > hardly heavy duty, and yet this > exceeds the default mbuf pool on the installed kernel (1024 + maxusers * > 64). > > Now, this can be immediately fixed by a sysadmin after that first boot, but > it does result in the second > driver that gets started to complain about inadequate buffers. > > I think the default calculation is dated and should be changed, but am not > sure the best way, so are > there suggestions/opinions about this, and might we get it fixed before 8.3 > is baked? I have hit this problem recently, too. Maybe the issue mostly/only exists on 32-bit systems. Here is a possible approach: 1. nmbclusters consume the kernel virtual address space so there must be some upper limit, say VM_LIMIT = 256000 (translates to 512MB of address space) 2. also you don't want the clusters to take up too much of the available memory. This one would only trigger for minimal-memory systems, or virtual machines, but still... MEM_LIMIT = (physical_ram / 2) / 2048 3. one may try to set a suitably large, desirable number of buffers TARGET_CLUSTERS = 128000 4. and finally we could use the current default as the absolute minimum MIN_CLUSTERS = 1024 + maxusers*64 Then at boot the system could say nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT) nmbclusters = max(nmbclusters, MIN_CLUSTERS) In turn, i believe interfaces should do their part and by default never try to allocate more than a fraction of the total number of buffers, if necessary reducing the number of active queues. what do people think ? cheers luigi ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
nmbclusters: how do we want to fix this for 8.3 ?
Using igb and/or ixgbe on a reasonably powered server requires 1K mbuf clusters per MSIX vector, that's how many are in a ring. Either driver will configure 8 queues on a system with that many or more cores, so 8K clusters per port... My test engineer has a system with 2 igb ports, and 2 10G ixgbe, this is hardly heavy duty, and yet this exceeds the default mbuf pool on the installed kernel (1024 + maxusers * 64). Now, this can be immediately fixed by a sysadmin after that first boot, but it does result in the second driver that gets started to complain about inadequate buffers. I think the default calculation is dated and should be changed, but am not sure the best way, so are there suggestions/opinions about this, and might we get it fixed before 8.3 is baked? Cheers, Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nmbclusters: how do we want to fix this for 8.3 ?
Hi, On Wed, Feb 22, 2012 at 2:56 PM, Jack Vogel wrote: > Using igb and/or ixgbe on a reasonably powered server requires 1K mbuf > clusters per MSIX vector, > that's how many are in a ring. Either driver will configure 8 queues on a > system with that many or more > cores, so 8K clusters per port... > > My test engineer has a system with 2 igb ports, and 2 10G ixgbe, this is > hardly heavy duty, and yet this > exceeds the default mbuf pool on the installed kernel (1024 + maxusers * > 64). > > Now, this can be immediately fixed by a sysadmin after that first boot, but > it does result in the second > driver that gets started to complain about inadequate buffers. > > I think the default calculation is dated and should be changed, but am not > sure the best way, so are > there suggestions/opinions about this, and might we get it fixed before 8.3 > is baked? > get rid of the limit once and for all, it is pointless. - Arnaud ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nmbclusters
On 29/03/06, Robert Watson <[EMAIL PROTECTED]> wrote: > > On Wed, 29 Mar 2006, Dag-Erling Smørgrav wrote: > > > "Conrad J. Sabatier" <[EMAIL PROTECTED]> writes: > >> Chris <[EMAIL PROTECTED]> wrote: > >>> so [kern.ipc.nmbclusters] has no affect, has this become a read only > >>> tunable again only settable in loader.conf? > >> To the best of my knowledge, this has *always* been a loader tunable, > >> not configurable on-the-fly. > > > > kern.ipc.nmbclusters is normally computed at boot time. A compile- time > > option to override it was introduced in 2.0-CURRENT. At that time, it was > > defined in param.c. A read-only sysctl was introduced in 3.0-CURRENT. It > > moved from param.c to uipc_mbuf.c in 4.0-CURRENT, then to subr_mbuf.c when > > mballoc was introduced in 5.0-CURRENT; became a tunable at some point after > > that; then moved again to kern_mbuf.c when mballoc was replaced with mbuma > > in 6.0-CURRENT. That is the point where it became read-write, for no good > > reason that I can see; setting it at runtime has no effect, because the size > > of the mbuf zone is determined at boot time. Perhaps Bosko (who wrote both > > mballoc and mbuma, IIRC) knows. > > Paul Saab from Yahoo! has a set of patches that allow run-time nmbclusters > changes to be implemented -- while it won't cause the freeing of clusters > referenced, it goes through and recalculates dependent variables, propagates > them into UMA, etc. I believe they're running with this patch on 6.x, and I > expect that they will be merged to -CURRENT and -STABLE in the relatively near > future. Not before 6.1, however. > > If the nmbclusters setting really has no effect right now, we should mark the > sysctl as read-only to make it more clear it doesn't, since allowing it to be > set without taking effect is counter-intuitive. > > Robert N M Watson > thanks for everyones responses. My 5.4 servers seem to accept the tunable been changed at runtime although this could be a bug and it isnt really changing? 4.x I know was a boot only tunable. I have tested this now in loader.conf and it works so it seems its a minor bug where the error message is replaced by a false success output. Chris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nmbclusters
Chris <[EMAIL PROTECTED]> writes: > thanks for everyones responses. My 5.4 servers seem to accept the > tunable been changed at runtime although this could be a bug and it > isnt really changing? You can change it, but it has no effect. DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nmbclusters
On Wed, 29 Mar 2006, Dag-Erling Smørgrav wrote: "Conrad J. Sabatier" <[EMAIL PROTECTED]> writes: Chris <[EMAIL PROTECTED]> wrote: so [kern.ipc.nmbclusters] has no affect, has this become a read only tunable again only settable in loader.conf? To the best of my knowledge, this has *always* been a loader tunable, not configurable on-the-fly. kern.ipc.nmbclusters is normally computed at boot time. A compile- time option to override it was introduced in 2.0-CURRENT. At that time, it was defined in param.c. A read-only sysctl was introduced in 3.0-CURRENT. It moved from param.c to uipc_mbuf.c in 4.0-CURRENT, then to subr_mbuf.c when mballoc was introduced in 5.0-CURRENT; became a tunable at some point after that; then moved again to kern_mbuf.c when mballoc was replaced with mbuma in 6.0-CURRENT. That is the point where it became read-write, for no good reason that I can see; setting it at runtime has no effect, because the size of the mbuf zone is determined at boot time. Perhaps Bosko (who wrote both mballoc and mbuma, IIRC) knows. Paul Saab from Yahoo! has a set of patches that allow run-time nmbclusters changes to be implemented -- while it won't cause the freeing of clusters referenced, it goes through and recalculates dependent variables, propagates them into UMA, etc. I believe they're running with this patch on 6.x, and I expect that they will be merged to -CURRENT and -STABLE in the relatively near future. Not before 6.1, however. If the nmbclusters setting really has no effect right now, we should mark the sysctl as read-only to make it more clear it doesn't, since allowing it to be set without taking effect is counter-intuitive. Robert N M Watson___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nmbclusters
It's always only been boot-time tunable (well, "always" is of course relative to my time with FreeBSD -- Dag-Erling has been around longer and therefore recounts its more comprehensive history). In 6.0-CURRENT there was an intention to make it sysctl (runtime) tunable, as it finally became at least theoretically possible to do so. I have recently seen a patch floating around from Paul Saab (ps@) who has finally made it runtime tunable -- at least enough so that it can be _increased_. Not sure if he has committed it, yet. Note that _decreasing_ nmbclusters at run-time will probably never be possible -- implementing is too difficult for what it would be worth. Cheers, Bosko On 3/29/06, Dag-Erling Smørgrav <[EMAIL PROTECTED]> wrote: > "Conrad J. Sabatier" <[EMAIL PROTECTED]> writes: > > Chris <[EMAIL PROTECTED]> wrote: > > > so [kern.ipc.nmbclusters] has no affect, has this become a read only > > > tunable again only settable in loader.conf? > > To the best of my knowledge, this has *always* been a loader tunable, > > not configurable on-the-fly. > > kern.ipc.nmbclusters is normally computed at boot time. A compile- > time option to override it was introduced in 2.0-CURRENT. At that > time, it was defined in param.c. A read-only sysctl was introduced in > 3.0-CURRENT. It moved from param.c to uipc_mbuf.c in 4.0-CURRENT, > then to subr_mbuf.c when mballoc was introduced in 5.0-CURRENT; became > a tunable at some point after that; then moved again to kern_mbuf.c > when mballoc was replaced with mbuma in 6.0-CURRENT. That is the > point where it became read-write, for no good reason that I can see; > setting it at runtime has no effect, because the size of the mbuf zone > is determined at boot time. Perhaps Bosko (who wrote both mballoc and > mbuma, IIRC) knows. > > DES > -- > Dag-Erling Smørgrav - [EMAIL PROTECTED] > -- Bosko Milekic <[EMAIL PROTECTED]> To see all the content I generate on the web, check out my Peoplefeeds profile at: http://peoplefeeds.com/bosko ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nmbclusters
"Conrad J. Sabatier" <[EMAIL PROTECTED]> writes: > Chris <[EMAIL PROTECTED]> wrote: > > so [kern.ipc.nmbclusters] has no affect, has this become a read only > > tunable again only settable in loader.conf? > To the best of my knowledge, this has *always* been a loader tunable, > not configurable on-the-fly. kern.ipc.nmbclusters is normally computed at boot time. A compile- time option to override it was introduced in 2.0-CURRENT. At that time, it was defined in param.c. A read-only sysctl was introduced in 3.0-CURRENT. It moved from param.c to uipc_mbuf.c in 4.0-CURRENT, then to subr_mbuf.c when mballoc was introduced in 5.0-CURRENT; became a tunable at some point after that; then moved again to kern_mbuf.c when mballoc was replaced with mbuma in 6.0-CURRENT. That is the point where it became read-write, for no good reason that I can see; setting it at runtime has no effect, because the size of the mbuf zone is determined at boot time. Perhaps Bosko (who wrote both mballoc and mbuma, IIRC) knows. DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: nmbclusters
On Tue, 28 Mar 2006 20:34:18 +0100 Chris <[EMAIL PROTECTED]> wrote: > Using 6.0 release latest security branch. > > netstat -m > 69/576/645 mbufs in use (current/cache/total) > 65/261/326/33792 mbuf clusters in use (current/cache/total/max) > 0/38/8704 sfbufs in use (current/peak/max) > 147K/666K/813K bytes allocated to network (current/cache/total) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 29780 requests for I/O initiated by sendfile > 633 calls to protocol drain routines > > sysctl kern.ipc.nmbclusters > kern.ipc.nmbclusters: 65536 > > sysctl kern.ipc.nmbclusters=25000 > kern.ipc.nmbclusters: 65536 -> 25000 > > 70/575/645 mbufs in use (current/cache/total) > 64/262/326/33792 mbuf clusters in use (current/cache/total/max) > 0/38/8704 sfbufs in use (current/peak/max) > 145K/667K/813K bytes allocated to network (current/cache/total) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 29780 requests for I/O initiated by sendfile > 633 calls to protocol drain routines > > so the sysctl variable has no affect, has this become a read only > tunable again only settable in loader.conf? To the best of my knowledge, this has *always* been a loader tunable, not configurable on-the-fly. Myself, ever since the introduction quite some time ago of the "friendly" setting of 0 (for unlimited mbufs), I've always used that in my /boot/loader.conf, i.e., kern.ipc.nmbclusters="0". Can't really comment on your other questions, I'm afraid. :-) > if yes then their is a bug > where it shows no error on sysctl command, or is it suppoedbly > settable then their is a bug where it doesnt work or netstat -m shows > inccorect info. Or is this setting been depreciated? > > Also if the machine stops responding, and no kernel panic logged does > it mean a livelock/deadlock? Have been seeing issues on 3 diff 6.0 > release servers which simply go dead. 2 were rolled back to 5.4 and > immediatly became stable and I left this one on 6.0 to try and resolve > problems but diffilcult with no log entries. > > Thanks > > Chris -- Conrad J. Sabatier <[EMAIL PROTECTED]> -- "In Unix veritas" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
nmbclusters
Using 6.0 release latest security branch. netstat -m 69/576/645 mbufs in use (current/cache/total) 65/261/326/33792 mbuf clusters in use (current/cache/total/max) 0/38/8704 sfbufs in use (current/peak/max) 147K/666K/813K bytes allocated to network (current/cache/total) 0 requests for sfbufs denied 0 requests for sfbufs delayed 29780 requests for I/O initiated by sendfile 633 calls to protocol drain routines sysctl kern.ipc.nmbclusters kern.ipc.nmbclusters: 65536 sysctl kern.ipc.nmbclusters=25000 kern.ipc.nmbclusters: 65536 -> 25000 70/575/645 mbufs in use (current/cache/total) 64/262/326/33792 mbuf clusters in use (current/cache/total/max) 0/38/8704 sfbufs in use (current/peak/max) 145K/667K/813K bytes allocated to network (current/cache/total) 0 requests for sfbufs denied 0 requests for sfbufs delayed 29780 requests for I/O initiated by sendfile 633 calls to protocol drain routines so the sysctl variable has no affect, has this become a read only tunable again only settable in loader.conf? if yes then their is a bug where it shows no error on sysctl command, or is it suppoedbly settable then their is a bug where it doesnt work or netstat -m shows inccorect info. Or is this setting been depreciated? Also if the machine stops responding, and no kernel panic logged does it mean a livelock/deadlock? Have been seeing issues on 3 diff 6.0 release servers which simply go dead. 2 were rolled back to 5.4 and immediatly became stable and I left this one on 6.0 to try and resolve problems but diffilcult with no log entries. Thanks Chris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: mbuf clusters behavior (NMBCLUSTERS)
If someone can answer this question, I can probably start troubleshoot the problem. After nfs server uses up the mbuf clusters, how are the mbuf clusters flushed? Should I try to change "maxusers 0" to a number like 10? One thing I forgot to mention is that the file system uses soft update. Would it matter? At Mon, 15 Jul 2002 18:37:59 -0400, Bosko Milekic wrote: > > On Mon, Jul 15, 2002 at 05:43:24PM -0400, ?$BED0fD>G7?(B wrote: > [...] > > If I do not suspend the copy command, the mbuf clusters hit the > > max and the server starts to drop the packets. It slows down the nfs > > serving severly due to its nfs retry. > > Are you sure that you're actually running out of address space? (i.e., > does `netstat -m' finally show a totally exhausted cluster pool?) > It is also possible to get the messages you mention if, for example, > you run out of available RAM. > > > How can I prevent this "mbuf clusters exhaustion"? > > Assuming this is really you running out of clusters or mbufs because > NMBCLUSTERS is too low and not because you're out of RAM, you can take > a look at setting per-UID sbsize (socket buffer) limits. > > > Increasing mbuf clusters is not an option. There is no way that I > > can allocate big enough for all of files I copy. > > > > Relating to this, I do not understand why that the mbuf clusters > > are not freed fast enough. I watched "top" and it does seem to be > > that CPU is not exhausted. > > After all, I'm copying less than 10Mbyte/sec, probably 6 - 7 Mbytes > > at most. > > Hard disk is a Seagate ATA/IV 60Gbyte.The drive is hooked up to a > > Promise PCI ATA/UDMA 100 controller card. > > bonnie shows that it can sustain 15M - 20M bytes read/write. > > Uhm, I've been following you up to this point. I'm not sure why > clusters are not being released faster, either, however: what exactly > are you copying from, and what are you copying to? I'm assuming you're > copying to an NFS mount somewhere? Please read the original message again. I believe that I included all the information I can think of. In summary: 1. Simple nfs file copy exhausts the mbuf clusters 2. When mbuf clusters are exhausted, the whole copy process suspends for 10 - 20 seconds. 3. One NFS server: FreeBSD 4.6-stable and Client: RH 7.3 4. I tried various size of mbuf clusters. 5. NMBCLUSTERS is not too low. I assigned more than recommended. 32768 (64M) is for multiuser large scale server. For this case, one server and one client, I gave it the double of it, and it is still no go. 6. The size of mbuf clusters is 128Mbytes and I can not increase any more. > > Is there anything I can try? > > Check your memory usage in top. Why? What do you think that what's hapenning? Slashing? Very unlikely. The FreeBSD server has no other client for the test or any other major service running. If so, I'd have mentioned in the original message. > > Thank you! > > > > -- > > [EMAIL PROTECTED], Naoyuki "Tai" Tai > > > > P.S. > > > > I sent this message to freebsd-questions, but, did not get any > > response. If you've seen this message before, sorry. > > Regards, > -- > Bosko Milekic > [EMAIL PROTECTED] > [EMAIL PROTECTED] > Have a good day! -- [EMAIL PROTECTED], Naoyuki "Tai" Tai To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: AW: mbuf clusters behavior (NMBCLUSTERS)
Alexander Maret writes: > > -Ursprungliche Nachricht- > > Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > > > When I copy a 500M file (like iso image) from the workstation to the > > server, the server starts to emit: > > > > Jul 12 09:28:54 nile /kernel: All mbuf clusters exhausted, > > please see tuning(7). > > I'm getting the same error here with 4.6-R. > > > So, I bumped up the nmbclusters to kern.ipc.nmbclusters=65536 > > I allocated 128Mbytes to the mbuf clusters, hoping that it is > > big enough. > > But, it still shows that the same > > > > All mbuf clusters exhausted, please see tuning(7). > > The same for me. > > > > How can I prevent this "mbuf clusters exhaustion"? > > I would be interested in an answer, too. RH Linux has a (really dumb) default of using UDP mounts with a 16K read and write size. A small percentage of packet loss will cause many fragments to build up on the IP reassembly queues and run you out of mbuf clusters. There are 3 things you could do to fix this: a) Use more reasonable NFS mount options on the linux boxes. 8K UDP should help quite a bit, 32K TCP should totally eliminate the problem. b) Drastically reduce net.inet.ip.maxfragpackets (make sure that (maxfragpackets * (16384/1472)) < (maxmbufclusters - some_constant). This will limit the amount of packets (and therefor mbufs) on the IP reassembly queues. This may could destroy your observed NFS write performance from the linux side if you do not also implement (a) above. c) Eliminate all packet loss in your network between the client and server. Not likely, as I suspect some of the packet loss might even be inside the linux or freebsd boxes. I suggest you do both (a) and (b). > > > Relating to this, I do not understand why that the mbuf clusters > > are not freed fast enough. I watched "top" and it does seem to be IP reassembly queues are timed-out out after IPFRAGTTL, which is quite a long time. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
AW: mbuf clusters behavior (NMBCLUSTERS)
> -Ursprungliche Nachricht- > Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > When I copy a 500M file (like iso image) from the workstation to the > server, the server starts to emit: > > Jul 12 09:28:54 nile /kernel: All mbuf clusters exhausted, > please see tuning(7). I'm getting the same error here with 4.6-R. > So, I bumped up the nmbclusters to kern.ipc.nmbclusters=65536 > I allocated 128Mbytes to the mbuf clusters, hoping that it is > big enough. > But, it still shows that the same > > All mbuf clusters exhausted, please see tuning(7). The same for me. > How can I prevent this "mbuf clusters exhaustion"? I would be interested in an answer, too. > Relating to this, I do not understand why that the mbuf clusters > are not freed fast enough. I watched "top" and it does seem to be > that CPU is not exhausted. > After all, I'm copying less than 10Mbyte/sec, probably 6 - 7 Mbytes > at most. > Hard disk is a Seagate ATA/IV 60Gbyte.The drive is hooked up to a > Promise PCI ATA/UDMA 100 controller card. > bonnie shows that it can sustain 15M - 20M bytes read/write. I'm trying to copy data from an 1GHZ Mobile PIII Linux installation with IDE disks to an AMD Athlon 700 FBSD 4.6-R installation with Ultra 160 SCSI disks over NFS. I'm wondering too why the FreeBSD box can't cope with my lousy linux installation. Didn't have time to investigate though. > Is there anything I can try? Greetings, Alex To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message