Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-23 Thread Juli Mallett
On Thu, Feb 23, 2012 at 07:19, Ivan Voras  wrote:
> On 23/02/2012 09:19, Fabien Thomas wrote:
>> I think this is more reasonable to setup interface with one queue.
>
> Unfortunately, the moment you do that, two things will happen:
> 1) users will start complaining again how FreeBSD is slow
> 2) the setting will be come a "sacred cow" and nobody will change this
> default for the next 10 years.

Is this any better than making queue-per-core a sacred cow?  Even very
small systems with comparatively-low memory these days have an
increasing number of cores.  They also usually have more RAM to go
with those cores, but not always.  Queue-per-core isn't even optimal
for some kinds of workloads, and is harmful to overall performance at
higher levels.  It also assumes that every core should be utilized for
the exciting task of receiving packets.  This makes sense on some
systems, but not all.

Plus more queues doesn't necessarily equal better performance even on
systems where you have the memory and cores to spare.  On systems with
non-uniform memory architectures, routinely processing queues on
different physical packages can make networking performance worse.

More queues is not a magic wand, it can be roughly the equivalent of
go-faster stripes.  Queue-per-core has a sort of logic to it, but is
not necessarily sensible, like the funroll-all-loops school of system
optimization.

Which sounds slightly off-topic, except that dedicating loads of mbufs
to receive queues that will sit empty on the vast majority of systems
and receive a few packets per second in the service of some kind of
magical thinking or excitement about multiqueue reception may be a
little ill-advised.  On my desktop with hardware supporting multiple
queues, do I really want to use the maximum number of them just to
handle a few thousand packets per second?  One core can do that just
fine.

FreeBSD's great to drop-in on forwarding systems that will have
moderate load, but it seems the best justification for this default is
so users need fewer reboots to get more queues to spread what is
assumed to be an evenly-distributed load over other cores.  In
practice, isn't the real problem that we have no facility for changing
the number of queues at runtime?

If the number of queues weren't fixed at boot, users could actually
find the number that suits them best with a plausible amount of work,
and the point about FreeBSD being "slow" goes away since it's perhaps
one more sysctl to set (or one per-interface) or one (or one-per)
ifconfig line to run, along with enabling forwarding, etc.

The big commitment that multi-queue drivers ask for when they use the
maximum number of queues on boot and then demand to fill those queues
up with mbufs is unreasonable, even if it can be met on a growing
number of systems without much in the way of pain.  It's unreasonable,
but perhaps it feels good to see all those interrupts bouncing around,
all those threads running from time to time in top.  Maybe it makes
FreeBSD seem more serious, or perhaps it's something that gets people
excited.  It gives the appearance of doing quite a bit behind the
scenes, and perhaps that's beneficial in and of itself, and will keep
users from imagining that FreeBSD is slow, to your point.  We should
be clear, though, whether we are motivated by technical or
psychological constraints and benefits.

Thanks,
Juli.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-23 Thread Josh Paetzel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/22/2012 13:51, Jack Vogel wrote:
> 
> 
> On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo  <mailto:ri...@iet.unipi.it>> wrote:
> 
> On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote:
>> On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote:
> ...
>>> I have hit this problem recently, too. Maybe the issue
>>> mostly/only exists on 32-bit systems.
>> 
>> No, we kept hitting mbuf pool limits on 64-bit systems when we
>> started working on FreeBSD support.
> 
> ok never mind then, the mechanism would be the same, though the
> limits (especially VM_LIMIT) would be different.
> 
>>> Here is a possible approach:
>>> 
>>> 1. nmbclusters consume the kernel virtual address space so
>>> there must be some upper limit, say
>>> 
>>> VM_LIMIT = 256000 (translates to 512MB of address space)
>>> 
>>> 2. also you don't want the clusters to take up too much of the
> available
>>> memory. This one would only trigger for minimal-memory
>>> systems, or virtual machines, but still...
>>> 
>>> MEM_LIMIT = (physical_ram / 2) / 2048
>>> 
>>> 3. one may try to set a suitably large, desirable number of
>>> buffers
>>> 
>>> TARGET_CLUSTERS = 128000
>>> 
>>> 4. and finally we could use the current default as the
>>> absolute
> minimum
>>> 
>>> MIN_CLUSTERS = 1024 + maxusers*64
>>> 
>>> Then at boot the system could say
>>> 
>>> nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT)
>>> 
>>> nmbclusters = max(nmbclusters, MIN_CLUSTERS)
>>> 
>>> 
>>> In turn, i believe interfaces should do their part and by
>>> default never try to allocate more than a fraction of the total
>>> number of buffers,
>> 
>> Well what fraction should that be?  It surely depends on how
>> many interfaces are in the system and how many queues the other
>> interfaces have.
> 
>>> if necessary reducing the number of active queues.
>> 
>> So now I have too few queues on my interface even after I
>> increase the limit.
>> 
>> There ought to be a standard way to configure numbers of queues
>> and default queue lengths.
> 
> Jack raised the problem that there is a poorly chosen default for 
> nmbclusters, causing one interface to consume all the buffers. If
> the user explicitly overrides the value then the number of cluster
> should be what the user asks (memory permitting). The next step is
> on devices: if there are no overrides, the default for a driver is
> to be lean. I would say that topping the request between 1/4 and
> 1/8 of the total buffers is surely better than the current 
> situation. Of course if there is an explicit override, then use it
> whatever happens to the others.
> 
> cheers luigi
> 
> 
> Hmmm, well, I could make the default use only 1 queue or something
> like that, was thinking more of what actual users of the hardware
> would want.
> 
> After the installed kernel is booted and the admin would do
> whatever post install modifications they wish it could be changed,
> along with nmbclusters.
> 
> This was why i sought opinions, of the algorithm itself, but also
> anyone using ixgbe and igb in heavy use, what would you find most
> convenient?
> 
> Jack
> 

The default setting is a thorn in our (with my ixsystems servers for
freebsd hat on) side.  A system with a quad port igb card and two
onboard igb NICs won't boot stable/8 or 8.x-R to multiuser.  Ditto for
a dual port chelsio or ixgbe alongside dual onboard igb interfaces.

My vote would be having systems over some minimum threshold of system
ram to come up with a higher default for nmbclusters.  You don't see
too many 10gbe NICs in systems with 2GB of RAM


- -- 
Thanks,

Josh Paetzel
FreeBSD -- The power to serve
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPRmIGAAoJEKFq1/n1feG229gIAIciDDKnc/K6/dgBA2YFGuuV
V9cYD6+Zm4bVT9nvFhxJCUj+3CTGQFvNwi2sQx6pVMUWQC7Cpb323CShc8BBNjV3
vCzTmvqVshO+LWhx6J8lq4rqTU+PIKajF3GnwIWt4xmZ6WhrjCUySORYSAINQjKr
iXJg/HBA7z/tsPUqOvzU0esZ4moUECapoldEOe0EF2jidARuM4q6MD1+QLMs+JSO
JUS5yMPV022NVYS79NsUfvJ1cuwd6/I7CPvsJsW0E+zMMF2BjKZesU89zyFDST80
0WptoEqR9cuyApwu0OfDDzKyL7Z6G9yaAr0zkCAHATxkK0KArMJP/j2eT5uzZkE=
=b44v
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-23 Thread Ivan Voras
On 23/02/2012 09:19, Fabien Thomas wrote:

> I think this is more reasonable to setup interface with one queue.

Unfortunately, the moment you do that, two things will happen:
1) users will start complaining again how FreeBSD is slow
2) the setting will be come a "sacred cow" and nobody will change this
default for the next 10 years.

If it really comes down to enabling only one queue, something needs to
complain extremely loudly that this isn't an optimal setting. Only
printing it out at boot may not be enough - what's needed is possibly a
script in periodic/daily which checks "system sanity" every day and
e-mails the operator.



signature.asc
Description: OpenPGP digital signature


Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-23 Thread Andreas Nilsson
On Thu, Feb 23, 2012 at 9:19 AM, Fabien Thomas wrote:

>
> Le 22 févr. 2012 à 22:51, Jack Vogel a écrit :
>
> > On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo  wrote:
> >
> >> On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote:
> >>> On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote:
> >> ...
> >>>> I have hit this problem recently, too.
> >>>> Maybe the issue mostly/only exists on 32-bit systems.
> >>>
> >>> No, we kept hitting mbuf pool limits on 64-bit systems when we started
> >>> working on FreeBSD support.
> >>
> >> ok never mind then, the mechanism would be the same, though
> >> the limits (especially VM_LIMIT) would be different.
> >>
> >>>> Here is a possible approach:
> >>>>
> >>>> 1. nmbclusters consume the kernel virtual address space so there
> >>>>   must be some upper limit, say
> >>>>
> >>>>VM_LIMIT = 256000 (translates to 512MB of address space)
> >>>>
> >>>> 2. also you don't want the clusters to take up too much of the
> >> available
> >>>>   memory. This one would only trigger for minimal-memory systems,
> >>>>   or virtual machines, but still...
> >>>>
> >>>>MEM_LIMIT = (physical_ram / 2) / 2048
> >>>>
> >>>> 3. one may try to set a suitably large, desirable number of buffers
> >>>>
> >>>>TARGET_CLUSTERS = 128000
> >>>>
> >>>> 4. and finally we could use the current default as the absolute
> minimum
> >>>>
> >>>>MIN_CLUSTERS = 1024 + maxusers*64
> >>>>
> >>>> Then at boot the system could say
> >>>>
> >>>>nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT)
> >>>>
> >>>>nmbclusters = max(nmbclusters, MIN_CLUSTERS)
> >>>>
> >>>>
> >>>> In turn, i believe interfaces should do their part and by default
> >>>> never try to allocate more than a fraction of the total number
> >>>> of buffers,
> >>>
> >>> Well what fraction should that be?  It surely depends on how many
> >>> interfaces are in the system and how many queues the other interfaces
> >>> have.
> >>
> >>>> if necessary reducing the number of active queues.
> >>>
> >>> So now I have too few queues on my interface even after I increase the
> >>> limit.
> >>>
> >>> There ought to be a standard way to configure numbers of queues and
> >>> default queue lengths.
> >>
> >> Jack raised the problem that there is a poorly chosen default for
> >> nmbclusters, causing one interface to consume all the buffers.
> >> If the user explicitly overrides the value then
> >> the number of cluster should be what the user asks (memory permitting).
> >> The next step is on devices: if there are no overrides, the default
> >> for a driver is to be lean. I would say that topping the request between
> >> 1/4 and 1/8 of the total buffers is surely better than the current
> >> situation. Of course if there is an explicit override, then use
> >> it whatever happens to the others.
> >>
> >> cheers
> >> luigi
> >>
> >
> > Hmmm, well, I could make the default use only 1 queue or something like
> > that,
> > was thinking more of what actual users of the hardware would want.
> >
>
> I think this is more reasonable to setup interface with one queue.
> Even if the cluster does not hit the max you will end up with unbalanced
> setting that
> let very low mbuf count for other uses.
>

If interfaces have the possibility to use more queues, they should, imo so
I'm all for rasing the default size.

For those systems with very limited memory it's easily changed.


>
> > After the installed kernel is booted and the admin would do whatever post
> > install
> > modifications they wish it could be changed, along with nmbclusters.
> >
> > This was why i sought opinions, of the algorithm itself, but also anyone
> > using
> > ixgbe and igb in heavy use, what would you find most convenient?
> >
> > Jack
> > ___
> > freebsd-...@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-23 Thread Fabien Thomas

Le 22 févr. 2012 à 22:51, Jack Vogel a écrit :

> On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo  wrote:
> 
>> On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote:
>>> On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote:
>> ...
>>>> I have hit this problem recently, too.
>>>> Maybe the issue mostly/only exists on 32-bit systems.
>>> 
>>> No, we kept hitting mbuf pool limits on 64-bit systems when we started
>>> working on FreeBSD support.
>> 
>> ok never mind then, the mechanism would be the same, though
>> the limits (especially VM_LIMIT) would be different.
>> 
>>>> Here is a possible approach:
>>>> 
>>>> 1. nmbclusters consume the kernel virtual address space so there
>>>>   must be some upper limit, say
>>>> 
>>>>VM_LIMIT = 256000 (translates to 512MB of address space)
>>>> 
>>>> 2. also you don't want the clusters to take up too much of the
>> available
>>>>   memory. This one would only trigger for minimal-memory systems,
>>>>   or virtual machines, but still...
>>>> 
>>>>MEM_LIMIT = (physical_ram / 2) / 2048
>>>> 
>>>> 3. one may try to set a suitably large, desirable number of buffers
>>>> 
>>>>TARGET_CLUSTERS = 128000
>>>> 
>>>> 4. and finally we could use the current default as the absolute minimum
>>>> 
>>>>MIN_CLUSTERS = 1024 + maxusers*64
>>>> 
>>>> Then at boot the system could say
>>>> 
>>>>nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT)
>>>> 
>>>>nmbclusters = max(nmbclusters, MIN_CLUSTERS)
>>>> 
>>>> 
>>>> In turn, i believe interfaces should do their part and by default
>>>> never try to allocate more than a fraction of the total number
>>>> of buffers,
>>> 
>>> Well what fraction should that be?  It surely depends on how many
>>> interfaces are in the system and how many queues the other interfaces
>>> have.
>> 
>>>> if necessary reducing the number of active queues.
>>> 
>>> So now I have too few queues on my interface even after I increase the
>>> limit.
>>> 
>>> There ought to be a standard way to configure numbers of queues and
>>> default queue lengths.
>> 
>> Jack raised the problem that there is a poorly chosen default for
>> nmbclusters, causing one interface to consume all the buffers.
>> If the user explicitly overrides the value then
>> the number of cluster should be what the user asks (memory permitting).
>> The next step is on devices: if there are no overrides, the default
>> for a driver is to be lean. I would say that topping the request between
>> 1/4 and 1/8 of the total buffers is surely better than the current
>> situation. Of course if there is an explicit override, then use
>> it whatever happens to the others.
>> 
>> cheers
>> luigi
>> 
> 
> Hmmm, well, I could make the default use only 1 queue or something like
> that,
> was thinking more of what actual users of the hardware would want.
> 

I think this is more reasonable to setup interface with one queue.
Even if the cluster does not hit the max you will end up with unbalanced 
setting that
let very low mbuf count for other uses.


> After the installed kernel is booted and the admin would do whatever post
> install
> modifications they wish it could be changed, along with nmbclusters.
> 
> This was why i sought opinions, of the algorithm itself, but also anyone
> using
> ixgbe and igb in heavy use, what would you find most convenient?
> 
> Jack
> ___
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-22 Thread Zaphod Beeblebrox
It could do some good to think of the scale of the problem and maybe
the driver can tune to the hardware.

First, is 8k packet buffers a reasonable default on a GigE port?
Well... on a GigE port, you could have from 100k pps (packets per
second) at 1500 bytes to 500k pps at around 300 bytes to truly
pathological rates of packets (2M pps at the Ethernet-minimum of 64
bytes).  8k buffers vanish in 1/10th of a second in the 1500 byte case
and that doesn't even really speak to the buffers getting emptied by
other software.

Do you maybe want to have a switch whereby the GigE port is in
performance or non-performance mode?  Do you want to assume that
systems with GigE ports are also not pathologically low in memory?
Perhaps in 10 or 100 megabit mode, the driver should make smaller
rings?

For that matter, if mbufs come in a page's worth at a time, what's the
drawback of scaling them up and down with network vs. memory vs. cache
pressure?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-22 Thread Jack Vogel
On Wed, Feb 22, 2012 at 1:44 PM, Luigi Rizzo  wrote:

> On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote:
> > On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote:
> ...
> > > I have hit this problem recently, too.
> > > Maybe the issue mostly/only exists on 32-bit systems.
> >
> > No, we kept hitting mbuf pool limits on 64-bit systems when we started
> > working on FreeBSD support.
>
> ok never mind then, the mechanism would be the same, though
> the limits (especially VM_LIMIT) would be different.
>
> > > Here is a possible approach:
> > >
> > > 1. nmbclusters consume the kernel virtual address space so there
> > >must be some upper limit, say
> > >
> > > VM_LIMIT = 256000 (translates to 512MB of address space)
> > >
> > > 2. also you don't want the clusters to take up too much of the
> available
> > >memory. This one would only trigger for minimal-memory systems,
> > >or virtual machines, but still...
> > >
> > > MEM_LIMIT = (physical_ram / 2) / 2048
> > >
> > > 3. one may try to set a suitably large, desirable number of buffers
> > >
> > > TARGET_CLUSTERS = 128000
> > >
> > > 4. and finally we could use the current default as the absolute minimum
> > >
> > > MIN_CLUSTERS = 1024 + maxusers*64
> > >
> > > Then at boot the system could say
> > >
> > > nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT)
> > >
> > > nmbclusters = max(nmbclusters, MIN_CLUSTERS)
> > >
> > >
> > > In turn, i believe interfaces should do their part and by default
> > > never try to allocate more than a fraction of the total number
> > > of buffers,
> >
> > Well what fraction should that be?  It surely depends on how many
> > interfaces are in the system and how many queues the other interfaces
> > have.
>
> > > if necessary reducing the number of active queues.
> >
> > So now I have too few queues on my interface even after I increase the
> > limit.
> >
> > There ought to be a standard way to configure numbers of queues and
> > default queue lengths.
>
> Jack raised the problem that there is a poorly chosen default for
> nmbclusters, causing one interface to consume all the buffers.
> If the user explicitly overrides the value then
> the number of cluster should be what the user asks (memory permitting).
> The next step is on devices: if there are no overrides, the default
> for a driver is to be lean. I would say that topping the request between
> 1/4 and 1/8 of the total buffers is surely better than the current
> situation. Of course if there is an explicit override, then use
> it whatever happens to the others.
>
> cheers
> luigi
>

Hmmm, well, I could make the default use only 1 queue or something like
that,
was thinking more of what actual users of the hardware would want.

After the installed kernel is booted and the admin would do whatever post
install
modifications they wish it could be changed, along with nmbclusters.

This was why i sought opinions, of the algorithm itself, but also anyone
using
ixgbe and igb in heavy use, what would you find most convenient?

Jack
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-22 Thread Luigi Rizzo
On Wed, Feb 22, 2012 at 09:09:46PM +, Ben Hutchings wrote:
> On Wed, 2012-02-22 at 21:52 +0100, Luigi Rizzo wrote:
...
> > I have hit this problem recently, too.
> > Maybe the issue mostly/only exists on 32-bit systems.
> 
> No, we kept hitting mbuf pool limits on 64-bit systems when we started
> working on FreeBSD support.

ok never mind then, the mechanism would be the same, though
the limits (especially VM_LIMIT) would be different.

> > Here is a possible approach:
> > 
> > 1. nmbclusters consume the kernel virtual address space so there 
> >must be some upper limit, say 
> > 
> > VM_LIMIT = 256000 (translates to 512MB of address space)
> > 
> > 2. also you don't want the clusters to take up too much of the available
> >memory. This one would only trigger for minimal-memory systems,
> >or virtual machines, but still...
> > 
> > MEM_LIMIT = (physical_ram / 2) / 2048
> > 
> > 3. one may try to set a suitably large, desirable number of buffers
> > 
> > TARGET_CLUSTERS = 128000
> > 
> > 4. and finally we could use the current default as the absolute minimum
> > 
> > MIN_CLUSTERS = 1024 + maxusers*64
> > 
> > Then at boot the system could say
> > 
> > nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT)
> > 
> > nmbclusters = max(nmbclusters, MIN_CLUSTERS)
> > 
> > 
> > In turn, i believe interfaces should do their part and by default
> > never try to allocate more than a fraction of the total number
> > of buffers,
> 
> Well what fraction should that be?  It surely depends on how many
> interfaces are in the system and how many queues the other interfaces
> have.

> > if necessary reducing the number of active queues.
> 
> So now I have too few queues on my interface even after I increase the
> limit.
> 
> There ought to be a standard way to configure numbers of queues and
> default queue lengths.

Jack raised the problem that there is a poorly chosen default for
nmbclusters, causing one interface to consume all the buffers.
If the user explicitly overrides the value then
the number of cluster should be what the user asks (memory permitting).
The next step is on devices: if there are no overrides, the default
for a driver is to be lean. I would say that topping the request between
1/4 and 1/8 of the total buffers is surely better than the current
situation. Of course if there is an explicit override, then use
it whatever happens to the others.

cheers
luigi
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-22 Thread Luigi Rizzo
On Wed, Feb 22, 2012 at 11:56:29AM -0800, Jack Vogel wrote:
> Using igb and/or ixgbe on a reasonably powered server requires 1K mbuf
> clusters per MSIX vector,
> that's how many are in a ring. Either driver will configure 8 queues on a
> system with that many or more
> cores, so 8K clusters per port...
> 
> My test engineer has a system with 2 igb ports, and 2 10G ixgbe, this is
> hardly heavy duty, and yet this
> exceeds the default mbuf pool on the installed kernel (1024 + maxusers *
> 64).
> 
> Now, this can be immediately fixed by a sysadmin after that first boot, but
> it does result in the second
> driver that gets started to complain about inadequate buffers.
> 
> I think the default calculation is dated and should be changed, but am not
> sure the best way, so are
> there suggestions/opinions about this, and might we get it fixed before 8.3
> is baked?

I have hit this problem recently, too.
Maybe the issue mostly/only exists on 32-bit systems.
Here is a possible approach:

1. nmbclusters consume the kernel virtual address space so there 
   must be some upper limit, say 

VM_LIMIT = 256000 (translates to 512MB of address space)

2. also you don't want the clusters to take up too much of the available
   memory. This one would only trigger for minimal-memory systems,
   or virtual machines, but still...

MEM_LIMIT = (physical_ram / 2) / 2048

3. one may try to set a suitably large, desirable number of buffers

TARGET_CLUSTERS = 128000

4. and finally we could use the current default as the absolute minimum

MIN_CLUSTERS = 1024 + maxusers*64

Then at boot the system could say

    nmbclusters = min(TARGET_CLUSTERS, VM_LIMIT, MEM_LIMIT)

nmbclusters = max(nmbclusters, MIN_CLUSTERS)


In turn, i believe interfaces should do their part and by default
never try to allocate more than a fraction of the total number
of buffers, if necessary reducing the number of active queues.

what do people think ?

cheers
luigi
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


nmbclusters: how do we want to fix this for 8.3 ?

2012-02-22 Thread Jack Vogel
Using igb and/or ixgbe on a reasonably powered server requires 1K mbuf
clusters per MSIX vector,
that's how many are in a ring. Either driver will configure 8 queues on a
system with that many or more
cores, so 8K clusters per port...

My test engineer has a system with 2 igb ports, and 2 10G ixgbe, this is
hardly heavy duty, and yet this
exceeds the default mbuf pool on the installed kernel (1024 + maxusers *
64).

Now, this can be immediately fixed by a sysadmin after that first boot, but
it does result in the second
driver that gets started to complain about inadequate buffers.

I think the default calculation is dated and should be changed, but am not
sure the best way, so are
there suggestions/opinions about this, and might we get it fixed before 8.3
is baked?

Cheers,

Jack
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nmbclusters: how do we want to fix this for 8.3 ?

2012-02-22 Thread Arnaud Lacombe
Hi,

On Wed, Feb 22, 2012 at 2:56 PM, Jack Vogel  wrote:
> Using igb and/or ixgbe on a reasonably powered server requires 1K mbuf
> clusters per MSIX vector,
> that's how many are in a ring. Either driver will configure 8 queues on a
> system with that many or more
> cores, so 8K clusters per port...
>
> My test engineer has a system with 2 igb ports, and 2 10G ixgbe, this is
> hardly heavy duty, and yet this
> exceeds the default mbuf pool on the installed kernel (1024 + maxusers *
> 64).
>
> Now, this can be immediately fixed by a sysadmin after that first boot, but
> it does result in the second
> driver that gets started to complain about inadequate buffers.
>
> I think the default calculation is dated and should be changed, but am not
> sure the best way, so are
> there suggestions/opinions about this, and might we get it fixed before 8.3
> is baked?
>
get rid of the limit once and for all, it is pointless.

 - Arnaud
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: nmbclusters

2006-03-29 Thread Chris
On 29/03/06, Robert Watson <[EMAIL PROTECTED]> wrote:
>
> On Wed, 29 Mar 2006, Dag-Erling Smørgrav wrote:
>
> > "Conrad J. Sabatier" <[EMAIL PROTECTED]> writes:
> >> Chris <[EMAIL PROTECTED]> wrote:
> >>> so [kern.ipc.nmbclusters] has no affect, has this become a read only
> >>> tunable again only settable in loader.conf?
> >> To the best of my knowledge, this has *always* been a loader tunable,
> >> not configurable on-the-fly.
> >
> > kern.ipc.nmbclusters is normally computed at boot time.  A compile- time
> > option to override it was introduced in 2.0-CURRENT.  At that time, it was
> > defined in param.c.  A read-only sysctl was introduced in 3.0-CURRENT.  It
> > moved from param.c to uipc_mbuf.c in 4.0-CURRENT, then to subr_mbuf.c when
> > mballoc was introduced in 5.0-CURRENT; became a tunable at some point after
> > that; then moved again to kern_mbuf.c when mballoc was replaced with mbuma
> > in 6.0-CURRENT.  That is the point where it became read-write, for no good
> > reason that I can see; setting it at runtime has no effect, because the size
> > of the mbuf zone is determined at boot time.  Perhaps Bosko (who wrote both
> > mballoc and mbuma, IIRC) knows.
>
> Paul Saab from Yahoo! has a set of patches that allow run-time nmbclusters
> changes to be implemented -- while it won't cause the freeing of clusters
> referenced, it goes through and recalculates dependent variables, propagates
> them into UMA, etc.  I believe they're running with this patch on 6.x, and I
> expect that they will be merged to -CURRENT and -STABLE in the relatively near
> future.  Not before 6.1, however.
>
> If the nmbclusters setting really has no effect right now, we should mark the
> sysctl as read-only to make it more clear it doesn't, since allowing it to be
> set without taking effect is counter-intuitive.
>
> Robert N M Watson
>

thanks for everyones responses.  My 5.4 servers seem to accept the
tunable been changed at runtime although this could be a bug and it
isnt really changing?  4.x I know was a boot only tunable.  I have
tested this now in loader.conf and it works so it seems its a minor
bug where the error message is replaced by a false success output.

Chris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nmbclusters

2006-03-29 Thread Dag-Erling Smørgrav
Chris <[EMAIL PROTECTED]> writes:
> thanks for everyones responses.  My 5.4 servers seem to accept the
> tunable been changed at runtime although this could be a bug and it
> isnt really changing?

You can change it, but it has no effect.

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nmbclusters

2006-03-29 Thread Robert Watson


On Wed, 29 Mar 2006, Dag-Erling Smørgrav wrote:


"Conrad J. Sabatier" <[EMAIL PROTECTED]> writes:

Chris <[EMAIL PROTECTED]> wrote:

so [kern.ipc.nmbclusters] has no affect, has this become a read only
tunable again only settable in loader.conf?

To the best of my knowledge, this has *always* been a loader tunable,
not configurable on-the-fly.


kern.ipc.nmbclusters is normally computed at boot time.  A compile- time 
option to override it was introduced in 2.0-CURRENT.  At that time, it was 
defined in param.c.  A read-only sysctl was introduced in 3.0-CURRENT.  It 
moved from param.c to uipc_mbuf.c in 4.0-CURRENT, then to subr_mbuf.c when 
mballoc was introduced in 5.0-CURRENT; became a tunable at some point after 
that; then moved again to kern_mbuf.c when mballoc was replaced with mbuma 
in 6.0-CURRENT.  That is the point where it became read-write, for no good 
reason that I can see; setting it at runtime has no effect, because the size 
of the mbuf zone is determined at boot time.  Perhaps Bosko (who wrote both 
mballoc and mbuma, IIRC) knows.


Paul Saab from Yahoo! has a set of patches that allow run-time nmbclusters 
changes to be implemented -- while it won't cause the freeing of clusters 
referenced, it goes through and recalculates dependent variables, propagates 
them into UMA, etc.  I believe they're running with this patch on 6.x, and I 
expect that they will be merged to -CURRENT and -STABLE in the relatively near 
future.  Not before 6.1, however.


If the nmbclusters setting really has no effect right now, we should mark the 
sysctl as read-only to make it more clear it doesn't, since allowing it to be 
set without taking effect is counter-intuitive.


Robert N M Watson___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: nmbclusters

2006-03-29 Thread Bosko Milekic
It's always only been boot-time tunable (well, "always" is of course
relative to my time with FreeBSD -- Dag-Erling has been around longer
and therefore recounts its more comprehensive history).  In
6.0-CURRENT there was an intention to make it sysctl (runtime)
tunable, as it finally became at least theoretically possible to do
so.

I have recently seen a patch floating around from Paul Saab (ps@) who
has finally made it runtime tunable -- at least enough so that it can
be _increased_.  Not sure if he has committed it, yet.  Note that
_decreasing_ nmbclusters at run-time will probably never be possible
-- implementing is too difficult for what it would be worth.

Cheers,
Bosko

On 3/29/06, Dag-Erling Smørgrav <[EMAIL PROTECTED]> wrote:
> "Conrad J. Sabatier" <[EMAIL PROTECTED]> writes:
> > Chris <[EMAIL PROTECTED]> wrote:
> > > so [kern.ipc.nmbclusters] has no affect, has this become a read only
> > > tunable again only settable in loader.conf?
> > To the best of my knowledge, this has *always* been a loader tunable,
> > not configurable on-the-fly.
>
> kern.ipc.nmbclusters is normally computed at boot time.  A compile-
> time option to override it was introduced in 2.0-CURRENT.  At that
> time, it was defined in param.c.  A read-only sysctl was introduced in
> 3.0-CURRENT.  It moved from param.c to uipc_mbuf.c in 4.0-CURRENT,
> then to subr_mbuf.c when mballoc was introduced in 5.0-CURRENT; became
> a tunable at some point after that; then moved again to kern_mbuf.c
> when mballoc was replaced with mbuma in 6.0-CURRENT.  That is the
> point where it became read-write, for no good reason that I can see;
> setting it at runtime has no effect, because the size of the mbuf zone
> is determined at boot time.  Perhaps Bosko (who wrote both mballoc and
> mbuma, IIRC) knows.
>
> DES
> --
> Dag-Erling Smørgrav - [EMAIL PROTECTED]
>


--
Bosko Milekic <[EMAIL PROTECTED]>

To see all the content I generate on the web, check out my Peoplefeeds
profile at:
http://peoplefeeds.com/bosko
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nmbclusters

2006-03-29 Thread Dag-Erling Smørgrav
"Conrad J. Sabatier" <[EMAIL PROTECTED]> writes:
> Chris <[EMAIL PROTECTED]> wrote:
> > so [kern.ipc.nmbclusters] has no affect, has this become a read only
> > tunable again only settable in loader.conf?
> To the best of my knowledge, this has *always* been a loader tunable,
> not configurable on-the-fly.

kern.ipc.nmbclusters is normally computed at boot time.  A compile-
time option to override it was introduced in 2.0-CURRENT.  At that
time, it was defined in param.c.  A read-only sysctl was introduced in
3.0-CURRENT.  It moved from param.c to uipc_mbuf.c in 4.0-CURRENT,
then to subr_mbuf.c when mballoc was introduced in 5.0-CURRENT; became
a tunable at some point after that; then moved again to kern_mbuf.c
when mballoc was replaced with mbuma in 6.0-CURRENT.  That is the
point where it became read-write, for no good reason that I can see;
setting it at runtime has no effect, because the size of the mbuf zone
is determined at boot time.  Perhaps Bosko (who wrote both mballoc and
mbuma, IIRC) knows.

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nmbclusters

2006-03-29 Thread Conrad J. Sabatier
On Tue, 28 Mar 2006 20:34:18 +0100
Chris <[EMAIL PROTECTED]> wrote:

> Using 6.0 release latest security branch.
> 
> netstat -m
> 69/576/645 mbufs in use (current/cache/total)
> 65/261/326/33792 mbuf clusters in use (current/cache/total/max)
> 0/38/8704 sfbufs in use (current/peak/max)
> 147K/666K/813K bytes allocated to network (current/cache/total)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 29780 requests for I/O initiated by sendfile
> 633 calls to protocol drain routines
> 
> sysctl kern.ipc.nmbclusters
> kern.ipc.nmbclusters: 65536
> 
> sysctl kern.ipc.nmbclusters=25000
> kern.ipc.nmbclusters: 65536 -> 25000
> 
> 70/575/645 mbufs in use (current/cache/total)
> 64/262/326/33792 mbuf clusters in use (current/cache/total/max)
> 0/38/8704 sfbufs in use (current/peak/max)
> 145K/667K/813K bytes allocated to network (current/cache/total)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 29780 requests for I/O initiated by sendfile
> 633 calls to protocol drain routines
> 
> so the sysctl variable has no affect, has this become a read only
> tunable again only settable in loader.conf?

To the best of my knowledge, this has *always* been a loader tunable,
not configurable on-the-fly.

Myself, ever since the introduction quite some time ago of the
"friendly" setting of 0 (for unlimited mbufs), I've always used that in
my /boot/loader.conf, i.e., kern.ipc.nmbclusters="0".

Can't really comment on your other questions, I'm afraid.  :-)

> if yes then their is a bug
> where it shows no error on sysctl command, or is it suppoedbly
> settable then their is a bug where it doesnt work or netstat -m shows
> inccorect info.  Or is this setting been depreciated?
> 
> Also if the machine stops responding, and no kernel panic logged does
> it mean a livelock/deadlock?  Have been seeing issues on 3 diff 6.0
> release servers which simply go dead.  2 were rolled back to 5.4 and
> immediatly became stable and I left this one on 6.0 to try and resolve
> problems but diffilcult with no log entries.
> 
> Thanks
> 
> Chris

-- 
Conrad J. Sabatier <[EMAIL PROTECTED]> -- "In Unix veritas"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


nmbclusters

2006-03-28 Thread Chris
Using 6.0 release latest security branch.

netstat -m
69/576/645 mbufs in use (current/cache/total)
65/261/326/33792 mbuf clusters in use (current/cache/total/max)
0/38/8704 sfbufs in use (current/peak/max)
147K/666K/813K bytes allocated to network (current/cache/total)
0 requests for sfbufs denied
0 requests for sfbufs delayed
29780 requests for I/O initiated by sendfile
633 calls to protocol drain routines

sysctl kern.ipc.nmbclusters
kern.ipc.nmbclusters: 65536

sysctl kern.ipc.nmbclusters=25000
kern.ipc.nmbclusters: 65536 -> 25000

70/575/645 mbufs in use (current/cache/total)
64/262/326/33792 mbuf clusters in use (current/cache/total/max)
0/38/8704 sfbufs in use (current/peak/max)
145K/667K/813K bytes allocated to network (current/cache/total)
0 requests for sfbufs denied
0 requests for sfbufs delayed
29780 requests for I/O initiated by sendfile
633 calls to protocol drain routines

so the sysctl variable has no affect, has this become a read only
tunable again only settable in loader.conf? if yes then their is a bug
where it shows no error on sysctl command, or is it suppoedbly
settable then their is a bug where it doesnt work or netstat -m shows
inccorect info.  Or is this setting been depreciated?

Also if the machine stops responding, and no kernel panic logged does
it mean a livelock/deadlock?  Have been seeing issues on 3 diff 6.0
release servers which simply go dead.  2 were rolled back to 5.4 and
immediatly became stable and I left this one on 6.0 to try and resolve
problems but diffilcult with no log entries.

Thanks

Chris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: mbuf clusters behavior (NMBCLUSTERS)

2002-07-16 Thread Naoyuki Tai


If someone can answer this question, I can probably start troubleshoot
the problem.

After nfs server uses up the mbuf clusters, how are the mbuf clusters
flushed?

Should I try to change "maxusers 0" to a number like 10?

One thing I forgot to mention is that the file system uses soft update.
Would it matter?

At Mon, 15 Jul 2002 18:37:59 -0400,
Bosko Milekic wrote:
> 
> On Mon, Jul 15, 2002 at 05:43:24PM -0400, ?$BED0fD>G7?(B wrote:
> [...]
> > If I do not suspend the copy command, the mbuf clusters hit the
> > max and the server starts to drop the packets. It slows down the nfs 
> > serving severly due to its nfs retry.
> 
>  Are you sure that you're actually running out of address space? (i.e.,
>  does `netstat -m' finally show a totally exhausted cluster pool?)
>  It is also possible to get the messages you mention if, for example,
>  you run out of available RAM.
> 
> > How can I prevent this "mbuf clusters exhaustion"?
> 
>  Assuming this is really you running out of clusters or mbufs because
>  NMBCLUSTERS is too low and not because you're out of RAM, you can take
>  a look at setting per-UID sbsize (socket buffer) limits.
>
> > Increasing mbuf clusters is not an option. There is no way that I 
> > can allocate big enough for all of files I copy.
> > 
> > Relating to this, I do not understand why that the mbuf clusters
> > are not freed fast enough. I watched "top" and it does seem to be
> > that CPU is not exhausted.
> > After all, I'm copying less than 10Mbyte/sec, probably 6 - 7 Mbytes
> > at most.
> > Hard disk is a Seagate ATA/IV 60Gbyte.The drive is hooked up to a 
> > Promise PCI ATA/UDMA 100 controller card.
> > bonnie shows that it can sustain 15M - 20M bytes read/write. 
> 
>  Uhm, I've been following you up to this point.  I'm not sure why
>  clusters are not being released faster, either, however: what exactly
>  are you copying from, and what are you copying to?  I'm assuming you're
>  copying to an NFS mount somewhere?

Please read the original message again. I believe that I included all
the information I can think of. In summary:

1. Simple nfs file copy exhausts the mbuf clusters
2. When mbuf clusters are exhausted, the whole copy process suspends
   for 10 - 20 seconds.
3. One NFS server: FreeBSD 4.6-stable and Client: RH 7.3
4. I tried various size of mbuf clusters.
5. NMBCLUSTERS is not too low. I assigned more than recommended.
   32768 (64M) is for multiuser large scale server. For this case, 
   one server and one client, I gave it the double of it, and it is
   still no go.
6. The size of mbuf clusters is 128Mbytes and I can not increase any
   more.

> > Is there anything I can try?
> 
>  Check your memory usage in top.

Why? What do you think that what's hapenning?

Slashing? Very unlikely. The FreeBSD server has no other client for
the test or any other major service running. If so, I'd have mentioned
in the original message.


> > Thank you!
> > 
> > -- 
> > [EMAIL PROTECTED], Naoyuki "Tai" Tai
> > 
> > P.S.
> > 
> > I sent this message to freebsd-questions, but, did not get any
> > response. If you've seen this message before, sorry.
> 
> Regards,
> -- 
> Bosko Milekic
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> 

Have a good day!

-- 
[EMAIL PROTECTED], Naoyuki "Tai" Tai

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: AW: mbuf clusters behavior (NMBCLUSTERS)

2002-07-16 Thread Andrew Gallatin



Alexander Maret writes:
 > > -Ursprungliche Nachricht-
 > > Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
 > > 
 > > When I copy a 500M file (like iso image) from the workstation to the
 > > server, the server starts to emit:
 > > 
 > > Jul 12 09:28:54 nile /kernel: All mbuf clusters exhausted, 
 > > please see tuning(7).
 > 
 > I'm getting the same error here with 4.6-R.
 >  
 > > So, I bumped up the nmbclusters to kern.ipc.nmbclusters=65536
 > > I allocated 128Mbytes to the mbuf clusters, hoping that it is 
 > > big enough.
 > > But, it still shows that the same 
 > > 
 > > All mbuf clusters exhausted, please see tuning(7).
 > 
 > The same for me.
 > 
 > 
 > > How can I prevent this "mbuf clusters exhaustion"?
 > 
 > I would be interested in an answer, too.


RH Linux has a (really dumb) default of using UDP mounts with a 16K
read and write size.  A small percentage of packet loss will cause
many fragments to build up on the IP reassembly queues and run you out
of mbuf clusters.

There are 3 things you could do to fix this:

a) Use more reasonable NFS mount options on the linux boxes.  8K UDP
should help quite a bit, 32K TCP should totally eliminate the problem.

b) Drastically reduce net.inet.ip.maxfragpackets (make sure that
(maxfragpackets * (16384/1472)) < (maxmbufclusters - some_constant).
This will limit the amount of packets (and therefor mbufs) on the IP
reassembly queues.  This may could destroy your observed NFS write
performance from the linux side if you do not also implement (a)
above. 

c) Eliminate all packet loss in your network between the client and
server.  Not likely,  as I suspect some of the packet loss might even
be inside the linux or freebsd boxes.

I suggest you do both (a) and (b).


 >  
 > > Relating to this, I do not understand why that the mbuf clusters
 > > are not freed fast enough. I watched "top" and it does seem to be


IP reassembly queues are timed-out out after IPFRAGTTL, which is quite
a long time.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



AW: mbuf clusters behavior (NMBCLUSTERS)

2002-07-16 Thread Alexander Maret
> -Ursprungliche Nachricht-
> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> 
> When I copy a 500M file (like iso image) from the workstation to the
> server, the server starts to emit:
> 
> Jul 12 09:28:54 nile /kernel: All mbuf clusters exhausted, 
> please see tuning(7).

I'm getting the same error here with 4.6-R.
 
> So, I bumped up the nmbclusters to kern.ipc.nmbclusters=65536
> I allocated 128Mbytes to the mbuf clusters, hoping that it is 
> big enough.
> But, it still shows that the same 
> 
> All mbuf clusters exhausted, please see tuning(7).

The same for me.


> How can I prevent this "mbuf clusters exhaustion"?

I would be interested in an answer, too.

 
> Relating to this, I do not understand why that the mbuf clusters
> are not freed fast enough. I watched "top" and it does seem to be
> that CPU is not exhausted.
> After all, I'm copying less than 10Mbyte/sec, probably 6 - 7 Mbytes
> at most.
> Hard disk is a Seagate ATA/IV 60Gbyte.The drive is hooked up to a 
> Promise PCI ATA/UDMA 100 controller card.
> bonnie shows that it can sustain 15M - 20M bytes read/write. 

I'm trying to copy data from an 1GHZ Mobile PIII Linux installation
with IDE disks to an AMD Athlon 700 FBSD 4.6-R installation with
Ultra 160 SCSI disks over NFS. I'm wondering too why the FreeBSD box can't
cope with my lousy linux installation. Didn't have time to investigate
though.


> Is there anything I can try?


Greetings,
Alex

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message