Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-25 Thread Bruce Evans

On Wed, 24 Sep 2008 [EMAIL PROTECTED] wrote:


At Wed, 24 Sep 2008 12:53:31 -0700,
John-Mark Gurney wrote:


George V. Neville-Neil wrote this message on Tue, Sep 23, 2008 at 15:29 -0400:

It turns out that the last time anyone looked at this constant was
before 1994 and it's very likely time to turn it into a kernel
tunable.  On hosts that have a high rate of packet transmission
packets can be dropped at the interface queue because this value is
too small.  Rather than make a sweeping code change I propose the
following change to the macro and updating a couple of places in the
IP and IPv6 stacks that were using this macro to set their own global
variables.


The global value is rarely used (except by low-end and/or 10 Mbps hardware
where it doesn't matter).  One important place where it is used (that you
changed) is for ipintrq, but this should have a different default anyway
and already has its own sysctl to fix up the broken default.


The better solution is to resurrect rwatson's patch that eliminates the
interface queue, and does direct dispatch to the ethernet driver..


If this were really better, then rwatson would have committed it.


Usually the driver has a queue of 512 or more packets already, so putting
them into a second queue doesn't provide much benefit besides increasing
the amount of locking necessary to deliver packets...


No, the second queue provides the fairly large benefit of doing watermark
processing via double buffering.  Most or all network drivers do no
watermark processing except accidentally via the second queue.  Even
with watermarks in the driver queue, it still takes a fairly large
external queue to prevent the transmitter running dry under load.
E.g., in my version of the bge driver, there is essentially a watermark
at 112 descriptors before the end of the driver tx queue.  Under load,
the driver tx queue is normally filled with 496 descriptors on every
output interrupt.  The driver enters state IFF_DRV_OACTIVE and upper
layers stop trying to transfer packets to the driver queue.  The next
output interrupt occurs after 384 of the 496 descriptors have been
handled, so that 112 remain.  Then IFF_DRV_OACTIVE is cleared and upper
layers resume transferring packets to the driver queue.  For output
to stream, and for efficiency, it is essential for upper layers to
have plenty of packets to transfer at this point, but without the
double buffering they would have none.  When things are working right,
the upper layers produce more than 384 descriptors by transferring at
this point so that the driver queue fills again.  For output to stream,
it is essential to produce 1 new descriptor faster than the hardware
can handle the 112 remaining ones and then keep producing new descriptors
faster than the hardware can handle the previous new ones.  With my
hardware, the 112 descriptors take about 150 usec to handle.  Buffering
in userland is too far away to deliver new packets within this time in
many cases (though syscalls are fast enough, scheduling delays are
usually too long).


Actually I am making this change because I found on 10G hardware the
queue is too small.  Also, there are many systems where you might want
to up this, usually ones that are highly biased towards transmit only,
like a multicast repeater of some sort.


You would rarely want the globabl default to apply to all interfaces.

Anywyay, interfaces with a hardware queue length (er, ring count) of
512 normally don't use the default, but use their ring count (bogusly
less 1) for the software queue length too.  This can be too small too.
In some cases, since select() for writing on sockets attached to
hardware doesn't work (it should block if all queues are full and
return soon IFF_DRV_ACTIVE transitions from set to clear and the queues
become non-full, but it only unerstands the socket buffer and doesn't
look at the queues), context switches to the application producing the
packets is delayed until at least the next scheduling tick or two,
which happens 2-20 msec later.  I use sendq lengths of ~2 for bge
and em to prevent the queues running dry with a scheduling latency of
20 msec.  (10 Gbps would require prepostorous queue lengths of ~20
for the same result.) However, even a queue length of (512+512-1) is
bad for performance.  Cycling through mbufs in a circular way ensures
busting of caches once the queue length is large enough for the memory
to exceed the cache size[s].  With a queue length of 50 and today's
cache sizes, it is almost possible to fit everything into the L1 cache.

Bruce
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-24 Thread Bruce M. Simpson

Hi,

I agree with the intent of the change that IPv4 and IPv6 input queues 
should have a tunable queue length. However, the change provided is 
going to make the definition of IFQ_MAXLEN global and dependent upon a 
variable.


[EMAIL PROTECTED] wrote:

Hi,

It turns out that the last time anyone looked at this constant was
before 1994 and it's very likely time to turn it into a kernel
tunable.  On hosts that have a high rate of packet transmission
packets can be dropped at the interface queue because this value is
too small.  Rather than make a sweeping code change I propose the
following change to the macro and updating a couple of places in the
IP and IPv6 stacks that were using this macro to set their own global
variables.
  


This isn't appropriate for many uses of ifq's which might be internal to 
a given driver or subsystem, and which may use IFQ_MAXLEN for 
convenience, as Ruslan has pointed out. I have code elsewhere which does 
this.


Can you please do this on a per-protocol stack basis? i.e. give IPv4 and 
IPv6 their own TUNABLE queue length.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-24 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

...
I found no occurrences of the above in our code base.  I used cscope
to search all of src/sys.  Are you aware of any occurrences of this?
  


I have been using IFQ_MAXLEN to size buffer queues internal to some 
IGMPv3 stuff.


I don't feel comfortable with a change which sizes the queues for both 
IPv4 and IPv6 stacks, from a variable which is obscured by a macro.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-24 Thread gnn
At Wed, 24 Sep 2008 15:50:32 +0100,
Bruce M. Simpson wrote:
 
 Hi,
 
 I agree with the intent of the change that IPv4 and IPv6 input queues 
 should have a tunable queue length. However, the change provided is 
 going to make the definition of IFQ_MAXLEN global and dependent upon a 
 variable.
 
 [EMAIL PROTECTED] wrote:
  Hi,
 
  It turns out that the last time anyone looked at this constant was
  before 1994 and it's very likely time to turn it into a kernel
  tunable.  On hosts that have a high rate of packet transmission
  packets can be dropped at the interface queue because this value is
  too small.  Rather than make a sweeping code change I propose the
  following change to the macro and updating a couple of places in the
  IP and IPv6 stacks that were using this macro to set their own global
  variables.

 
 This isn't appropriate for many uses of ifq's which might be internal to 
 a given driver or subsystem, and which may use IFQ_MAXLEN for 
 convenience, as Ruslan has pointed out. I have code elsewhere which does 
 this.
 
 Can you please do this on a per-protocol stack basis? i.e. give IPv4 and 
 IPv6 their own TUNABLE queue length.
 

Actually what we'd need is N of these, since my target is actually the
send queue, not the input queue.  Let me look at this some more.

Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-24 Thread John-Mark Gurney
George V. Neville-Neil wrote this message on Tue, Sep 23, 2008 at 15:29 -0400:
 It turns out that the last time anyone looked at this constant was
 before 1994 and it's very likely time to turn it into a kernel
 tunable.  On hosts that have a high rate of packet transmission
 packets can be dropped at the interface queue because this value is
 too small.  Rather than make a sweeping code change I propose the
 following change to the macro and updating a couple of places in the
 IP and IPv6 stacks that were using this macro to set their own global
 variables.

The better solution is to resurrect rwatson's patch that eliminates the
interface queue, and does direct dispatch to the ethernet driver..
Usually the driver has a queue of 512 or more packets already, so putting
them into a second queue doesn't provide much benefit besides increasing
the amount of locking necessary to deliver packets...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-24 Thread gnn
At Wed, 24 Sep 2008 12:53:31 -0700,
John-Mark Gurney wrote:
 
 George V. Neville-Neil wrote this message on Tue, Sep 23, 2008 at 15:29 -0400:
  It turns out that the last time anyone looked at this constant was
  before 1994 and it's very likely time to turn it into a kernel
  tunable.  On hosts that have a high rate of packet transmission
  packets can be dropped at the interface queue because this value is
  too small.  Rather than make a sweeping code change I propose the
  following change to the macro and updating a couple of places in the
  IP and IPv6 stacks that were using this macro to set their own global
  variables.
 
 The better solution is to resurrect rwatson's patch that eliminates the
 interface queue, and does direct dispatch to the ethernet driver..
 Usually the driver has a queue of 512 or more packets already, so putting
 them into a second queue doesn't provide much benefit besides increasing
 the amount of locking necessary to deliver packets...

Actually I am making this change because I found on 10G hardware the
queue is too small.  Also, there are many systems where you might want
to up this, usually ones that are highly biased towards transmit only,
like a multicast repeater of some sort.

Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-24 Thread Julian Elischer

[EMAIL PROTECTED] wrote:

At Wed, 24 Sep 2008 12:53:31 -0700,
John-Mark Gurney wrote:

George V. Neville-Neil wrote this message on Tue, Sep 23, 2008 at 15:29 -0400:

It turns out that the last time anyone looked at this constant was
before 1994 and it's very likely time to turn it into a kernel
tunable.  On hosts that have a high rate of packet transmission
packets can be dropped at the interface queue because this value is
too small.  Rather than make a sweeping code change I propose the
following change to the macro and updating a couple of places in the
IP and IPv6 stacks that were using this macro to set their own global
variables.

The better solution is to resurrect rwatson's patch that eliminates the
interface queue, and does direct dispatch to the ethernet driver..
Usually the driver has a queue of 512 or more packets already, so putting
them into a second queue doesn't provide much benefit besides increasing
the amount of locking necessary to deliver packets...


Actually I am making this change because I found on 10G hardware the
queue is too small.  Also, there are many systems where you might want
to up this, usually ones that are highly biased towards transmit only,
like a multicast repeater of some sort.



One system I have seen, that I thought made sense used to define
queue length globally in msecs and each interface interpretted
that to a different length.



Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-23 Thread Ruslan Ermilov
Hi,

On Tue, Sep 23, 2008 at 03:29:06PM -0400, [EMAIL PROTECTED] wrote:
 It turns out that the last time anyone looked at this constant was
 before 1994 and it's very likely time to turn it into a kernel
 tunable.  On hosts that have a high rate of packet transmission
 packets can be dropped at the interface queue because this value is
 too small.  Rather than make a sweeping code change I propose the
 following change to the macro and updating a couple of places in the
 IP and IPv6 stacks that were using this macro to set their own global
 variables.
 
 I have tested this in my test lab at work, it is not as yet in
 production at my day job, but will be soon.
 
It's not that bad -- most modern Ethernet drivers initialize interface
input queues themselves, and don't depend on IFQ_MAXLEN.  The IPv4
input queue is tunable via net.inet.ip.intr_queue_maxlen.  The IPv6
queue can similarly be made tunable.  I agree that ifqmaxlen can be
made tunable because there's still a lot of (mostly for old hardware)
drivers that use ifqmaxlen and IFQ_MAXLEN, but I'm against changing
the definition of IFQ_MAXLEN.  Imagine some code like this:

void *x[IFQ_MAXLEN];// here it's 50

And some function that does:

for (i = 0; i  IFQ_MAXLEN; i++) {  // not necessarily 50
x[i] = NULL;
}


Cheers,
-- 
Ruslan Ermilov
[EMAIL PROTECTED]
FreeBSD committer
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-23 Thread gnn
At Wed, 24 Sep 2008 00:17:18 +0400,
Ruslan Ermilov wrote:
 
 Hi,
 
 On Tue, Sep 23, 2008 at 03:29:06PM -0400, [EMAIL PROTECTED] wrote:
  It turns out that the last time anyone looked at this constant was
  before 1994 and it's very likely time to turn it into a kernel
  tunable.  On hosts that have a high rate of packet transmission
  packets can be dropped at the interface queue because this value is
  too small.  Rather than make a sweeping code change I propose the
  following change to the macro and updating a couple of places in the
  IP and IPv6 stacks that were using this macro to set their own global
  variables.
  
  I have tested this in my test lab at work, it is not as yet in
  production at my day job, but will be soon.
  
 It's not that bad -- most modern Ethernet drivers initialize interface
 input queues themselves, and don't depend on IFQ_MAXLEN.  The IPv4
 input queue is tunable via net.inet.ip.intr_queue_maxlen.  The IPv6
 queue can similarly be made tunable.  I agree that ifqmaxlen can be
 made tunable because there's still a lot of (mostly for old hardware)
 drivers that use ifqmaxlen and IFQ_MAXLEN, but I'm against changing
 the definition of IFQ_MAXLEN.  Imagine some code like this:
 

Sorry, this is about the output queue, not the input queue.

Though there are both input and output queues that depend on this.

 void *x[IFQ_MAXLEN];  // here it's 50
 
 And some function that does:
 
 for (i = 0; i  IFQ_MAXLEN; i++) {// not necessarily 50
   x[i] = NULL;
 }
 

I found no occurrences of the above in our code base.  I used cscope
to search all of src/sys.  Are you aware of any occurrences of this?

Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]