Re: Preallocated skb's?

2000-09-16 Thread jamal



On Fri, 15 Sep 2000, Bogdan Costescu wrote:

> On Fri, 15 Sep 2000, jamal wrote:
> 
> > Only the timer runs at HZ granularity ;-<
> 
> Some cards provide their own high resolution timers; latest 3Com cards
> provide several with different purposes (none currently used). The
> question is how many of these also provide the Rx early interrupts.
> You also mentioned an auto-tunable Rx mitigation scheme. How do you
> implement it without using hardware timers ?
> 

Oh, the tulip 21143 explicitly has an interupt mitigation timer; this is
for both the tx and rx. But i see you can also use a general purpose timer
on the NIC to simulate mitigation.
disable rx interupts and other sources of noise (eg rx no buf) and set the
timer to wait a certain number of packet times.
Donald's drivers generally have this scheme built in; however, it is a
one-shot mode on rx work overload (mostly there for interupt sharing
according to one of Donald's old posts).
So what you do instead is have a table of these 'mitigation' values[1]
and select the appropriate mitigation value; i.e you have a pointer that
moves up and down the table and based on the load picks the correct
mitigation value.
When Robert Oks the current tulip, you should be able to see how it is
done there.

> > 20Msec is probably too much time. If my math is not wrong, 1 bit time in
> > a 100Mps is 1 ns; 64 bytes is 512ns.
> 
> I think your are wrong by a factor of 10 here, 1 bit time at 100Mbps
> should be 10 ns. Then 64 bytes is 5.12 us (u=micro). Anyway, this is
> comparable with the time needed to reach ISR, so you can have several
> (but small number) of packets already waiting for processing.
> 
> > You use the period(5-10micros), while waiting
> > for full packet arrival, to make the route decision (lookup etc).
> > i.e this will allow for a better FF; it will not offload things.
> 
> Just that you span several layers by doing this, it's not driver specific
> anymore.

I think we should heed Donald's advice on this early rx. I would take
Donald's word for it; he's been there done that. He knows.
eg the PCI burst issues makes a lot of sense. Unless someone with the
right tools (eg PCI bus monitors) might do some measurements and maybe
challenge Donald ;->

cheers,
jamal

[1] the table would look something like:
table[0] == 1 packet per interupt (default); disable timer
table[1] == 2 packets per interupt
table[3] == 3 packets per interupt 
.
.
etc.
use 64 bytes as the packet size since it is the smallest ethernet size.
as you pointed out that is 51.2 microsecs. so 102.4 microsecs for table[1]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-16 Thread jamal



On Fri, 15 Sep 2000, Bogdan Costescu wrote:

 On Fri, 15 Sep 2000, jamal wrote:
 
  Only the timer runs at HZ granularity ;-
 
 Some cards provide their own high resolution timers; latest 3Com cards
 provide several with different purposes (none currently used). The
 question is how many of these also provide the Rx early interrupts.
 You also mentioned an auto-tunable Rx mitigation scheme. How do you
 implement it without using hardware timers ?
 

Oh, the tulip 21143 explicitly has an interupt mitigation timer; this is
for both the tx and rx. But i see you can also use a general purpose timer
on the NIC to simulate mitigation.
disable rx interupts and other sources of noise (eg rx no buf) and set the
timer to wait a certain number of packet times.
Donald's drivers generally have this scheme built in; however, it is a
one-shot mode on rx work overload (mostly there for interupt sharing
according to one of Donald's old posts).
So what you do instead is have a table of these 'mitigation' values[1]
and select the appropriate mitigation value; i.e you have a pointer that
moves up and down the table and based on the load picks the correct
mitigation value.
When Robert Oks the current tulip, you should be able to see how it is
done there.

  20Msec is probably too much time. If my math is not wrong, 1 bit time in
  a 100Mps is 1 ns; 64 bytes is 512ns.
 
 I think your are wrong by a factor of 10 here, 1 bit time at 100Mbps
 should be 10 ns. Then 64 bytes is 5.12 us (u=micro). Anyway, this is
 comparable with the time needed to reach ISR, so you can have several
 (but small number) of packets already waiting for processing.
 
  You use the period(5-10micros), while waiting
  for full packet arrival, to make the route decision (lookup etc).
  i.e this will allow for a better FF; it will not offload things.
 
 Just that you span several layers by doing this, it's not driver specific
 anymore.

I think we should heed Donald's advice on this early rx. I would take
Donald's word for it; he's been there done that. He knows.
eg the PCI burst issues makes a lot of sense. Unless someone with the
right tools (eg PCI bus monitors) might do some measurements and maybe
challenge Donald ;-

cheers,
jamal

[1] the table would look something like:
table[0] == 1 packet per interupt (default); disable timer
table[1] == 2 packets per interupt
table[3] == 3 packets per interupt 
.
.
etc.
use 64 bytes as the packet size since it is the smallest ethernet size.
as you pointed out that is 51.2 microsecs. so 102.4 microsecs for table[1]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-15 Thread Donald Becker

On Fri, 15 Sep 2000, Bogdan Costescu wrote:

> On Fri, 15 Sep 2000, jamal wrote:
> > You use the period(5-10micros), while waiting
> > for full packet arrival, to make the route decision (lookup etc).
> > i.e this will allow for a better FF; it will not offload things.
> 
> Just that you span several layers by doing this, it's not driver specific
> anymore.

Many chips have some sort of early-Rx feature, but it's still a bad idea for
the many reasons I've pointed out before.

An additional reason not use early-Rx is that chips such as the 3c905C are
most efficient at using the PCI bus when transfering a whole packet in a
single PCI burst (plus two smaller bursts initially reading and later
writing the descriptor).  Using an early-Rx interrupt scheme means using
multiple smaller bursts.

The early-Rx scheme worked well on the ISA bus, where transfers were slow
and not bursting.

Also note: it is possible to drop an Rx packet after the early Rx
interrupt.

Donald Becker   [EMAIL PROTECTED]
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210   Beowulf-II Cluster Distribution
Annapolis MD 21403

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-15 Thread Bogdan Costescu

On Fri, 15 Sep 2000, jamal wrote:

> Only the timer runs at HZ granularity ;-<

Some cards provide their own high resolution timers; latest 3Com cards
provide several with different purposes (none currently used). The
question is how many of these also provide the Rx early interrupts.
You also mentioned an auto-tunable Rx mitigation scheme. How do you
implement it without using hardware timers ?

> 20Msec is probably too much time. If my math is not wrong, 1 bit time in
> a 100Mps is 1 ns; 64 bytes is 512ns.

I think your are wrong by a factor of 10 here, 1 bit time at 100Mbps
should be 10 ns. Then 64 bytes is 5.12 us (u=micro). Anyway, this is
comparable with the time needed to reach ISR, so you can have several
(but small number) of packets already waiting for processing.

> You use the period(5-10micros), while waiting
> for full packet arrival, to make the route decision (lookup etc).
> i.e this will allow for a better FF; it will not offload things.

Just that you span several layers by doing this, it's not driver specific
anymore.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-15 Thread jamal



On Fri, 15 Sep 2000, Bogdan Costescu wrote:

> On Thu, 14 Sep 2000, jamal wrote:
> 
> The 3Com cards can generate this interrupt, however this is not used in
> current 3c59x.c. I suggested this to Andrew, but he is already worried
> about the current interrupt rate and unhappy that 3Com cards do not
> provide hardware support for Rx mitigation.
> 
> An ideea might be to combine Rx early interrupts with some kind of
> software timer-based mitigation. 

Only the timer runs at HZ granularity ;-<

>  IMHO this has 2 advantages:
> - because of the overhead that Andrew pointed out, by the time the CPU
> reaches the ISR code and the skbuff allocation is done, the entire packet
> might already be transferred; 

20Msec is probably too much time. If my math is not wrong, 1 bit time in
a 100Mps is 1 ns; 64 bytes is 512ns. If you are waiting for 640 bytes
more that is ~5 microsecs or say 10microsecs for average 1000 bytes
in a LAN. For a 10mbps connection ~100microsecs. But we dont have problems
with 10Mbps.
I think this is an interesting heuristic to use. the 20Msec given by
Andrew appears to me to be x86 specific and processor dependent though.
Can you guarantee it on a 600Mhz Alpha processor?
This is just one of those schemes which are useful in my opinion for
quick header inspection: While the packet is still coming in, you have
enough data to make a call. You use the period(5-10micros), while waiting
for full packet arrival, to make the route decision (lookup etc).
i.e this will allow for a better FF; it will not offload things.
Instead of using full-rx interupts as is done today, it will make sense to
receive mid-interupts so that you are ready for the above scheme. I know,
i know Linux is a general purpose OS.

> however, a check has to be done to assure
> that the packet was not dropped by the hardware and you try to fit a
> packet in a skbuff sized for the previous packet (in case several packets
> can be transferred during the "overhead" time)

Most of the schemes like that dont drop the packet once you receive
partial pieces. Other incoming packets will be dropped though if no space.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-15 Thread Bogdan Costescu

On Thu, 14 Sep 2000, jamal wrote:

> If i remember correctly some of the 3coms still give this 'mid-interupt',
> no? It could useful to just say quickly read the header and make routing
> decisions as in fast routing but not under heavy load.

The 3Com cards can generate this interrupt, however this is not used in
current 3c59x.c. I suggested this to Andrew, but he is already worried
about the current interrupt rate and unhappy that 3Com cards do not
provide hardware support for Rx mitigation.

An ideea might be to combine Rx early interrupts with some kind of
software timer-based mitigation. IMHO this has 2 advantages:
- because of the overhead that Andrew pointed out, by the time the CPU
reaches the ISR code and the skbuff allocation is done, the entire packet
might already be transferred; however, a check has to be done to assure
that the packet was not dropped by the hardware and you try to fit a
packet in a skbuff sized for the previous packet (in case several packets
can be transferred during the "overhead" time)
- under load, because interrupts occur anyway (the Rx early ones), you
don't loose anything in terms of latency.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-15 Thread Bogdan Costescu

On Thu, 14 Sep 2000, jamal wrote:

 If i remember correctly some of the 3coms still give this 'mid-interupt',
 no? It could useful to just say quickly read the header and make routing
 decisions as in fast routing but not under heavy load.

The 3Com cards can generate this interrupt, however this is not used in
current 3c59x.c. I suggested this to Andrew, but he is already worried
about the current interrupt rate and unhappy that 3Com cards do not
provide hardware support for Rx mitigation.

An ideea might be to combine Rx early interrupts with some kind of
software timer-based mitigation. IMHO this has 2 advantages:
- because of the overhead that Andrew pointed out, by the time the CPU
reaches the ISR code and the skbuff allocation is done, the entire packet
might already be transferred; however, a check has to be done to assure
that the packet was not dropped by the hardware and you try to fit a
packet in a skbuff sized for the previous packet (in case several packets
can be transferred during the "overhead" time)
- under load, because interrupts occur anyway (the Rx early ones), you
don't loose anything in terms of latency.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-15 Thread jamal



On Fri, 15 Sep 2000, Bogdan Costescu wrote:

 On Thu, 14 Sep 2000, jamal wrote:
 
 The 3Com cards can generate this interrupt, however this is not used in
 current 3c59x.c. I suggested this to Andrew, but he is already worried
 about the current interrupt rate and unhappy that 3Com cards do not
 provide hardware support for Rx mitigation.
 
 An ideea might be to combine Rx early interrupts with some kind of
 software timer-based mitigation. 

Only the timer runs at HZ granularity ;-

  IMHO this has 2 advantages:
 - because of the overhead that Andrew pointed out, by the time the CPU
 reaches the ISR code and the skbuff allocation is done, the entire packet
 might already be transferred; 

20Msec is probably too much time. If my math is not wrong, 1 bit time in
a 100Mps is 1 ns; 64 bytes is 512ns. If you are waiting for 640 bytes
more that is ~5 microsecs or say 10microsecs for average 1000 bytes
in a LAN. For a 10mbps connection ~100microsecs. But we dont have problems
with 10Mbps.
I think this is an interesting heuristic to use. the 20Msec given by
Andrew appears to me to be x86 specific and processor dependent though.
Can you guarantee it on a 600Mhz Alpha processor?
This is just one of those schemes which are useful in my opinion for
quick header inspection: While the packet is still coming in, you have
enough data to make a call. You use the period(5-10micros), while waiting
for full packet arrival, to make the route decision (lookup etc).
i.e this will allow for a better FF; it will not offload things.
Instead of using full-rx interupts as is done today, it will make sense to
receive mid-interupts so that you are ready for the above scheme. I know,
i know Linux is a general purpose OS.

 however, a check has to be done to assure
 that the packet was not dropped by the hardware and you try to fit a
 packet in a skbuff sized for the previous packet (in case several packets
 can be transferred during the "overhead" time)

Most of the schemes like that dont drop the packet once you receive
partial pieces. Other incoming packets will be dropped though if no space.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-15 Thread Bogdan Costescu

On Fri, 15 Sep 2000, jamal wrote:

 Only the timer runs at HZ granularity ;-

Some cards provide their own high resolution timers; latest 3Com cards
provide several with different purposes (none currently used). The
question is how many of these also provide the Rx early interrupts.
You also mentioned an auto-tunable Rx mitigation scheme. How do you
implement it without using hardware timers ?

 20Msec is probably too much time. If my math is not wrong, 1 bit time in
 a 100Mps is 1 ns; 64 bytes is 512ns.

I think your are wrong by a factor of 10 here, 1 bit time at 100Mbps
should be 10 ns. Then 64 bytes is 5.12 us (u=micro). Anyway, this is
comparable with the time needed to reach ISR, so you can have several
(but small number) of packets already waiting for processing.

 You use the period(5-10micros), while waiting
 for full packet arrival, to make the route decision (lookup etc).
 i.e this will allow for a better FF; it will not offload things.

Just that you span several layers by doing this, it's not driver specific
anymore.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread yodaiken

On Thu, Sep 14, 2000 at 10:26:08PM -0400, jamal wrote:
> 
> 
> One of the things we need to measure still is the latency. The scheme
> currently used with dynamically adjusting the mitigation parameters might
> not affect latency much -- simply because the adjustement is based on the
> load. We still have to prove this. The theory is:
> Under a lot of congestion, you delay longer because the layers above
> you are congested as gauged from a feedback; and under low congestion, you
> should theoretically adjust all the way down to 1 interupt/packet. Under
> heavy load, your latency is already screwed anyways because of large
> backlog queue; this is regardless of mitigation.

Or maybe the extra delay in congested circumstances will cause more 
timeouts and that's precisely when you need to improve latency?


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread jamal



On Thu, 14 Sep 2000, Donald Becker wrote:

> No, because I know I sound like a broken record.   

;->

> What we measured is that the cache impact of allocating and initializing our
> (ever-larger) skbuffs is huge.  So we pay some CPU time getting a new
> skbuff, and some more CPU time later reloading the cache with useful data.
> 
> The skbuff is added to the end of the driver Rx buffer list, so the memory
> lines are out of the cache by the time we need them.

So is there a workable solution to this?

> 
> The Rx ring should be able to hold at least
>(interrupt-latency * 100/1000Mbps) bits
> and 
>(interrupt-latency * 100/1000Mbps)/(64 bytes/packet * 8 bits/byte) packets
> 

cool. Assuming 64 bytes because it is minimal sized ethernet packet?

> The PCI drivers make some effort to always allocate the same size skbuff, so
> recycling skbuffs, or otherwise optimizing their allocation, is useful.
> 

Good to hear this from you.

> The only significant advantage of interrupt mitigation is cache locality
> when allocating new skbuffs, and having an additional mechanism to drop
> packets under overwhelming load.
> 
> The disadvantage of Rx interrupt mitigation is adding latency just where it
> might matter the most.  

One of the things we need to measure still is the latency. The scheme
currently used with dynamically adjusting the mitigation parameters might
not affect latency much -- simply because the adjustement is based on the
load. We still have to prove this. The theory is:
Under a lot of congestion, you delay longer because the layers above
you are congested as gauged from a feedback; and under low congestion, you
should theoretically adjust all the way down to 1 interupt/packet. Under
heavy load, your latency is already screwed anyways because of large
backlog queue; this is regardless of mitigation.

> Remember that the hot ticket for old-IPX performance
> was taking an *extra* early interrupt for each Rx packet.

If i remember correctly some of the 3coms still give this 'mid-interupt',
no? It could useful to just say quickly read the header and make routing
decisions as in fast routing but not under heavy load.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Donald Becker

On Thu, 14 Sep 2000, jamal wrote:
> On Thu, 14 Sep 2000, Andrew Morton wrote:
> > But for 3c59x (which is not a very efficient driver (yet)), it takes 6
> > usecs to even get into the ISR, and around 4 uSecs to traverse it. 
> > Guess another 4 to leave the ISR, guess half as much again for whoever
> > got interrupted to undo the resulting cache pollution.
> > 
> > That's 20 usec per interrupt, of which 1 usec could be saved by skb
> > pooling.
> 
> With these numbers + how long it takes to queue the packets in
> netif_rx(); i would say you roughly should be able to tune your DMA
> ring appropriately. 
> 
> Roughly your DMA ring should be able to hold:
> 
> (PCI_Burst_bandwidth*((20*10-6)+pci_bus_latency))) bits.
> 
> Did i hear Donald say something? ;->

No, because I know I sound like a broken record.  

What we measured is that the cache impact of allocating and initializing our
(ever-larger) skbuffs is huge.  So we pay some CPU time getting a new
skbuff, and some more CPU time later reloading the cache with useful data.

The skbuff is added to the end of the driver Rx buffer list, so the memory
lines are out of the cache by the time we need them.

The Rx ring should be able to hold at least
   (interrupt-latency * 100/1000Mbps) bits
and 
   (interrupt-latency * 100/1000Mbps)/(64 bytes/packet * 8 bits/byte) packets


> > If you don't do Rx interrupt mitigation there's no point in event
> > thinking about skb pooling.
> 
> FF does not use mitigation and as Robert was pointing out this was adding
> a lot of value.

The PCI drivers make some effort to always allocate the same size skbuff, so
recycling skbuffs, or otherwise optimizing their allocation, is useful.

The only significant advantage of interrupt mitigation is cache locality
when allocating new skbuffs, and having an additional mechanism to drop
packets under overwhelming load.

The disadvantage of Rx interrupt mitigation is adding latency just where it
might matter the most.  Remember that the hot ticket for old-IPX performance
was taking an *extra* early interrupt for each Rx packet.

Donald Becker   [EMAIL PROTECTED]
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210   Beowulf-II Cluster Distribution
Annapolis MD 21403

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread jamal



On Thu, 14 Sep 2000, Andrew Morton wrote:

> But for 3c59x (which is not a very efficient driver (yet)), it takes 6
> usecs to even get into the ISR, and around 4 uSecs to traverse it. 
> Guess another 4 to leave the ISR, guess half as much again for whoever
> got interrupted to undo the resulting cache pollution.
> 
> That's 20 usec per interrupt, of which 1 usec could be saved by skb
> pooling.
> 

With these numbers + how long it takes to queue the packets in
netif_rx(); i would say you roughly should be able to tune your DMA
ring appropriately. 

Roughly your DMA ring should be able to hold:

(PCI_Burst_bandwidth*((20*10-6)+pci_bus_latency))) bits.

Did i hear Donald say something? ;->

> 
> If you don't do Rx interrupt mitigation there's no point in event
> thinking about skb pooling.
> 

FF does not use mitigation and as Robert was pointing out this was adding
a lot of value.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread jamal



What Alexey's code does is _not_ preallocation -- it does re-cycling.
On tx_completion, the skb is recycled onto a recycle queue unless the
queue is full (which is a tunable parameter) in which case it is freed.
This is more sensible than doing pre-allocation during idle times
or other smart schemes. On a busy system this queue will always 
have something.

What i meant by aging is to have a separate thread that prunes the queue
based on age i.e how long the skb has been sitting there etc. I think Jes
had a bottom-half running there; a simple per-cpu timer might suffice.
The heuristic (such as the timer decay etc) for this part needs a study
and thats what Robert and i are planing to do.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Robert Olsson



Yes !

The FF experiments with 2.1.X indicated improvement factor about 2-3 times
with skb recycling. With combination of FF and skb recycling we could reach 
fast Ethernet wire speed forwarding on 400 Mhz CPU. About ~147 KPPS. 
As jamal reported the improvement is much less today but the forwarding 
performance is impressive even without FF and skb recycling. Slab seems
to do a good job and especially when the debug is disabled. :-)


--ro

Andi Kleen writes:
 > On Thu, Sep 14, 2000 at 11:59:32PM +1100, Andrew Morton wrote:
 > > That's 20 usec per interrupt, of which 1 usec could be saved by skb
 > > pooling.
 > 
 > FF usually runs with interrupt mitigation at higher rates (8-16 or even
 > more packets / interrupt). I agree though that it probably does not 
 > make too much difference.  alloc_skb could probably be made cheaper 
 > for the FF case by being more clever in the slab constructor (I think
 > there was some bitrot during 2.3 on the cache line usage -- 2.2 pretty
 > much only needed 2 cache lines in the header for a FF packet) 
 > 
 > 
 > -Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Andi Kleen

On Thu, Sep 14, 2000 at 11:59:32PM +1100, Andrew Morton wrote:
> That's 20 usec per interrupt, of which 1 usec could be saved by skb
> pooling.

FF usually runs with interrupt mitigation at higher rates (8-16 or even
more packets / interrupt). I agree though that it probably does not 
make too much difference.  alloc_skb could probably be made cheaper 
for the FF case by being more clever in the slab constructor (I think
there was some bitrot during 2.3 on the cache line usage -- 2.2 pretty
much only needed 2 cache lines in the header for a FF packet) 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Andrew Morton

jamal wrote:
> 
> The FF code of the tulip does have skb recycling code.
> And i belive Jes' acenic code does or did at some point.

But this isn't preallocation.  Unless you got cute, this scheme would
limit the "preallocation" to the DMA ring size.

For network-intensive applications, a larger pool of preallocated
buffers would allow a driver to handle Rx packet bursts more
gracefully.   Make the pool size tunable to match the desired max burst
size, as well as to avoid davem's Oracle problem.

At present we perform the allocation at interrupt time which is
precisely when we shouldn't.  A background kernel thread which keeps the
pool topped up would even things out and would give a significant
increase in our peak Rx rates, up to a particular burst size.

This is considerable smarter than simply tweaking the driver for huge
DMA ring sizes.  It is pretty specialised though.

One big bonus: the packets can then be allocated GFP_KERNEL rather than
GFP_ATOMIC.

Oh.

A quick measurement says dev_alloc_skb(1000) takes about 450 cycles
on x86.  At 150 kpps that's a potential 10% loss from your peak
burst rate.

But for 3c59x (which is not a very efficient driver (yet)), it takes 6
usecs to even get into the ISR, and around 4 uSecs to traverse it. 
Guess another 4 to leave the ISR, guess half as much again for whoever
got interrupted to undo the resulting cache pollution.

That's 20 usec per interrupt, of which 1 usec could be saved by skb
pooling.


If you don't do Rx interrupt mitigation there's no point in event
thinking about skb pooling.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Jes Sorensen

> "jamal" == jamal  <[EMAIL PROTECTED]> writes:

jamal> The FF code of the tulip does have skb recycling code.  And i
jamal> belive Jes' acenic code does or did at some point.  Robert
jamal> Olson and I were thinking of taking out that code out of the
jamal> tulip for reasons such as you talk about (and the thought maybe
jamal> that the per-CPU slab might have obsoleted that
jamal> requirement). We did some tests with 2.4.0-test7 and were
jamal> suprised to observe that at high rate of input packets, it
jamal> still made as a big a difference as 7000 packets per second ;->
jamal> i.e we got 7Kpps more by using skb recycling.

I tried recycling in the acenic driver, but after adding Ingo's early
per CPU slab caches I couldn't see any measurable performance gain
from using recycling.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Andi Kleen

On Thu, Sep 14, 2000 at 04:55:16AM -0700, David S. Miller wrote:
>Date: Thu, 14 Sep 2000 06:53:37 -0400 (EDT)
>From: jamal <[EMAIL PROTECTED]>
> 
>Dave, would a scheme with an aging of the skbs in the recycle queue
>and an upper bound of the number of packets sitting on the queue be
>acceptable?
> 
> This sounds more reasonable, certainly.  Perhaps you and Jeff should
> collaborate :-)

Instead of explicit aging you could just use the skb_head slab cache for that:
just kmalloc/kfree the data area in the slab constructor/destructor. This
could probably give you most of the advantages of recycled skbs (less
header setups) etc. in the slab environment.  You may need to tune the
slab cache prunning algorithms for that a bit though because the weight
of a skbhead would be much more heavy (e.g. by giving the skbhead slab cache a
bigger priority for prunning) 


-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread David S. Miller

   Date: Thu, 14 Sep 2000 06:53:37 -0400 (EDT)
   From: jamal <[EMAIL PROTECTED]>

   Dave, would a scheme with an aging of the skbs in the recycle queue
   and an upper bound of the number of packets sitting on the queue be
   acceptable?

This sounds more reasonable, certainly.  Perhaps you and Jeff should
collaborate :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Matthew Kirkwood

On Thu, 14 Sep 2000, David S. Miller wrote:

>Does anyone think that allocating skbs during system idle time
>would be useful?
> 
> I really don't like these sorts of things, because it makes an
> assumption as to what memory is about to be used for.

I agree.  Surely The Linux Way (tm) would be to make the
allocations so cheap that prealocation would gain you
nothing.

Matthew.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread jamal



On Thu, 14 Sep 2000, David S. Miller wrote:

>Date:  Thu, 14 Sep 2000 04:44:53 -0400
>From: Jeff Garzik <[EMAIL PROTECTED]>
> 
>Does anyone think that allocating skbs during system idle time
>would be useful?
> 
> I really don't like these sorts of things, because it makes an
> assumption as to what memory is about to be used for.
> 
> What if you were to preallocate skbs while idle, then the next thing
> which happens is some userland program walks over a 2gb dataset and
> no network activity happens at all.
> 

The FF code of the tulip does have skb recycling code.
And i belive Jes' acenic code does or did at some point. 
Robert Olson and I were thinking of taking out that code out of the
tulip for reasons such as you talk about (and the thought maybe that
the per-CPU slab might have obsoleted that requirement). We did some tests
with 2.4.0-test7 and were suprised to observe that at high rate of input
packets, it still made as a big a difference as 7000 packets per second
;-> i.e we got 7Kpps more by using skb recycling.

Dave, would a scheme with an aging of the skbs in the recycle queue
and an upper bound of the number of packets sitting on the queue be
acceptable?
Maybe ANK can make a comment as well.
Robert and I plan to play with such a scheme for a long time under many 
different scenarios and come with numbers (throughput etc) instead of
"here's a patch and intuitively it makes sense". 
This is really a 2.5 thing if acceptable.

cheers,
jamal

PS:- OLS patch coming soon; a few more tests (as time permits);-> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Preallocated skb's?

2000-09-14 Thread Jeff Garzik

Does anyone think that allocating skbs during system idle time would be
useful?

Net drivers (well, ethernet at least) often wind up allocating
maximum-sized skb's for use in Rx descriptors.  It seems to me that it
would be useful at interrupt time to have an skb already allocated,
falling back on current dev_alloc_skb behavior during times of system
load.

Jeff



-- 
Jeff Garzik  | Windows NT Performance,
Building 1024| on the next "In Search Of"
MandrakeSoft, Inc.   |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread David S. Miller

   Date:Thu, 14 Sep 2000 04:44:53 -0400
   From: Jeff Garzik <[EMAIL PROTECTED]>

   Does anyone think that allocating skbs during system idle time
   would be useful?

I really don't like these sorts of things, because it makes an
assumption as to what memory is about to be used for.

What if you were to preallocate skbs while idle, then the next thing
which happens is some userland program walks over a 2gb dataset and
no network activity happens at all.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Preallocated skb's?

2000-09-14 Thread Jeff Garzik

Does anyone think that allocating skbs during system idle time would be
useful?

Net drivers (well, ethernet at least) often wind up allocating
maximum-sized skb's for use in Rx descriptors.  It seems to me that it
would be useful at interrupt time to have an skb already allocated,
falling back on current dev_alloc_skb behavior during times of system
load.

Jeff



-- 
Jeff Garzik  | Windows NT Performance,
Building 1024| on the next "In Search Of"
MandrakeSoft, Inc.   |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread jamal



On Thu, 14 Sep 2000, Donald Becker wrote:

 No, because I know I sound like a broken record.  skipskip 

;-

 What we measured is that the cache impact of allocating and initializing our
 (ever-larger) skbuffs is huge.  So we pay some CPU time getting a new
 skbuff, and some more CPU time later reloading the cache with useful data.
 
 The skbuff is added to the end of the driver Rx buffer list, so the memory
 lines are out of the cache by the time we need them.

So is there a workable solution to this?

 
 The Rx ring should be able to hold at least
(interrupt-latency * 100/1000Mbps) bits
 and 
(interrupt-latency * 100/1000Mbps)/(64 bytes/packet * 8 bits/byte) packets
 

cool. Assuming 64 bytes because it is minimal sized ethernet packet?

 The PCI drivers make some effort to always allocate the same size skbuff, so
 recycling skbuffs, or otherwise optimizing their allocation, is useful.
 

Good to hear this from you.

 The only significant advantage of interrupt mitigation is cache locality
 when allocating new skbuffs, and having an additional mechanism to drop
 packets under overwhelming load.
 
 The disadvantage of Rx interrupt mitigation is adding latency just where it
 might matter the most.  

One of the things we need to measure still is the latency. The scheme
currently used with dynamically adjusting the mitigation parameters might
not affect latency much -- simply because the adjustement is based on the
load. We still have to prove this. The theory is:
Under a lot of congestion, you delay longer because the layers above
you are congested as gauged from a feedback; and under low congestion, you
should theoretically adjust all the way down to 1 interupt/packet. Under
heavy load, your latency is already screwed anyways because of large
backlog queue; this is regardless of mitigation.

 Remember that the hot ticket for old-IPX performance
 was taking an *extra* early interrupt for each Rx packet.

If i remember correctly some of the 3coms still give this 'mid-interupt',
no? It could useful to just say quickly read the header and make routing
decisions as in fast routing but not under heavy load.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread David S. Miller

   Date:Thu, 14 Sep 2000 04:44:53 -0400
   From: Jeff Garzik [EMAIL PROTECTED]

   Does anyone think that allocating skbs during system idle time
   would be useful?

I really don't like these sorts of things, because it makes an
assumption as to what memory is about to be used for.

What if you were to preallocate skbs while idle, then the next thing
which happens is some userland program walks over a 2gb dataset and
no network activity happens at all.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Andrew Morton

jamal wrote:
 
 The FF code of the tulip does have skb recycling code.
 And i belive Jes' acenic code does or did at some point.

But this isn't preallocation.  Unless you got cute, this scheme would
limit the "preallocation" to the DMA ring size.

For network-intensive applications, a larger pool of preallocated
buffers would allow a driver to handle Rx packet bursts more
gracefully.   Make the pool size tunable to match the desired max burst
size, as well as to avoid davem's Oracle problem.

At present we perform the allocation at interrupt time which is
precisely when we shouldn't.  A background kernel thread which keeps the
pool topped up would even things out and would give a significant
increase in our peak Rx rates, up to a particular burst size.

This is considerable smarter than simply tweaking the driver for huge
DMA ring sizes.  It is pretty specialised though.

One big bonus: the packets can then be allocated GFP_KERNEL rather than
GFP_ATOMIC.

Oh.

A quick measurement says dev_alloc_skb(1000) takes about 450 cycles
on x86.  At 150 kpps that's a potential 10% loss from your peak
burst rate.

But for 3c59x (which is not a very efficient driver (yet)), it takes 6
usecs to even get into the ISR, and around 4 uSecs to traverse it. 
Guess another 4 to leave the ISR, guess half as much again for whoever
got interrupted to undo the resulting cache pollution.

That's 20 usec per interrupt, of which 1 usec could be saved by skb
pooling.


If you don't do Rx interrupt mitigation there's no point in event
thinking about skb pooling.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread jamal



On Thu, 14 Sep 2000, David S. Miller wrote:

Date:  Thu, 14 Sep 2000 04:44:53 -0400
From: Jeff Garzik [EMAIL PROTECTED]
 
Does anyone think that allocating skbs during system idle time
would be useful?
 
 I really don't like these sorts of things, because it makes an
 assumption as to what memory is about to be used for.
 
 What if you were to preallocate skbs while idle, then the next thing
 which happens is some userland program walks over a 2gb dataset and
 no network activity happens at all.
 

The FF code of the tulip does have skb recycling code.
And i belive Jes' acenic code does or did at some point. 
Robert Olson and I were thinking of taking out that code out of the
tulip for reasons such as you talk about (and the thought maybe that
the per-CPU slab might have obsoleted that requirement). We did some tests
with 2.4.0-test7 and were suprised to observe that at high rate of input
packets, it still made as a big a difference as 7000 packets per second
;- i.e we got 7Kpps more by using skb recycling.

Dave, would a scheme with an aging of the skbs in the recycle queue
and an upper bound of the number of packets sitting on the queue be
acceptable?
Maybe ANK can make a comment as well.
Robert and I plan to play with such a scheme for a long time under many 
different scenarios and come with numbers (throughput etc) instead of
"here's a patch and intuitively it makes sense". 
This is really a 2.5 thing if acceptable.

cheers,
jamal

PS:- OLS patch coming soon; a few more tests (as time permits);- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Matthew Kirkwood

On Thu, 14 Sep 2000, David S. Miller wrote:

Does anyone think that allocating skbs during system idle time
would be useful?
 
 I really don't like these sorts of things, because it makes an
 assumption as to what memory is about to be used for.

I agree.  Surely The Linux Way (tm) would be to make the
allocations so cheap that prealocation would gain you
nothing.

Matthew.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread David S. Miller

   Date: Thu, 14 Sep 2000 06:53:37 -0400 (EDT)
   From: jamal [EMAIL PROTECTED]

   Dave, would a scheme with an aging of the skbs in the recycle queue
   and an upper bound of the number of packets sitting on the queue be
   acceptable?

This sounds more reasonable, certainly.  Perhaps you and Jeff should
collaborate :-)

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Jes Sorensen

 "jamal" == jamal  [EMAIL PROTECTED] writes:

jamal The FF code of the tulip does have skb recycling code.  And i
jamal belive Jes' acenic code does or did at some point.  Robert
jamal Olson and I were thinking of taking out that code out of the
jamal tulip for reasons such as you talk about (and the thought maybe
jamal that the per-CPU slab might have obsoleted that
jamal requirement). We did some tests with 2.4.0-test7 and were
jamal suprised to observe that at high rate of input packets, it
jamal still made as a big a difference as 7000 packets per second ;-
jamal i.e we got 7Kpps more by using skb recycling.

I tried recycling in the acenic driver, but after adding Ingo's early
per CPU slab caches I couldn't see any measurable performance gain
from using recycling.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread Robert Olsson



Yes !

The FF experiments with 2.1.X indicated improvement factor about 2-3 times
with skb recycling. With combination of FF and skb recycling we could reach 
fast Ethernet wire speed forwarding on 400 Mhz CPU. About ~147 KPPS. 
As jamal reported the improvement is much less today but the forwarding 
performance is impressive even without FF and skb recycling. Slab seems
to do a good job and especially when the debug is disabled. :-)


--ro

Andi Kleen writes:
  On Thu, Sep 14, 2000 at 11:59:32PM +1100, Andrew Morton wrote:
   That's 20 usec per interrupt, of which 1 usec could be saved by skb
   pooling.
  
  FF usually runs with interrupt mitigation at higher rates (8-16 or even
  more packets / interrupt). I agree though that it probably does not 
  make too much difference.  alloc_skb could probably be made cheaper 
  for the FF case by being more clever in the slab constructor (I think
  there was some bitrot during 2.3 on the cache line usage -- 2.2 pretty
  much only needed 2 cache lines in the header for a FF packet) 
  
  
  -Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread jamal



On Thu, 14 Sep 2000, Andrew Morton wrote:

 But for 3c59x (which is not a very efficient driver (yet)), it takes 6
 usecs to even get into the ISR, and around 4 uSecs to traverse it. 
 Guess another 4 to leave the ISR, guess half as much again for whoever
 got interrupted to undo the resulting cache pollution.
 
 That's 20 usec per interrupt, of which 1 usec could be saved by skb
 pooling.
 

With these numbers + how long it takes to queue the packets in
netif_rx(); i would say you roughly should be able to tune your DMA
ring appropriately. 

Roughly your DMA ring should be able to hold:

(PCI_Burst_bandwidth*((20*10-6)+pci_bus_latency))) bits.

Did i hear Donald say something? ;-

 
 If you don't do Rx interrupt mitigation there's no point in event
 thinking about skb pooling.
 

FF does not use mitigation and as Robert was pointing out this was adding
a lot of value.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread jamal



What Alexey's code does is _not_ preallocation -- it does re-cycling.
On tx_completion, the skb is recycled onto a recycle queue unless the
queue is full (which is a tunable parameter) in which case it is freed.
This is more sensible than doing pre-allocation during idle times
or other smart schemes. On a busy system this queue will always 
have something.

What i meant by aging is to have a separate thread that prunes the queue
based on age i.e how long the skb has been sitting there etc. I think Jes
had a bottom-half running there; a simple per-cpu timer might suffice.
The heuristic (such as the timer decay etc) for this part needs a study
and thats what Robert and i are planing to do.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Preallocated skb's?

2000-09-14 Thread yodaiken

On Thu, Sep 14, 2000 at 10:26:08PM -0400, jamal wrote:
 
 
 One of the things we need to measure still is the latency. The scheme
 currently used with dynamically adjusting the mitigation parameters might
 not affect latency much -- simply because the adjustement is based on the
 load. We still have to prove this. The theory is:
 Under a lot of congestion, you delay longer because the layers above
 you are congested as gauged from a feedback; and under low congestion, you
 should theoretically adjust all the way down to 1 interupt/packet. Under
 heavy load, your latency is already screwed anyways because of large
 backlog queue; this is regardless of mitigation.

Or maybe the extra delay in congested circumstances will cause more 
timeouts and that's precisely when you need to improve latency?


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/