Re: Preallocated skb's?
On Fri, 15 Sep 2000, Bogdan Costescu wrote: > On Fri, 15 Sep 2000, jamal wrote: > > > Only the timer runs at HZ granularity ;-< > > Some cards provide their own high resolution timers; latest 3Com cards > provide several with different purposes (none currently used). The > question is how many of these also provide the Rx early interrupts. > You also mentioned an auto-tunable Rx mitigation scheme. How do you > implement it without using hardware timers ? > Oh, the tulip 21143 explicitly has an interupt mitigation timer; this is for both the tx and rx. But i see you can also use a general purpose timer on the NIC to simulate mitigation. disable rx interupts and other sources of noise (eg rx no buf) and set the timer to wait a certain number of packet times. Donald's drivers generally have this scheme built in; however, it is a one-shot mode on rx work overload (mostly there for interupt sharing according to one of Donald's old posts). So what you do instead is have a table of these 'mitigation' values[1] and select the appropriate mitigation value; i.e you have a pointer that moves up and down the table and based on the load picks the correct mitigation value. When Robert Oks the current tulip, you should be able to see how it is done there. > > 20Msec is probably too much time. If my math is not wrong, 1 bit time in > > a 100Mps is 1 ns; 64 bytes is 512ns. > > I think your are wrong by a factor of 10 here, 1 bit time at 100Mbps > should be 10 ns. Then 64 bytes is 5.12 us (u=micro). Anyway, this is > comparable with the time needed to reach ISR, so you can have several > (but small number) of packets already waiting for processing. > > > You use the period(5-10micros), while waiting > > for full packet arrival, to make the route decision (lookup etc). > > i.e this will allow for a better FF; it will not offload things. > > Just that you span several layers by doing this, it's not driver specific > anymore. I think we should heed Donald's advice on this early rx. I would take Donald's word for it; he's been there done that. He knows. eg the PCI burst issues makes a lot of sense. Unless someone with the right tools (eg PCI bus monitors) might do some measurements and maybe challenge Donald ;-> cheers, jamal [1] the table would look something like: table[0] == 1 packet per interupt (default); disable timer table[1] == 2 packets per interupt table[3] == 3 packets per interupt . . etc. use 64 bytes as the packet size since it is the smallest ethernet size. as you pointed out that is 51.2 microsecs. so 102.4 microsecs for table[1] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Fri, 15 Sep 2000, Bogdan Costescu wrote: On Fri, 15 Sep 2000, jamal wrote: Only the timer runs at HZ granularity ;- Some cards provide their own high resolution timers; latest 3Com cards provide several with different purposes (none currently used). The question is how many of these also provide the Rx early interrupts. You also mentioned an auto-tunable Rx mitigation scheme. How do you implement it without using hardware timers ? Oh, the tulip 21143 explicitly has an interupt mitigation timer; this is for both the tx and rx. But i see you can also use a general purpose timer on the NIC to simulate mitigation. disable rx interupts and other sources of noise (eg rx no buf) and set the timer to wait a certain number of packet times. Donald's drivers generally have this scheme built in; however, it is a one-shot mode on rx work overload (mostly there for interupt sharing according to one of Donald's old posts). So what you do instead is have a table of these 'mitigation' values[1] and select the appropriate mitigation value; i.e you have a pointer that moves up and down the table and based on the load picks the correct mitigation value. When Robert Oks the current tulip, you should be able to see how it is done there. 20Msec is probably too much time. If my math is not wrong, 1 bit time in a 100Mps is 1 ns; 64 bytes is 512ns. I think your are wrong by a factor of 10 here, 1 bit time at 100Mbps should be 10 ns. Then 64 bytes is 5.12 us (u=micro). Anyway, this is comparable with the time needed to reach ISR, so you can have several (but small number) of packets already waiting for processing. You use the period(5-10micros), while waiting for full packet arrival, to make the route decision (lookup etc). i.e this will allow for a better FF; it will not offload things. Just that you span several layers by doing this, it's not driver specific anymore. I think we should heed Donald's advice on this early rx. I would take Donald's word for it; he's been there done that. He knows. eg the PCI burst issues makes a lot of sense. Unless someone with the right tools (eg PCI bus monitors) might do some measurements and maybe challenge Donald ;- cheers, jamal [1] the table would look something like: table[0] == 1 packet per interupt (default); disable timer table[1] == 2 packets per interupt table[3] == 3 packets per interupt . . etc. use 64 bytes as the packet size since it is the smallest ethernet size. as you pointed out that is 51.2 microsecs. so 102.4 microsecs for table[1] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Fri, 15 Sep 2000, Bogdan Costescu wrote: > On Fri, 15 Sep 2000, jamal wrote: > > You use the period(5-10micros), while waiting > > for full packet arrival, to make the route decision (lookup etc). > > i.e this will allow for a better FF; it will not offload things. > > Just that you span several layers by doing this, it's not driver specific > anymore. Many chips have some sort of early-Rx feature, but it's still a bad idea for the many reasons I've pointed out before. An additional reason not use early-Rx is that chips such as the 3c905C are most efficient at using the PCI bus when transfering a whole packet in a single PCI burst (plus two smaller bursts initially reading and later writing the descriptor). Using an early-Rx interrupt scheme means using multiple smaller bursts. The early-Rx scheme worked well on the ISA bus, where transfers were slow and not bursting. Also note: it is possible to drop an Rx packet after the early Rx interrupt. Donald Becker [EMAIL PROTECTED] Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf-II Cluster Distribution Annapolis MD 21403 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Fri, 15 Sep 2000, jamal wrote: > Only the timer runs at HZ granularity ;-< Some cards provide their own high resolution timers; latest 3Com cards provide several with different purposes (none currently used). The question is how many of these also provide the Rx early interrupts. You also mentioned an auto-tunable Rx mitigation scheme. How do you implement it without using hardware timers ? > 20Msec is probably too much time. If my math is not wrong, 1 bit time in > a 100Mps is 1 ns; 64 bytes is 512ns. I think your are wrong by a factor of 10 here, 1 bit time at 100Mbps should be 10 ns. Then 64 bytes is 5.12 us (u=micro). Anyway, this is comparable with the time needed to reach ISR, so you can have several (but small number) of packets already waiting for processing. > You use the period(5-10micros), while waiting > for full packet arrival, to make the route decision (lookup etc). > i.e this will allow for a better FF; it will not offload things. Just that you span several layers by doing this, it's not driver specific anymore. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Fri, 15 Sep 2000, Bogdan Costescu wrote: > On Thu, 14 Sep 2000, jamal wrote: > > The 3Com cards can generate this interrupt, however this is not used in > current 3c59x.c. I suggested this to Andrew, but he is already worried > about the current interrupt rate and unhappy that 3Com cards do not > provide hardware support for Rx mitigation. > > An ideea might be to combine Rx early interrupts with some kind of > software timer-based mitigation. Only the timer runs at HZ granularity ;-< > IMHO this has 2 advantages: > - because of the overhead that Andrew pointed out, by the time the CPU > reaches the ISR code and the skbuff allocation is done, the entire packet > might already be transferred; 20Msec is probably too much time. If my math is not wrong, 1 bit time in a 100Mps is 1 ns; 64 bytes is 512ns. If you are waiting for 640 bytes more that is ~5 microsecs or say 10microsecs for average 1000 bytes in a LAN. For a 10mbps connection ~100microsecs. But we dont have problems with 10Mbps. I think this is an interesting heuristic to use. the 20Msec given by Andrew appears to me to be x86 specific and processor dependent though. Can you guarantee it on a 600Mhz Alpha processor? This is just one of those schemes which are useful in my opinion for quick header inspection: While the packet is still coming in, you have enough data to make a call. You use the period(5-10micros), while waiting for full packet arrival, to make the route decision (lookup etc). i.e this will allow for a better FF; it will not offload things. Instead of using full-rx interupts as is done today, it will make sense to receive mid-interupts so that you are ready for the above scheme. I know, i know Linux is a general purpose OS. > however, a check has to be done to assure > that the packet was not dropped by the hardware and you try to fit a > packet in a skbuff sized for the previous packet (in case several packets > can be transferred during the "overhead" time) Most of the schemes like that dont drop the packet once you receive partial pieces. Other incoming packets will be dropped though if no space. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, jamal wrote: > If i remember correctly some of the 3coms still give this 'mid-interupt', > no? It could useful to just say quickly read the header and make routing > decisions as in fast routing but not under heavy load. The 3Com cards can generate this interrupt, however this is not used in current 3c59x.c. I suggested this to Andrew, but he is already worried about the current interrupt rate and unhappy that 3Com cards do not provide hardware support for Rx mitigation. An ideea might be to combine Rx early interrupts with some kind of software timer-based mitigation. IMHO this has 2 advantages: - because of the overhead that Andrew pointed out, by the time the CPU reaches the ISR code and the skbuff allocation is done, the entire packet might already be transferred; however, a check has to be done to assure that the packet was not dropped by the hardware and you try to fit a packet in a skbuff sized for the previous packet (in case several packets can be transferred during the "overhead" time) - under load, because interrupts occur anyway (the Rx early ones), you don't loose anything in terms of latency. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, jamal wrote: If i remember correctly some of the 3coms still give this 'mid-interupt', no? It could useful to just say quickly read the header and make routing decisions as in fast routing but not under heavy load. The 3Com cards can generate this interrupt, however this is not used in current 3c59x.c. I suggested this to Andrew, but he is already worried about the current interrupt rate and unhappy that 3Com cards do not provide hardware support for Rx mitigation. An ideea might be to combine Rx early interrupts with some kind of software timer-based mitigation. IMHO this has 2 advantages: - because of the overhead that Andrew pointed out, by the time the CPU reaches the ISR code and the skbuff allocation is done, the entire packet might already be transferred; however, a check has to be done to assure that the packet was not dropped by the hardware and you try to fit a packet in a skbuff sized for the previous packet (in case several packets can be transferred during the "overhead" time) - under load, because interrupts occur anyway (the Rx early ones), you don't loose anything in terms of latency. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Fri, 15 Sep 2000, Bogdan Costescu wrote: On Thu, 14 Sep 2000, jamal wrote: The 3Com cards can generate this interrupt, however this is not used in current 3c59x.c. I suggested this to Andrew, but he is already worried about the current interrupt rate and unhappy that 3Com cards do not provide hardware support for Rx mitigation. An ideea might be to combine Rx early interrupts with some kind of software timer-based mitigation. Only the timer runs at HZ granularity ;- IMHO this has 2 advantages: - because of the overhead that Andrew pointed out, by the time the CPU reaches the ISR code and the skbuff allocation is done, the entire packet might already be transferred; 20Msec is probably too much time. If my math is not wrong, 1 bit time in a 100Mps is 1 ns; 64 bytes is 512ns. If you are waiting for 640 bytes more that is ~5 microsecs or say 10microsecs for average 1000 bytes in a LAN. For a 10mbps connection ~100microsecs. But we dont have problems with 10Mbps. I think this is an interesting heuristic to use. the 20Msec given by Andrew appears to me to be x86 specific and processor dependent though. Can you guarantee it on a 600Mhz Alpha processor? This is just one of those schemes which are useful in my opinion for quick header inspection: While the packet is still coming in, you have enough data to make a call. You use the period(5-10micros), while waiting for full packet arrival, to make the route decision (lookup etc). i.e this will allow for a better FF; it will not offload things. Instead of using full-rx interupts as is done today, it will make sense to receive mid-interupts so that you are ready for the above scheme. I know, i know Linux is a general purpose OS. however, a check has to be done to assure that the packet was not dropped by the hardware and you try to fit a packet in a skbuff sized for the previous packet (in case several packets can be transferred during the "overhead" time) Most of the schemes like that dont drop the packet once you receive partial pieces. Other incoming packets will be dropped though if no space. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Fri, 15 Sep 2000, jamal wrote: Only the timer runs at HZ granularity ;- Some cards provide their own high resolution timers; latest 3Com cards provide several with different purposes (none currently used). The question is how many of these also provide the Rx early interrupts. You also mentioned an auto-tunable Rx mitigation scheme. How do you implement it without using hardware timers ? 20Msec is probably too much time. If my math is not wrong, 1 bit time in a 100Mps is 1 ns; 64 bytes is 512ns. I think your are wrong by a factor of 10 here, 1 bit time at 100Mbps should be 10 ns. Then 64 bytes is 5.12 us (u=micro). Anyway, this is comparable with the time needed to reach ISR, so you can have several (but small number) of packets already waiting for processing. You use the period(5-10micros), while waiting for full packet arrival, to make the route decision (lookup etc). i.e this will allow for a better FF; it will not offload things. Just that you span several layers by doing this, it's not driver specific anymore. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, Sep 14, 2000 at 10:26:08PM -0400, jamal wrote: > > > One of the things we need to measure still is the latency. The scheme > currently used with dynamically adjusting the mitigation parameters might > not affect latency much -- simply because the adjustement is based on the > load. We still have to prove this. The theory is: > Under a lot of congestion, you delay longer because the layers above > you are congested as gauged from a feedback; and under low congestion, you > should theoretically adjust all the way down to 1 interupt/packet. Under > heavy load, your latency is already screwed anyways because of large > backlog queue; this is regardless of mitigation. Or maybe the extra delay in congested circumstances will cause more timeouts and that's precisely when you need to improve latency? -- - Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, Donald Becker wrote: > No, because I know I sound like a broken record. ;-> > What we measured is that the cache impact of allocating and initializing our > (ever-larger) skbuffs is huge. So we pay some CPU time getting a new > skbuff, and some more CPU time later reloading the cache with useful data. > > The skbuff is added to the end of the driver Rx buffer list, so the memory > lines are out of the cache by the time we need them. So is there a workable solution to this? > > The Rx ring should be able to hold at least >(interrupt-latency * 100/1000Mbps) bits > and >(interrupt-latency * 100/1000Mbps)/(64 bytes/packet * 8 bits/byte) packets > cool. Assuming 64 bytes because it is minimal sized ethernet packet? > The PCI drivers make some effort to always allocate the same size skbuff, so > recycling skbuffs, or otherwise optimizing their allocation, is useful. > Good to hear this from you. > The only significant advantage of interrupt mitigation is cache locality > when allocating new skbuffs, and having an additional mechanism to drop > packets under overwhelming load. > > The disadvantage of Rx interrupt mitigation is adding latency just where it > might matter the most. One of the things we need to measure still is the latency. The scheme currently used with dynamically adjusting the mitigation parameters might not affect latency much -- simply because the adjustement is based on the load. We still have to prove this. The theory is: Under a lot of congestion, you delay longer because the layers above you are congested as gauged from a feedback; and under low congestion, you should theoretically adjust all the way down to 1 interupt/packet. Under heavy load, your latency is already screwed anyways because of large backlog queue; this is regardless of mitigation. > Remember that the hot ticket for old-IPX performance > was taking an *extra* early interrupt for each Rx packet. If i remember correctly some of the 3coms still give this 'mid-interupt', no? It could useful to just say quickly read the header and make routing decisions as in fast routing but not under heavy load. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, jamal wrote: > On Thu, 14 Sep 2000, Andrew Morton wrote: > > But for 3c59x (which is not a very efficient driver (yet)), it takes 6 > > usecs to even get into the ISR, and around 4 uSecs to traverse it. > > Guess another 4 to leave the ISR, guess half as much again for whoever > > got interrupted to undo the resulting cache pollution. > > > > That's 20 usec per interrupt, of which 1 usec could be saved by skb > > pooling. > > With these numbers + how long it takes to queue the packets in > netif_rx(); i would say you roughly should be able to tune your DMA > ring appropriately. > > Roughly your DMA ring should be able to hold: > > (PCI_Burst_bandwidth*((20*10-6)+pci_bus_latency))) bits. > > Did i hear Donald say something? ;-> No, because I know I sound like a broken record. What we measured is that the cache impact of allocating and initializing our (ever-larger) skbuffs is huge. So we pay some CPU time getting a new skbuff, and some more CPU time later reloading the cache with useful data. The skbuff is added to the end of the driver Rx buffer list, so the memory lines are out of the cache by the time we need them. The Rx ring should be able to hold at least (interrupt-latency * 100/1000Mbps) bits and (interrupt-latency * 100/1000Mbps)/(64 bytes/packet * 8 bits/byte) packets > > If you don't do Rx interrupt mitigation there's no point in event > > thinking about skb pooling. > > FF does not use mitigation and as Robert was pointing out this was adding > a lot of value. The PCI drivers make some effort to always allocate the same size skbuff, so recycling skbuffs, or otherwise optimizing their allocation, is useful. The only significant advantage of interrupt mitigation is cache locality when allocating new skbuffs, and having an additional mechanism to drop packets under overwhelming load. The disadvantage of Rx interrupt mitigation is adding latency just where it might matter the most. Remember that the hot ticket for old-IPX performance was taking an *extra* early interrupt for each Rx packet. Donald Becker [EMAIL PROTECTED] Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf-II Cluster Distribution Annapolis MD 21403 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, Andrew Morton wrote: > But for 3c59x (which is not a very efficient driver (yet)), it takes 6 > usecs to even get into the ISR, and around 4 uSecs to traverse it. > Guess another 4 to leave the ISR, guess half as much again for whoever > got interrupted to undo the resulting cache pollution. > > That's 20 usec per interrupt, of which 1 usec could be saved by skb > pooling. > With these numbers + how long it takes to queue the packets in netif_rx(); i would say you roughly should be able to tune your DMA ring appropriately. Roughly your DMA ring should be able to hold: (PCI_Burst_bandwidth*((20*10-6)+pci_bus_latency))) bits. Did i hear Donald say something? ;-> > > If you don't do Rx interrupt mitigation there's no point in event > thinking about skb pooling. > FF does not use mitigation and as Robert was pointing out this was adding a lot of value. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
What Alexey's code does is _not_ preallocation -- it does re-cycling. On tx_completion, the skb is recycled onto a recycle queue unless the queue is full (which is a tunable parameter) in which case it is freed. This is more sensible than doing pre-allocation during idle times or other smart schemes. On a busy system this queue will always have something. What i meant by aging is to have a separate thread that prunes the queue based on age i.e how long the skb has been sitting there etc. I think Jes had a bottom-half running there; a simple per-cpu timer might suffice. The heuristic (such as the timer decay etc) for this part needs a study and thats what Robert and i are planing to do. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
Yes ! The FF experiments with 2.1.X indicated improvement factor about 2-3 times with skb recycling. With combination of FF and skb recycling we could reach fast Ethernet wire speed forwarding on 400 Mhz CPU. About ~147 KPPS. As jamal reported the improvement is much less today but the forwarding performance is impressive even without FF and skb recycling. Slab seems to do a good job and especially when the debug is disabled. :-) --ro Andi Kleen writes: > On Thu, Sep 14, 2000 at 11:59:32PM +1100, Andrew Morton wrote: > > That's 20 usec per interrupt, of which 1 usec could be saved by skb > > pooling. > > FF usually runs with interrupt mitigation at higher rates (8-16 or even > more packets / interrupt). I agree though that it probably does not > make too much difference. alloc_skb could probably be made cheaper > for the FF case by being more clever in the slab constructor (I think > there was some bitrot during 2.3 on the cache line usage -- 2.2 pretty > much only needed 2 cache lines in the header for a FF packet) > > > -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, Sep 14, 2000 at 11:59:32PM +1100, Andrew Morton wrote: > That's 20 usec per interrupt, of which 1 usec could be saved by skb > pooling. FF usually runs with interrupt mitigation at higher rates (8-16 or even more packets / interrupt). I agree though that it probably does not make too much difference. alloc_skb could probably be made cheaper for the FF case by being more clever in the slab constructor (I think there was some bitrot during 2.3 on the cache line usage -- 2.2 pretty much only needed 2 cache lines in the header for a FF packet) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
jamal wrote: > > The FF code of the tulip does have skb recycling code. > And i belive Jes' acenic code does or did at some point. But this isn't preallocation. Unless you got cute, this scheme would limit the "preallocation" to the DMA ring size. For network-intensive applications, a larger pool of preallocated buffers would allow a driver to handle Rx packet bursts more gracefully. Make the pool size tunable to match the desired max burst size, as well as to avoid davem's Oracle problem. At present we perform the allocation at interrupt time which is precisely when we shouldn't. A background kernel thread which keeps the pool topped up would even things out and would give a significant increase in our peak Rx rates, up to a particular burst size. This is considerable smarter than simply tweaking the driver for huge DMA ring sizes. It is pretty specialised though. One big bonus: the packets can then be allocated GFP_KERNEL rather than GFP_ATOMIC. Oh. A quick measurement says dev_alloc_skb(1000) takes about 450 cycles on x86. At 150 kpps that's a potential 10% loss from your peak burst rate. But for 3c59x (which is not a very efficient driver (yet)), it takes 6 usecs to even get into the ISR, and around 4 uSecs to traverse it. Guess another 4 to leave the ISR, guess half as much again for whoever got interrupted to undo the resulting cache pollution. That's 20 usec per interrupt, of which 1 usec could be saved by skb pooling. If you don't do Rx interrupt mitigation there's no point in event thinking about skb pooling. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
> "jamal" == jamal <[EMAIL PROTECTED]> writes: jamal> The FF code of the tulip does have skb recycling code. And i jamal> belive Jes' acenic code does or did at some point. Robert jamal> Olson and I were thinking of taking out that code out of the jamal> tulip for reasons such as you talk about (and the thought maybe jamal> that the per-CPU slab might have obsoleted that jamal> requirement). We did some tests with 2.4.0-test7 and were jamal> suprised to observe that at high rate of input packets, it jamal> still made as a big a difference as 7000 packets per second ;-> jamal> i.e we got 7Kpps more by using skb recycling. I tried recycling in the acenic driver, but after adding Ingo's early per CPU slab caches I couldn't see any measurable performance gain from using recycling. Jes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, Sep 14, 2000 at 04:55:16AM -0700, David S. Miller wrote: >Date: Thu, 14 Sep 2000 06:53:37 -0400 (EDT) >From: jamal <[EMAIL PROTECTED]> > >Dave, would a scheme with an aging of the skbs in the recycle queue >and an upper bound of the number of packets sitting on the queue be >acceptable? > > This sounds more reasonable, certainly. Perhaps you and Jeff should > collaborate :-) Instead of explicit aging you could just use the skb_head slab cache for that: just kmalloc/kfree the data area in the slab constructor/destructor. This could probably give you most of the advantages of recycled skbs (less header setups) etc. in the slab environment. You may need to tune the slab cache prunning algorithms for that a bit though because the weight of a skbhead would be much more heavy (e.g. by giving the skbhead slab cache a bigger priority for prunning) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
Date: Thu, 14 Sep 2000 06:53:37 -0400 (EDT) From: jamal <[EMAIL PROTECTED]> Dave, would a scheme with an aging of the skbs in the recycle queue and an upper bound of the number of packets sitting on the queue be acceptable? This sounds more reasonable, certainly. Perhaps you and Jeff should collaborate :-) Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, David S. Miller wrote: >Does anyone think that allocating skbs during system idle time >would be useful? > > I really don't like these sorts of things, because it makes an > assumption as to what memory is about to be used for. I agree. Surely The Linux Way (tm) would be to make the allocations so cheap that prealocation would gain you nothing. Matthew. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, David S. Miller wrote: >Date: Thu, 14 Sep 2000 04:44:53 -0400 >From: Jeff Garzik <[EMAIL PROTECTED]> > >Does anyone think that allocating skbs during system idle time >would be useful? > > I really don't like these sorts of things, because it makes an > assumption as to what memory is about to be used for. > > What if you were to preallocate skbs while idle, then the next thing > which happens is some userland program walks over a 2gb dataset and > no network activity happens at all. > The FF code of the tulip does have skb recycling code. And i belive Jes' acenic code does or did at some point. Robert Olson and I were thinking of taking out that code out of the tulip for reasons such as you talk about (and the thought maybe that the per-CPU slab might have obsoleted that requirement). We did some tests with 2.4.0-test7 and were suprised to observe that at high rate of input packets, it still made as a big a difference as 7000 packets per second ;-> i.e we got 7Kpps more by using skb recycling. Dave, would a scheme with an aging of the skbs in the recycle queue and an upper bound of the number of packets sitting on the queue be acceptable? Maybe ANK can make a comment as well. Robert and I plan to play with such a scheme for a long time under many different scenarios and come with numbers (throughput etc) instead of "here's a patch and intuitively it makes sense". This is really a 2.5 thing if acceptable. cheers, jamal PS:- OLS patch coming soon; a few more tests (as time permits);-> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Preallocated skb's?
Does anyone think that allocating skbs during system idle time would be useful? Net drivers (well, ethernet at least) often wind up allocating maximum-sized skb's for use in Rx descriptors. It seems to me that it would be useful at interrupt time to have an skb already allocated, falling back on current dev_alloc_skb behavior during times of system load. Jeff -- Jeff Garzik | Windows NT Performance, Building 1024| on the next "In Search Of" MandrakeSoft, Inc. | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
Date:Thu, 14 Sep 2000 04:44:53 -0400 From: Jeff Garzik <[EMAIL PROTECTED]> Does anyone think that allocating skbs during system idle time would be useful? I really don't like these sorts of things, because it makes an assumption as to what memory is about to be used for. What if you were to preallocate skbs while idle, then the next thing which happens is some userland program walks over a 2gb dataset and no network activity happens at all. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Preallocated skb's?
Does anyone think that allocating skbs during system idle time would be useful? Net drivers (well, ethernet at least) often wind up allocating maximum-sized skb's for use in Rx descriptors. It seems to me that it would be useful at interrupt time to have an skb already allocated, falling back on current dev_alloc_skb behavior during times of system load. Jeff -- Jeff Garzik | Windows NT Performance, Building 1024| on the next "In Search Of" MandrakeSoft, Inc. | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, Donald Becker wrote: No, because I know I sound like a broken record. skipskip ;- What we measured is that the cache impact of allocating and initializing our (ever-larger) skbuffs is huge. So we pay some CPU time getting a new skbuff, and some more CPU time later reloading the cache with useful data. The skbuff is added to the end of the driver Rx buffer list, so the memory lines are out of the cache by the time we need them. So is there a workable solution to this? The Rx ring should be able to hold at least (interrupt-latency * 100/1000Mbps) bits and (interrupt-latency * 100/1000Mbps)/(64 bytes/packet * 8 bits/byte) packets cool. Assuming 64 bytes because it is minimal sized ethernet packet? The PCI drivers make some effort to always allocate the same size skbuff, so recycling skbuffs, or otherwise optimizing their allocation, is useful. Good to hear this from you. The only significant advantage of interrupt mitigation is cache locality when allocating new skbuffs, and having an additional mechanism to drop packets under overwhelming load. The disadvantage of Rx interrupt mitigation is adding latency just where it might matter the most. One of the things we need to measure still is the latency. The scheme currently used with dynamically adjusting the mitigation parameters might not affect latency much -- simply because the adjustement is based on the load. We still have to prove this. The theory is: Under a lot of congestion, you delay longer because the layers above you are congested as gauged from a feedback; and under low congestion, you should theoretically adjust all the way down to 1 interupt/packet. Under heavy load, your latency is already screwed anyways because of large backlog queue; this is regardless of mitigation. Remember that the hot ticket for old-IPX performance was taking an *extra* early interrupt for each Rx packet. If i remember correctly some of the 3coms still give this 'mid-interupt', no? It could useful to just say quickly read the header and make routing decisions as in fast routing but not under heavy load. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
Date:Thu, 14 Sep 2000 04:44:53 -0400 From: Jeff Garzik [EMAIL PROTECTED] Does anyone think that allocating skbs during system idle time would be useful? I really don't like these sorts of things, because it makes an assumption as to what memory is about to be used for. What if you were to preallocate skbs while idle, then the next thing which happens is some userland program walks over a 2gb dataset and no network activity happens at all. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
jamal wrote: The FF code of the tulip does have skb recycling code. And i belive Jes' acenic code does or did at some point. But this isn't preallocation. Unless you got cute, this scheme would limit the "preallocation" to the DMA ring size. For network-intensive applications, a larger pool of preallocated buffers would allow a driver to handle Rx packet bursts more gracefully. Make the pool size tunable to match the desired max burst size, as well as to avoid davem's Oracle problem. At present we perform the allocation at interrupt time which is precisely when we shouldn't. A background kernel thread which keeps the pool topped up would even things out and would give a significant increase in our peak Rx rates, up to a particular burst size. This is considerable smarter than simply tweaking the driver for huge DMA ring sizes. It is pretty specialised though. One big bonus: the packets can then be allocated GFP_KERNEL rather than GFP_ATOMIC. Oh. A quick measurement says dev_alloc_skb(1000) takes about 450 cycles on x86. At 150 kpps that's a potential 10% loss from your peak burst rate. But for 3c59x (which is not a very efficient driver (yet)), it takes 6 usecs to even get into the ISR, and around 4 uSecs to traverse it. Guess another 4 to leave the ISR, guess half as much again for whoever got interrupted to undo the resulting cache pollution. That's 20 usec per interrupt, of which 1 usec could be saved by skb pooling. If you don't do Rx interrupt mitigation there's no point in event thinking about skb pooling. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, David S. Miller wrote: Date: Thu, 14 Sep 2000 04:44:53 -0400 From: Jeff Garzik [EMAIL PROTECTED] Does anyone think that allocating skbs during system idle time would be useful? I really don't like these sorts of things, because it makes an assumption as to what memory is about to be used for. What if you were to preallocate skbs while idle, then the next thing which happens is some userland program walks over a 2gb dataset and no network activity happens at all. The FF code of the tulip does have skb recycling code. And i belive Jes' acenic code does or did at some point. Robert Olson and I were thinking of taking out that code out of the tulip for reasons such as you talk about (and the thought maybe that the per-CPU slab might have obsoleted that requirement). We did some tests with 2.4.0-test7 and were suprised to observe that at high rate of input packets, it still made as a big a difference as 7000 packets per second ;- i.e we got 7Kpps more by using skb recycling. Dave, would a scheme with an aging of the skbs in the recycle queue and an upper bound of the number of packets sitting on the queue be acceptable? Maybe ANK can make a comment as well. Robert and I plan to play with such a scheme for a long time under many different scenarios and come with numbers (throughput etc) instead of "here's a patch and intuitively it makes sense". This is really a 2.5 thing if acceptable. cheers, jamal PS:- OLS patch coming soon; a few more tests (as time permits);- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, David S. Miller wrote: Does anyone think that allocating skbs during system idle time would be useful? I really don't like these sorts of things, because it makes an assumption as to what memory is about to be used for. I agree. Surely The Linux Way (tm) would be to make the allocations so cheap that prealocation would gain you nothing. Matthew. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
Date: Thu, 14 Sep 2000 06:53:37 -0400 (EDT) From: jamal [EMAIL PROTECTED] Dave, would a scheme with an aging of the skbs in the recycle queue and an upper bound of the number of packets sitting on the queue be acceptable? This sounds more reasonable, certainly. Perhaps you and Jeff should collaborate :-) Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
"jamal" == jamal [EMAIL PROTECTED] writes: jamal The FF code of the tulip does have skb recycling code. And i jamal belive Jes' acenic code does or did at some point. Robert jamal Olson and I were thinking of taking out that code out of the jamal tulip for reasons such as you talk about (and the thought maybe jamal that the per-CPU slab might have obsoleted that jamal requirement). We did some tests with 2.4.0-test7 and were jamal suprised to observe that at high rate of input packets, it jamal still made as a big a difference as 7000 packets per second ;- jamal i.e we got 7Kpps more by using skb recycling. I tried recycling in the acenic driver, but after adding Ingo's early per CPU slab caches I couldn't see any measurable performance gain from using recycling. Jes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
Yes ! The FF experiments with 2.1.X indicated improvement factor about 2-3 times with skb recycling. With combination of FF and skb recycling we could reach fast Ethernet wire speed forwarding on 400 Mhz CPU. About ~147 KPPS. As jamal reported the improvement is much less today but the forwarding performance is impressive even without FF and skb recycling. Slab seems to do a good job and especially when the debug is disabled. :-) --ro Andi Kleen writes: On Thu, Sep 14, 2000 at 11:59:32PM +1100, Andrew Morton wrote: That's 20 usec per interrupt, of which 1 usec could be saved by skb pooling. FF usually runs with interrupt mitigation at higher rates (8-16 or even more packets / interrupt). I agree though that it probably does not make too much difference. alloc_skb could probably be made cheaper for the FF case by being more clever in the slab constructor (I think there was some bitrot during 2.3 on the cache line usage -- 2.2 pretty much only needed 2 cache lines in the header for a FF packet) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, 14 Sep 2000, Andrew Morton wrote: But for 3c59x (which is not a very efficient driver (yet)), it takes 6 usecs to even get into the ISR, and around 4 uSecs to traverse it. Guess another 4 to leave the ISR, guess half as much again for whoever got interrupted to undo the resulting cache pollution. That's 20 usec per interrupt, of which 1 usec could be saved by skb pooling. With these numbers + how long it takes to queue the packets in netif_rx(); i would say you roughly should be able to tune your DMA ring appropriately. Roughly your DMA ring should be able to hold: (PCI_Burst_bandwidth*((20*10-6)+pci_bus_latency))) bits. Did i hear Donald say something? ;- If you don't do Rx interrupt mitigation there's no point in event thinking about skb pooling. FF does not use mitigation and as Robert was pointing out this was adding a lot of value. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
What Alexey's code does is _not_ preallocation -- it does re-cycling. On tx_completion, the skb is recycled onto a recycle queue unless the queue is full (which is a tunable parameter) in which case it is freed. This is more sensible than doing pre-allocation during idle times or other smart schemes. On a busy system this queue will always have something. What i meant by aging is to have a separate thread that prunes the queue based on age i.e how long the skb has been sitting there etc. I think Jes had a bottom-half running there; a simple per-cpu timer might suffice. The heuristic (such as the timer decay etc) for this part needs a study and thats what Robert and i are planing to do. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
On Thu, Sep 14, 2000 at 10:26:08PM -0400, jamal wrote: One of the things we need to measure still is the latency. The scheme currently used with dynamically adjusting the mitigation parameters might not affect latency much -- simply because the adjustement is based on the load. We still have to prove this. The theory is: Under a lot of congestion, you delay longer because the layers above you are congested as gauged from a feedback; and under low congestion, you should theoretically adjust all the way down to 1 interupt/packet. Under heavy load, your latency is already screwed anyways because of large backlog queue; this is regardless of mitigation. Or maybe the extra delay in congested circumstances will cause more timeouts and that's precisely when you need to improve latency? -- - Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/