Re: [pktgen script v2 0/2] Add a pktgen sample script of NUMA awareness
On Mon, 18 Sep 2017 11:06:21 +0200 Jesper Dangaard Brouer wrote: > On Sun, 17 Sep 2017 20:36:36 +0800 Robert Hoo > wrote: > > > Change log > > v2: > > Rebased to > > https://github.com/netoptimizer/network-testing/tree/master/pktgen > > Hi Robert, > > Thank you for submitting this against my git tree[1]. I skimmed the > patches and they looked okay. I'll give them a test run, before I > accept them into my tree. > > Later I'll synchronize my pktgen scripts/git-tree with the kernel via > regular patches against DaveM's net-next tree[2] (and I'll try to > remember to give you author credit). > > [1] https://github.com/netoptimizer/network-testing/tree/master/pktgen > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/samples/pktgen FYI, I've applied and pushed these patches to my tree. https://github.com/netoptimizer/network-testing/commits?author=robert-hoo https://github.com/netoptimizer/network-testing/commit/1b9b4b797a4f112 https://github.com/netoptimizer/network-testing/commit/65efc2352f63dde https://github.com/netoptimizer/network-testing/commit/54eb5178aaf4031 I fixed the description a bit, and only made one simple change: https://github.com/netoptimizer/network-testing/commit/9ff58568b3f8c91 Thanks for working on improving the pktgen scripts :-) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [pktgen script v2 0/2] Add a pktgen sample script of NUMA awareness
On Sun, 17 Sep 2017 20:36:36 +0800 Robert Hoo wrote: > Change log > v2: > Rebased to https://github.com/netoptimizer/network-testing/tree/master/pktgen Hi Robert, Thank you for submitting this against my git tree[1]. I skimmed the patches and they looked okay. I'll give them a test run, before I accept them into my tree. Later I'll synchronize my pktgen scripts/git-tree with the kernel via regular patches against DaveM's net-next tree[2] (and I'll try to remember to give you author credit). [1] https://github.com/netoptimizer/network-testing/tree/master/pktgen [2] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/samples/pktgen -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: pktgen question
From: Steve Wise <[EMAIL PROTECTED]> Date: Mon, 08 Oct 2007 17:46:26 -0500 > We're talking about just for pktgen...eh? My bad, I'm happy to review a patch that uses the skb->destructor in pktgen to achieve this. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
David Miller wrote: From: Ben Greear <[EMAIL PROTECTED]> Date: Mon, 08 Oct 2007 14:57:13 -0700 This skb recycling can certainly work and has been done several times before. It never gets into the kernel though. Because it doesn't work. A socket can hang onto a receive packet essentially forever. You cannot therefore rely upon the network stack to give you the packet back in some reasonable finite amount of time. This is simply the nature of the beast. Which means that you either: 1) Starve and stop receiving packets when the recycling ring runs out because all of those packets are stuck in socket buffers. This is easily DoS'able by users on your system 2) End up allocating new packets anyways, but then what's the point of the recycling ring? It's just a hack that works when everything goes as planned, and in real life that is rarely the case. It also makes the driver locking more complicated. Packet input occurs in NAPI drivers via netif_receive_skb() which would be capable of recycling packets back to the same driver in a recycling scheme. But the recycling can occur from other contexts too. The netif_receive_skb() caller already has atomic access to the receive queue, but those other callback cases do not. That locking issue is just the tip of the iceberg. Once you start discussing solutions, all sorts of new issues begin to pop up. SKB recycling, just say no. We're talking about just for pktgen...eh? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
From: Ben Greear <[EMAIL PROTECTED]> Date: Mon, 08 Oct 2007 14:57:13 -0700 > This skb recycling can certainly work and has been done several > times before. It never gets into the kernel though. Because it doesn't work. A socket can hang onto a receive packet essentially forever. You cannot therefore rely upon the network stack to give you the packet back in some reasonable finite amount of time. This is simply the nature of the beast. Which means that you either: 1) Starve and stop receiving packets when the recycling ring runs out because all of those packets are stuck in socket buffers. This is easily DoS'able by users on your system 2) End up allocating new packets anyways, but then what's the point of the recycling ring? It's just a hack that works when everything goes as planned, and in real life that is rarely the case. It also makes the driver locking more complicated. Packet input occurs in NAPI drivers via netif_receive_skb() which would be capable of recycling packets back to the same driver in a recycling scheme. But the recycling can occur from other contexts too. The netif_receive_skb() caller already has atomic access to the receive queue, but those other callback cases do not. That locking issue is just the tip of the iceberg. Once you start discussing solutions, all sorts of new issues begin to pop up. SKB recycling, just say no. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Steve Wise wrote: Ben Greear wrote: Rick Jones wrote: Perf-wise, you could clone the skbs up front, then deliver them to the nic in a tight loop. This would mitigate the added overhead introduced by calling skb_clone() in the loop doing transmits... That only works if you are sending a small number of skbs. You can't pre-clone several minutes worth of 10Gbe traffic with any normal amount of RAM. Does pktgen really need to allocate anything more than some smallish fraction more than the depth of the driver's transmit queue? If you want to send sustained high rates of traffic, for more than just a trivial amount of time, then you either have to play the current trick with the skb_get(), or you have to allocate a real packet each time (maybe with skb_clone() or similar, but it's still more overhead than the skb_get which only bumps a reference count.) I see no other way, but if you can think of one, please let me know. You can keep freed skb's that were cloned on a free list, then reuse them once freed. You can detect when the driver frees them by adding a destroy function to the skb. So what will happen is the set of cloned skbs needed will eventually settled down to a constent amount and the amount will be based on the latency involved in transmitting a single skb. And it should be bounded by the max txq depth. Yes? (or am I all wet :) So you would pay the overhead of cloning only until you hit this steady state. Whatchathink? This skb recycling can certainly work and has been done several times before. It never gets into the kernel though. Also, still a lot more overhead than incrementing the in-use counter since you will have to re-initialize the pkt (you don't know what all the underlying devices might have done to it.) The number needed would be bound by all of the queues involved. There could be several queues if you are running pktgen on virtual devices (with potential queues between virtual devs and hardware devs). Still, I at least would be interested in seeing a patch if you put one together. You should get buy-in from Robert and/or DaveM if you want it to have a chance to get into the kernel. Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Ben Greear wrote: Rick Jones wrote: Perf-wise, you could clone the skbs up front, then deliver them to the nic in a tight loop. This would mitigate the added overhead introduced by calling skb_clone() in the loop doing transmits... That only works if you are sending a small number of skbs. You can't pre-clone several minutes worth of 10Gbe traffic with any normal amount of RAM. Does pktgen really need to allocate anything more than some smallish fraction more than the depth of the driver's transmit queue? If you want to send sustained high rates of traffic, for more than just a trivial amount of time, then you either have to play the current trick with the skb_get(), or you have to allocate a real packet each time (maybe with skb_clone() or similar, but it's still more overhead than the skb_get which only bumps a reference count.) I see no other way, but if you can think of one, please let me know. You can keep freed skb's that were cloned on a free list, then reuse them once freed. You can detect when the driver frees them by adding a destroy function to the skb. So what will happen is the set of cloned skbs needed will eventually settled down to a constent amount and the amount will be based on the latency involved in transmitting a single skb. And it should be bounded by the max txq depth. Yes? (or am I all wet :) So you would pay the overhead of cloning only until you hit this steady state. Whatchathink? Thanks, Ben - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Rick Jones wrote: Perf-wise, you could clone the skbs up front, then deliver them to the nic in a tight loop. This would mitigate the added overhead introduced by calling skb_clone() in the loop doing transmits... That only works if you are sending a small number of skbs. You can't pre-clone several minutes worth of 10Gbe traffic with any normal amount of RAM. Does pktgen really need to allocate anything more than some smallish fraction more than the depth of the driver's transmit queue? If you want to send sustained high rates of traffic, for more than just a trivial amount of time, then you either have to play the current trick with the skb_get(), or you have to allocate a real packet each time (maybe with skb_clone() or similar, but it's still more overhead than the skb_get which only bumps a reference count.) I see no other way, but if you can think of one, please let me know. Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Perf-wise, you could clone the skbs up front, then deliver them to the nic in a tight loop. This would mitigate the added overhead introduced by calling skb_clone() in the loop doing transmits... That only works if you are sending a small number of skbs. You can't pre-clone several minutes worth of 10Gbe traffic with any normal amount of RAM. Does pktgen really need to allocate anything more than some smallish fraction more than the depth of the driver's transmit queue? Of course, even that won't fit in the L1 cache of a processor, so if one is really just trying to max-out the NIC it might still have too much overhead, but then does pktgen+driver et al actually fit in an L1 cache? rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
From: Steve Wise <[EMAIL PROTECTED]> Date: Mon, 24 Sep 2007 08:54:13 -0500 > I think pktgen should be cloning the skbs using skb_clone(). Then it > will work for all devices, eh? The problem is that skb_clone() is (relatively) expensive and pktgen is trying to just grab a reference to the SKB in the absolutely cheapest way possible. In my personal opinion, I think what the drivers that don't work with pktgen are doing is wrong and they should find another way to pass state around other than to depend upon being able to use the ->cb[] area at-will. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Hi, Steve Wise writes: > I think pktgen should be cloning the skbs using skb_clone(). Then it > will work for all devices, eh? pktgen assumes for "fastpath" sending exclusive ownership of the skb. And does a skb_get to avoid final skb destruction so the same skb can be sent over and over. The idea is to avoid memory allocation and keep things in cache to give very high packet rates with identical packets. I But if you need to alter the packet then the skb_get trick can't be done. And you have to turn off "fastpath" with clone_skb > Perf-wise, you could clone the skbs up front, then deliver them to the > nic in a tight loop. This would mitigate the added overhead introduced > by calling skb_clone() in the loop doing transmits... Sure it's can be done. It could replay sequences etc but it will not beat the skb_get trick in sending identical packets. It has been proposed before but I've avoided such efforts to keep things relatively small and simple. Really pktgen should be reworked to have s small skim in kernel and move the rest of the stuff to userland. Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Steve Wise wrote: Ben Greear wrote: Steve Wise wrote: I think pktgen should be cloning the skbs using skb_clone(). Then it will work for all devices, eh? That might work, but it would decrease performance slightly (or, increase CPU load at least). Perf-wise, you could clone the skbs up front, then deliver them to the nic in a tight loop. This would mitigate the added overhead introduced by calling skb_clone() in the loop doing transmits... That only works if you are sending a small number of skbs. You can't pre-clone several minutes worth of 10Gbe traffic with any normal amount of RAM. Maybe a new option: multi_clone If the current code is busted, I think it should be fixed. Well, it works fine when used correctly :) Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Ben Greear wrote: Steve Wise wrote: I think pktgen should be cloning the skbs using skb_clone(). Then it will work for all devices, eh? That might work, but it would decrease performance slightly (or, increase CPU load at least). Perf-wise, you could clone the skbs up front, then deliver them to the nic in a tight loop. This would mitigate the added overhead introduced by calling skb_clone() in the loop doing transmits... Maybe a new option: multi_clone If the current code is busted, I think it should be fixed. My 2 cents. Steve. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Steve Wise wrote: I think pktgen should be cloning the skbs using skb_clone(). Then it will work for all devices, eh? That might work, but it would decrease performance slightly (or, increase CPU load at least). Maybe a new option: multi_clone Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
I think pktgen should be cloning the skbs using skb_clone(). Then it will work for all devices, eh? Ben Greear wrote: jamal wrote: On Sun, 2007-23-09 at 12:55 -0500, Steve Wise wrote: Its a hack that breaks cxgb3 because cxgb3 uses the skb->cb area for each skb passed down. So cxgb3 is at fault then? IE a driver cannot use the skb->cb field if the users count is > 1? Or maybe a driver can _never_ use the cb field? any layer can use the cb structure whichever way they wish. There are violations, e.g: the vlan code also uses the cb field to pass vlan details for hardware acceleration. How does pktgen affect it though, clone() will just copy the cb field and pktgen doesnt touch it. In retrospect, pktgen may have to use clone - ccing Robert Olsson. Pktgen abuses the ref count, as far as I can tell. It works with most ethernet drivers, but that multi-pkt will also fail with things like vlans because the skb->dev is changed as it is transmitted (to the lower-level device). I'd say just don't use the multi-pkt with pktgen on devices that can't handle it. Thanks, Ben - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
jamal wrote: On Sun, 2007-23-09 at 12:55 -0500, Steve Wise wrote: Its a hack that breaks cxgb3 because cxgb3 uses the skb->cb area for each skb passed down. So cxgb3 is at fault then? IE a driver cannot use the skb->cb field if the users count is > 1? Or maybe a driver can _never_ use the cb field? any layer can use the cb structure whichever way they wish. There are violations, e.g: the vlan code also uses the cb field to pass vlan details for hardware acceleration. How does pktgen affect it though, clone() will just copy the cb field and pktgen doesnt touch it. In retrospect, pktgen may have to use clone - ccing Robert Olsson. Pktgen abuses the ref count, as far as I can tell. It works with most ethernet drivers, but that multi-pkt will also fail with things like vlans because the skb->dev is changed as it is transmitted (to the lower-level device). I'd say just don't use the multi-pkt with pktgen on devices that can't handle it. Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
On Sun, 2007-23-09 at 12:55 -0500, Steve Wise wrote: > Its a hack that breaks cxgb3 because cxgb3 uses the skb->cb area for > each skb passed down. So cxgb3 is at fault then? IE a driver cannot > use the skb->cb field if the users count is > 1? Or maybe a driver can > _never_ use the cb field? any layer can use the cb structure whichever way they wish. There are violations, e.g: the vlan code also uses the cb field to pass vlan details for hardware acceleration. How does pktgen affect it though, clone() will just copy the cb field and pktgen doesnt touch it. In retrospect, pktgen may have to use clone - ccing Robert Olsson. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Hi Steve. On Sun, Sep 23, 2007 at 11:12:12AM -0500, Steve Wise ([EMAIL PROTECTED]) wrote: > The pktgen module provides a way to "clone" the skb its using for > transmission, and allows passing N clones of the originally created skb > to the driver under test.However, it doesn't really use skb_clone(), > but rather it just bumps the skb->users count for each "clone" and > passes the same skb ptr to the driver. > > Q: Is that a valid use of skb->users or should pktgen really be cloning > the skbuff? It's a hack, but since skb is owned by pktgen only (no copies in some outside queues or some other access) it is allowed just to bump reference counter (i.e. 'share' skb in usual notation). -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen question
Evgeniy Polyakov wrote: Hi Steve. On Sun, Sep 23, 2007 at 11:12:12AM -0500, Steve Wise ([EMAIL PROTECTED]) wrote: The pktgen module provides a way to "clone" the skb its using for transmission, and allows passing N clones of the originally created skb to the driver under test.However, it doesn't really use skb_clone(), but rather it just bumps the skb->users count for each "clone" and passes the same skb ptr to the driver. Q: Is that a valid use of skb->users or should pktgen really be cloning the skbuff? It's a hack, but since skb is owned by pktgen only (no copies in some outside queues or some other access) it is allowed just to bump reference counter (i.e. 'share' skb in usual notation). Its a hack that breaks cxgb3 because cxgb3 uses the skb->cb area for each skb passed down. So cxgb3 is at fault then? IE a driver cannot use the skb->cb field if the users count is > 1? Or maybe a driver can _never_ use the cb field? Steve. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
On Wed, 2007-05-09 at 22:06 -0700, Mandeep Singh Baines wrote: > jamal ([EMAIL PROTECTED]) wrote: > > If you read the paper: There are no issues with high throughput - NAPI > > kicks in. > > The challenge to be overcome is at low traffic, if you have a real fast > > processor your cpu-cycles-used/bits-processed ratio is high > > I'm not sure cpu-cycles-used/bits-processed is the correct metric to use. > An alternative would be to look at cpu-cycles-used/unit-time (i.e. > CPU utilization or load) for a given bit-rate or packet-rate. I wasnt explicit but we are saying the same thing. The paper is more specific: if you make the packet count used constant - this translates to a unit of time given low traffic rate. If you fix the time, cpu-cycles are easier to extrapolate from. > This would make an interesting graph. If you are to plot a cpu-util vs packet rate, on most modern hardware you will see a spike before you hit the MLFRR and then utilization will go down and slowly start going up. > At low packet-rate, CPU utilization is low so doing extra work per packet > is probably OK. Thats the arguement weve used in the past ;-> Unfortunately, with the relative cost of IO going up you cant keep making that arguement (at least thats the position presented in the paper). Your mileage may vary. If you are running a machine dishing out bulk transfers probably not a big deal. If you are doing database transactions, you loose in benchmarks etc. > Utilizing 2% of CPU vesus 1% is negligible. But at higher > rate, when there is more CPU utilization, using 40% of CPU versus 60% is > significant. I think the absolute different in CPU utilization is more > important than the relative difference. iow, 2% versus 1%, even though > a 2x difference in cpu-cycles/packet, is negligible compared to 40% > versus 60%. Refer to above. > Using a timer might also behave better in a tick-less (CONFIG_NO_HZ) > configuration. good point. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
jamal ([EMAIL PROTECTED]) wrote: > On Wed, 2007-05-09 at 14:55 +0100, James Chapman wrote: > > > Thanks Jamal. Yes, I'd already read your paper. I think my idea is > > different to the ideas described in your paper > > I am hoping you can pick from the lessons of what has been tried and > failed and the justification for critiqueing something as "Failed". > If you have - i feel i accomplished something useful writting the paper. > > > and I'm in the process of > > writing it up as an RFC to post to netdev. > > Please cc me if you want my feedback - I am backlogged by about 3000 > messages on netdev. > > > Briefly though, the driver's > > ->poll() holds off doing netif_rx_complete() for 1-2 jiffies and keeps > > itself in poll_on mode for that time, consuming no quota. > > The net_rx > > softirq processing loop is modified to detect the case when the only > > devices in its poll list are doing no work (consuming no quota). The > > driver's ->poll() samples jiffies while it is considering when to do the > > netif_rx_complete() like your Parked state - no timers are used. > > Ok, so the difference seems to be you actually poll instead for those > jiffies instead starting a timer in the parked state - is that right? > > > If I missed that this approach has already been tried before and > > rejected, please let me know. I see better throughput and latency in my > > packet forwarding and LAN setups using it. > > If you read the paper: There are no issues with high throughput - NAPI > kicks in. > The challenge to be overcome is at low traffic, if you have a real fast > processor your cpu-cycles-used/bits-processed ratio is high I'm not sure cpu-cycles-used/bits-processed is the correct metric to use. An alternative would be to look at cpu-cycles-used/unit-time (i.e. CPU utilization or load) for a given bit-rate or packet-rate. This would make an interesting graph. At low packet-rate, CPU utilization is low so doing extra work per packet is probably OK. Utilizing 2% of CPU vesus 1% is negligible. But at higher rate, when there is more CPU utilization, using 40% of CPU versus 60% is significant. I think the absolute different in CPU utilization is more important than the relative difference. iow, 2% versus 1%, even though a 2x difference in cpu-cycles/packet, is negligible compared to 40% versus 60%. > If you are polling (softirqs have high prio and depending on the cpu, > there could be a few gazillion within those 1-2 jiffies), then isnt the > end result still a high cpu-cycles used? > Thats what the timer tries to avoid (do nothing until some deffered > point). Using a timer might also behave better in a tick-less (CONFIG_NO_HZ) configuration. > If you waste cpu cycles and poll, I can see that (to emphasize: For > FastCPU-LowTraffic scenario), you will end up _not_ having latency > issues i pointed out, but you surely have digressed from the original > goal which is to address the cpu abuse at low traffic (because you abuse > more cpu). > > One thing that may be valuable is to show that the timers and polls are > not much different in terms of cpu abuse (It's theoretically not true, > but reality may not match). > The other thing is now hrestimers are on, can you try using a piece of > hardware that can get those kicked and see if you see any useful results > on latency. > > cheers, > jamal > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
On Wed, 2007-05-09 at 14:55 +0100, James Chapman wrote: > Thanks Jamal. Yes, I'd already read your paper. I think my idea is > different to the ideas described in your paper I am hoping you can pick from the lessons of what has been tried and failed and the justification for critiqueing something as "Failed". If you have - i feel i accomplished something useful writting the paper. > and I'm in the process of > writing it up as an RFC to post to netdev. Please cc me if you want my feedback - I am backlogged by about 3000 messages on netdev. > Briefly though, the driver's > ->poll() holds off doing netif_rx_complete() for 1-2 jiffies and keeps > itself in poll_on mode for that time, consuming no quota. > The net_rx > softirq processing loop is modified to detect the case when the only > devices in its poll list are doing no work (consuming no quota). The > driver's ->poll() samples jiffies while it is considering when to do the > netif_rx_complete() like your Parked state - no timers are used. Ok, so the difference seems to be you actually poll instead for those jiffies instead starting a timer in the parked state - is that right? > If I missed that this approach has already been tried before and > rejected, please let me know. I see better throughput and latency in my > packet forwarding and LAN setups using it. If you read the paper: There are no issues with high throughput - NAPI kicks in. The challenge to be overcome is at low traffic, if you have a real fast processor your cpu-cycles-used/bits-processed ratio is high If you are polling (softirqs have high prio and depending on the cpu, there could be a few gazillion within those 1-2 jiffies), then isnt the end result still a high cpu-cycles used? Thats what the timer tries to avoid (do nothing until some deffered point). If you waste cpu cycles and poll, I can see that (to emphasize: For FastCPU-LowTraffic scenario), you will end up _not_ having latency issues i pointed out, but you surely have digressed from the original goal which is to address the cpu abuse at low traffic (because you abuse more cpu). One thing that may be valuable is to show that the timers and polls are not much different in terms of cpu abuse (It's theoretically not true, but reality may not match). The other thing is now hrestimers are on, can you try using a piece of hardware that can get those kicked and see if you see any useful results on latency. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
jamal wrote: On Wed, 2007-05-09 at 13:03 +0100, James Chapman wrote: I have a patch that solves the high interrupt rate problem by keeping the driver in polled mode longer. It's written for the latest NAPI version that DaveM posted recently. I'll try to get some time to write it up and post it for comment. Theres interesting challenges to be tackled with resolving that. Just so you dont reinvent the wheel or to help invent a better one; please read: http://kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf if you have, what are the differences in your approach - and when do you find it useful? Thanks Jamal. Yes, I'd already read your paper. I think my idea is different to the ideas described in your paper and I'm in the process of writing it up as an RFC to post to netdev. Briefly though, the driver's ->poll() holds off doing netif_rx_complete() for 1-2 jiffies and keeps itself in poll_on mode for that time, consuming no quota. The net_rx softirq processing loop is modified to detect the case when the only devices in its poll list are doing no work (consuming no quota). The driver's ->poll() samples jiffies while it is considering when to do the netif_rx_complete() like your Parked state - no timers are used. If I missed that this approach has already been tried before and rejected, please let me know. I see better throughput and latency in my packet forwarding and LAN setups using it. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
On Wed, 2007-05-09 at 13:03 +0100, James Chapman wrote: > I have a patch that solves the high interrupt rate problem by keeping > the driver in polled mode longer. It's written for the latest NAPI > version that DaveM posted recently. I'll try to get some time to write > it up and post it for comment. Theres interesting challenges to be tackled with resolving that. Just so you dont reinvent the wheel or to help invent a better one; please read: http://kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf if you have, what are the differences in your approach - and when do you find it useful? cheers, jamal PS:- Mandeep, i was going to suggest thresholds but you beat me to it with your latest patch. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
Mandeep Singh Baines wrote: Daniele Venzano ([EMAIL PROTECTED]) wrote: The patch looks good and I think it can be pushed higher (-mm ?) for some wider testing. I don't have the hardware available to do some tests myself, unfortunately, but it would be similar to yours anyway. I'd like to know how this works for people with less memory and slower CPU, but any kind of test run will be appreciated. Hi Daniele, I tested the driver under different setting of CPU clock. Under a lower clock, I get less interrupts (what I was hoping for) and slightly less performance. Under higher clock, I get better performance and almost 1 interrupt per packet. I have a patch that solves the high interrupt rate problem by keeping the driver in polled mode longer. It's written for the latest NAPI version that DaveM posted recently. I'll try to get some time to write it up and post it for comment. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
Daniele Venzano ([EMAIL PROTECTED]) wrote: > The patch looks good and I think it can be pushed higher (-mm ?) for some > wider > testing. I don't have the hardware available to do some tests myself, > unfortunately, but it would be similar to yours anyway. > > I'd like to know how this works for people with less memory and slower CPU, > but any > kind of test run will be appreciated. Hi Daniele, I tested the driver under different setting of CPU clock. Under a lower clock, I get less interrupts (what I was hoping for) and slightly less performance. Under higher clock, I get better performance and almost 1 interrupt per packet. I also made a tiny change to the driver which resulted in slightly better performance and less interrupts. I made a one-line change to finish_xmit, which now wakes up the transmit queue if there are at least 16 available slots in the tx ring. Previously, the driver would wake up the transmit queue if there was just a single slot available. This should result in better hysteresis. Attached are my test results and a new patch. Signed-off-by: Mandeep Singh Baines <[EMAIL PROTECTED]> -- diff --git a/drivers/net/sis900.c b/drivers/net/sis900.c index 7c6e480..40c1aee 100644 --- a/drivers/net/sis900.c +++ b/drivers/net/sis900.c @@ -185,7 +185,6 @@ struct sis900_private { dma_addr_t tx_ring_dma; dma_addr_t rx_ring_dma; - unsigned int tx_full; /* The Tx queue is full. */ u8 host_bridge_rev; u8 chipset_rev; }; @@ -202,8 +201,10 @@ MODULE_PARM_DESC(max_interrupt_work, "SiS 900/7016 maximum events handled per in MODULE_PARM_DESC(sis900_debug, "SiS 900/7016 bitmapped debugging message level"); #ifdef CONFIG_NET_POLL_CONTROLLER -static void sis900_poll(struct net_device *dev); +static void sis900_poll_controller(struct net_device *dev); #endif + +static int sis900_poll(struct net_device *dev, int *budget); static int sis900_open(struct net_device *net_dev); static int sis900_mii_probe (struct net_device * net_dev); static void sis900_init_rxfilter (struct net_device * net_dev); @@ -216,8 +217,8 @@ static void sis900_tx_timeout(struct net_device *net_dev); static void sis900_init_tx_ring(struct net_device *net_dev); static void sis900_init_rx_ring(struct net_device *net_dev); static int sis900_start_xmit(struct sk_buff *skb, struct net_device *net_dev); -static int sis900_rx(struct net_device *net_dev); -static void sis900_finish_xmit (struct net_device *net_dev); +static int sis900_rx(struct net_device *net_dev, int limit); +static int sis900_finish_xmit (struct net_device *net_dev); static irqreturn_t sis900_interrupt(int irq, void *dev_instance); static int sis900_close(struct net_device *net_dev); static int mii_ioctl(struct net_device *net_dev, struct ifreq *rq, int cmd); @@ -474,9 +475,11 @@ static int __devinit sis900_probe(struct pci_dev *pci_dev, net_dev->tx_timeout = sis900_tx_timeout; net_dev->watchdog_timeo = TX_TIMEOUT; net_dev->ethtool_ops = &sis900_ethtool_ops; + net_dev->poll = &sis900_poll; + net_dev->weight = 64; #ifdef CONFIG_NET_POLL_CONTROLLER -net_dev->poll_controller = &sis900_poll; +net_dev->poll_controller = &sis900_poll_controller; #endif if (sis900_debug > 0) @@ -979,13 +982,44 @@ static u16 sis900_reset_phy(struct net_device *net_dev, int phy_addr) return status; } +static int sis900_poll(struct net_device *dev, int *budget) +{ + struct sis900_private *sis_priv = dev->priv; + long ioaddr = dev->base_addr; + int limit = min_t(int, dev->quota, *budget); + int rx_work_done; + int tx_work_done; + + /* run TX completion thread */ + spin_lock(sis_priv->lock); + tx_work_done = sis900_finish_xmit(dev); + spin_unlock(sis_priv->lock); + + /* run RX thread */ + rx_work_done = sis900_rx(dev, limit); + *budget -= rx_work_done; + dev->quota -= rx_work_done; + + /* re-enable interrupts if no work done */ + if (rx_work_done + tx_work_done == 0) { + netif_rx_complete(dev); + /* Enable all known interrupts. */ + outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxOK), ioaddr + imr); + /* Handle rotting packet */ + sis900_rx(dev, NUM_RX_DESC); + return 0; + } + + return 1; +} + #ifdef CONFIG_NET_POLL_CONTROLLER /* * Polling 'interrupt' - used by things like netconsole to send skbs * without having to re-enable interrupts. It's not called while * the interrupt routine is executing. */ -static void sis900_poll(struct net_device *dev) +static void sis900_poll_controller(struct net_device *dev) { disable_irq(dev->irq); sis900_interrupt(dev->irq, dev); @@ -1032,7 +1066,7 @@ sis900_open(struct net_device *net_dev) sis900_set_mode(ioaddr, HW_SPEED_10_MBPS, FDX_CAPABLE_HALF_SELECTED); /* Enable all known interrupts by setting the inter
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
Cool. I'll try to see if I can clock my pc lower and run the experiments again. I'll measure cpu utilization also this time around. That should be useful for extrapolating. Regards, Mandeep On 9/4/07, Daniele Venzano <[EMAIL PROTECTED]> wrote: > - Message d'origine - > De: Mandeep Singh Baines <[EMAIL PROTECTED]> > Date: Mon, 3 Sep 2007 20:20:36 -0700 > Sujet: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition > > >Hi Daniele, > > > >Attached is a patch for converting the sis900 driver to NAPI. Please take a > >look at let me know what you think. I'm not 100% sure if I'm handling the > >rotting packet issue correctly or that I have the locking right in > >tx_timeout. > >These might be areas to look at closely. > > > >I didn't see much saving in interrupts on my machine (too fast, I guess). But > >I still think its beneficial: pushing work out of the interrupt handler into > >a bottom half is a good thing and we no longer need to disable interrupts > >in start_xmit. > > > >I did see a significant boost to tx performance by optimizing start_xmit: > >more > >than double pps in pktgen. > > > >I'm also attaching some test results for various iterations of development. > > The patch looks good and I think it can be pushed higher (-mm ?) for some > wider > testing. I don't have the hardware available to do some tests myself, > unfortunately, but it would be similar to yours anyway. > > I'd like to know how this works for people with less memory and slower CPU, > but any > kind of test run will be appreciated. > > Thanks, bye. > > > -- > Daniele Venzano > [EMAIL PROTECTED] > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
On 9/4/07, jamal <[EMAIL PROTECTED]> wrote: > On Mon, 2007-03-09 at 20:20 -0700, Mandeep Singh Baines wrote: > > > I didn't see much saving in interrupts on my machine (too fast, I guess). > > You could try the idea suggested by Dave earlier and just turn interupts > for every nth packet. That should cut down the numbers. > I wanted to do that but I couldn't figure out how to free the last skb if it happened to be a transmit that didn't generate a tx completion interrupt. Let's say I interrupt every 4th packet.I've already sent packets 0 to 4. Now I send packets 5 and 6. I can't figure out how to ensure that the skb's for these packets get freed in a deterministic amount of time if there is no interrupt? They will get freed when packet 8 gets transmitted but there is no upper bound on when that will happen. Maybe I could create a tx skb cleanup timer that I kickoff in hard_start_xmit(). Every call to hard_start_xmit would reset the timer. I could try this for a future patch. Not sure if the extra code would cost me more than I get back in savings. > > I did see a significant boost to tx performance by optimizing start_xmit: > > more > > than double pps in pktgen. > > 148Kpps on a slow piece of hardware aint bad - Good Stuff. I wonder how > much CPU is being abused. > > If you wanna go one extra mile (separate future patch): get rid of that > tx lock and use netif_tx_lock on the interupt path. Look at some sane > driver like tg3 for reference. > Cool. I'll check out the tg3. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
- Message d'origine - De: Mandeep Singh Baines <[EMAIL PROTECTED]> Date: Mon, 3 Sep 2007 20:20:36 -0700 Sujet: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition >Hi Daniele, > >Attached is a patch for converting the sis900 driver to NAPI. Please take a >look at let me know what you think. I'm not 100% sure if I'm handling the >rotting packet issue correctly or that I have the locking right in tx_timeout. >These might be areas to look at closely. > >I didn't see much saving in interrupts on my machine (too fast, I guess). But >I still think its beneficial: pushing work out of the interrupt handler into >a bottom half is a good thing and we no longer need to disable interrupts >in start_xmit. > >I did see a significant boost to tx performance by optimizing start_xmit: more >than double pps in pktgen. > >I'm also attaching some test results for various iterations of development. The patch looks good and I think it can be pushed higher (-mm ?) for some wider testing. I don't have the hardware available to do some tests myself, unfortunately, but it would be similar to yours anyway. I'd like to know how this works for people with less memory and slower CPU, but any kind of test run will be appreciated. Thanks, bye. -- Daniele Venzano [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
On Mon, 2007-03-09 at 20:20 -0700, Mandeep Singh Baines wrote: > I didn't see much saving in interrupts on my machine (too fast, I guess). You could try the idea suggested by Dave earlier and just turn interupts for every nth packet. That should cut down the numbers. > I did see a significant boost to tx performance by optimizing start_xmit: more > than double pps in pktgen. 148Kpps on a slow piece of hardware aint bad - Good Stuff. I wonder how much CPU is being abused. If you wanna go one extra mile (separate future patch): get rid of that tx lock and use netif_tx_lock on the interupt path. Look at some sane driver like tg3 for reference. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
Hi Daniele, Attached is a patch for converting the sis900 driver to NAPI. Please take a look at let me know what you think. I'm not 100% sure if I'm handling the rotting packet issue correctly or that I have the locking right in tx_timeout. These might be areas to look at closely. I didn't see much saving in interrupts on my machine (too fast, I guess). But I still think its beneficial: pushing work out of the interrupt handler into a bottom half is a good thing and we no longer need to disable interrupts in start_xmit. I did see a significant boost to tx performance by optimizing start_xmit: more than double pps in pktgen. I'm also attaching some test results for various iterations of development. Regards, Mandeep Daniele Venzano ([EMAIL PROTECTED]) wrote: > - Message d'origine - > De: jamal <[EMAIL PROTECTED]> > Date: Fri, 31 Aug 2007 09:50:03 -0400 > Sujet: Re: Re: pktgen terminating condition > > >I dont know if you followed the discussion - by defering the freeing of > >skbs, you will be slowing down socket apps sending from the local > >machine. It may be ok if the socket buffers were huge, but that comes at > >the cost of system memory (which may not be a big deal) > >Do you by any chance recall why you used the idle interupt instead of > >txok to kick the prunning of tx descriptors? > > That should be asked to the original author of the driver, that I wasn't able > to contact when I took over the maintainership several years ago. > I think that since this chip is usually used on cheap/low performance > (relatively speaking) devices it was felt that generating an interrupt for > each transmitted packet was going to affect the performance of the system too > much. At the start, if I remember correctly, this was a chip thought for > consumer use, where transmissions won't happen continuously for long periods > of time. Then the sis900 started to get used everywhere, from embedded > systems to (cheap) server motherboards and the usage scenario changed. > > > -- > Daniele Venzano > [EMAIL PROTECTED] Tested using pktgen. cpuinfo: processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 8 model name : AMD Geode NX stepping: 1 cpu MHz : 1397.641 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow ts fid vid bogomips: 2796.00 clflush size: 32 meminfo: MemTotal: 907092 kB Results in pps. Sending 40 60-byte packets. Other than Iteration 5 and 6, there is 1 interrupt per packet. Iteration 0 (replace TxIDLE with TxOK): 6, 62052, 62335 Iteration 1 (optimize sis900_start_xmit): 148815, 148815, 148829 Iteration 2 (convert Rx to NAPI): 148669, 148635, 148677 Iteration 3 (remove tx_full dev state variable): 148677, 148528, 148532 Iteration 4 (optimize finish_xmit): 148677, 148676, 148689 Iteration 5 (move tx completion code into NAPI poll): 147751, 147658, 147809 Interrupts: 359270, 360747, 358010 Iteration 6 (cleanup + rotting packet handling + rx optimization): 148652, 148680, 148681 Interrupts: 399927, 399841, 399835 diff --git a/drivers/net/sis900.c b/drivers/net/sis900.c index 7c6e480..862e15a 100644 --- a/drivers/net/sis900.c +++ b/drivers/net/sis900.c @@ -185,7 +185,6 @@ struct sis900_private { dma_addr_t tx_ring_dma; dma_addr_t rx_ring_dma; - unsigned int tx_full; /* The Tx queue is full. */ u8 host_bridge_rev; u8 chipset_rev; }; @@ -202,8 +201,10 @@ MODULE_PARM_DESC(max_interrupt_work, "SiS 900/7016 maximum events handled per in MODULE_PARM_DESC(sis900_debug, "SiS 900/7016 bitmapped debugging message level"); #ifdef CONFIG_NET_POLL_CONTROLLER -static void sis900_poll(struct net_device *dev); +static void sis900_poll_controller(struct net_device *dev); #endif + +static int sis900_poll(struct net_device *dev, int *budget); static int sis900_open(struct net_device *net_dev); static int sis900_mii_probe (struct net_device * net_dev); static void sis900_init_rxfilter (struct net_device * net_dev); @@ -216,8 +217,8 @@ static void sis900_tx_timeout(struct net_device *net_dev); static void sis900_init_tx_ring(struct net_device *net_dev); static void sis900_init_rx_ring(struct net_device *net_dev); static int sis900_start_xmit(struct sk_buff *skb, struct net_device *net_dev); -static int sis900_rx(struct net_device *net_dev); -static void sis900_finish_xmit (struct net_device *net_dev); +static int sis900_rx(struct net_device *net_dev, int limit); +static int sis900_finish_xmit (struct net_device *net_dev); static
Re: pktgen terminating condition
- Message d'origine - De: jamal <[EMAIL PROTECTED]> Date: Fri, 31 Aug 2007 09:50:03 -0400 Sujet: Re: Re: pktgen terminating condition >I dont know if you followed the discussion - by defering the freeing of >skbs, you will be slowing down socket apps sending from the local >machine. It may be ok if the socket buffers were huge, but that comes at >the cost of system memory (which may not be a big deal) >Do you by any chance recall why you used the idle interupt instead of >txok to kick the prunning of tx descriptors? That should be asked to the original author of the driver, that I wasn't able to contact when I took over the maintainership several years ago. I think that since this chip is usually used on cheap/low performance (relatively speaking) devices it was felt that generating an interrupt for each transmitted packet was going to affect the performance of the system too much. At the start, if I remember correctly, this was a chip thought for consumer use, where transmissions won't happen continuously for long periods of time. Then the sis900 started to get used everywhere, from embedded systems to (cheap) server motherboards and the usage scenario changed. -- Daniele Venzano [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: pktgen terminating condition
On Fri, 2007-31-08 at 14:17 +0200, Daniele Venzano wrote: > I don't regard the TxOK solution as something usable for mainline, but it has > its > use for the users of pktgen. I dont know if you followed the discussion - by defering the freeing of skbs, you will be slowing down socket apps sending from the local machine. It may be ok if the socket buffers were huge, but that comes at the cost of system memory (which may not be a big deal) Do you by any chance recall why you used the idle interupt instead of txok to kick the prunning of tx descriptors? cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
On Thu, 2007-30-08 at 22:19 -0700, David Miller wrote: > You could implement this quite simply using skb->destructor. Thats what i was thinking .. > It will add some atomics, so on weaker pktgen source systems > it might decrease the generators rate. Indeed. So maybe a config option instead; it has value afaics in the batching case only. Need to experiment. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: pktgen terminating condition
- Message d'origine - De: "Mandeep Baines" <[EMAIL PROTECTED]> Date: Wed, 29 Aug 2007 09:59:42 -0700 Sujet: Re: pktgen terminating condition >> Looks good to me given the desire. I would bounce it by whoever the >> maintainer is - they may have some insights on the lazy tx prune habit. >> > >+ [EMAIL PROTECTED] >+ [EMAIL PROTECTED] > >I'll work on a NAPI patch. Here I'm (the maintainer). Sorry for the delay replying. I don't regard the TxOK solution as something usable for mainline, but it has its use for the users of pktgen. NAPI is the way to go. Thanks, bye. -- Daniele Venzano [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
From: jamal <[EMAIL PROTECTED]> Date: Thu, 30 Aug 2007 08:08:40 -0400 > I was confusing it with another issue where pktgen could send a lot of > packets without waiting for them to be freed; there are some drivers > (10G) which may hold onto 8K skbs. A gazillion ooms start spewing ;-> My > thinking in resolving that was to do something like waht sockets do and > charge pktgen so it doesnt have too many outstanding packets in flight. You could implement this quite simply using skb->destructor. It will add some atomics, so on weaker pktgen source systems it might decrease the generators rate. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
On Wed, 2007-29-08 at 18:32 +0200, Robert Olsson wrote: > Yes it's synchronization issue... the test is over and we have sent > all pkts to the device but pktgen cannot free the skb for it still > has refcounts. Ok, right. I was confusing it with another issue where pktgen could send a lot of packets without waiting for them to be freed; there are some drivers (10G) which may hold onto 8K skbs. A gazillion ooms start spewing ;-> My thinking in resolving that was to do something like waht sockets do and charge pktgen so it doesnt have too many outstanding packets in flight. > IMO must drivers have provisions to handle situation like. Mandeep was saying he found less than a handful that didnt conform. > I'll > guess we can discuss last-resort timer if it should be us, ms > or possibly seconds but shouldn't need ping to make this happen. > If so we probably have a ICMP packet sitting there waiting instead. I think as long as it doesnt affect throughput calculation (it just adds to idle time) its fine. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
On Wed, 2007-29-08 at 09:59 -0700, Mandeep Baines wrote: > I'll work on a NAPI patch. It's a GoodThing - go for it. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
On 8/29/07, jamal <[EMAIL PROTECTED]> wrote: > On Tue, 2007-28-08 at 21:43 -0700, Mandeep Singh Baines wrote: > > > I interpret this to mean that the interrupt gets generates after a packet > > is transferred to the TFIFO on the NIC and the next packet in the ring is > > NULL. > > iow, when tx transits to idle .. > > > This interrupt gets generates less often then TxOK which gets generated > > after every completed packet transmit. So I'm thinking that maybe the > > driver was written to use this interrupt instead of TxOK for this reason. > > Really just my speculation. > > It does make sense if you are trying to cut down on tx interupts. It's a > little extreme - and if you dont have much memory it may not be the best > scheme. OTOH, you have moved to the other extreme imo ;-> You are > generating an interupt per tx packet. This may not matter on modern > hardware because that NIC seems to be Fast Ethernet. > You may wanna heed Dave's advice and just always set the idle on a > per-descriptor basis as well as txcomplete on every Xth packet (4 was > suggested). My hardware at home is not so modern:( I'm running a AMD Geode NX1750. I like to punish myself:) Actually, it does the job and is very power efficient. Agreed, interrupting every packet is inefficient. A better solution is probably NAPI. I'll work on a new patch. > Looking at sis900_start_xmit() at the spot where OWN is set > on the descriptor, theres possibly another bit in tx_ring[entry].cmdsts > you may could set that will ask for tx complete; which makes me wonder: > is that "outl(TxENA | inl(ioaddr + cr), ioaddr + cr)" really needed on a > per-packet basis? Should it not be sufficient to turn it on once at > open()? > You have to set this bit to restart the Tx State Machine. You could optimize this if you use the TxIdle interrupt to tell you when you really need to set this bit. > > Anyway, if I rewrite the driver to use TxOK instead of TxIdle, pktgen > > works fine. > > I think its a good thing pktgen caught this; i am unsure however if it > is doing the right thing. Hoping Robert would respond. > One thing pktgen could do is restrict the amount of outstanding buffers > by using accounting the way sockets do - this at least wont stop > progress as it did in your case. > > > I'm attaching the patch. > > Looks good to me given the desire. I would bounce it by whoever the > maintainer is - they may have some insights on the lazy tx prune habit. > + [EMAIL PROTECTED] + [EMAIL PROTECTED] I'll work on a NAPI patch. Regards, Mandeep - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
jamal writes: > On Tue, 2007-28-08 at 21:43 -0700, Mandeep Singh Baines wrote: > I think its a good thing pktgen caught this; i am unsure however if it > is doing the right thing. Hoping Robert would respond. > One thing pktgen could do is restrict the amount of outstanding buffers > by using accounting the way sockets do - this at least wont stop > progress as it did in your case. Hello, Yes it's synchronization issue... the test is over and we have sent all pkts to the device but pktgen cannot free the skb for it still has refcounts. IMO must drivers have provisions to handle situation like. I'll guess we can discuss last-resort timer if it should be us, ms or possibly seconds but shouldn't need ping to make this happen. If so we probably have a ICMP packet sitting there waiting instead. Cheers --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
On 8/29/07, Mandeep Baines <[EMAIL PROTECTED]> wrote: ... > Unfortunately, the TxIdle interrupt would not free the last odd packet. > However, if we want to quiet interrupts couldn't the driver just be > converted NAPI. If this seems reasonable I could work on a patch. It seems reasonable and that's what I was thinking when you submitted the patch to use TXOk. I think NAPI could be used with either TxOK or TxIdle interrupts. For TxIdle to work, the driver would have to recognize one more packet is still in flight and poll for completion of that packet - then re-enable interrupts and switch back to interrupt mode of operation. I just don't know if netperf TCP_RR perf numbers will suffer (assuming TCP_RR is more important than CPU utilization/efficiency on heavy TX loads like pktgen.) grant - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
On Tue, 2007-28-08 at 21:43 -0700, Mandeep Singh Baines wrote: > I interpret this to mean that the interrupt gets generates after a packet > is transferred to the TFIFO on the NIC and the next packet in the ring is > NULL. iow, when tx transits to idle .. > This interrupt gets generates less often then TxOK which gets generated > after every completed packet transmit. So I'm thinking that maybe the > driver was written to use this interrupt instead of TxOK for this reason. > Really just my speculation. It does make sense if you are trying to cut down on tx interupts. It's a little extreme - and if you dont have much memory it may not be the best scheme. OTOH, you have moved to the other extreme imo ;-> You are generating an interupt per tx packet. This may not matter on modern hardware because that NIC seems to be Fast Ethernet. You may wanna heed Dave's advice and just always set the idle on a per-descriptor basis as well as txcomplete on every Xth packet (4 was suggested). Looking at sis900_start_xmit() at the spot where OWN is set on the descriptor, theres possibly another bit in tx_ring[entry].cmdsts you may could set that will ask for tx complete; which makes me wonder: is that "outl(TxENA | inl(ioaddr + cr), ioaddr + cr)" really needed on a per-packet basis? Should it not be sufficient to turn it on once at open()? > Anyway, if I rewrite the driver to use TxOK instead of TxIdle, pktgen > works fine. I think its a good thing pktgen caught this; i am unsure however if it is doing the right thing. Hoping Robert would respond. One thing pktgen could do is restrict the amount of outstanding buffers by using accounting the way sockets do - this at least wont stop progress as it did in your case. > I'm attaching the patch. Looks good to me given the desire. I would bounce it by whoever the maintainer is - they may have some insights on the lazy tx prune habit. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
From: Mandeep Singh Baines <[EMAIL PROTECTED]> Date: Tue, 28 Aug 2007 21:43:52 -0700 > Here is what the datasheet has to say about TxIdle: > > "This event is signaled when the transmit state machine enters the idle state > from a non-idle state. This will happen whenever the state machine encounters > an "end-of-list" condition (NULL link field or a descriptor with OWN clear)." > > I interpret this to mean that the interrupt gets generates after a packet > is transferred to the TFIFO on the NIC and the next packet in the ring is > NULL. > > This interrupt gets generates less often then TxOK which gets generated > after every completed packet transmit. So I'm thinking that maybe the > driver was written to use this interrupt instead of TxOK for this reason. > Really just my speculation. I see, so essentially it doesn't interrupt until the entire TX ring is empty and has been sent onto the wire. Yes, this would be exactly sub-optimal for pktgen or in fact any application :-) It seems that the INTbit in the TX descriptor status of the SIS190 can be an interrupt trigger. In that case, a reasonable and quite common scheme would be to set that bit every 1/4 of the TX ring. And also enable the TX idle interrupt. So if the TX ring size is 8 entries and you received a set of TX sends you'd set the interrupt status bits like this: TX descr interrupt bit packet 0: none packet 1: INTbit packet 2: none packet 3: INTbit packet 4: none packet 5: INTbit packet 6: none packet 7: INTbit And you'd get 4 TX descriptor based interrupts, one for each INTbit and probably free up 2 TX packets each time (or more if there is some overlap). The idle interrupt bit take care of the case where you have an odd number of packets (say removing packet 7 in the trace above) to make sure those sub-1/4 group of TX frames get freed up and processed in a deterministic amount of time. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
On Tue, 2007-28-08 at 20:28 -0400, jamal wrote: > On Tue, 2007-28-08 at 17:06 -0700, David Miller wrote: > > > Devices need to free packets in a deterministic amount of time, no > > matter what. > > If i understood the driver pointed to - "finite amount of time" spec is > still met. Seems some interupt is generated (TX_IDLE) when the tx path > gets idle and this will kick the tx cleanup. Mandeep, correct me if i am > wrong. If my statement is correct, i wouldnt call this a driver bug;-> > > Robert, I think i asked you this same question on that piece of code > once - and iirc you said its because you had to synchronise and make > sure the packet made it to the wire. In this scenario, the packet has > made it to the wire - just hasnt been freed. Note that when teaching > pktgen to do batching, i removed that logic and things work fine. Hi Jamal, Here is what the datasheet has to say about TxIdle: "This event is signaled when the transmit state machine enters the idle state from a non-idle state. This will happen whenever the state machine encounters an "end-of-list" condition (NULL link field or a descriptor with OWN clear)." I interpret this to mean that the interrupt gets generates after a packet is transferred to the TFIFO on the NIC and the next packet in the ring is NULL. This interrupt gets generates less often then TxOK which gets generated after every completed packet transmit. So I'm thinking that maybe the driver was written to use this interrupt instead of TxOK for this reason. Really just my speculation. Anyway, if I rewrite the driver to use TxOK instead of TxIdle, pktgen works fine. I'm attaching the patch. Signed-off-by: Mandeep Singh Baines <[EMAIL PROTECTED]> --- diff --git a/drivers/net/sis900.c b/drivers/net/sis900.c index 7c6e480..69e5a57 100644 --- a/drivers/net/sis900.c +++ b/drivers/net/sis900.c @@ -1032,7 +1032,7 @@ sis900_open(struct net_device *net_dev) sis900_set_mode(ioaddr, HW_SPEED_10_MBPS, FDX_CAPABLE_HALF_SELECTED); /* Enable all known interrupts by setting the interrupt mask. */ - outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxIDLE), ioaddr + imr); + outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxOK), ioaddr + imr); outl(RxENA | inl(ioaddr + cr), ioaddr + cr); outl(IE, ioaddr + ier); @@ -1557,7 +1557,7 @@ static void sis900_tx_timeout(struct net_device *net_dev) outl(sis_priv->tx_ring_dma, ioaddr + txdp); /* Enable all known interrupts by setting the interrupt mask. */ - outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxIDLE), ioaddr + imr); + outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxOK), ioaddr + imr); return; } @@ -1655,7 +1655,7 @@ static irqreturn_t sis900_interrupt(int irq, void *dev_instance) do { status = inl(ioaddr + isr); - if ((status & (HIBERR|TxURN|TxERR|TxIDLE|RxORN|RxERR|RxOK)) == 0) + if ((status & (HIBERR|TxURN|TxERR|TxOK|RxORN|RxERR|RxOK)) == 0) /* nothing intresting happened */ break; handled = 1; @@ -1665,7 +1665,7 @@ static irqreturn_t sis900_interrupt(int irq, void *dev_instance) /* Rx interrupt */ sis900_rx(net_dev); - if (status & (TxURN | TxERR | TxIDLE)) + if (status & (TxURN | TxERR | TxOK)) /* Tx interrupt */ sis900_finish_xmit(net_dev); @@ -2462,7 +2462,7 @@ static int sis900_resume(struct pci_dev *pci_dev) sis900_set_mode(ioaddr, HW_SPEED_10_MBPS, FDX_CAPABLE_HALF_SELECTED); /* Enable all known interrupts by setting the interrupt mask. */ - outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxIDLE), ioaddr + imr); + outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxOK), ioaddr + imr); outl(RxENA | inl(ioaddr + cr), ioaddr + cr); outl(IE, ioaddr + ier);
Re: pktgen terminating condition
On Tue, 2007-28-08 at 17:06 -0700, David Miller wrote: > Devices need to free packets in a deterministic amount of time, no > matter what. If i understood the driver pointed to - "finite amount of time" spec is still met. Seems some interupt is generated (TX_IDLE) when the tx path gets idle and this will kick the tx cleanup. Mandeep, correct me if i am wrong. If my statement is correct, i wouldnt call this a driver bug;-> Robert, I think i asked you this same question on that piece of code once - and iirc you said its because you had to synchronise and make sure the packet made it to the wire. In this scenario, the packet has made it to the wire - just hasnt been freed. Note that when teaching pktgen to do batching, i removed that logic and things work fine. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
I don't even understand why this needs to be discussed to be honest with you :-) Just my (possibly frustrating :) habit of asking questions which help further my understanding of the code :) rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
From: Rick Jones <[EMAIL PROTECTED]> Date: Tue, 28 Aug 2007 16:48:24 -0700 > Not to defend any one driver, but could it be that the behaviour of TCP > is such that we've not noticed until now? > > Does TCP particularly care for an SKB carrying an ACK? For ones carrying > data there would be an ACK arriving before TCP could "really" free them > right? And that ACK (rx event) could very well come after some other TX > completion, or perhaps trigger cleaning up TXs in the driver at that time? > > And a UDP app is as likely as not to be request/response in nature, with the > response behaving at the driver level rather like a TCP ACK in terms of the > above. The issue is not ACKs or anything of that nature, but socket buffer accounting. We can't release the socket until all transmit packets have been liberated and their size subtracted from the send buffer quota. Because these packets have their skb->destructor and skb->sk set to reference that socket and the protocol code that manages it's send buffer allocations. In the worst possible case, using UDP as an example, let's say one big packets fits in the send buffer and this is the one last packet that gets into the devices TX ring. It never liberates the packet, and if that is the only application generating traffic in the system we deadlock. Devices need to free packets in a deterministic amount of time, no matter what. I don't even understand why this needs to be discussed to be honest with you :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
David Miller wrote: From: Mandeep Singh Baines <[EMAIL PROTECTED]> Date: Tue, 28 Aug 2007 15:47:18 -0700 It seems that some drivers do not immediately free skbs on transmit complete. They will hold the skb until there are more packets to free. One example is the sis900 driver which uses TX_IDLE (tx state-machine idle) instead of TX_OK (tx completed) as a way of coalescing interrupts. I've also seen another driver which does something similar. Not sure how prevalent this technique is. So is this a bug in the drivers or a bug in pktgen? This is a bug since it can stall TCP sockets, and even non-TCP sockets will not be freeable until these SKBs are released by the device. Not to defend any one driver, but could it be that the behaviour of TCP is such that we've not noticed until now? Does TCP particularly care for an SKB carrying an ACK? For ones carrying data there would be an ACK arriving before TCP could "really" free them right? And that ACK (rx event) could very well come after some other TX completion, or perhaps trigger cleaning up TXs in the driver at that time? And a UDP app is as likely as not to be request/response in nature, with the response behaving at the driver level rather like a TCP ACK in terms of the above. Both of those are things I don't think we typically see happen in pktgen right? All transmit packets must be released in a finite amount of time by the driver without requiring more traffic to induce this release. Yes, but for what definition of finite? rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen terminating condition
From: Mandeep Singh Baines <[EMAIL PROTECTED]> Date: Tue, 28 Aug 2007 15:47:18 -0700 > It seems that some drivers do not immediately free skbs on transmit > complete. They will hold the skb until there are more packets to > free. One example is the sis900 driver which uses TX_IDLE (tx > state-machine idle) instead of TX_OK (tx completed) as a way of > coalescing interrupts. I've also seen another driver which does > something similar. Not sure how prevalent this technique is. > > So is this a bug in the drivers or a bug in pktgen? This is a bug since it can stall TCP sockets, and even non-TCP sockets will not be freeable until these SKBs are released by the device. All transmit packets must be released in a finite amount of time by the driver without requiring more traffic to induce this release. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen Multiqueue support [PATCH]
From: Robert Olsson <[EMAIL PROTECTED]> Date: Mon, 27 Aug 2007 16:55:19 +0200 > Below some pktgen support to send into different TX queues. > This can of course be feed into input queues on other machines > > Signed-off-by: Robert Olsson <[EMAIL PROTECTED]> Thanks for implementing this Robert, patch applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen multiqueue oops
From: Robert Olsson <[EMAIL PROTECTED]> Date: Mon, 27 Aug 2007 11:16:19 +0200 > Initially pkt_dev can be NULL this causes netif_subqueue_stopped to > oops. The patch below should cure it. But maybe the pktgen TX logic > should be reworked to better support the new multiqueue support. > > Signed-off-by: Robert Olsson <[EMAIL PROTECTED]> Patch applied, thanks Robert. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
From: Robert Olsson <[EMAIL PROTECTED]> Date: Fri, 1 Dec 2006 09:14:01 +0100 > > David Miller writes: > > > Agreed. > > > > Robert, please fix this by using a completion so that we can > > wait for the threads to start up, something like this: > > Included. It passes my test but Alexey and others test. Ok, I'm going to put Robert's version of the fix into 2.6.19 and previous -stable branches. But for 2.6.20 I'd like to do the following, based upon Christoph's suggestion to convert to kthread. Can folks please give this a spin? I've tested that the module loads and unloads properly, the threads startup and shutdown, but that's it. Thanks! commit b3010a665cc33596fbf4be3fc6c3c5c80aeefb65 Author: David S. Miller <[EMAIL PROTECTED]> Date: Mon Jan 1 20:51:53 2007 -0800 [PKTGEN]: Convert to kthread API. Based upon a suggestion from Christoph Hellwig. This fixes various races in module load/unload handling too. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 1897a3a..04d4b93 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -148,6 +148,7 @@ #include #include #include +#include #include #include #include @@ -360,8 +361,7 @@ struct pktgen_thread { spinlock_t if_lock; struct list_head if_list; /* All device here */ struct list_head th_list; - int removed; - char name[32]; + struct task_struct *tsk; char result[512]; u32 max_before_softirq; /* We'll call do_softirq to prevent starvation. */ @@ -1689,7 +1689,7 @@ static int pktgen_thread_show(struct seq_file *seq, void *v) BUG_ON(!t); seq_printf(seq, "Name: %s max_before_softirq: %d\n", - t->name, t->max_before_softirq); + t->tsk->comm, t->max_before_softirq); seq_printf(seq, "Running: "); @@ -3112,7 +3112,7 @@ static void pktgen_rem_thread(struct pktgen_thread *t) { /* Remove from the thread list */ - remove_proc_entry(t->name, pg_proc_dir); + remove_proc_entry(t->tsk->comm, pg_proc_dir); mutex_lock(&pktgen_thread_lock); @@ -3260,58 +3260,40 @@ out:; * Main loop of the thread goes here */ -static void pktgen_thread_worker(struct pktgen_thread *t) +static int pktgen_thread_worker(void *arg) { DEFINE_WAIT(wait); + struct pktgen_thread *t = arg; struct pktgen_dev *pkt_dev = NULL; int cpu = t->cpu; - sigset_t tmpsig; u32 max_before_softirq; u32 tx_since_softirq = 0; - daemonize("pktgen/%d", cpu); - - /* Block all signals except SIGKILL, SIGSTOP and SIGTERM */ - - spin_lock_irq(¤t->sighand->siglock); - tmpsig = current->blocked; - siginitsetinv(¤t->blocked, - sigmask(SIGKILL) | sigmask(SIGSTOP) | sigmask(SIGTERM)); - - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); - - /* Migrate to the right CPU */ - set_cpus_allowed(current, cpumask_of_cpu(cpu)); - if (smp_processor_id() != cpu) - BUG(); + BUG_ON(smp_processor_id() != cpu); init_waitqueue_head(&t->queue); - t->control &= ~(T_TERMINATE); - t->control &= ~(T_RUN); - t->control &= ~(T_STOP); - t->control &= ~(T_REMDEVALL); - t->control &= ~(T_REMDEV); - t->pid = current->pid; PG_DEBUG(printk("pktgen: starting pktgen/%d: pid=%d\n", cpu, current->pid)); max_before_softirq = t->max_before_softirq; - __set_current_state(TASK_INTERRUPTIBLE); - mb(); + set_current_state(TASK_INTERRUPTIBLE); - while (1) { - - __set_current_state(TASK_RUNNING); + while (!kthread_should_stop()) { + pkt_dev = next_to_run(t); - /* -* Get next dev to xmit -- if any. -*/ + if (!pkt_dev && + (t->control & (T_STOP | T_RUN | T_REMDEVALL | T_REMDEV)) + == 0) { + prepare_to_wait(&(t->queue), &wait, + TASK_INTERRUPTIBLE); + schedule_timeout(HZ / 10); + finish_wait(&(t->queue), &wait); + } - pkt_dev = next_to_run(t); + __set_current_state(TASK_RUNNING); if (pkt_dev) { @@ -3329,21 +3311,8 @@ static void pktgen_thread_worker(struct pktgen_thread *t) do_softirq(); tx_since_softirq = 0; } - } else { - prepare_to_wait(&(t->queue), &wait, TASK_INTERRUPTIBLE); - schedule_timeout(HZ / 10); - finish_wait(&(t->queue), &wait); } - /* -* Back from sleep, either due to the timeout or signal.
Re: pktgen
From: "Alexey Dobriyan" <[EMAIL PROTECTED]> Date: Fri, 1 Dec 2006 12:51:53 +0300 > On 12/1/06, Robert Olsson <[EMAIL PROTECTED]> wrote: > > David Miller writes: > > > Agreed. > > > > > > Robert, please fix this by using a completion so that we can > > > wait for the threads to start up, something like this: > > > > Included. It passes my test but Alexey and others test. > > Confused now. Is my "t->control &= ~(T_TERMINATE);" fix deprecated by > completions? The completions solve the bug completely. But later on we should integate a change that eliminates these spurious t->control bit clears at the start of the pktgen thread execution, it just isn't needed to fix the bug so we can make it later. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
From: Christoph Hellwig <[EMAIL PROTECTED]> Date: Fri, 1 Dec 2006 08:22:25 + > On Thu, Nov 30, 2006 at 08:14:23PM -0800, David Miller wrote: > > Agreed. > > > > Robert, please fix this by using a completion so that we can > > wait for the threads to start up, something like this: > > No, that's wrong aswell :) Please use the kthread_ API that takes > care of all this. kernel_thread is going away mid-term, so you'd > have to do this work anyway. I was going to suggest this Christophe, but I wanted a change small and easy enough to verify in order to merge the fix into the -stable branch. Converting the thing over to kthread would make the change more risky in such a context. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
Alexey Dobriyan writes: > > Confused now. Is my "t->control &= ~(T_TERMINATE);" fix deprecated by > completions? It's not needed with completion patch as this does the job a bit more mainstream. The T_TERMINATE seems to work well I've tested on machine with CPU:s. Only once I noticed that only 2 of 4 threads got started but it could be something else... And the race is hardly seen with any real use of pktgen.. .Let's hear what others... and completions was out-of-date. Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
On 12/1/06, Robert Olsson <[EMAIL PROTECTED]> wrote: David Miller writes: > Agreed. > > Robert, please fix this by using a completion so that we can > wait for the threads to start up, something like this: Included. It passes my test but Alexey and others test. Confused now. Is my "t->control &= ~(T_TERMINATE);" fix deprecated by completions? --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -147,6 +147,7 @@ #include #include #include +#include #include #include #include @@ -160,7 +161,7 @@ #include/* do_div */ #include -#define VERSION "pktgen v2.68: Packet Generator for packet performance testing.\n" +#define VERSION "pktgen v2.69: Packet Generator for packet performance testing.\n" /* #define PG_DEBUG(a) a */ #define PG_DEBUG(a) @@ -206,6 +207,11 @@ static struct proc_dir_entry *pg_proc_di #define VLAN_TAG_SIZE(x) ((x)->vlan_id == 0x ? 0 : 4) #define SVLAN_TAG_SIZE(x) ((x)->svlan_id == 0x ? 0 : 4) +struct pktgen_thread_info { + struct pktgen_thread *t; + struct completion *c; +}; + struct flow_state { __u32 cur_daddr; int count; @@ -3264,10 +3270,11 @@ out:; * Main loop of the thread goes here */ -static void pktgen_thread_worker(struct pktgen_thread *t) +static void pktgen_thread_worker(struct pktgen_thread_info *info) { DEFINE_WAIT(wait); struct pktgen_dev *pkt_dev = NULL; + struct pktgen_thread *t = info->t; int cpu = t->cpu; sigset_t tmpsig; u32 max_before_softirq; @@ -3307,6 +3314,8 @@ static void pktgen_thread_worker(struct __set_current_state(TASK_INTERRUPTIBLE); mb(); +complete(info->c); + while (1) { __set_current_state(TASK_RUNNING); @@ -3518,6 +3527,8 @@ static struct pktgen_thread *__init pktg static int __init pktgen_create_thread(const char *name, int cpu) { int err; + struct pktgen_thread_info info; +struct completion started; struct pktgen_thread *t = NULL; struct proc_dir_entry *pe; @@ -3558,7 +3569,11 @@ static int __init pktgen_create_thread(c t->removed = 0; - err = kernel_thread((void *)pktgen_thread_worker, (void *)t, + init_completion(&started); +info.t = t; +info.c = &started; + + err = kernel_thread((void *)pktgen_thread_worker, (void *)&info, CLONE_FS | CLONE_FILES | CLONE_SIGHAND); if (err < 0) { printk("pktgen: kernel_thread() failed for cpu %d\n", t->cpu); @@ -3568,6 +3583,7 @@ static int __init pktgen_create_thread(c return err; } + wait_for_completion(&started); return 0; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
On Thu, Nov 30, 2006 at 08:14:23PM -0800, David Miller wrote: > Agreed. > > Robert, please fix this by using a completion so that we can > wait for the threads to start up, something like this: No, that's wrong aswell :) Please use the kthread_ API that takes care of all this. kernel_thread is going away mid-term, so you'd have to do this work anyway. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
David Miller writes: > Agreed. > > Robert, please fix this by using a completion so that we can > wait for the threads to start up, something like this: Included. It passes my test but Alexey and others test. Cheers. --ro diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 733d86d..a630a73 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -147,6 +147,7 @@ #include #include #include +#include #include #include #include @@ -160,7 +161,7 @@ #include /* do_div */ #include -#define VERSION "pktgen v2.68: Packet Generator for packet performance testing.\n" +#define VERSION "pktgen v2.69: Packet Generator for packet performance testing.\n" /* #define PG_DEBUG(a) a */ #define PG_DEBUG(a) @@ -206,6 +207,11 @@ static struct proc_dir_entry *pg_proc_di #define VLAN_TAG_SIZE(x) ((x)->vlan_id == 0x ? 0 : 4) #define SVLAN_TAG_SIZE(x) ((x)->svlan_id == 0x ? 0 : 4) +struct pktgen_thread_info { + struct pktgen_thread *t; + struct completion *c; +}; + struct flow_state { __u32 cur_daddr; int count; @@ -3264,10 +3270,11 @@ out:; * Main loop of the thread goes here */ -static void pktgen_thread_worker(struct pktgen_thread *t) +static void pktgen_thread_worker(struct pktgen_thread_info *info) { DEFINE_WAIT(wait); struct pktgen_dev *pkt_dev = NULL; + struct pktgen_thread *t = info->t; int cpu = t->cpu; sigset_t tmpsig; u32 max_before_softirq; @@ -3307,6 +3314,8 @@ static void pktgen_thread_worker(struct __set_current_state(TASK_INTERRUPTIBLE); mb(); +complete(info->c); + while (1) { __set_current_state(TASK_RUNNING); @@ -3518,6 +3527,8 @@ static struct pktgen_thread *__init pktg static int __init pktgen_create_thread(const char *name, int cpu) { int err; + struct pktgen_thread_info info; +struct completion started; struct pktgen_thread *t = NULL; struct proc_dir_entry *pe; @@ -3558,7 +3569,11 @@ static int __init pktgen_create_thread(c t->removed = 0; - err = kernel_thread((void *)pktgen_thread_worker, (void *)t, + init_completion(&started); +info.t = t; +info.c = &started; + + err = kernel_thread((void *)pktgen_thread_worker, (void *)&info, CLONE_FS | CLONE_FILES | CLONE_SIGHAND); if (err < 0) { printk("pktgen: kernel_thread() failed for cpu %d\n", t->cpu); @@ -3568,6 +3583,7 @@ static int __init pktgen_create_thread(c return err; } + wait_for_completion(&started); return 0; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
From: Ben Greear <[EMAIL PROTECTED]> Date: Thu, 30 Nov 2006 09:33:43 -0800 > Robert Olsson wrote: > > @@ -3673,6 +3673,8 @@ static void __exit pg_cleanup(void) > > struct list_head *q, *n; > > wait_queue_head_t queue; > > init_waitqueue_head(&queue); > > + > > + schedule_timeout_interruptible(msecs_to_jiffies(125)); > > > > /* Stop all interfaces & threads */ > > > > That strikes me as a hack..surely there is a better method than just adding > a sleep?? Agreed. Robert, please fix this by using a completion so that we can wait for the threads to start up, something like this: ... #include ... struct pktgen_thread_info { struct pktgen_thread *t; struct completion *c; }; static void pktgen_thread_worker(struct pktgen_thread_info *info) { struct pktgen_thread *t = info->t; ... __set_current_state(TASK_INTERRUPTIBLE); mb(); complete(info->c); ... } static int __init pktgen_create_thread(const char *name, int cpu) { ... struct pktgen_thread_info info; struct complation started; ... init_completion(&started); info.t = t; info.c = &started; err = kernel_thread((void *)pktgen_thread_worker, (void *)t, CLONE_FS | CLONE_FILES | CLONE_SIGHAND); if (err < 0) { printk("pktgen: kernel_thread() failed for cpu %d\n", t->cpu); remove_proc_entry(t->name, pg_proc_dir); list_del(&t->th_list); kfree(t); return err; } wait_for_completion(&started); return 0; } We can horse around with fixing the initial t->control bit flipping in pktgen_thread_worker() in a future changeset if we want. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
Robert Olsson wrote: Hello! Seems you found a race when rmmod is done before it's fully started Try: diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 733d86d..ac0b4b1 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -160,7 +160,7 @@ #include/* do_div */ #include -#define VERSION "pktgen v2.68: Packet Generator for packet performance testing.\n" +#define VERSION "pktgen v2.69: Packet Generator for packet performance testing.\n" /* #define PG_DEBUG(a) a */ #define PG_DEBUG(a) @@ -3673,6 +3673,8 @@ static void __exit pg_cleanup(void) struct list_head *q, *n; wait_queue_head_t queue; init_waitqueue_head(&queue); + + schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Stop all interfaces & threads */ That strikes me as a hack..surely there is a better method than just adding a sleep?? Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
Hello! Seems you found a race when rmmod is done before it's fully started Try: diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 733d86d..ac0b4b1 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -160,7 +160,7 @@ #include /* do_div */ #include -#define VERSION "pktgen v2.68: Packet Generator for packet performance testing.\n" +#define VERSION "pktgen v2.69: Packet Generator for packet performance testing.\n" /* #define PG_DEBUG(a) a */ #define PG_DEBUG(a) @@ -3673,6 +3673,8 @@ static void __exit pg_cleanup(void) struct list_head *q, *n; wait_queue_head_t queue; init_waitqueue_head(&queue); + + schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Stop all interfaces & threads */ for i in 1 2 3 4 5 ; do modprobe pktgen ; rmmod pktgen ; done In dmesg pktgen v2.69: Packet Generator for packet performance testing. pktgen v2.69: Packet Generator for packet performance testing. pktgen v2.69: Packet Generator for packet performance testing. pktgen v2.69: Packet Generator for packet performance testing. pktgen v2.69: Packet Generator for packet performance testing. Cheers. --ro Alexey Dobriyan writes: > On 11/30/06, David Miller <[EMAIL PROTECTED]> wrote: > > From: Alexey Dobriyan <[EMAIL PROTECTED]> > > Date: Wed, 29 Nov 2006 23:04:37 +0300 > > > > > Looks like worker thread strategically clears it if scheduled at wrong > > > moment. > > > > > > --- a/net/core/pktgen.c > > > +++ b/net/core/pktgen.c > > > @@ -3292,7 +3292,6 @@ static void pktgen_thread_worker(struct > > > > > > init_waitqueue_head(&t->queue); > > > > > > -t->control &= ~(T_TERMINATE); > > > t->control &= ~(T_RUN); > > > t->control &= ~(T_STOP); > > > t->control &= ~(T_REMDEVALL); > > > > Good catch Alexey. Did you rerun the load/unload test with > > this fix applied? If it fixes things, I'll merge it. > > Well, yes, it fixes things, except Ctrl+C getting you out of > modprobe/rmmod loop will spit > backtrace again. And other flags: T_RUN, T_STOP. Clearance is not > needed due to kZalloc and > create bugs as demostrated. > > Give me some time. > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
On 11/30/06, David Miller <[EMAIL PROTECTED]> wrote: From: Alexey Dobriyan <[EMAIL PROTECTED]> Date: Wed, 29 Nov 2006 23:04:37 +0300 > Looks like worker thread strategically clears it if scheduled at wrong > moment. > > --- a/net/core/pktgen.c > +++ b/net/core/pktgen.c > @@ -3292,7 +3292,6 @@ static void pktgen_thread_worker(struct > >init_waitqueue_head(&t->queue); > > - t->control &= ~(T_TERMINATE); >t->control &= ~(T_RUN); >t->control &= ~(T_STOP); >t->control &= ~(T_REMDEVALL); Good catch Alexey. Did you rerun the load/unload test with this fix applied? If it fixes things, I'll merge it. Well, yes, it fixes things, except Ctrl+C getting you out of modprobe/rmmod loop will spit backtrace again. And other flags: T_RUN, T_STOP. Clearance is not needed due to kZalloc and create bugs as demostrated. Give me some time. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
From: Alexey Dobriyan <[EMAIL PROTECTED]> Date: Wed, 29 Nov 2006 23:04:37 +0300 > Looks like worker thread strategically clears it if scheduled at wrong > moment. > > --- a/net/core/pktgen.c > +++ b/net/core/pktgen.c > @@ -3292,7 +3292,6 @@ static void pktgen_thread_worker(struct > > init_waitqueue_head(&t->queue); > > - t->control &= ~(T_TERMINATE); > t->control &= ~(T_RUN); > t->control &= ~(T_STOP); > t->control &= ~(T_REMDEVALL); Good catch Alexey. Did you rerun the load/unload test with this fix applied? If it fixes things, I'll merge it. Thanks! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
On Tue, Nov 28, 2006 at 03:33:25PM -0800, David Miller wrote: > From: Alexey Dobriyan <[EMAIL PROTECTED]> > Date: Wed, 22 Nov 2006 00:22:51 +0300 > > > [CCing netdev, bug in pktgen] > > > > [build modular pktgen] > > while true; do modprobe pktgen && rmmod pktgen; done > > > > BUG: warning at fs/proc/generic.c:732/remove_proc_entry() > > [] remove_proc_entry+0x161/0x1ca > > [] pg_cleanup+0xd5/0xdc [pktgen] > > [] autoremove_wake_function+0x0/0x35 > > [] sys_delete_module+0x162/0x189 > > [] remove_vma+0x31/0x36 > > [] do_munmap+0x193/0x1ac > > [] sysenter_past_esp+0x56/0x79 > > [] fn_hash_delete+0x4f/0x1c7 > > > > On Tue, Nov 21, 2006 at 09:36:46PM +0100, Pavol Gono wrote: > > > I am going to add two more: > > > for i in 1 2 3 4 5 ; do modprobe pktgen ; rmmod pktgen ; done > > > > Looks like it creates /proc/net/pktgen/kpktgen_%i but forgets to remove > > them. > > It's pretty careful to delete all of the entries under > /proc/net/pktgen/. > > When the module is brought down, it walks the list of threads > and brings them down by setting T_TERMINATE in t->control. Looks like worker thread strategically clears it if scheduled at wrong moment. --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -3292,7 +3292,6 @@ static void pktgen_thread_worker(struct init_waitqueue_head(&t->queue); - t->control &= ~(T_TERMINATE); t->control &= ~(T_RUN); t->control &= ~(T_STOP); t->control &= ~(T_REMDEVALL); > This makes the thread break out of it's loop and run: > > pktgen_stop(t); > pktgen_rem_all_ifs(t); > pktgen_rem_thread(t); Kernel seeems to survive, but when I hit Ctrl+C after half a minute backtrace is back being the very last dmesg lines. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen
From: Alexey Dobriyan <[EMAIL PROTECTED]> Date: Wed, 22 Nov 2006 00:22:51 +0300 > [CCing netdev, bug in pktgen] > > [build modular pktgen] > while true; do modprobe pktgen && rmmod pktgen; done > > BUG: warning at fs/proc/generic.c:732/remove_proc_entry() >[] remove_proc_entry+0x161/0x1ca >[] pg_cleanup+0xd5/0xdc [pktgen] > [] autoremove_wake_function+0x0/0x35 >[] sys_delete_module+0x162/0x189 >[] remove_vma+0x31/0x36 >[] do_munmap+0x193/0x1ac >[] sysenter_past_esp+0x56/0x79 >[] fn_hash_delete+0x4f/0x1c7 > > On Tue, Nov 21, 2006 at 09:36:46PM +0100, Pavol Gono wrote: > > I am going to add two more: > > for i in 1 2 3 4 5 ; do modprobe pktgen ; rmmod pktgen ; done > > Looks like it creates /proc/net/pktgen/kpktgen_%i but forgets to remove > them. It's pretty careful to delete all of the entries under /proc/net/pktgen/. When the module is brought down, it walks the list of threads and brings them down by setting T_TERMINATE in t->control. This makes the thread break out of it's loop and run: pktgen_stop(t); pktgen_rem_all_ifs(t); pktgen_rem_thread(t); pktgen_rem_all_ifs() will delete all device entries in /proc/net/pktgen/ Next, pktgen_rem_thread() will kill off the entry for /proc/net/pktgen/kpkgen_%i Finally, the top-level module remove code will delete the control file right before it tries to kill off the directory via: /* Clean up proc file system */ remove_proc_entry(PGCTRL, pg_proc_dir); proc_net_remove(PG_PROC_DIR); So I don't see any bugs that could cause this. One possibility is that the thread's t->name is being corrupted somehow, that would make the remove_proc_entry() fail and cause the behavior you see. But I can't see anything obvious that might do that either. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
Ben Greear writes: > > Changes: > > * use a nano-second timer based on the scheduler timer (TSC) for relative > > times, instead of get_time_of_day. Seems I missed to set tsc as clocksource. It makes a difference. Performance is normal and I'm less confused. e1000 82546GB @ 1.6 GHz Opteron. Kernel 2.6.19-rc1_Bifrost_-g18e199c6-dirty echo acpi_pm> /sys/devices/system/clocksource/clocksource0/current_clocksource psize pps - 60 556333 124 526942 252 452981 508 234996 1020 119748 1496 82248 echo tsc> /sys/devices/system/clocksource/clocksource0/current_clocksource psize pps - 60 819914 124 747286 252 452975 508 234993 1020 119749 1496 82247 Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
jamal writes: > If you are listening then start with: > > 1) Do a simple test with just udp traffic as above, doing simple > accounting. This helps you to get a feel on how things work. > 2) modify the matching rules to match your magic cookie > 3) write a simple action invoked by your matching rules and use tc to > add it to the policy. > 4) integrate in your app now that you know what you are doing. Yes. Sounds like simple and general solution. No call-backs, no #ifdef's no extra modules. Just a little recipe in pktgen.txt Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
jamal wrote: On Sat, 2006-04-11 at 09:29 -0800, Ben Greear wrote: You can ofcourse add many of these based on other header info. It seems to me your magic header is inside the UDP packet, correct? In that case you will have to play with matches since you can specify arbitraty offsets and values inside the packet. Yes, I want to be able to randomize source/dest MAC, IP, and IP Port, so I need to match only on the magic cookie. I'll soon add support to deal with TCP packets as well, but that is only going to require calculating a different offset to look for the magic value. Of course you can do this from your application instead of using tc. I think that will be the best place to control this and sync with pktgen sending. You mean the equiv of calling system(tc filter .) from the app, or do you mean something else entirely? the rest can be in a method called after you match in your script? If the packet/byte counters that the drop action provides are not good enough, you can write yourself a little module that will do the accounting the way you want it. For example this will be necessary to the out-of-sequence cheques below. Timestamps are already being updated Look at net/sched/act_simple.c and its associated user space code in iproute2. Now, Ben - are you listening really this time or am i wasting my time for the nth time giving you all these details? ;-> You always give details that are pertinent to your setup and not to mine, and you always have some 'small' step that is 'write a kernel module and hack on tc' if you want it to do something other than the generic thing. If you are listening then start with: 1) Do a simple test with just udp traffic as above, doing simple accounting. This helps you to get a feel on how things work. 2) modify the matching rules to match your magic cookie These first two do not look too difficult. I certainly need more control/stats than what the generic counters would be, however, so I assume I need to move to step 3. 3) write a simple action invoked by your matching rules and use tc to add it to the policy. I have exactly no idea of what you mean by this. How do I get this tc framework to send the packet with 32-bit value FOO at offset X to some method called pktgen_rx_pkt(skb) in the pktgen module? That is all I want in a hook: The ability to match on a pkt and send it to my kernel module and to let my kernel module and have it traverse the stack no farther unless my kernel module decides to re-insert it for some reason. If this takes significant work, then please don't waste more time trying to talk me into this, as my small patch to dev.c already does exactly what I need and the code is very easy to understand. If this is easy, how about you write the module and make the tc changes. I'll work on any changes needed in pktgen and polish up a nice patch for Robert et al Then you can get the latency distributions, OOO, Dup, Dropped and other counters that my pktgen logic provides for your own testing purposes... Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
On Sat, 2006-04-11 at 09:29 -0800, Ben Greear wrote: > jamal wrote: > Please do send a script. I match based on this method below. Probably > you only > need the part that checks for the MAGIC, What i do in my case is send to the SUT to UDP port 9. The SUT loops back the packet to me after processing and i just have a simple check in the traffic generator of: a) Packet = IPV4, IP address matches what i want, port =9, b) If this matches i account and drop it. This means all other traffic (like ssh etc goes through fine). I dont really need to check for length because _i know_ udp port 9 to the SUT is test traffic. For such a setup, the script would be as simple as (assuming SUT hooked to eth0): -- #interested in incoming packets on eth0 tc qdisc add dev eth0 ingress #interested in packets coming from SUT(10.0.0.21) on UDP port 9 tc filter add dev eth0 parent : protocol ip prio 6 u32 \ match ip src 10.0.0.21/32 \ match udp port 9 0x \ flowid 1:16 \ action drop --- When you do a listing of this filter you will see accounting info. You list by saying: --- tc -s filter ls dev eth0 parent : --- You can ofcourse add many of these based on other header info. It seems to me your magic header is inside the UDP packet, correct? In that case you will have to play with matches since you can specify arbitraty offsets and values inside the packet. Of course you can do this from your application instead of using tc. I think that will be the best place to control this and sync with pktgen sending. > the rest can be in a method > called after you match in your script? > If the packet/byte counters that the drop action provides are not good enough, you can write yourself a little module that will do the accounting the way you want it. For example this will be necessary to the out-of-sequence cheques below. Timestamps are already being updated Look at net/sched/act_simple.c and its associated user space code in iproute2. Now, Ben - are you listening really this time or am i wasting my time for the nth time giving you all these details? ;-> If you are listening then start with: 1) Do a simple test with just udp traffic as above, doing simple accounting. This helps you to get a feel on how things work. 2) modify the matching rules to match your magic cookie 3) write a simple action invoked by your matching rules and use tc to add it to the policy. 4) integrate in your app now that you know what you are doing. If you follow these steps or a variant-of i will spend time and help. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
jamal wrote: On Wed, 2006-01-11 at 11:11 -0800, Ben Greear wrote: I'd be thrilled to have the receive logic go into pktgen, even if it was #if 0 with a comment showing how to patch dev.c to get it working. It would make my out-of-tree patch smaller and should help others who are doing research and driver development... I use pktgen extensively these days for ipsec testing. I also record stats for pkts i receive using a tc action drop. Very simple and works great. Ben, you keep insisting on doing this hook (every 3 months) because you dont want to invest time to do it with netlink (I thought you sold this as part of your product - the investment of your time is really not that high). If you tell me what the packet trigger is, I will send you a script ;-> Please do send a script. I match based on this method below. Probably you only need the part that checks for the MAGIC, the rest can be in a method called after you match in your script? /* Returns < 0 if the skb is not a pktgen buffer. */ int pktgen_receive(struct sk_buff* skb) { /* See if we have a pktgen packet */ /* TODO: Add support for detecting IPv6, TCP packets too. This will only * catch UDP at the moment. --Ben */ /*printk("pktgen-rcv, skb->len: %d\n", skb->len);*/ if ((skb->len >= (20 + 8 + sizeof(struct pktgen_hdr))) && (skb->protocol == __constant_htons(ETH_P_IP))) { struct pktgen_hdr* pgh; /* It's IP, and long enough, lets check the magic number. * TODO: This is a hack not always guaranteed to catch the right * packets. */ /*printk("Length & protocol passed, skb->data: %p, raw: %p\n", skb->data, skb->h.raw);*/ pgh = (struct pktgen_hdr*)(skb->data + 20 + 8); /* tmp = (char*)(skb->data); for (i = 0; i<90; i++) { printk("%02hx ", tmp[i]); if (((i + 1) % 15) == 0) { printk("\n"); } } printk("\n"); */ if (pgh->pgh_magic == __constant_ntohl(PKTGEN_MAGIC)) { struct net_device* dev = skb->dev; struct pktgen_dev* pkt_dev; __u32 seq = ntohl(pgh->seq_num); // TODO: Need lock..maybe pkt_dev = dev->pkt_dev; if (!pkt_dev) { return -1; } pkt_dev->pkts_rcvd++; pkt_dev->bytes_rcvd += ((skb->tail - skb->mac.raw) + 4); /* +4 for the checksum */ /* Check for out-of-sequence packets */ if (pkt_dev->last_seq_rcvd == seq) { pkt_dev->dup_rcvd++; pkt_dev->dup_since_incr++; } else { __s64 rx; __s64 tx; struct timeval txtv; if (skb->tstamp.off_sec || skb->tstamp.off_usec) { skb_get_timestamp(skb, &txtv); } else { do_gettimeofday(&txtv); skb_set_timestamp(skb, &txtv); } rx = tv_to_us(&txtv); txtv.tv_usec = ntohl(pgh->tv_usec); txtv.tv_sec = ntohl(pgh->tv_sec); tx = tv_to_us(&txtv); record_latency(pkt_dev, rx - tx); if ((pkt_dev->last_seq_rcvd + 1) == seq) { if ((pkt_dev->peer_clone_skb > 1) && (pkt_dev->peer_clone_skb > (pkt_dev->dup_since_incr + 1))) { pkt_dev->seq_gap_rcvd += (pkt_dev->peer_clone_skb - pkt_dev->dup_since_incr - 1); } /* Great, in order...all is well */ } else if (pkt_dev->last_seq_rcvd < seq) { /* sequence gap, means we dropped a pkt most likely */ if (pkt_dev->peer_clone_skb > 1) { /* We dropped more than one sequence number's worth, * and if we're using clone_skb, then this is quite * a few. This n
Re: pktgen patch available for perusal.
On Wed, 2006-01-11 at 11:11 -0800, Ben Greear wrote: > I'd be thrilled to have the receive logic go into pktgen, even if > it was #if 0 with a comment > showing how to patch dev.c to get it working. It would make my > out-of-tree patch smaller > and should help others who are doing research and driver development... > I use pktgen extensively these days for ipsec testing. I also record stats for pkts i receive using a tc action drop. Very simple and works great. Ben, you keep insisting on doing this hook (every 3 months) because you dont want to invest time to do it with netlink (I thought you sold this as part of your product - the investment of your time is really not that high). If you tell me what the packet trigger is, I will send you a script ;-> Robert, I have started writing some generic netlink messaging to replace the /proc that i will send your way. I need to get consensus on some tunnel-mode IPSEC patches first then i will send the about three patchsets your way (to replace the old ones i sent earlier). cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
Robert Olsson wrote: Ben Greear writes: > It requires a hook in dev.c, or at least that is the only way I can > think to implement it. Well the hook be placed along the packet path even in drivers. In tulip I didn't even take packet of the ring in some experiments. And there plenty of existing hooks already i.e PRE_ROUTE? For my particular application the hook needs to be right after the bridge hook. My dev.c hook looks like this: #if defined(CONFIG_NET_PKTGEN) || defined(CONFIG_NET_PKTGEN_MODULE) #include "pktgen.h" #warning "Compiling dev.c for pktgen."; int (*handle_pktgen_hook)(struct sk_buff *skb) = NULL; EXPORT_SYMBOL(handle_pktgen_hook); static __inline__ int handle_pktgen_rcv(struct sk_buff* skb) { if (handle_pktgen_hook) { return handle_pktgen_hook(skb); } return -1; } #endif . #if defined(CONFIG_NET_PKTGEN) || defined(CONFIG_NET_PKTGEN_MODULE) if ((skb->dev->pkt_dev) && (handle_pktgen_rcv(skb) >= 0)) { /* Pktgen may consume the packet, no need to send * to further protocols. */ goto out; } #endif If the rx-hook logic is already in pktgen module, and it's symbol exported, perhaps a second hook module could be written that would just be the bridge between a driver or a PRE-ROUTE hook and pktgen? The hook module would probably not be much larger than the code above... Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
Ben Greear writes: > It requires a hook in dev.c, or at least that is the only way I can > think to implement it. Well the hook be placed along the packet path even in drivers. In tulip I didn't even take packet of the ring in some experiments. And there plenty of existing hooks already i.e PRE_ROUTE? > I don't think that hook is going to be allowed into the kernel, so the > best I was hoping for was to have the code in pktgen with #if 0 and let > users patch their kernel to add the hook... Right so what I was trying to say was rather having #if 0 and dead code in pktgen it could be kept separate. Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
Robert Olsson wrote: Ben Greear writes: > I'd be thrilled to have the receive logic go into pktgen, even if it was #if 0 with a comment > showing how to patch dev.c to get it working. It would make my out-of-tree patch smaller > and should help others who are doing research and driver development... Just an idea... RX part is pretty separate from TX. How about if we keep the this code in a small seprate optional module? It can developed into some general RX probe. It requires a hook in dev.c, or at least that is the only way I can think to implement it. I don't think that hook is going to be allowed into the kernel, so the best I was hoping for was to have the code in pktgen with #if 0 and let users patch their kernel to add the hook... Ben Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
Ben Greear writes: > I'd be thrilled to have the receive logic go into pktgen, even if it was #if > 0 with a comment > showing how to patch dev.c to get it working. It would make my out-of-tree > patch smaller > and should help others who are doing research and driver development... Just an idea... RX part is pretty separate from TX. How about if we keep the this code in a small seprate optional module? It can developed into some general RX probe. Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen patch available for perusal.
Robert Olsson wrote: Ben Greear writes: > I've completed the first pass of my changes to pktgen in 2.6.18. > Many of these features are probably DOA based on previous conversations, > but perhaps this will help someone Thanks. Well sometimes there is a need to capture and drop pkts and various points, so it handy to have code fragments ready and just add the hook when it's needed in driver or rx softirq etc. I'd be thrilled to have the receive logic go into pktgen, even if it was #if 0 with a comment showing how to patch dev.c to get it working. It would make my out-of-tree patch smaller and should help others who are doing research and driver development... If you'd accept a patch like this, please let me know. > Changes: > * use a nano-second timer based on the scheduler timer (TSC) for relative times, instead of get_time_of_day. Any chance you extract this one so we can compare with get_time_of_day. This pops up in the profiles and TX numbers are not what used to be. I'll work on that. Thanks, Ben Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PKTGEN: Adds thread limit parameter.
On Mon, 13 Mar 2006, Luiz Fernando Capitulino wrote: BTW Robert, my TODO list for pktgen so far is: * A new command called 'rem_device' to remove one device at a time (currently, we can only remove all devices in one shoot with 'rem_devices_all') This should be very simple to implement - there's now a "T_REMDEV" flag in addition to "T_REMDEVALL", though no "rem_device" method has been added. ("T_REMDEV" is currently used only when a NETDEV_UNREGISTER notification is received for the device.) -- Arthur - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PKTGEN: Adds thread limit parameter.
On Mon, 13 Mar 2006 14:29:17 +0100 Robert Olsson <[EMAIL PROTECTED]> wrote: | | Luiz Fernando Capitulino writes: | | > Well, I wouldn't say it's _really_ needed. But it really avoids having | > too many thread entries in the pktgen's /proc directory, and as a good | > result, you will not have pending threads which will never run as well. | > | > Also note that the patch is trivial, if you look at it in detail, | > you'll see that the biggest change we have is the 'if' part. The rest I | > would call cosmetic because the behaivor is the same. | > | > | If so -- Wouldn't a concept of a bitmask to control also which CPU's | > | that runs the threads be more general? | > | > Sounds like a bit complex, and would be my turn to ask if it's | > really needed. :) | | A bit set for each CPU there will a pktgen process running. Looks interesting. Could you give a few more details? Or an example somewhere in the kernel (if any) for what you have in mind. | > * Some minor fixes and cleanups, like functions returns being not | > checked. | > | > * A new command called 'rem_device' to remove one device at a time | > (currently, we can only remove all devices in one shoot with | > 'rem_devices_all') | > | > * Ports pktgen to use the kernel thread API | | The current model was chosen simplicity and to bound a device | to a specific CPU so it never cahanges. A change will need to | carefully tested. Sure thing. | > * cleanup the debug function usage | | I would like to remove the do_softirq stuff from pktgen... Added to the TODO list. :) -- Luiz Fernando N. Capitulino - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PKTGEN: Adds thread limit parameter.
Luiz Fernando Capitulino writes: > Well, I wouldn't say it's _really_ needed. But it really avoids having > too many thread entries in the pktgen's /proc directory, and as a good > result, you will not have pending threads which will never run as well. > > Also note that the patch is trivial, if you look at it in detail, > you'll see that the biggest change we have is the 'if' part. The rest I > would call cosmetic because the behaivor is the same. > > | If so -- Wouldn't a concept of a bitmask to control also which CPU's > | that runs the threads be more general? > > Sounds like a bit complex, and would be my turn to ask if it's > really needed. :) A bit set for each CPU there will a pktgen process running. > * Some minor fixes and cleanups, like functions returns being not > checked. > > * A new command called 'rem_device' to remove one device at a time > (currently, we can only remove all devices in one shoot with > 'rem_devices_all') > > * Ports pktgen to use the kernel thread API The current model was chosen simplicity and to bound a device to a specific CPU so it never cahanges. A change will need to carefully tested. > * cleanup the debug function usage I would like to remove the do_softirq stuff from pktgen... Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PKTGEN: Adds thread limit parameter.
Hi Robert, On Mon, 13 Mar 2006 10:44:49 +0100 Robert Olsson <[EMAIL PROTECTED]> wrote: | | Hello! | | Really needed? Well, I wouldn't say it's _really_ needed. But it really avoids having too many thread entries in the pktgen's /proc directory, and as a good result, you will not have pending threads which will never run as well. Also note that the patch is trivial, if you look at it in detail, you'll see that the biggest change we have is the 'if' part. The rest I would call cosmetic because the behaivor is the same. | If so -- Wouldn't a concept of a bitmask to control also which CPU's | that runs the threads be more general? Sounds like a bit complex, and would be my turn to ask if it's really needed. :) BTW Robert, my TODO list for pktgen so far is: * Some minor fixes and cleanups, like functions returns being not checked. * A new command called 'rem_device' to remove one device at a time (currently, we can only remove all devices in one shoot with 'rem_devices_all') * Ports pktgen to use the kernel thread API * cleanup the debug function usage Would be good to hear from you if they really makes sense. Thank you for the feedback, | Luiz Fernando Capitulino writes: | > | > Currently, pktgen will create one thread for each online CPU in the | > system. That behaivor may be annoying if you're using pktgen in a | > large SMP system. | > | > This patch adds a new module parameter called 'pg_max_threads', which | > can be set to the maximum number of threads pktgen should create. | > | > For example, if you're playing with pktgen in SMP system with 8 | > CPUs, but want only two pktgen's threads, you can do: | > | >modprobe pktgen pg_max_threads=2 | > | > Signed-off-by: Luiz Capitulino <[EMAIL PROTECTED]> | > | > --- | > | > net/core/pktgen.c | 23 +-- | > 1 files changed, 17 insertions(+), 6 deletions(-) | > | > e210ad47d0db52496fdaabdd50bfe6ee0bcc53cd | > diff --git a/net/core/pktgen.c b/net/core/pktgen.c | > index 37b25a6..994aef1 100644 | > --- a/net/core/pktgen.c | > +++ b/net/core/pktgen.c | > @@ -154,7 +154,7 @@ | > #include /* do_div */ | > #include | > | > -#define VERSION "pktgen v2.66: Packet Generator for packet performance testing.\n" | > +#define VERSION "pktgen v2.67: Packet Generator for packet performance testing.\n" | > | > /* #define PG_DEBUG(a) a */ | > #define PG_DEBUG(a) | > @@ -488,6 +488,7 @@ static unsigned int fmt_ip6(char *s, con | > static int pg_count_d = 1000; /* 1000 pkts by default */ | > static int pg_delay_d; | > static int pg_clone_skb_d; | > +static int pg_max_threads; | > static int debug; | > | > static DEFINE_MUTEX(pktgen_thread_lock); | > @@ -3184,7 +3185,7 @@ static int pktgen_remove_device(struct p | > | > static int __init pg_init(void) | > { | > - int cpu; | > + int i, threads; | >struct proc_dir_entry *pe; | > | >printk(version); | > @@ -3208,15 +3209,24 @@ static int __init pg_init(void) | >/* Register us to receive netdevice events */ | >register_netdevice_notifier(&pktgen_notifier_block); | > | > - for_each_online_cpu(cpu) { | > + threads = num_online_cpus(); | > + | > + /* | > + * Check if we should have less threads than the number | > + * of online CPUs | > + */ | > + if ((pg_max_threads > 0) && (pg_max_threads < threads)) | > + threads = pg_max_threads; | > + | > + for (i = 0; i < threads; i++) { | >int err; | >char buf[30]; | > | > - sprintf(buf, "kpktgend_%i", cpu); | > - err = pktgen_create_thread(buf, cpu); | > + sprintf(buf, "kpktgend_%i", i); | > + err = pktgen_create_thread(buf, i); | >if (err) | >printk("pktgen: WARNING: Cannot create thread for cpu %d (%d)\n", | > - cpu, err); | > + i, err); | >} | > | >if (list_empty(&pktgen_threads)) { | > @@ -3263,4 +3273,5 @@ MODULE_LICENSE("GPL"); | > module_param(pg_count_d, int, 0); | > module_param(pg_delay_d, int, 0); | > module_param(pg_clone_skb_d, int, 0); | > +module_param(pg_max_threads, int, 0); | > module_param(debug, int, 0); | > -- | > 1.2.4.gbe76 | > | > | > | > -- | > Luiz Fernando N. Capitulino -- Luiz Fernando N. Capitulino - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PKTGEN 3/3]: Updates version.
From: Luiz Fernando Capitulino <[EMAIL PROTECTED]> Date: Wed, 1 Mar 2006 23:05:54 -0300 > > Due to the thread's lock changes, we're at a new version now. > > Signed-off-by: Luiz Capitulino <[EMAIL PROTECTED]> Applied, thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PKTGEN 2/3]: Removes thread_{un,}lock() macros.
From: Luiz Fernando Capitulino <[EMAIL PROTECTED]> Date: Wed, 1 Mar 2006 23:05:48 -0300 > > As suggested by Arnaldo, this patch replaces the > thread_lock()/thread_unlock() > by directly calls to mutex_lock()/mutex_unlock(). > > This change makes the code a bit more readable, and the direct calls are used > everywhere in the kernel. > > Signed-off-by: Luiz Capitulino <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PKTGEN 1/3]: Convert thread lock to mutexes.
From: Luiz Fernando Capitulino <[EMAIL PROTECTED]> Date: Wed, 1 Mar 2006 23:05:43 -0300 > > pktgen's thread semaphores are strict mutexes, convert them to the mutex > implementation. > > Signed-off-by: Luiz Capitulino <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen + napi == kaboom
On 2/22/06, Simon Kirby <[EMAIL PROTECTED]> wrote: > Of course, now it doesn't send as fast. Hrmph. :) On this older Xeon > 2.4 Ghz w/533 FSB and e1000 & tg3 @ PCI-X 133 Mhz 64 bit, SMP kernel, > single pktgen thread, I'm only seeing: > > clone_skb=0, 802.1Q tagging, 60 byte: > e1000: 558526pps 268Mb/sec (268092480bps) errors: 0 > tg3: 621260pps 298Mb/sec (298204800bps) errors: 0 > > clone_skb=0, no 802.1Q, 60 byte: > e1000: 664558pps 318Mb/sec (318987840bps) errors: 0 > tg3: 772650pps 370Mb/sec (370872000bps) errors: 0 > > clone_skb=16384, no 802.1Q, 60 byte: > e1000: 684206pps 328Mb/sec (328418880bps) errors: 0 > tg3: 1069830pps 513Mb/sec (513518400bps) errors: 0 > > I tried on an Opteron 140 box and it was faster for both cards, but not > by much. oprofile showed a lot of do_getttimeofday, so I hacked a bunch > of calls out of pktgen -- I noticed the CPU time shifted around but the > throughput was still the same as before, as if it's card or bus limited. > > Why is it so difficult to actually get 1 Gbps of small packets? I also > tried changing ring buffer sizes, txqueuelen, interrupt coalescing > settings, etc... all I was able to do was make it slower or very > slightly faster. Its difficult because you have *loads* of transactions going over the bus. Linux's single transmit packet at a time methodology also exacerbates this, as we're unable to coalesce transmit tail (TDT) writes to the bus and are probably interrupting the previous DMA *a lot* (see thread below) You'll probably be able to get a little better throughput by switching to a UP kernel, but from my experience you're getting pretty close to the max for pci-x e1000 adapters. There are some previous messages about this like: http://oss.sgi.com/projects/netdev/archive/2004-12/msg00017.html beware the hardware bug when enabling TXDMAC! Jesse - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen + napi == kaboom
On Wed, Feb 22, 2006 at 10:56:31AM -0800, Ben Greear wrote: > For VLANs, make sure that the 'multi-skb' is always zero. This is because > the VLAN code modifies the skb by re-assigning the skb->dev to the > underlying > device. > > I'm not sure if this is the problem you hit, but it could be... Yup, that was it. Of course, now it doesn't send as fast. Hrmph. :) On this older Xeon 2.4 Ghz w/533 FSB and e1000 & tg3 @ PCI-X 133 Mhz 64 bit, SMP kernel, single pktgen thread, I'm only seeing: clone_skb=0, 802.1Q tagging, 60 byte: e1000: 558526pps 268Mb/sec (268092480bps) errors: 0 tg3: 621260pps 298Mb/sec (298204800bps) errors: 0 clone_skb=0, no 802.1Q, 60 byte: e1000: 664558pps 318Mb/sec (318987840bps) errors: 0 tg3: 772650pps 370Mb/sec (370872000bps) errors: 0 clone_skb=16384, no 802.1Q, 60 byte: e1000: 684206pps 328Mb/sec (328418880bps) errors: 0 tg3: 1069830pps 513Mb/sec (513518400bps) errors: 0 I tried on an Opteron 140 box and it was faster for both cards, but not by much. oprofile showed a lot of do_getttimeofday, so I hacked a bunch of calls out of pktgen -- I noticed the CPU time shifted around but the throughput was still the same as before, as if it's card or bus limited. Why is it so difficult to actually get 1 Gbps of small packets? I also tried changing ring buffer sizes, txqueuelen, interrupt coalescing settings, etc... all I was able to do was make it slower or very slightly faster. Simon- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen + napi == kaboom
Robert Olsson wrote: Simon Kirby writes: > Just tried to do some benchmarks for outgoing packet rates with the > e1000 and tg3. When I tried some vlan tagging with pktgen, it blew > up immediately: Hello! No pktgen has no support vlan as-is .Guess there should be some config option to select vlan and enable it and fill vlan header in the packet. If you're motivated give it a try. VLAN devices will add the header for you...pktgen should work with no modification. It will work *best* if you use my patch to give stack feedback so that pktgen + vlan doesn't drop so many pkts on transmit. diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -1,4 +1,4 @@ -/* +/* -*- linux-c -*- * INET802.1Q VLAN * Ethernet-type device handling. * diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -438,6 +438,11 @@ int vlan_dev_hard_start_xmit(struct sk_b struct net_device_stats *stats = vlan_dev_get_stats(dev); struct vlan_ethhdr *veth = (struct vlan_ethhdr *)(skb->data); + /* Please note, dev_queue_xmit consumes the pkt regardless of the +* return value. So, will copy the skb first and free if successful. +*/ + struct sk_buff* skb2 = skb_get(skb); + /* Handle non-VLAN frames if they are sent to us, for example by DHCP. * * NOTE: THIS ASSUMES DIX ETHERNET, SPECIFICALLY NOT SUPPORTING @@ -467,6 +472,10 @@ int vlan_dev_hard_start_xmit(struct sk_b skb = __vlan_put_tag(skb, veth_TCI); if (!skb) { stats->tx_dropped++; + /* Free the extra copy, assuming this is a non-recoverable +* issue and we don't want calling code to retry. +*/ + kfree_skb(skb2); return 0; } @@ -484,13 +493,24 @@ int vlan_dev_hard_start_xmit(struct sk_b veth->h_vlan_proto, veth->h_vlan_TCI, veth->h_vlan_encapsulated_proto); #endif - stats->tx_packets++; /* for statics only */ - stats->tx_bytes += skb->len; - skb->dev = VLAN_DEV_INFO(dev)->real_dev; - dev_queue_xmit(skb); - return 0; + { + int rv = dev_queue_xmit(skb); + if (rv == 0) { + /* Was success, need to free the skb reference since +* we bumped up the user count above. If there was an +* error instead, then the skb2 will not be freed, and so +* the calling code will be able to re-send it. +*/ + + stats->tx_packets++; /* for statics only */ + stats->tx_bytes += skb2->len; + + kfree_skb(skb2); + } + return rv; + } } int vlan_dev_hwaccel_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pktgen + napi == kaboom
Simon Kirby wrote: 2.6.15.4: Just tried to do some benchmarks for outgoing packet rates with the e1000 and tg3. When I tried some vlan tagging with pktgen, it blew up immediately: For VLANs, make sure that the 'multi-skb' is always zero. This is because the VLAN code modifies the skb by re-assigning the skb->dev to the underlying device. I'm not sure if this is the problem you hit, but it could be... Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html