Re: [WIP][PATCHES] Network xmit batching - tg3 support
On Wed, 2007-04-07 at 09:49 +0530, Krishna Kumar2 wrote: Do you see any contention for tx_lock which can justify having a prep handler? As I understood, no other CPU can be in the xmit code at the same time since RUNNING bit is held. Hence getting this lock early or late should not matter for the xmit side (and you are also holding dev-queue_lock while running prep so no new skbs can also be added to the dev during this time). And I couldn't find any driver that uses the same tx_lock on rx, On any non-LLTX driver: netif_tx_lock() is grabbed on both tx and receive paths. I have to admit, the e1000 is too clever for its own good: there is no documentation/description (and Herbert never explained) but it seems to be able to play with only certain parts of the ring on tx side and others on rx side so that a lock becomes unnecessary. So even for non-LLTX, some lock needs to be held on rx just to be clean (and theres a lot of grunty-ness against LLTX drivers out there). so where is the saving by doing prep ? You would agree i think that if there is contention between two CPUs for the same lock, gut feeling is there is benefit to do things outside the lock when possible. This becomes more important with batching when amortizing the cost of a lock. Theres a little more than that and I could write up a longer description if needed. Also note: not all drivers may be able to move things outside of the lock - infact the tun driver i converted is hard to do anything with since it doesnt have any hardware features that makes it easy to move things outside. For this reason this api is optional (and tun infact doesnt use it); and as i have said a few times now, i will be more than happy to get rid of it if experiments show it is unnecessary. For now my take is someone needs to disprove it is valuable (most prefered with experiments). cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [WIP][PATCHES] Network xmit batching - tg3 support
On Mon, 2007-02-07 at 17:21 -0700, Michael Chan wrote: [Matt, please include the count in the fix per previous email] Only base_flags and mss are needed and these can be determined right before sending the frame. So is it better not to store these in the skb-cb at all? long answer: My goal with storing these values and computing them was to do certain things that dont require holding the netif_tx_lock within a batch as well. Evaluating the packet metadata and formating the packet to be ready for stashing into DMA was one thing i could do outside of holding the lock easily - and running that in loop of 100 packets amortizes the instruction cache and allows me (when i get to it) to hold the lock for a lot less computation. In the e1000 for example i was able to go as far as computing how many descriptors are needed per skb and stashing those in the skb-cb; the way the tg3 is structured makes doing all that needing some big changes. So to answer your question: It would be nice if i could keep the skb-cb for now and we can get rid of it later if deemed a nuisance for batching. + dcount = tg3_tx_avail(tp); if (unlikely(netif_queue_stopped(tp-dev) - (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH(tp { + (dcount TG3_TX_WAKEUP_THRESH(tp { netif_tx_lock(tp-dev); + tp-dev-xmit_win = 1; if (netif_queue_stopped(tp-dev) - (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH(tp))) + (dcount TG3_TX_WAKEUP_THRESH(tp))) { netif_wake_queue(tp-dev); + tp-dev-xmit_win = dcount; + } netif_tx_unlock(tp-dev); } } This is also not right. tg3_tx() runs without netif_tx_lock(). tg3_tx_avail() can change after you get the netif_tx_lock() and we must get the updated value again. If we just rely on dcount, we can call wake_queue() when the ring is full, or there may be no wakeup when the ring is empty. You are right, I was trying to be a smart-ass there so i didnt have to fold the lines (for readability). Can you or Matt restore it to the way it was originally while keeping the xmit_win update and just send me a patch? Thanks a lot Michael and Matt for looking at this and improving it. Maybe i should switch all my nics to tg3 from now on ;- cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [WIP][PATCHES] Network xmit batching - tg3 support
From: jamal [EMAIL PROTECTED] Date: Tue, 03 Jul 2007 09:09:48 -0400 [It sounds very dangerous to me the way skb-cb is being used by the vlan code (i.e requires human intervention/knowledge to catch it as an issue). I had no freaking idea the vlan code was using it. Maybe a huge comment somewhere on how these cbs are being used by drivers would help or even a registration on startup to just make sure there is no conflict at a layer (i have been meaning to do the later for years now). In any case this is is a digression]. This is how the VLAN layer passed in the tag for the transmit function. The CB is owned by the caller until you spam it with your local data at your level, so if you need any information the previous layer provides you have to sample it before using the -cb[] So you can simply fix this by reading the VLAN tag, then using the -cb[] however you like. It's always been like this, don't be surprised :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [WIP][PATCHES] Network xmit batching - tg3 support
On Tue, 2007-03-07 at 12:31 -0700, Matt Carlson wrote: I had planned on using netperf, but pktgen sounds like a more controlled environment. Thanks for the tip. I can help more if you use pktgen - netperf is more involved. Also pktgen is much closer to the driver so it would let you see closely if any improvements are obvious or not. Do you see any reason why we couldn't just add the VLAN code to the prep stage and simply overwrite that portion of the skb-cb? Our driver would just store its value in the base_flags member of tg3_tx_cbdata. That would be a much better approach;- Go for it. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [WIP][PATCHES] Network xmit batching - tg3 support
Hi Jamal, J Hadi Salim [EMAIL PROTECTED] wrote on 07/03/2007 06:56:20 PM: On Mon, 2007-02-07 at 17:21 -0700, Michael Chan wrote: [Matt, please include the count in the fix per previous email] long answer: My goal with storing these values and computing them was to do certain things that dont require holding the netif_tx_lock within a batch as well. Evaluating the packet metadata and formating the packet to be ready for stashing into DMA was one thing i could do outside of holding the lock easily - and running that in loop of 100 packets amortizes the instruction cache and allows me (when i get to it) to hold the lock for a lot less computation. Do you see any contention for tx_lock which can justify having a prep handler? As I understood, no other CPU can be in the xmit code at the same time since RUNNING bit is held. Hence getting this lock early or late should not matter for the xmit side (and you are also holding dev-queue_lock while running prep so no new skbs can also be added to the dev during this time). And I couldn't find any driver that uses the same tx_lock on rx, so where is the saving by doing prep ? - KK - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [WIP][PATCHES] Network xmit batching - tg3 support
On Mon, 2007-07-02 at 14:20 -0700, Matt Carlson wrote: Also, I think the count, max_per_txd, and nr_frags fields of the tg3_tx_cbdata struct are not needed. The count field is not needed also. +struct tg3_tx_cbdata { + u32 base_flags; + int count; + unsigned int max_per_txd; + unsigned int nr_frags; + unsigned int mss; +}; +#define TG3_SKB_CB(__skb) ((struct tg3_tx_cbdata *)((__skb)-cb[0])) Only base_flags and mss are needed and these can be determined right before sending the frame. So is it better not to store these in the skb-cb at all? @@ -3118,12 +3120,16 @@ static void tg3_tx(struct tg3 *tp) */ smp_mb(); + dcount = tg3_tx_avail(tp); if (unlikely(netif_queue_stopped(tp-dev) -(tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH(tp { +(dcount TG3_TX_WAKEUP_THRESH(tp { netif_tx_lock(tp-dev); + tp-dev-xmit_win = 1; if (netif_queue_stopped(tp-dev) - (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH(tp))) + (dcount TG3_TX_WAKEUP_THRESH(tp))) { netif_wake_queue(tp-dev); + tp-dev-xmit_win = dcount; + } netif_tx_unlock(tp-dev); } } This is also not right. tg3_tx() runs without netif_tx_lock(). tg3_tx_avail() can change after you get the netif_tx_lock() and we must get the updated value again. If we just rely on dcount, we can call wake_queue() when the ring is full, or there may be no wakeup when the ring is empty. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[WIP][PATCHES] Network xmit batching - tg3 support
peoplez, I have added support for tg3 on batching. I see equivalent performance improvement for pktgen as i did with e1000 when using gige. I have only tested on two machines (one being a laptop which does 10/100Mbps). Unfortunately in both cases these are considered to be in the class of buggy tg3s (which take a longer code path). To the tg3 folks - can you double check if am off on something? I have split a few things that you may like as well. I havent upgraded the tree - it is still circa 2.6.22-rc4 based; at some point i will sync with Daves net-26 Anyone who has tg3 based hardware: I would appreciate any testing and results ... The git tree is at: git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git but i have attached the patch in case you just wanna stare. cheers, jamal commit 91859b60521653a2f72ac70dfe9bfada4fdb28cb Author: Jamal Hadi Salim [EMAIL PROTECTED] Date: Wed Jun 27 19:50:35 2007 -0400 [NET_BATCH] Add tg3 batch support Make tg3 use the batch api. I have tested on my old laptop and another server class machine; they all seem to work - unfortunately they are both considered old class tg3. I am sure theres improvements to be made, but this is as a functional good start. Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index 2f31841..be03cbd 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -581,6 +581,7 @@ static inline void tg3_netif_stop(struct tg3 *tp) static inline void tg3_netif_start(struct tg3 *tp) { netif_wake_queue(tp-dev); + tp-dev-xmit_win = TG3_TX_RING_SIZE 2; /* NOTE: unconditional netif_wake_queue is only appropriate * so long as all callers are assured to have free tx slots * (such as after tg3_init_hw) @@ -3066,6 +3067,7 @@ static inline u32 tg3_tx_avail(struct tg3 *tp) */ static void tg3_tx(struct tg3 *tp) { + int dcount; u32 hw_idx = tp-hw_status-idx[0].tx_consumer; u32 sw_idx = tp-tx_cons; @@ -3118,12 +3120,16 @@ static void tg3_tx(struct tg3 *tp) */ smp_mb(); + dcount = tg3_tx_avail(tp); if (unlikely(netif_queue_stopped(tp-dev) -(tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH(tp { +(dcount TG3_TX_WAKEUP_THRESH(tp { netif_tx_lock(tp-dev); + tp-dev-xmit_win = 1; if (netif_queue_stopped(tp-dev) - (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH(tp))) + (dcount TG3_TX_WAKEUP_THRESH(tp))) { netif_wake_queue(tp-dev); + tp-dev-xmit_win = dcount; + } netif_tx_unlock(tp-dev); } } @@ -3877,47 +3883,56 @@ static void tg3_set_txd(struct tg3 *tp, int entry, txd-vlan_tag = vlan_tag TXD_VLAN_TAG_SHIFT; } -/* hard_start_xmit for devices that don't have any bugs and - * support TG3_FLG2_HW_TSO_2 only. - */ -static int tg3_start_xmit(struct sk_buff *skb, struct net_device *dev) -{ - struct tg3 *tp = netdev_priv(dev); - dma_addr_t mapping; - u32 len, entry, base_flags, mss; - - len = skb_headlen(skb); +struct tg3_tx_cbdata { + u32 base_flags; + int count; + unsigned int max_per_txd; + unsigned int nr_frags; + unsigned int mss; +}; +#define TG3_SKB_CB(__skb) ((struct tg3_tx_cbdata *)((__skb)-cb[0])) +#define NETDEV_TX_DROPPED -5 - /* We are running in BH disabled context with netif_tx_lock -* and TX reclaim runs via tp-poll inside of a software -* interrupt. Furthermore, IRQ processing runs lockless so we have -* no IRQ context deadlocks to worry about either. Rejoice! -*/ - if (unlikely(tg3_tx_avail(tp) = (skb_shinfo(skb)-nr_frags + 1))) { - if (!netif_queue_stopped(dev)) { - netif_stop_queue(dev); +static int tg3_prep_bug_frame(struct sk_buff *skb, struct net_device *dev) +{ + struct tg3_tx_cbdata *cb = TG3_SKB_CB(skb); - /* This is a hard error, log it. */ - printk(KERN_ERR PFX %s: BUG! Tx Ring full when - queue awake!\n, dev-name); + cb-base_flags = 0; + cb-mss = skb_shinfo(skb)-gso_size; + if (cb-mss != 0) { + if (skb_header_cloned(skb) + pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) { + dev_kfree_skb(skb); + return NETDEV_TX_DROPPED; } - return NETDEV_TX_BUSY; + + cb-base_flags |= (TXD_FLAG_CPU_PRE_DMA | + TXD_FLAG_CPU_POST_DMA); } - entry = tp-tx_prod; - base_flags = 0; - mss = 0; - if ((mss = skb_shinfo(skb)-gso_size) != 0) { + if (skb-ip_summed == CHECKSUM_PARTIAL) + cb-base_flags |=