Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-04 Thread jamal
On Wed, 2007-04-07 at 09:49 +0530, Krishna Kumar2 wrote:

 Do you see any contention for tx_lock which can justify having a prep
 handler? As I understood, no other CPU can be in the xmit code at the
 same time since RUNNING bit is held. Hence getting this lock early or
 late should not matter for the xmit side (and you are also holding
 dev-queue_lock while running prep so no new skbs can also be added to
 the dev during this time). And I couldn't find any driver that uses the
 same tx_lock on rx, 

On any non-LLTX driver:
netif_tx_lock() is grabbed on both tx and receive paths. 

I have to admit, the e1000 is too clever for its own good:
there is no documentation/description (and Herbert never explained) but
it seems to be able to play with only certain parts of the ring on tx
side and others on rx side so that a lock becomes unnecessary. So even
for non-LLTX, some lock needs to be held on rx just to be clean (and
theres a lot of grunty-ness against LLTX drivers out there).

 so where is the saving by doing prep ?

You would agree i think that if there is contention between two CPUs for
the same lock, gut feeling is there is benefit to do things outside the
lock when possible. This becomes more important with batching when
amortizing the cost of a lock.
Theres a little more than that and I could write up a longer description
if needed.

Also note: not all drivers may be able to move things outside of the
lock - infact the tun driver i converted is hard to do anything with
since it doesnt have any hardware features that makes it easy to move
things outside. For this reason this api is optional (and tun infact
doesnt use it); and as i have said a few times now, i will be more than
happy to get rid of it if experiments show it is unnecessary. For now my
take is someone needs to disprove it is valuable (most prefered with
experiments).

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread jamal
On Mon, 2007-02-07 at 17:21 -0700, Michael Chan wrote:

[Matt, please include the count in the fix per previous email]

 Only base_flags and mss are needed and these can be determined right
 before sending the frame.  So is it better not to store these in the
 skb-cb at all?

long answer:
My goal with storing these values and computing them was to do certain
things that dont require holding the netif_tx_lock within a batch as
well. Evaluating the packet metadata and formating the packet to be
ready for stashing into DMA was one thing i could do outside of holding
the lock easily - and running that in loop of 100 packets amortizes the
instruction cache and allows me (when i get to it) to hold the lock for
a lot less computation.
In the e1000 for example i was able to go as far as computing how many
descriptors are needed per skb and stashing those in the skb-cb; the
way the tg3 is structured makes doing all that needing some big changes.
So to answer your question: It would be nice if i could keep the skb-cb
for now and we can get rid of it later if deemed a nuisance for
batching.

  
 + dcount = tg3_tx_avail(tp);
   if (unlikely(netif_queue_stopped(tp-dev) 
 -  (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH(tp {
 +  (dcount  TG3_TX_WAKEUP_THRESH(tp {
   netif_tx_lock(tp-dev);
 + tp-dev-xmit_win = 1;
   if (netif_queue_stopped(tp-dev) 
 - (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH(tp)))
 + (dcount  TG3_TX_WAKEUP_THRESH(tp))) {
   netif_wake_queue(tp-dev);
 + tp-dev-xmit_win = dcount;
 + }
   netif_tx_unlock(tp-dev);
   }
  }
 
 This is also not right.  tg3_tx() runs without netif_tx_lock().
 tg3_tx_avail() can change after you get the netif_tx_lock() and we must
 get the updated value again.  If we just rely on dcount, we can call
 wake_queue() when the ring is full, or there may be no wakeup when the
 ring is empty.

You are right, I was trying to be a smart-ass there so i didnt have to
fold the lines (for readability). Can you or Matt restore it to the way
it was originally while keeping the xmit_win update and just send me a
patch?

Thanks a lot Michael and Matt for looking at this and improving it.
Maybe i should switch all my nics to tg3 from now on ;-

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread David Miller
From: jamal [EMAIL PROTECTED]
Date: Tue, 03 Jul 2007 09:09:48 -0400

 [It sounds very dangerous to me the way skb-cb is being used by the
 vlan code (i.e requires human intervention/knowledge to catch it as an
 issue). I had no freaking idea the vlan code was using it. Maybe a huge
 comment somewhere on how these cbs are being used by drivers would help
 or even a registration on startup to just make sure there is no conflict
 at a layer (i have been meaning to do the later for years now). In any
 case this is is a digression]. 

This is how the VLAN layer passed in the tag for the transmit
function.

The CB is owned by the caller until you spam it with your local
data at your level, so if you need any information the previous
layer provides you have to sample it before using the -cb[]

So you can simply fix this by reading the VLAN tag, then using
the -cb[] however you like.

It's always been like this, don't be surprised :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread jamal
On Tue, 2007-03-07 at 12:31 -0700, Matt Carlson wrote:

 I had planned on using netperf, but pktgen sounds like a more controlled
 environment.  Thanks for the tip.

I can help more if you use pktgen - netperf is more involved. Also
pktgen is much closer to the driver so it would let you see closely
if any improvements are obvious or not. 

 Do you see any reason why we couldn't just add the VLAN code to the prep
 stage and simply overwrite that portion of the skb-cb?  Our driver would
 just store its value in the base_flags member of tg3_tx_cbdata.

That would be a much better approach;- Go for it.

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread Krishna Kumar2
Hi Jamal,

J Hadi Salim [EMAIL PROTECTED] wrote on 07/03/2007 06:56:20 PM:

 On Mon, 2007-02-07 at 17:21 -0700, Michael Chan wrote:

 [Matt, please include the count in the fix per previous email]
 long answer:
 My goal with storing these values and computing them was to do certain
 things that dont require holding the netif_tx_lock within a batch as
 well. Evaluating the packet metadata and formating the packet to be
 ready for stashing into DMA was one thing i could do outside of holding
 the lock easily - and running that in loop of 100 packets amortizes the
 instruction cache and allows me (when i get to it) to hold the lock for
 a lot less computation.

Do you see any contention for tx_lock which can justify having a prep
handler? As I understood, no other CPU can be in the xmit code at the
same time since RUNNING bit is held. Hence getting this lock early or
late should not matter for the xmit side (and you are also holding
dev-queue_lock while running prep so no new skbs can also be added to
the dev during this time). And I couldn't find any driver that uses the
same tx_lock on rx, so where is the saving by doing prep ?

- KK

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-02 Thread Michael Chan
On Mon, 2007-07-02 at 14:20 -0700, Matt Carlson wrote:

 
 Also, I think the count, max_per_txd, and nr_frags fields of the
 tg3_tx_cbdata struct are not needed.

The count field is not needed also.

+struct tg3_tx_cbdata {
+   u32 base_flags;
+   int count;
+   unsigned int max_per_txd;
+   unsigned int nr_frags;
+   unsigned int mss;
+};
+#define TG3_SKB_CB(__skb)   ((struct tg3_tx_cbdata *)((__skb)-cb[0]))

Only base_flags and mss are needed and these can be determined right
before sending the frame.  So is it better not to store these in the
skb-cb at all?

@@ -3118,12 +3120,16 @@ static void tg3_tx(struct tg3 *tp)
 */
smp_mb();
 
+   dcount = tg3_tx_avail(tp);
if (unlikely(netif_queue_stopped(tp-dev) 
-(tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH(tp {
+(dcount  TG3_TX_WAKEUP_THRESH(tp {
netif_tx_lock(tp-dev);
+   tp-dev-xmit_win = 1;
if (netif_queue_stopped(tp-dev) 
-   (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH(tp)))
+   (dcount  TG3_TX_WAKEUP_THRESH(tp))) {
netif_wake_queue(tp-dev);
+   tp-dev-xmit_win = dcount;
+   }
netif_tx_unlock(tp-dev);
}
 }

This is also not right.  tg3_tx() runs without netif_tx_lock().
tg3_tx_avail() can change after you get the netif_tx_lock() and we must
get the updated value again.  If we just rely on dcount, we can call
wake_queue() when the ring is full, or there may be no wakeup when the
ring is empty.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[WIP][PATCHES] Network xmit batching - tg3 support

2007-06-27 Thread jamal
peoplez,

I have added support for tg3 on batching. I see equivalent performance
improvement for pktgen as i did with e1000 when using gige. 
I have only tested on two machines (one being a laptop which does
10/100Mbps). Unfortunately in both cases these are considered to be in
the class of  buggy tg3s (which take a longer code path).

To the tg3 folks - can you double check if am off on something?
I have split a few things that you may like as well.
I havent upgraded the tree - it is still circa 2.6.22-rc4 based; at some
point i will sync with Daves net-26

Anyone who has tg3 based hardware: I would appreciate any testing and
results ...

The git tree is at:
git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git
but i have attached the patch in case you just wanna stare.

cheers,
jamal
commit 91859b60521653a2f72ac70dfe9bfada4fdb28cb
Author: Jamal Hadi Salim [EMAIL PROTECTED]
Date:   Wed Jun 27 19:50:35 2007 -0400

[NET_BATCH] Add tg3 batch support
Make tg3 use the batch api.

I have tested on my old laptop and another server class
machine; they all seem to work - unfortunately they are
both considered old class tg3.

I am sure theres improvements to be made, but this is as a
functional good start.

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 2f31841..be03cbd 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -581,6 +581,7 @@ static inline void tg3_netif_stop(struct tg3 *tp)
 static inline void tg3_netif_start(struct tg3 *tp)
 {
netif_wake_queue(tp-dev);
+   tp-dev-xmit_win = TG3_TX_RING_SIZE  2;
/* NOTE: unconditional netif_wake_queue is only appropriate
 * so long as all callers are assured to have free tx slots
 * (such as after tg3_init_hw)
@@ -3066,6 +3067,7 @@ static inline u32 tg3_tx_avail(struct tg3 *tp)
  */
 static void tg3_tx(struct tg3 *tp)
 {
+   int dcount;
u32 hw_idx = tp-hw_status-idx[0].tx_consumer;
u32 sw_idx = tp-tx_cons;
 
@@ -3118,12 +3120,16 @@ static void tg3_tx(struct tg3 *tp)
 */
smp_mb();
 
+   dcount = tg3_tx_avail(tp);
if (unlikely(netif_queue_stopped(tp-dev) 
-(tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH(tp {
+(dcount  TG3_TX_WAKEUP_THRESH(tp {
netif_tx_lock(tp-dev);
+   tp-dev-xmit_win = 1;
if (netif_queue_stopped(tp-dev) 
-   (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH(tp)))
+   (dcount  TG3_TX_WAKEUP_THRESH(tp))) {
netif_wake_queue(tp-dev);
+   tp-dev-xmit_win = dcount;
+   }
netif_tx_unlock(tp-dev);
}
 }
@@ -3877,47 +3883,56 @@ static void tg3_set_txd(struct tg3 *tp, int entry,
txd-vlan_tag = vlan_tag  TXD_VLAN_TAG_SHIFT;
 }
 
-/* hard_start_xmit for devices that don't have any bugs and
- * support TG3_FLG2_HW_TSO_2 only.
- */
-static int tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
-{
-   struct tg3 *tp = netdev_priv(dev);
-   dma_addr_t mapping;
-   u32 len, entry, base_flags, mss;
-
-   len = skb_headlen(skb);
+struct tg3_tx_cbdata {
+   u32 base_flags;
+   int count;
+   unsigned int max_per_txd;
+   unsigned int nr_frags;
+   unsigned int mss;
+};
+#define TG3_SKB_CB(__skb)   ((struct tg3_tx_cbdata *)((__skb)-cb[0]))
+#define NETDEV_TX_DROPPED   -5
 
-   /* We are running in BH disabled context with netif_tx_lock
-* and TX reclaim runs via tp-poll inside of a software
-* interrupt.  Furthermore, IRQ processing runs lockless so we have
-* no IRQ context deadlocks to worry about either.  Rejoice!
-*/
-   if (unlikely(tg3_tx_avail(tp) = (skb_shinfo(skb)-nr_frags + 1))) {
-   if (!netif_queue_stopped(dev)) {
-   netif_stop_queue(dev);
+static int tg3_prep_bug_frame(struct sk_buff *skb, struct net_device *dev)
+{
+   struct tg3_tx_cbdata *cb = TG3_SKB_CB(skb);
 
-   /* This is a hard error, log it. */
-   printk(KERN_ERR PFX %s: BUG! Tx Ring full when 
-  queue awake!\n, dev-name);
+   cb-base_flags = 0;
+   cb-mss = skb_shinfo(skb)-gso_size;
+   if (cb-mss != 0) {
+   if (skb_header_cloned(skb) 
+   pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
+   dev_kfree_skb(skb);
+   return NETDEV_TX_DROPPED;
}
-   return NETDEV_TX_BUSY;
+
+   cb-base_flags |= (TXD_FLAG_CPU_PRE_DMA |
+  TXD_FLAG_CPU_POST_DMA);
}
 
-   entry = tp-tx_prod;
-   base_flags = 0;
-   mss = 0;
-   if ((mss = skb_shinfo(skb)-gso_size) != 0) {
+   if (skb-ip_summed == CHECKSUM_PARTIAL)
+   cb-base_flags |=