Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-04 Thread jamal
On Wed, 2007-04-07 at 09:49 +0530, Krishna Kumar2 wrote:

> Do you see any contention for tx_lock which can justify having a prep
> handler? As I understood, no other CPU can be in the xmit code at the
> same time since RUNNING bit is held. Hence getting this lock early or
> late should not matter for the xmit side (and you are also holding
> dev->queue_lock while running prep so no new skbs can also be added to
> the dev during this time). And I couldn't find any driver that uses the
> same tx_lock on rx, 

On any non-LLTX driver:
netif_tx_lock() is grabbed on both tx and receive paths. 

I have to admit, the e1000 is too clever for its own good:
there is no documentation/description (and Herbert never explained) but
it seems to be able to play with only certain parts of the ring on tx
side and others on rx side so that a lock becomes unnecessary. So even
for non-LLTX, some lock needs to be held on rx just to be clean (and
theres a lot of grunty-ness against LLTX drivers out there).

> so where is the saving by doing prep ?

You would agree i think that if there is contention between two CPUs for
the same lock, gut feeling is there is benefit to do things outside the
lock when possible. This becomes more important with batching when
amortizing the cost of a lock.
Theres a little more than that and I could write up a longer description
if needed.

Also note: not all drivers may be able to move things outside of the
lock - infact the tun driver i converted is hard to do anything with
since it doesnt have any hardware features that makes it easy to move
things outside. For this reason this api is optional (and tun infact
doesnt use it); and as i have said a few times now, i will be more than
happy to get rid of it if experiments show it is unnecessary. For now my
take is someone needs to disprove it is valuable (most prefered with
experiments).

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread Krishna Kumar2
Hi Jamal,

J Hadi Salim <[EMAIL PROTECTED]> wrote on 07/03/2007 06:56:20 PM:

> On Mon, 2007-02-07 at 17:21 -0700, Michael Chan wrote:
>
> [Matt, please include the count in the fix per previous email]
> long answer:
> My goal with storing these values and computing them was to do certain
> things that dont require holding the netif_tx_lock within a batch as
> well. Evaluating the packet metadata and formating the packet to be
> ready for stashing into DMA was one thing i could do outside of holding
> the lock easily - and running that in loop of 100 packets amortizes the
> instruction cache and allows me (when i get to it) to hold the lock for
> a lot less computation.

Do you see any contention for tx_lock which can justify having a prep
handler? As I understood, no other CPU can be in the xmit code at the
same time since RUNNING bit is held. Hence getting this lock early or
late should not matter for the xmit side (and you are also holding
dev->queue_lock while running prep so no new skbs can also be added to
the dev during this time). And I couldn't find any driver that uses the
same tx_lock on rx, so where is the saving by doing prep ?

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread jamal
On Tue, 2007-03-07 at 12:31 -0700, Matt Carlson wrote:

> I had planned on using netperf, but pktgen sounds like a more controlled
> environment.  Thanks for the tip.

I can help more if you use pktgen - netperf is more involved. Also
pktgen is much closer to the driver so it would let you see closely
if any improvements are obvious or not. 

> Do you see any reason why we couldn't just add the VLAN code to the prep
> stage and simply overwrite that portion of the skb-cb?  Our driver would
> just store its value in the base_flags member of tg3_tx_cbdata.

That would be a much better approach;-> Go for it.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread David Miller
From: jamal <[EMAIL PROTECTED]>
Date: Tue, 03 Jul 2007 09:09:48 -0400

> [It sounds very dangerous to me the way skb->cb is being used by the
> vlan code (i.e requires human intervention/knowledge to catch it as an
> issue). I had no freaking idea the vlan code was using it. Maybe a huge
> comment somewhere on how these cbs are being used by drivers would help
> or even a registration on startup to just make sure there is no conflict
> at a layer (i have been meaning to do the later for years now). In any
> case this is is a digression]. 

This is how the VLAN layer passed in the tag for the transmit
function.

The CB is owned by the caller until you spam it with your local
data at your level, so if you need any information the previous
layer provides you have to sample it before using the ->cb[]

So you can simply fix this by reading the VLAN tag, then using
the ->cb[] however you like.

It's always been like this, don't be surprised :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread Matt Carlson
On Tue, 2007-07-03 at 09:09 -0400, jamal wrote:
> On Mon, 2007-02-07 at 14:20 -0700, Matt Carlson wrote:
> 
> 
> > 
> > Hi Jamal.  I'll be testing your patch soon,
> 
> much thanks. Please let me know if you need help while doing this.
> What tools are you planning to test with? I have tested this patch
> with pktgen on a dual opteron/tg3-buggy. 
> There is an outstanding issue in regards to clocksource - however the
> batch does well regardless of the clock source.

I had planned on using netperf, but pktgen sounds like a more controlled
environment.  Thanks for the tip.

> >  but I wanted to point out a
> > bug in the patch.  The patch defines TG3_SKB_CB() as follows :
> > 
> > #define TG3_SKB_CB(__skb) ((struct tg3_tx_cbdata *)&((__skb)->cb[0]))
> > 
> > This definition will collide with the VLAN macros if TG3_VLAN_TAG_USED
> > is set.  vlan_tx_tag_get() is defined as :
> > 
> > #define vlan_tx_tag_get(__skb)  (VLAN_TX_SKB_CB(__skb)->vlan_tag)
> > 
> > VLAN_TX_SKB_CB is defined as :
> > 
> > #define VLAN_TX_SKB_CB(__skb) \
> > ((struct vlan_skb_tx_cookie *)&((__skb)->cb[0]))
> > 
> 
> yikes. Thanks for catching that - I thought i had this pretty much
> covered after scanning the source. So that bug exists on the e1000 as
> well.
> [It sounds very dangerous to me the way skb->cb is being used by the
> vlan code (i.e requires human intervention/knowledge to catch it as an
> issue). I had no freaking idea the vlan code was using it. Maybe a huge
> comment somewhere on how these cbs are being used by drivers would help
> or even a registration on startup to just make sure there is no conflict
> at a layer (i have been meaning to do the later for years now). In any
> case this is is a digression]. 
> 
> In the meantime, changing it to use byte 8 and above should do it? i.e.
> #define TG3_SKB_CB(__skb) ((struct tg3_tx_cbdata *)&((__skb)->cb[8]))
> 
> There are 48 bytes there on the skb-cb, so there should be plenty.

Do you see any reason why we couldn't just add the VLAN code to the prep
stage and simply overwrite that portion of the skb-cb?  Our driver would
just store its value in the base_flags member of tg3_tx_cbdata.

> > Also, I think the count, max_per_txd, and nr_frags fields of the
> > tg3_tx_cbdata struct are not needed.
> 
> Yes, you are right. That was a result of the LinuxWay(tm) (aka cutnpaste
> from the e1000 which needs them), Can you send me a patch for that and
> TG3_SKB_CB? 

Once we iron out the skb-cb issue, sure.

> 
> cheers,
> jamal
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread jamal
On Mon, 2007-02-07 at 17:21 -0700, Michael Chan wrote:

[Matt, please include the count in the fix per previous email]

> Only base_flags and mss are needed and these can be determined right
> before sending the frame.  So is it better not to store these in the
> skb->cb at all?

long answer:
My goal with storing these values and computing them was to do certain
things that dont require holding the netif_tx_lock within a batch as
well. Evaluating the packet metadata and formating the packet to be
ready for stashing into DMA was one thing i could do outside of holding
the lock easily - and running that in loop of 100 packets amortizes the
instruction cache and allows me (when i get to it) to hold the lock for
a lot less computation.
In the e1000 for example i was able to go as far as computing how many
descriptors are needed per skb and stashing those in the skb->cb; the
way the tg3 is structured makes doing all that needing some big changes.
So to answer your question: It would be nice if i could keep the skb->cb
for now and we can get rid of it later if deemed a nuisance for
batching.

>  
> + dcount = tg3_tx_avail(tp);
>   if (unlikely(netif_queue_stopped(tp->dev) &&
> -  (tg3_tx_avail(tp) > TG3_TX_WAKEUP_THRESH(tp {
> +  (dcount > TG3_TX_WAKEUP_THRESH(tp {
>   netif_tx_lock(tp->dev);
> + tp->dev->xmit_win = 1;
>   if (netif_queue_stopped(tp->dev) &&
> - (tg3_tx_avail(tp) > TG3_TX_WAKEUP_THRESH(tp)))
> + (dcount > TG3_TX_WAKEUP_THRESH(tp))) {
>   netif_wake_queue(tp->dev);
> + tp->dev->xmit_win = dcount;
> + }
>   netif_tx_unlock(tp->dev);
>   }
>  }
> 
> This is also not right.  tg3_tx() runs without netif_tx_lock().
> tg3_tx_avail() can change after you get the netif_tx_lock() and we must
> get the updated value again.  If we just rely on dcount, we can call
> wake_queue() when the ring is full, or there may be no wakeup when the
> ring is empty.

You are right, I was trying to be a smart-ass there so i didnt have to
fold the lines (for readability). Can you or Matt restore it to the way
it was originally while keeping the xmit_win update and just send me a
patch?

Thanks a lot Michael and Matt for looking at this and improving it.
Maybe i should switch all my nics to tg3 from now on ;->

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-03 Thread jamal
On Mon, 2007-02-07 at 14:20 -0700, Matt Carlson wrote:


> 
> Hi Jamal.  I'll be testing your patch soon,

much thanks. Please let me know if you need help while doing this.
What tools are you planning to test with? I have tested this patch
with pktgen on a dual opteron/tg3-buggy. 
There is an outstanding issue in regards to clocksource - however the
batch does well regardless of the clock source.

>  but I wanted to point out a
> bug in the patch.  The patch defines TG3_SKB_CB() as follows :
> 
> #define TG3_SKB_CB(__skb) ((struct tg3_tx_cbdata *)&((__skb)->cb[0]))
> 
> This definition will collide with the VLAN macros if TG3_VLAN_TAG_USED
> is set.  vlan_tx_tag_get() is defined as :
> 
> #define vlan_tx_tag_get(__skb)  (VLAN_TX_SKB_CB(__skb)->vlan_tag)
> 
> VLAN_TX_SKB_CB is defined as :
> 
> #define VLAN_TX_SKB_CB(__skb) \
> ((struct vlan_skb_tx_cookie *)&((__skb)->cb[0]))
> 

yikes. Thanks for catching that - I thought i had this pretty much
covered after scanning the source. So that bug exists on the e1000 as
well.
[It sounds very dangerous to me the way skb->cb is being used by the
vlan code (i.e requires human intervention/knowledge to catch it as an
issue). I had no freaking idea the vlan code was using it. Maybe a huge
comment somewhere on how these cbs are being used by drivers would help
or even a registration on startup to just make sure there is no conflict
at a layer (i have been meaning to do the later for years now). In any
case this is is a digression]. 

In the meantime, changing it to use byte 8 and above should do it? i.e.
#define TG3_SKB_CB(__skb) ((struct tg3_tx_cbdata *)&((__skb)->cb[8]))

There are 48 bytes there on the skb-cb, so there should be plenty.

> Also, I think the count, max_per_txd, and nr_frags fields of the
> tg3_tx_cbdata struct are not needed.

Yes, you are right. That was a result of the LinuxWay(tm) (aka cutnpaste
from the e1000 which needs them), Can you send me a patch for that and
TG3_SKB_CB? 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-02 Thread Michael Chan
On Mon, 2007-07-02 at 14:20 -0700, Matt Carlson wrote:

> 
> Also, I think the count, max_per_txd, and nr_frags fields of the
> tg3_tx_cbdata struct are not needed.

The count field is not needed also.

+struct tg3_tx_cbdata {
+   u32 base_flags;
+   int count;
+   unsigned int max_per_txd;
+   unsigned int nr_frags;
+   unsigned int mss;
+};
+#define TG3_SKB_CB(__skb)   ((struct tg3_tx_cbdata *)&((__skb)->cb[0]))

Only base_flags and mss are needed and these can be determined right
before sending the frame.  So is it better not to store these in the
skb->cb at all?

@@ -3118,12 +3120,16 @@ static void tg3_tx(struct tg3 *tp)
 */
smp_mb();
 
+   dcount = tg3_tx_avail(tp);
if (unlikely(netif_queue_stopped(tp->dev) &&
-(tg3_tx_avail(tp) > TG3_TX_WAKEUP_THRESH(tp {
+(dcount > TG3_TX_WAKEUP_THRESH(tp {
netif_tx_lock(tp->dev);
+   tp->dev->xmit_win = 1;
if (netif_queue_stopped(tp->dev) &&
-   (tg3_tx_avail(tp) > TG3_TX_WAKEUP_THRESH(tp)))
+   (dcount > TG3_TX_WAKEUP_THRESH(tp))) {
netif_wake_queue(tp->dev);
+   tp->dev->xmit_win = dcount;
+   }
netif_tx_unlock(tp->dev);
}
 }

This is also not right.  tg3_tx() runs without netif_tx_lock().
tg3_tx_avail() can change after you get the netif_tx_lock() and we must
get the updated value again.  If we just rely on dcount, we can call
wake_queue() when the ring is full, or there may be no wakeup when the
ring is empty.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching - tg3 support

2007-07-02 Thread Matt Carlson
On Wed, 2007-06-27 at 20:05 -0400, jamal wrote:
> peoplez,
> 
> I have added support for tg3 on batching. I see equivalent performance
> improvement for pktgen as i did with e1000 when using gige. 
> I have only tested on two machines (one being a laptop which does
> 10/100Mbps). Unfortunately in both cases these are considered to be in
> the class of  "buggy" tg3s (which take a longer code path).
> 
> To the tg3 folks - can you double check if am off on something?
> I have split a few things that you may like as well.
> I havent upgraded the tree - it is still circa 2.6.22-rc4 based; at some
> point i will sync with Daves net-26
> 
> Anyone who has tg3 based hardware: I would appreciate any testing and
> results ...
> 
> The git tree is at:
> git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git
> but i have attached the patch in case you just wanna stare.
> 
> cheers,
> jamal

Hi Jamal.  I'll be testing your patch soon, but I wanted to point out a
bug in the patch.  The patch defines TG3_SKB_CB() as follows :

#define TG3_SKB_CB(__skb) ((struct tg3_tx_cbdata *)&((__skb)->cb[0]))

This definition will collide with the VLAN macros if TG3_VLAN_TAG_USED
is set.  vlan_tx_tag_get() is defined as :

#define vlan_tx_tag_get(__skb)  (VLAN_TX_SKB_CB(__skb)->vlan_tag)

VLAN_TX_SKB_CB is defined as :

#define VLAN_TX_SKB_CB(__skb) \
((struct vlan_skb_tx_cookie *)&((__skb)->cb[0]))

Also, I think the count, max_per_txd, and nr_frags fields of the
tg3_tx_cbdata struct are not needed.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html