Re: netif_tx_disable and lockless TX

2006-06-14 Thread jamal
On Wed, 2006-31-05 at 19:52 +0200, Robert Olsson wrote:
> jamal writes:
> 
>  > Latency-wise: TX completion interrupt provides the best latency.
>  > Processing in the poll() -aka softirq- was almost close to the hardirq
>  > variant. So if you can make things run in a softirq such as transmit
>  > one, then the numbers will likely stay the same.
>  
>  I don't remember we tried tasklet for TX a la Herbert's suggestion but we 
>  used use tasklets for controlling RX processing to avoid hardirq livelock
>  in pre-NAPI versions.
> 

Hrm - it may have been a private thing i did then. I could swear we did
that experiment together ...
Perhaps Herbert's motivation was not really to optimize but rather to
get something unstuck in the transmit path state machine maybe in a
context of netconsole? The conditions for which that tasklet would even
run require a CPU collision to the transmit. Sorry, I didnt quiet follow
the motivation/discussion that ended in that patch.

>  Had variants of tulip driver with both TX cleaning at ->poll and TX
>  cleaning at hardirq and didn't see any performance difference. The 
>  ->poll was much cleaner but we kept Alexey's original work for tulip.
> 

It certainly is cleaner - but i do recall the hardirq variant had better
latency much observable under high packet rates aka small packets. 

>  > Sorry, I havent been following discussions on netchannels[1] so i am not
>  > qualified to comment on the "replacement" part Dave mentioned earlier.
>  > What I can say is the tx processing doesnt have to be part of the NAPI
>  > poll() and still use hardirq.
> 
>  Yes true but I see TX numbers with newer boards (wire rate small pakets)
>  with cleaing in ->poll. Also now linux is very safe in network "overload" 
>  situations. Moving work to hardirq may change that.
> 

Oh, I am not suggesting a change - i am a lot more conservative than
that ;-> these areas are delicate (not code-delicate Acme ;->) but
rather what seems obvious requires a lot of experimental results first.

Robert, your transmit results Intel or AMD based?

cheers,
jamal



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-06-02 Thread Robert Olsson

Stephen Hemminger writes:

 > I also noticed that you really don't save much by doing TX cleaning at 
 > hardirq, because in hardirq you need to do dev_kfree_irq and that causes 
 > a softirq (for the routing case where users=1). So when routing it 
 > doesn't make much difference, both methods cause the softirq delayed 
 > processing to be invoked. For locally generated packets which are 
 > cloned, the hardirq will drop the ref count, and that is faster than 
 > doing the whole softirq round trip.

 Right. Also the other way around, repeated ->poll can avoid TX hardirq's.

 Cheers.
--ro
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-06-01 Thread Stephen Hemminger

Robert Olsson wrote:

jamal writes:

 > Latency-wise: TX completion interrupt provides the best latency.
 > Processing in the poll() -aka softirq- was almost close to the hardirq
 > variant. So if you can make things run in a softirq such as transmit
 > one, then the numbers will likely stay the same.
 
 I don't remember we tried tasklet for TX a la Herbert's suggestion but we 
 used use tasklets for controlling RX processing to avoid hardirq livelock

 in pre-NAPI versions.

 Had variants of tulip driver with both TX cleaning at ->poll and TX
 cleaning at hardirq and didn't see any performance difference. The 
 ->poll was much cleaner but we kept Alexey's original work for tulip.


 > Sorry, I havent been following discussions on netchannels[1] so i am not
 > qualified to comment on the "replacement" part Dave mentioned earlier.
 > What I can say is the tx processing doesnt have to be part of the NAPI
 > poll() and still use hardirq.

 Yes true but I see TX numbers with newer boards (wire rate small pakets)
 with cleaing in ->poll. Also now linux is very safe in network "overload" 
 situations. Moving work to hardirq may change that.



  
I also noticed that you really don't save much by doing TX cleaning at 
hardirq, because in hardirq you need to do dev_kfree_irq and that causes 
a softirq (for the routing case where users=1). So when routing it 
doesn't make much difference, both methods cause the softirq delayed 
processing to be invoked. For locally generated packets which are 
cloned, the hardirq will drop the ref count, and that is faster than 
doing the whole softirq round trip.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Herbert Xu
On Wed, May 31, 2006 at 05:27:53PM -0700, Michael Chan wrote:
> 
> Now that I think about it some more, we can eliminate most of the
> tx_lock in tg3 without converting to xmit_lock. In most places, we get
> the tx_lock after we call netif_tx_disable(), etc, and quiesce the chip.
> So the tx_lock or xmit_lock is really not necessary in most places. The
> tp->lock is sufficient after we stop everything.

Sounds good.

> The only few remaining places where we need it is around
> netif_stop_queue/netif_wake_queue in hard_start_xmit and tx completion.
> We can either keep the tx_lock or convert to xmit_lock and either way
> makes little difference because these are not in the normal fast path.
> If we keep the tx_lock, it will be just like BNX2.

Now that I look at it the BNX2 method looks perfect.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Michael Chan
On Thu, 2006-06-01 at 10:42 +1000, Herbert Xu wrote:
> On Wed, May 31, 2006 at 04:01:06PM -0700, Michael Chan wrote:
> > 
> > Oh, I thought your idea was to keep the LLTX and just replace tx_lock
> > with xmit_lock in tg3. But I suppose we can also clear LLTX, remove the
> > tx_lock in hard_start_xmit and convert the rest to avoid changing
> > netpoll.
> 
> That's what I meant.  This has the advantage that things like AF_PACKET
> won't see duplicate packets.

Now that I think about it some more, we can eliminate most of the
tx_lock in tg3 without converting to xmit_lock. In most places, we get
the tx_lock after we call netif_tx_disable(), etc, and quiesce the chip.
So the tx_lock or xmit_lock is really not necessary in most places. The
tp->lock is sufficient after we stop everything.

The only few remaining places where we need it is around
netif_stop_queue/netif_wake_queue in hard_start_xmit and tx completion.
We can either keep the tx_lock or convert to xmit_lock and either way
makes little difference because these are not in the normal fast path.
If we keep the tx_lock, it will be just like BNX2.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Herbert Xu
On Wed, May 31, 2006 at 04:01:06PM -0700, Michael Chan wrote:
> 
> Oh, I thought your idea was to keep the LLTX and just replace tx_lock
> with xmit_lock in tg3. But I suppose we can also clear LLTX, remove the
> tx_lock in hard_start_xmit and convert the rest to avoid changing
> netpoll.

That's what I meant.  This has the advantage that things like AF_PACKET
won't see duplicate packets.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Michael Chan
On Thu, 2006-06-01 at 10:25 +1000, Herbert Xu wrote:
> On Wed, May 31, 2006 at 05:09:08PM -0700, David Miller wrote:
> > From: "Michael Chan" <[EMAIL PROTECTED]>
> > Date: Wed, 31 May 2006 14:20:29 -0700
> > 
> > > David, So do we want to fix this issue as proposed by Herbert to replace
> > > tx_lock with xmit_lock? It seems quite straightforward to do. For this
> > > change to work, netpoll also needs to be fixed up a bit to check for
> > > LLTX before getting the xmit_lock.
> > 
> > Oh yes, netpoll needs changes, thanks for noticing that.
> 
> Why does it need to change? The idea is to take the LLTX flag off tg3
> after the conversion to xmit_lock.

Oh, I thought your idea was to keep the LLTX and just replace tx_lock
with xmit_lock in tg3. But I suppose we can also clear LLTX, remove the
tx_lock in hard_start_xmit and convert the rest to avoid changing
netpoll.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Herbert Xu
On Wed, May 31, 2006 at 05:09:08PM -0700, David Miller wrote:
> From: "Michael Chan" <[EMAIL PROTECTED]>
> Date: Wed, 31 May 2006 14:20:29 -0700
> 
> > David, So do we want to fix this issue as proposed by Herbert to replace
> > tx_lock with xmit_lock? It seems quite straightforward to do. For this
> > change to work, netpoll also needs to be fixed up a bit to check for
> > LLTX before getting the xmit_lock.
> 
> Oh yes, netpoll needs changes, thanks for noticing that.

Why does it need to change? The idea is to take the LLTX flag off tg3
after the conversion to xmit_lock.

> I think this netpoll wrinkle means we also have to make
> sure to set the xmit_lock_owner across the board.

You're right.  In fact this can deadlock today for those drivers that
already make use of xmit_lock without setting the owner.  So I suppose
something like net_xmit_lock to obtain xmit_lock is called for.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread David Miller
From: "Michael Chan" <[EMAIL PROTECTED]>
Date: Wed, 31 May 2006 14:20:29 -0700

> David, So do we want to fix this issue as proposed by Herbert to replace
> tx_lock with xmit_lock? It seems quite straightforward to do. For this
> change to work, netpoll also needs to be fixed up a bit to check for
> LLTX before getting the xmit_lock.

Oh yes, netpoll needs changes, thanks for noticing that.

That's why I had the idea to make xmit_lock a pointer, so
that things like that netpoll case could be transparent.

I think this netpoll wrinkle means we also have to make
sure to set the xmit_lock_owner across the board.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Michael Chan
On Tue, 2006-05-30 at 22:11 -0700, David Miller wrote:
> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Wed, 31 May 2006 14:58:11 +1000
> 
> > Yes, TG3 does not disable IRQs when taking its TX lock.  So do you see
> > any problems with replacing the TG3 TX lock using xmit_lock?
> 
> I don't see any.
> 
David, So do we want to fix this issue as proposed by Herbert to replace
tx_lock with xmit_lock? It seems quite straightforward to do. For this
change to work, netpoll also needs to be fixed up a bit to check for
LLTX before getting the xmit_lock.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Robert Olsson

jamal writes:

 > Latency-wise: TX completion interrupt provides the best latency.
 > Processing in the poll() -aka softirq- was almost close to the hardirq
 > variant. So if you can make things run in a softirq such as transmit
 > one, then the numbers will likely stay the same.
 
 I don't remember we tried tasklet for TX a la Herbert's suggestion but we 
 used use tasklets for controlling RX processing to avoid hardirq livelock
 in pre-NAPI versions.

 Had variants of tulip driver with both TX cleaning at ->poll and TX
 cleaning at hardirq and didn't see any performance difference. The 
 ->poll was much cleaner but we kept Alexey's original work for tulip.

 > Sorry, I havent been following discussions on netchannels[1] so i am not
 > qualified to comment on the "replacement" part Dave mentioned earlier.
 > What I can say is the tx processing doesnt have to be part of the NAPI
 > poll() and still use hardirq.

 Yes true but I see TX numbers with newer boards (wire rate small pakets)
 with cleaing in ->poll. Also now linux is very safe in network "overload" 
 situations. Moving work to hardirq may change that.

 Cheers.

--ro
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread jamal
On Wed, 2006-31-05 at 22:40 +1000, Herbert Xu wrote:
> On Wed, May 31, 2006 at 08:36:12AM -0400, jamal wrote:
> >
[..]
> > 
> > Been done in the past, bad numbers especially in SMP for reasons of
> > latency and likelihood that a tasklet will run in a totally different
> > CPU.
> 
> Was the bad latency measured with code like the above or was it with
> an unconditional tasklet schedule?
> 

The idea was to prune those skbs for reuse within reason so as to
not sacrifice perfomance. Perfomance is sacrificed if you have too many
TX EOLs, or you dont prune the descriptors fast enough. The dynamics of
proved to be hard to nail in a generic way because behavior varies based
on traffic patterns.
So the "within reason" part will unconditionally schedule a tasklet on
the tx if certain DMA ring thresholds are crossed, way after the tx lock
has been grabbed which is different from your suggestion to do it when
you cant grab a lock.
Otherwise you let the hardirq take care of things. Tasklets proved to be
a bad idea and instead we ended reclaiming the skb/descriptors right at
the point where  we initially had scheduled tasklets. I dont know what
happened to using that technique - Robert can testify better than i.

> Please note that my code will do the completion in the IRQ handler
> most of the time.  Only if there is contention for the spin lock will
> it defer work to the softirq.

that part is different and I think there may be potential - it will
require experimenting with a variety of traffic patterns. I wish i had
time to help.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Herbert Xu
On Wed, May 31, 2006 at 08:36:12AM -0400, jamal wrote:
>
> > The IRQ handler would look like
> > 
> > if (!spin_trylock(&dev->xmit_lock)) {
> > tasklet_schedule(&priv->tx_completion_tasklet);
> > return;
> > }
> > 
> > handle_tx_completion();
> > spin_unlock(&dev->xmit_lock);
> > 
> > Where the TX completion tasklet would simply do
> > 
> > spin_lock(&dev->xmit_lock);
> > handle_tx_completion();
> > spin_unlock(&dev->xmit_lock);
> > 
> > What do you think?
> 
> Been done in the past, bad numbers especially in SMP for reasons of
> latency and likelihood that a tasklet will run in a totally different
> CPU.

Was the bad latency measured with code like the above or was it with
an unconditional tasklet schedule?

Please note that my code will do the completion in the IRQ handler
most of the time.  Only if there is contention for the spin lock will
it defer work to the softirq.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread jamal
On Wed, 2006-31-05 at 22:06 +1000, Herbert Xu wrote:
> On Wed, May 31, 2006 at 12:08:18AM -0700, David Miller wrote:
> > 
> > I understand what you're saying.  What I'm talking about is that in
> > net channel drivers we might go back to IRQ disabling locks again.
> 
> OK, let's assume that the TX completion will go back into the IRQ
> handler.  I contend that we can still get by with a BH-disabling
> xmit_lock.  This is how it would work:
> 
> The IRQ handler would look like
> 
>   if (!spin_trylock(&dev->xmit_lock)) {
>   tasklet_schedule(&priv->tx_completion_tasklet);
>   return;
>   }
> 
>   handle_tx_completion();
>   spin_unlock(&dev->xmit_lock);
> 
> Where the TX completion tasklet would simply do
> 
>   spin_lock(&dev->xmit_lock);
>   handle_tx_completion();
>   spin_unlock(&dev->xmit_lock);
> 
> What do you think?

Been done in the past, bad numbers especially in SMP for reasons of
latency and likelihood that a tasklet will run in a totally different
CPU.

Latency-wise: TX completion interrupt provides the best latency.
Processing in the poll() -aka softirq- was almost close to the hardirq
variant. So if you can make things run in a softirq such as transmit
one, then the numbers will likely stay the same.

Sorry, I havent been following discussions on netchannels[1] so i am not
qualified to comment on the "replacement" part Dave mentioned earlier.
What I can say is the tx processing doesnt have to be part of the NAPI
poll() and still use hardirq.

If you look at the earlier NAPI drivers such as the tulip or the
broadcom 1250, the TX EOL was always made an exception and was never
touched by the ->poll() code rather by the main interrupt routine. I do
prefer the scheme of leaving the TX EOL out of the NAPI poll because i
believe that it provides better performance. The caveat is there's a lot
of fscked hardware out there that does unconditional clear-on-read of
the int status register;-> I had a lot of fun with bcm1250 and ended
discovering that there were infact two status regs ;-> (it pays to know
people for undocumented things;->) one was for debug version which
doesnt clear on writing. So in one spot i would read the real register
and in the other the debug version. I had numbers which showed doing it
this way provided better performance than doing it in the poll() routine
(scouring the code just now it seems my version never made it in, so you
will see something along the lines of e1000/tg3).

BTW, Andi Kleen on CC is the person who posted some numbers on the LLTX
(as well as the patch) at some point.
I have also CCed Robert who may have comments on the use of tasklets. 

cheers,
jamal

[1] Such a shame, such an exciting topic, such few cycles. I will have
time RealSoonNow and you will probably see a_patch_from_me_too.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread Herbert Xu
On Wed, May 31, 2006 at 12:08:18AM -0700, David Miller wrote:
> 
> I understand what you're saying.  What I'm talking about is that in
> net channel drivers we might go back to IRQ disabling locks again.

OK, let's assume that the TX completion will go back into the IRQ
handler.  I contend that we can still get by with a BH-disabling
xmit_lock.  This is how it would work:

The IRQ handler would look like

if (!spin_trylock(&dev->xmit_lock)) {
tasklet_schedule(&priv->tx_completion_tasklet);
return;
}

handle_tx_completion();
spin_unlock(&dev->xmit_lock);

Where the TX completion tasklet would simply do

spin_lock(&dev->xmit_lock);
handle_tx_completion();
spin_unlock(&dev->xmit_lock);

What do you think?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-31 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 31 May 2006 16:31:52 +1000

> On Tue, May 30, 2006 at 11:26:26PM -0700, David Miller wrote:
> > 
> > I don't think it will be worthwhile to keep NAPI around just for
> > TX completion.  Sure the dev_kfree_skb() will schedule software
> > interrupt work to do the actual free, but the TX ring walking
> > and dev_kfree_skb() calls will be in hard IRQ context.
> 
> Sure we won't need all of NAPI.  But would it be that bad to schedule
> a tasklet to do the TX completion?

It seems just an extra level of indirection. :)

> Oh, I'm not proposing that we disable IRQs on xmit_lock at all.
 ...

I understand what you're saying.  What I'm talking about is that in
net channel drivers we might go back to IRQ disabling locks again.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread Herbert Xu
On Tue, May 30, 2006 at 11:26:26PM -0700, David Miller wrote:
> 
> I don't think it will be worthwhile to keep NAPI around just for
> TX completion.  Sure the dev_kfree_skb() will schedule software
> interrupt work to do the actual free, but the TX ring walking
> and dev_kfree_skb() calls will be in hard IRQ context.

Sure we won't need all of NAPI.  But would it be that bad to schedule
a tasklet to do the TX completion?

> > > The only regret I have about that is we will go back to not being able
> > > to profile ->hard_start_xmit() very well in such drivers.
> > 
> > Can you elborate on that? I think I've already removed all references
> > to this in my memory :)
> 
> If you disable IRQs in the ->hard_start_xmit() handler, you don't
> get timer based profiling ticks.   Currently we do.

Oh, I'm not proposing that we disable IRQs on xmit_lock at all.  What
I'm saying is that for tg3, since both xmit_lock and tx_lock are BH-
disabling locks, they are equivalent and therefore we can replace
tx_lock with xmit_lock.

For other lockless NICs that have IRQ disabling tx_locks, they can
be converted to BH-disabling ones and then be able to use xmit_lock.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 31 May 2006 15:14:51 +1000

> On Tue, May 30, 2006 at 10:11:17PM -0700, David Miller wrote:
> > 
> > It will come back when net channels arrive :-) This is because net
> > channel drivers will do all the work in hard IRQs and not in a NAPI or
> > NAPI-like context.
> 
> I thought the current channel stuff is RX only.  Is TX completion moving
> to IRQ context as well?

I don't think it will be worthwhile to keep NAPI around just for
TX completion.  Sure the dev_kfree_skb() will schedule software
interrupt work to do the actual free, but the TX ring walking
and dev_kfree_skb() calls will be in hard IRQ context.

> > The only regret I have about that is we will go back to not being able
> > to profile ->hard_start_xmit() very well in such drivers.
> 
> Can you elborate on that? I think I've already removed all references
> to this in my memory :)

If you disable IRQs in the ->hard_start_xmit() handler, you don't
get timer based profiling ticks.   Currently we do.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread Herbert Xu
On Tue, May 30, 2006 at 10:30:25PM -0700, Michael Chan wrote:
> 
> I also don't see any problem. It looks like we don't have to set
> xmit_lock_owner, right?

Right, unless you want to call dev_queue_xmit recursively :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread Michael Chan
David Miller wrote:

> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Wed, 31 May 2006 14:58:11 +1000
> 
> > Yes, TG3 does not disable IRQs when taking its TX lock.  So do you
see
> > any problems with replacing the TG3 TX lock using xmit_lock?
> 
> I don't see any.
> 

I also don't see any problem. It looks like we don't have to set
xmit_lock_owner, right?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread Herbert Xu
On Tue, May 30, 2006 at 10:11:17PM -0700, David Miller wrote:
> 
> It will come back when net channels arrive :-) This is because net
> channel drivers will do all the work in hard IRQs and not in a NAPI or
> NAPI-like context.

I thought the current channel stuff is RX only.  Is TX completion moving
to IRQ context as well?

> The only regret I have about that is we will go back to not being able
> to profile ->hard_start_xmit() very well in such drivers.

Can you elborate on that? I think I've already removed all references
to this in my memory :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 31 May 2006 14:58:11 +1000

> Hi Michael:
> 
> On Tue, May 30, 2006 at 09:51:03PM -0700, Michael Chan wrote:
> >
> > > That's why I suggest that every NIC that uses this feature be forced
> > > to do what TG3 does so only BH disabling is needed.  Once that's done
> > > they can just use xmit_lock and everyone will be happy again.
> > 
> > As long as the tx completion is all done in NAPI, it can use BH
> > disabling
> > without irq disabling.
> 
> Yes, TG3 does not disable IRQs when taking its TX lock.  So do you see
> any problems with replacing the TG3 TX lock using xmit_lock?

I don't see any.

Thanks for reminding me about the IRQ vs. BH disabling issue.

It will come back when net channels arrive :-) This is because net
channel drivers will do all the work in hard IRQs and not in a NAPI or
NAPI-like context.

The only regret I have about that is we will go back to not being able
to profile ->hard_start_xmit() very well in such drivers.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread Herbert Xu
Hi Michael:

On Tue, May 30, 2006 at 09:51:03PM -0700, Michael Chan wrote:
>
> > That's why I suggest that every NIC that uses this feature be forced
> > to do what TG3 does so only BH disabling is needed.  Once that's done
> > they can just use xmit_lock and everyone will be happy again.
> 
> As long as the tx completion is all done in NAPI, it can use BH
> disabling
> without irq disabling.

Yes, TG3 does not disable IRQs when taking its TX lock.  So do you see
any problems with replacing the TG3 TX lock using xmit_lock?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread Michael Chan
Herbert Xu wrote:

> On Tue, May 30, 2006 at 09:13:22PM -0700, David Miller wrote:
> > As per the bug, I always keep thinking about changing how we
> > do the lockless stuff by simply making xmit_lock a pointer.
> > Drivers like tg3 can make it point to their local driver lock.
> 
> This is equivalent to just using the xmit_lock in the driver, right?
> IIRC the problem was with IRQ disabling vs. BH disabling.

For drivers that always take a private tx lock between tx completion
and hard_start_xmit on top of the xmit_lock, getting rid of the
xmit_lock using LLTX is a net gain.

For drivers that don't need a private tx lock or rarely need it between
tx completion and hard_start_xmit, it makes little difference whether
you use xmit_lock or private lock with LLTX.

> 
> That's why I suggest that every NIC that uses this feature be forced
> to do what TG3 does so only BH disabling is needed.  Once that's done
> they can just use xmit_lock and everyone will be happy again.
> 

As long as the tx completion is all done in NAPI, it can use BH
disabling
without irq disabling.

TG3 uses LLTX with a private tx_lock. BNX2 uses xmit_lock with a private
tx_lock that is rarely used. Both drivers use BH disabling. The profiles
of the 2 drivers are very similar.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread Herbert Xu
On Tue, May 30, 2006 at 09:13:22PM -0700, David Miller wrote:
> 
> There were numbers posted for real drivers at one point, I
> do remember that, and the gains were non-trivial.  At least
> enough to justify the feature.

I wonder if there are other ways of achieving the same improvement.

> As per the bug, I always keep thinking about changing how we
> do the lockless stuff by simply making xmit_lock a pointer.
> Drivers like tg3 can make it point to their local driver lock.

This is equivalent to just using the xmit_lock in the driver, right?
IIRC the problem was with IRQ disabling vs. BH disabling.

That's why I suggest that every NIC that uses this feature be forced
to do what TG3 does so only BH disabling is needed.  Once that's done
they can just use xmit_lock and everyone will be happy again.

> This leaves still the cases like loopback which do no locking
> whatsoever.  I don't know how to fit it into the above idea.

As far as I'm concerned loopback can do whatever it wants as long as
it doesn't create bugs in real NICs :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread Roland Dreier
Herbert> However, lockless drivers do not take the xmit_lock so
Herbert> this method is ineffective.  Such drivers need to do
Herbert> their own checking inside whatever locks that they do
Herbert> take.  For example, tg3 could get around this by checking
Herbert> whether the queue is stopped in its hard_start_xmit
Herbert> function.

Yes, I had to add this to the IPoIB driver, because calling
netif_stop_queue() when the transmit ring was full was sometimes still
allowing hard_start_xmit to be called again:

/*
 * Check if our queue is stopped.  Since we have the LLTX bit
 * set, we can't rely on netif_stop_queue() preventing our
 * xmit function from being called with a full queue.
 */
if (unlikely(netif_queue_stopped(dev))) {
spin_unlock_irqrestore(&priv->tx_lock, flags);
return NETDEV_TX_BUSY;
}

this bug started a long thread a while back, but I don't remember if
there was any resolution.

Herbert> I must say though that I'm becoming less and less
Herbert> impressed by the lockless feature based on the number of
Herbert> problems that it has caused.  Does anyone have any hard
Herbert> figures as to its effectiveness (excluding any stats
Herbert> relating to the loopback interface which can be easily
Herbert> separated from normal NIC drivers).

I don't have exact figures at hand, but I remember something like a 2 or
3 percent throughput improvement for IPoIB.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_tx_disable and lockless TX

2006-05-30 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 31 May 2006 14:03:07 +1000

> I must say though that I'm becoming less and less impressed by the
> lockless feature based on the number of problems that it has caused.
> Does anyone have any hard figures as to its effectiveness (excluding
> any stats relating to the loopback interface which can be easily
> separated from normal NIC drivers).

There were numbers posted for real drivers at one point, I
do remember that, and the gains were non-trivial.  At least
enough to justify the feature.

As per the bug, I always keep thinking about changing how we
do the lockless stuff by simply making xmit_lock a pointer.
Drivers like tg3 can make it point to their local driver lock.

This leaves still the cases like loopback which do no locking
whatsoever.  I don't know how to fit it into the above idea.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


netif_tx_disable and lockless TX

2006-05-30 Thread Herbert Xu
Hi:

It has occured to me that the function netif_tx_disable may be unsafe
when used in drivers that do lockless transmission.  In particular,
the function relies on taking the xmit_lock to ensure that no further
transmissions will occur until the queue is started again.

However, lockless drivers do not take the xmit_lock so this method
is ineffective.  Such drivers need to do their own checking inside
whatever locks that they do take.  For example, tg3 could get around
this by checking whether the queue is stopped in its hard_start_xmit
function.

I must say though that I'm becoming less and less impressed by the
lockless feature based on the number of problems that it has caused.
Does anyone have any hard figures as to its effectiveness (excluding
any stats relating to the loopback interface which can be easily
separated from normal NIC drivers).

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html