Re: [2.6 patch] net/llc/llc_conn.c: fix possible NULL dereference

2007-05-18 Thread David Miller
From: Eugene Teo <[EMAIL PROTECTED]>
Date: Sat, 19 May 2007 13:49:11 +0800

> Spotted by the Coverity checker.

Why am I not surprised :-(

There is no bug here, if Coverity warns every single time skb_peek()
is used and not tested against NULL, that's a very serious shortcoming
of Coverity or what it has been taught about SKB queues.

It needs to learn that skb_queue_len() != 0 implies skb_peek() will
not return NULL sans locking errors.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] net/llc/llc_conn.c: fix possible NULL dereference

2007-05-18 Thread Eugene Teo
Randy Dunlap wrote:
> On Sat, 19 May 2007 13:13:07 +0800 Eugene Teo wrote:
> 
>> skb_peek() might return an empty list. skb should be checked before calling
>> llc_pdu_sn_hdr() with it.
>>
>> Spotted by the Coverity checker.
>>
>> Signed-off-by: Eugene Teo <[EMAIL PROTECTED]>
[...]
> 
> Oh, and your patch has spaces instead of tabs.  It's a hassle to
> get thunderbird to send a patch that preserves tabs.  See if this:
>http://mbligh.org/linuxdocs/Email/Clients/Thunderbird
> helps you any.

Here's a resend:

skb_peek() might return an empty list. skb should be checked before calling
llc_pdu_sn_hdr() with it.

Spotted by the Coverity checker.

Signed-off-by: Eugene Teo <[EMAIL PROTECTED]>
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index 3b8cfbe..6d3a07e 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -324,6 +324,8 @@ int llc_conn_remove_acked_pdus(struct sock *sk, u8 nr, u16 *how_many_unacked)
 	if (!q_len)
 		goto out;
 	skb = skb_peek(&llc->pdu_unack_q);
+	if (!skb)
+		goto out;
 	pdu = llc_pdu_sn_hdr(skb);
 
 	/* finding position of last acked pdu in queue */


Re: [2.6 patch] net/llc/llc_conn.c: fix possible NULL dereference

2007-05-18 Thread Herbert Xu
Eugene Teo <[EMAIL PROTECTED]> wrote:
>
> diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
> index 3b8cfbe..28a3994 100644
> --- a/net/llc/llc_conn.c
> +++ b/net/llc/llc_conn.c
> @@ -323,7 +323,8 @@ int llc_conn_remove_acked_pdus(struct sock *sk, u8 nr, u16
> *how_many_unacked)
> 
>if (!q_len)
>goto out;
> -   skb = skb_peek(&llc->pdu_unack_q);
> +   if (! (skb = skb_peek(&llc->pdu_unack_q)))
> +   goto out;

Actually we just checked that the queue length is non-zero so there
must be a packet there unless someone's just removed it.  If it were
possible for someone else to remove it in parallel, then we've got
bigger problems to worry about :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libertas: skb dereferenced after netif_rx

2007-05-18 Thread Jeff Garzik

Stephen Hemminger wrote:

netif_rx() used to return a value in older kernels.


netif_rx() returns a value in current kernels.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libertas: skb dereferenced after netif_rx

2007-05-18 Thread Stephen Hemminger
On Fri, 18 May 2007 14:09:03 -0400
"John W. Linville" <[EMAIL PROTECTED]> wrote:

> On Wed, May 16, 2007 at 05:01:27PM -0400, Florin Malita wrote:
> > In libertas_process_rxed_packet() and process_rxed_802_11_packet() the 
> > skb is dereferenced after being passed to netif_rx (called from 
> > libertas_upload_rx_packet). Spotted by Coverity (1658, 1659).
>  
> Relocating the libertas_upload_rx_packet call is fine, but...
> 
> > Also, libertas_upload_rx_packet() unconditionally returns 0 so the error 
> > check is dead code - might as well take it out.
> 
> Is this merely an implementation detail?  Or an absolute fact?
> If the former is true, then we should preserve the error
> checking.  If the latter, then we should change the signature of
> libertas_upload_rx_packet to return void.

netif_rx() used to return a value in older kernels.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Purpose of bit shifts in tcp_get_info

2007-05-18 Thread Herbert Xu
Daniel Schaffrath <[EMAIL PROTECTED]> wrote:
> 
> I was wondering what the purpose of the bit shifts in tcp_get_info  
> right after the jiffies conversion might be. What's the time unit  
> after that shift?
> 
> info->tcpi_rtt = jiffies_to_usecs(tp->srtt)>>3;
> info->tcpi_rttvar = jiffies_to_usecs(tp->mdev)>>2;
> [...]
> info->tcpi_rcv_rtt = jiffies_to_usecs(tp->rcv_rtt_est.rtt)>>3;
> 
> Maybe somebody has a hint for me?

Checkout the comment in tcp_rtt_estimator() in net/ipv4/tcp_input.c.
It's commented in the actual struct too for srtt at least.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000: assertion hit in e1000_clean(), kernel 2.6.21.1

2007-05-18 Thread Curtis Doty

4:03pm Kok, Auke said:


Chuck Ebbert wrote:

 We have several reports now of hitting this assertion in
 netif_rx_complete(), inlined in e1000_clean():

 BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state));

  [] __queue_work+0x51/0x5e
  [] et_rx_action+0x94/0x185
  [] __do_softirq+0x5d/0xba
  [] do_softirq+0x59/0xb1
  [] local_bh_enable_ip+0x35/0x40
  [] dev_open+0x44/0x62
  [] dev_change_flags+0x46/0xe3
  [] devinet_ioctl+0x250/0x56a

 The second function is "net_rx_action", corrupted
 by the serial connection.
 The source file has four extra lines at the top because of a
 trivial wireless patch, so 898 in that code is really 894 in
 the stock kernel.


please shared that code then.



http://cvs.fedora.redhat.com/viewcvs/devel/kernel/linux-2.6-wireless.patch

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000: assertion hit in e1000_clean(), kernel 2.6.21.1

2007-05-18 Thread Kok, Auke

Chuck Ebbert wrote:

We have several reports now of hitting this assertion in
netif_rx_complete(), inlined in e1000_clean():

BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state));

 [] __queue_work+0x51/0x5e
 [] et_rx_action+0x94/0x185
 [] __do_softirq+0x5d/0xba
 [] do_softirq+0x59/0xb1
 [] local_bh_enable_ip+0x35/0x40
 [] dev_open+0x44/0x62
 [] dev_change_flags+0x46/0xe3
 [] devinet_ioctl+0x250/0x56a

The second function is "net_rx_action", corrupted
by the serial connection. 


The source file has four extra lines at the top because of a
trivial wireless patch, so 898 in that code is really 894 in
the stock kernel.


please shared that code then.


https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240339


this is lacking a lot of debugging info. Please post *all* the dmesg output, 
lspci -vvv, ethtool -e ethX, etc. in the bugzilla.


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


e1000: assertion hit in e1000_clean(), kernel 2.6.21.1

2007-05-18 Thread Chuck Ebbert
We have several reports now of hitting this assertion in
netif_rx_complete(), inlined in e1000_clean():

BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state));

 [] __queue_work+0x51/0x5e
 [] et_rx_action+0x94/0x185
 [] __do_softirq+0x5d/0xba
 [] do_softirq+0x59/0xb1
 [] local_bh_enable_ip+0x35/0x40
 [] dev_open+0x44/0x62
 [] dev_change_flags+0x46/0xe3
 [] devinet_ioctl+0x250/0x56a

The second function is "net_rx_action", corrupted
by the serial connection. 

The source file has four extra lines at the top because of a
trivial wireless patch, so 898 in that code is really 894 in
the stock kernel.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240339
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver updates

2007-05-18 Thread Gene Heskett
On Friday 18 May 2007, Andrew Morton wrote:
>On Fri, 18 May 2007 23:46:21 +0200
>
>Mariusz Koz__owski <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>> > diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h
>> > index 7053026..111f23d 100644
>> > --- a/drivers/net/smc91x.h
>> > +++ b/drivers/net/smc91x.h
>> > @@ -279,6 +279,40 @@ SMC_outw(u16 val, void __iomem *ioaddr, int reg)
>>
>> ...
>>
>> > +#define SMC_outb(v, a, r) __ __
>> > __outw(((inw((a)+((r)&~1))*(0xff<<8*(r%2 | ((v)<<(8*(r&2, (a) +
>> > ((r)&~1))
>>
>> This one has unbalanced parenthesis.
>
>I dunno how you can tell - I can't count that high.
>
Try removing the one in front of the comma, it makes the count -1.  And I 
wrote, probably a good 20 years ago, a gizmo I called cntx that found that 
stuff for you.  I occasionally use it to run over a section of the kernel 
tree, but it rarely finds anything to fuss about.  And I really do need to 
make it handle shell expansions so you don't have write a quick script to 
have it scan everything in a directory.  However I suspect that most of you 
already have such a tool in your bag of tricks.

>Can this be programmed in C, rather than in cpp?
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [EMAIL PROTECTED]
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
I'm definitely not in Omaha!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver updates

2007-05-18 Thread Mariusz Kozlowski
> > > diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h
> > > index 7053026..111f23d 100644
> > > --- a/drivers/net/smc91x.h
> > > +++ b/drivers/net/smc91x.h
> > > @@ -279,6 +279,40 @@ SMC_outw(u16 val, void __iomem *ioaddr, int reg)
> > 
> > ...
> > 
> > > +#define SMC_outb(v, a, r) __ __ 
> > > __outw(((inw((a)+((r)&~1))*(0xff<<8*(r%2 | ((v)<<(8*(r&2, (a) + 
> > > ((r)&~1))
> > 
> > This one has unbalanced parenthesis.
> > 
> 
> I dunno how you can tell - I can't count that high.

Me neither. Simple script does that for me ;) Not sure where exactly
the parenthesis is missing so I leave it up to someone (the author?)
who knows better.

Regards,

Mariusz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver updates

2007-05-18 Thread Andrew Morton
On Fri, 18 May 2007 23:46:21 +0200
Mariusz Koz__owski <[EMAIL PROTECTED]> wrote:

> Hello, 
> 
> > diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h
> > index 7053026..111f23d 100644
> > --- a/drivers/net/smc91x.h
> > +++ b/drivers/net/smc91x.h
> > @@ -279,6 +279,40 @@ SMC_outw(u16 val, void __iomem *ioaddr, int reg)
> 
> ...
> 
> > +#define SMC_outb(v, a, r) __ __ 
> > __outw(((inw((a)+((r)&~1))*(0xff<<8*(r%2 | ((v)<<(8*(r&2, (a) + 
> > ((r)&~1))
> 
> This one has unbalanced parenthesis.
> 

I dunno how you can tell - I can't count that high.

Can this be programmed in C, rather than in cpp?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver updates

2007-05-18 Thread Mariusz Kozłowski
Hello, 

> diff --git a/drivers/net/smc91x.h b/drivers/net/smc91x.h
> index 7053026..111f23d 100644
> --- a/drivers/net/smc91x.h
> +++ b/drivers/net/smc91x.h
> @@ -279,6 +279,40 @@ SMC_outw(u16 val, void __iomem *ioaddr, int reg)

...

> +#define SMC_outb(v, a, r)      outw(((inw((a)+((r)&~1))*(0xff<<8*(r%2 | 
> ((v)<<(8*(r&2, (a) + ((r)&~1))

This one has unbalanced parenthesis.

Regards,

Mariusz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][af_key]pfkey_add: Optimize SA adds and algorithm probes

2007-05-18 Thread Herbert Xu
On Fri, May 18, 2007 at 02:34:12PM +1000, Herbert Xu wrote:
> 
> Actually, I think we should just probe for the specific algorithm
> requested rather than everything.  See patch below.

Doh, forgot to actually remove the probe call :)

[IPSEC] pfkey: Load specific algorithm in pfkey_add rather than all

This is a natural extension of the changeset

[XFRM]: Probe selected algorithm only.

which only removed the probe call for xfrm_user.  This patch does exactly
the same thing for af_key.  In other words, we load the algorithm requested
by the user rather than everything when adding xfrm states in af_key.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/key/af_key.c b/net/key/af_key.c
index a994441..d302dda 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1448,8 +1448,6 @@ static int pfkey_add(struct sock *sk, struct sk_buff 
*skb, struct sadb_msg *hdr,
int err;
struct km_event c;
 
-   xfrm_probe_algs();
-
x = pfkey_msg2xfrm_state(hdr, ext_hdrs);
if (IS_ERR(x))
return PTR_ERR(x);
diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 6249a94..8a72def 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -347,67 +347,44 @@ static inline int calg_entries(void)
return ARRAY_SIZE(calg_list);
 }
 
-/* Todo: generic iterators */
-struct xfrm_algo_desc *xfrm_aalg_get_byid(int alg_id)
-{
-   int i;
-
-   for (i = 0; i < aalg_entries(); i++) {
-   if (aalg_list[i].desc.sadb_alg_id == alg_id) {
-   if (aalg_list[i].available)
-   return &aalg_list[i];
-   else
-   break;
-   }
-   }
-   return NULL;
-}
-EXPORT_SYMBOL_GPL(xfrm_aalg_get_byid);
-
-struct xfrm_algo_desc *xfrm_ealg_get_byid(int alg_id)
-{
-   int i;
+struct xfrm_algo_list {
+   struct xfrm_algo_desc *algs;
+   int entries;
+   u32 type;
+   u32 mask;
+};
 
-   for (i = 0; i < ealg_entries(); i++) {
-   if (ealg_list[i].desc.sadb_alg_id == alg_id) {
-   if (ealg_list[i].available)
-   return &ealg_list[i];
-   else
-   break;
-   }
-   }
-   return NULL;
-}
-EXPORT_SYMBOL_GPL(xfrm_ealg_get_byid);
+static const struct xfrm_algo_list xfrm_aalg_list = {
+   .algs = aalg_list,
+   .entries = ARRAY_SIZE(aalg_list),
+   .type = CRYPTO_ALG_TYPE_HASH,
+   .mask = CRYPTO_ALG_TYPE_HASH_MASK | CRYPTO_ALG_ASYNC,
+};
 
-struct xfrm_algo_desc *xfrm_calg_get_byid(int alg_id)
-{
-   int i;
+static const struct xfrm_algo_list xfrm_ealg_list = {
+   .algs = ealg_list,
+   .entries = ARRAY_SIZE(ealg_list),
+   .type = CRYPTO_ALG_TYPE_BLKCIPHER,
+   .mask = CRYPTO_ALG_TYPE_MASK | CRYPTO_ALG_ASYNC,
+};
 
-   for (i = 0; i < calg_entries(); i++) {
-   if (calg_list[i].desc.sadb_alg_id == alg_id) {
-   if (calg_list[i].available)
-   return &calg_list[i];
-   else
-   break;
-   }
-   }
-   return NULL;
-}
-EXPORT_SYMBOL_GPL(xfrm_calg_get_byid);
+static const struct xfrm_algo_list xfrm_calg_list = {
+   .algs = calg_list,
+   .entries = ARRAY_SIZE(calg_list),
+   .type = CRYPTO_ALG_TYPE_COMPRESS,
+   .mask = CRYPTO_ALG_TYPE_MASK | CRYPTO_ALG_ASYNC,
+};
 
-static struct xfrm_algo_desc *xfrm_get_byname(struct xfrm_algo_desc *list,
- int entries, u32 type, u32 mask,
- char *name, int probe)
+static struct xfrm_algo_desc *xfrm_find_algo(
+   const struct xfrm_algo_list *algo_list,
+   int match(const struct xfrm_algo_desc *entry, const void *data),
+   const void *data, int probe)
 {
+   struct xfrm_algo_desc *list = algo_list->algs;
int i, status;
 
-   if (!name)
-   return NULL;
-
-   for (i = 0; i < entries; i++) {
-   if (strcmp(name, list[i].name) &&
-   (!list[i].compat || strcmp(name, list[i].compat)))
+   for (i = 0; i < algo_list->entries; i++) {
+   if (!match(list + i, data))
continue;
 
if (list[i].available)
@@ -416,8 +393,8 @@ static struct xfrm_algo_desc *xfrm_get_byname(struct 
xfrm_algo_desc *list,
if (!probe)
break;
 
-   status = crypto_has_alg(list[i].name, type,
-   mask | CRYPTO_ALG_ASYNC);
+   status = crypto_has_alg(list[i].name, algo_list->type,
+   

Re: [patch 2/7] CAN: Add PF_CAN core module

2007-05-18 Thread Oliver Hartkopp

Hello Paul,

as you may see in the attached SVN-log i changed some code according 
your suggestions.

I additionally made some clarifications of function names.

If you would like to see the af_can.c completely please visit:
http://svn.berlios.de/svnroot/repos/socketcan/trunk/kernel/2.6/net/can/af_can.c

But this code in the projekt SVN is working for several kernel versions 
(which is removed by some scripts when creating a LKML netdev patch) - 
just for your information.


Best regards,
Oliver

 Original-Nachricht 
Betreff:r311 - trunk/kernel/2.6/net/can
Datum:  Fri, 18 May 2007 21:47:15 +0200
Von:[EMAIL PROTECTED]
An: [EMAIL PROTECTED]



Author: hartkopp
Date: 2007-05-18 21:47:14 +0200 (Fri, 18 May 2007)
New Revision: 311

Modified:
  trunk/kernel/2.6/net/can/af_can.c
Log:
Updated RCU removal of dev_rcv_lists structures in the case of CAN-interfaces
going down (in can_notifier). Thanks to Paul E. McKenny (IBM) who gave this
hint on netdev kernel mailinglist.


Modified: trunk/kernel/2.6/net/can/af_can.c
===
--- trunk/kernel/2.6/net/can/af_can.c   2007-05-16 09:03:21 UTC (rev 310)
+++ trunk/kernel/2.6/net/can/af_can.c   2007-05-18 19:47:14 UTC (rev 311)
@@ -454,31 +454,48 @@
}
EXPORT_SYMBOL(can_rx_register);

-static void can_rcv_lists_delete(struct rcu_head *rp)
+static void can_rx_delete_list(struct hlist_head *rl)
{
+   struct receiver *r;
+   struct hlist_node *n;
+
+   hlist_for_each_entry_rcu(r, n, rl, list) {
+   hlist_del_rcu(&r->list);
+   kmem_cache_free(rcv_cache, r);
+   }
+}
+
+/*
+ * can_rx_delete_device - rcu callback for dev_rcv_lists structure removal
+ */
+static void can_rx_delete_device(struct rcu_head *rp)
+{
struct dev_rcv_lists *d = container_of(rp, struct dev_rcv_lists, rcu);
+   int i;

+   /* remove all receivers hooked at this netdevice */
+   can_rx_delete_list(&d->rx_err);
+   can_rx_delete_list(&d->rx_all);
+   can_rx_delete_list(&d->rx_fil);
+   can_rx_delete_list(&d->rx_inv);
+   can_rx_delete_list(&d->rx_eff);
+
+   for (i = 0; i < 2048; i++)
+   can_rx_delete_list(&d->rx_sff[i]);
+
kfree(d);
}

-static void can_rx_delete(struct rcu_head *rp)
+/*
+ * can_rx_delete_receiver - rcu callback for single receiver entry removal
+ */
+static void can_rx_delete_receiver(struct rcu_head *rp)
{
struct receiver *r = container_of(rp, struct receiver, rcu);

kmem_cache_free(rcv_cache, r);
}

-static void can_rx_delete_all(struct hlist_head *rl)
-{
-   struct receiver *r;
-   struct hlist_node *n;
-
-   hlist_for_each_entry_rcu(r, n, rl, list) {
-   hlist_del_rcu(&r->list);
-   call_rcu(&r->rcu, can_rx_delete);
-   }
-}
-
/**
 * can_rx_unregister - unsubscribe CAN frames from a specific interface
 * @dev: pointer to netdevice (NULL => unsubcribe from 'all' CAN devices list)
@@ -556,7 +573,7 @@

/* schedule the receiver item for deletion */
if (r)
-   call_rcu(&r->rcu, can_rx_delete);
+   call_rcu(&r->rcu, can_rx_delete_receiver);

return ret;
}
@@ -945,7 +962,6 @@
struct net_device *dev = (struct net_device *)data;
struct notifier *n;
struct dev_rcv_lists *d;
-   int i;

DBG("called for %s, msg = %lu\n", dev->name, msg);

@@ -986,25 +1002,16 @@
spin_lock_bh(&rcv_lists_lock);

d = find_dev_rcv_lists(dev);
-   if (d) {
+   if (d)
hlist_del_rcu(&d->list);
-
-   /* remove all receivers hooked at this netdevice */
-   can_rx_delete_all(&d->rx_err);
-   can_rx_delete_all(&d->rx_all);
-   can_rx_delete_all(&d->rx_fil);
-   can_rx_delete_all(&d->rx_inv);
-   can_rx_delete_all(&d->rx_eff);
-   for (i = 0; i < 2048; i++)
-   can_rx_delete_all(&d->rx_sff[i]);
-   } else
+   else
printk(KERN_ERR "can: notifier: receive list not "
   "found for dev %s\n", dev->name);

spin_unlock_bh(&rcv_lists_lock);

if (d)
-   call_rcu(&d->rcu, can_rcv_lists_delete);
+   call_rcu(&d->rcu, can_rx_delete_device);

break;
}

___
Socketcan-commit mailing list
[EMAIL PROTECTED]
https://lists.berlios.de/mailman/listinfo/socketcan-commit


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)

2007-05-18 Thread John Heffner

Rick Jones wrote:
as an asside, "tcp_max_ssthresh" sounds like the maximum value ssthresh 
can take-on.  is that correct, or is this more of a "once ssthresh is 
above this, behave in this new way?"  If that is the case, while the 


I don't like it either, but you'll have to talk to Sally Floyd about 
that one.. ;)


In general, I would like the documentation to emphasize more how to set 
the parameter than describe the algorithm.  The max_ssthresh parameter 
should ideally be set to the bottleneck queue size, or more 
realistically a conservative value that's likely to be smaller than the 
bottleneck queue size.  When max_ssthresh is smaller than the bottleneck 
queue, (limited) slow start will not overflow it until cwnd has fully 
ramped up to the appropriate size.


  -John
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libertas: skb dereferenced after netif_rx

2007-05-18 Thread Florin Malita

John W. Linville wrote:
Also, libertas_upload_rx_packet() unconditionally returns 0 so the error 
check is dead code - might as well take it out.



Is this merely an implementation detail?  Or an absolute fact?
  


I believe it's an absolute fact that got lost among implementation 
details ;)


All libertas_upload_rx_packet does is set a few fields in the skb, then 
pass it up to the stack via netif_rx:


139 int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb)
140 {
141 lbs_pr_debug(1, "skb->data=%p\n", skb->data);
142
143 if(IS_MESH_FRAME(skb))
144 skb->dev = priv->mesh_dev;
145 else
146 skb->dev = priv->wlan_dev.netdev;
147 skb->protocol = eth_type_trans(skb, priv->wlan_dev.netdev);
148 skb->ip_summed = CHECKSUM_UNNECESSARY;
149
150 netif_rx(skb);
151
152 return 0;
153 }


Since netif_rx always succeeds, so should libertas_upload_rx_packet - 
there's no reason for passing back a success code (especially one that's 
hardcoded to 0).



If the latter, then we should change the signature of
libertas_upload_rx_packet to return void.
  


Makes sense, updated patch below.


Another potential patch is to remove the "ret = 0" line before the
"done" label, since ret is initialized at the head of the function.
Come to think of it, you can probably remove the "= 0" part of ret's
declaration as well (in both functions).
  


Right, even more: looks like both process_rxed_802_11_packet & 
libertas_process_rxed_packet can only return 0 so we could drop the 
return code altogether and change their signature to void too (nobody 
seems to care about their return code anyway). I will send a separate 
cleanup patch but this might be leaning more on the implementation 
detail side (planning to extend the functions and make the return code 
meaningful in the future?) so somebody familiar with the driver should 
make the call.


Thanks,
Florin


Signed-off-by: Florin Malita <[EMAIL PROTECTED]>
---

decl.h |2 +-
rx.c   |   22 +-
2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/net/wireless/libertas/decl.h 
b/drivers/net/wireless/libertas/decl.h
index 606bdd0..dfe2764 100644
--- a/drivers/net/wireless/libertas/decl.h
+++ b/drivers/net/wireless/libertas/decl.h
@@ -46,7 +46,7 @@ u32 libertas_index_to_data_rate(u8 index);
u8 libertas_data_rate_to_index(u32 rate);
void libertas_get_fwversion(wlan_adapter * adapter, char *fwversion, int 
maxlen);

-int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb);
+void libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb);

/** The proc fs interface */
int libertas_process_rx_command(wlan_private * priv);
diff --git a/drivers/net/wireless/libertas/rx.c 
b/drivers/net/wireless/libertas/rx.c
index d17924f..b19b5aa 100644
--- a/drivers/net/wireless/libertas/rx.c
+++ b/drivers/net/wireless/libertas/rx.c
@@ -136,7 +136,7 @@ static void wlan_compute_rssi(wlan_private * priv, struct 
rxpd *p_rx_pd)
LEAVE();
}

-int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb)
+void libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb)
{
lbs_pr_debug(1, "skb->data=%p\n", skb->data);

@@ -148,8 +148,6 @@ int libertas_upload_rx_packet(wlan_private * priv, struct 
sk_buff *skb)
skb->ip_summed = CHECKSUM_UNNECESSARY;

netif_rx(skb);
-
-   return 0;
}

/**
@@ -269,15 +267,11 @@ int libertas_process_rxed_packet(wlan_private * priv, 
struct sk_buff *skb)
wlan_compute_rssi(priv, p_rx_pd);

lbs_pr_debug(1, "RX Data: size of actual packet = %d\n", skb->len);
-   if (libertas_upload_rx_packet(priv, skb)) {
-   lbs_pr_debug(1, "RX error: libertas_upload_rx_packet"
-  " returns failure\n");
-   ret = -1;
-   goto done;
-   }
priv->stats.rx_bytes += skb->len;
priv->stats.rx_packets++;

+   libertas_upload_rx_packet(priv, skb);
+
ret = 0;
done:
LEAVE();
@@ -438,17 +432,11 @@ static int process_rxed_802_11_packet(wlan_private * 
priv, struct sk_buff *skb)
wlan_compute_rssi(priv, prxpd);

lbs_pr_debug(1, "RX Data: size of actual packet = %d\n", skb->len);
-
-   if (libertas_upload_rx_packet(priv, skb)) {
-   lbs_pr_debug(1, "RX error: libertas_upload_rx_packet "
-   "returns failure\n");
-   ret = -1;
-   goto done;
-   }
-
priv->stats.rx_bytes += skb->len;
priv->stats.rx_packets++;

+   libertas_upload_rx_packet(priv, skb);
+
ret = 0;
done:
LEAVE();

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)

2007-05-18 Thread Rick Jones

Baruch Even wrote:

Ilpo Järvinen wrote:


Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
Documentation/networking/ip-sysctl.txt |   13 +
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ce16e6a..44ba8d4 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -239,6 +239,19 @@ tcp_max_orphans - INTEGER
more aggressively. Let me to remind again: each orphan eats
up to ~64K of unswappable memory.

+tcp_max_ssthresh - INTEGER
+   Limited Slow-Start for TCP with Large Congestion Windows defined in
+   RFC3742. Limited slow-start is a mechanism to limit grow of the



s/grow/growth/



+   congestion window on the region where congestion window is larger than
+   tcp_max_ssthresh. A TCP connection with a large congestion window could
+   have its congestion window increased by thousand (or even more)
+   segments per RTT by the traditional slow-start procedure which might be
+   counter-productive to TCP performance when packet losses start to
+   occur. With limited slow-start TCP increments congestion window at
+   most tcp_max_ssthresh/2 segments per RTT when the congestion window is



I'm not a native English speaker but "at most" sounds a bit awkward to
me, maybe change it to "by no more than". But I'm sure someone can find
a better phrasing.


It could be "by at most" or "by no more than" but indeed it should not just be 
"at most."


Also, it should be "With limited slow-start TCP increments the congestion 
window..."   Combining with other nits yeilds:


Limited Slow-Start for TCP with Large Congestion Windows is defined in RFC3742. 
 Limited slow-start is a mechanism to limit growth of the congestion window 
when the congestion window is larger than tcp_max_ssthresh.  A TCP connection 
with a large congestion window could have its congestion window increased by 
thousands of segments (or more) per RTT via the traditional slow-start 
procedures. This might degrade TCP performance when packet losses start to 
occur.  With limited slow-start, TCP increments the congestion window by at most 
tcp_max_ssthresh/2 segments per RTT when the congestion window is is above 
tcp_max_ssthresh.


rick jones

as an asside, "tcp_max_ssthresh" sounds like the maximum value ssthresh can 
take-on.  is that correct, or is this more of a "once ssthresh is above this, 
behave in this new way?"  If that is the case, while the horse has probably left 
the barn, perhaps another name would be better?




Baruch
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libertas: skb dereferenced after netif_rx

2007-05-18 Thread John W. Linville
On Wed, May 16, 2007 at 05:01:27PM -0400, Florin Malita wrote:
> In libertas_process_rxed_packet() and process_rxed_802_11_packet() the 
> skb is dereferenced after being passed to netif_rx (called from 
> libertas_upload_rx_packet). Spotted by Coverity (1658, 1659).
 
Relocating the libertas_upload_rx_packet call is fine, but...

> Also, libertas_upload_rx_packet() unconditionally returns 0 so the error 
> check is dead code - might as well take it out.

Is this merely an implementation detail?  Or an absolute fact?
If the former is true, then we should preserve the error
checking.  If the latter, then we should change the signature of
libertas_upload_rx_packet to return void.

> Signed-off-by: Florin Malita <[EMAIL PROTECTED]>

>   lbs_pr_debug(1, "RX Data: size of actual packet = %d\n", skb->len);
> - if (libertas_upload_rx_packet(priv, skb)) {
> - lbs_pr_debug(1, "RX error: libertas_upload_rx_packet"
> -" returns failure\n");
> - ret = -1;
> - goto done;
> - }
>   priv->stats.rx_bytes += skb->len;
>   priv->stats.rx_packets++;
> 
> + libertas_upload_rx_packet(priv, skb);
> +
>   ret = 0;
> done:
>   LEAVE();

Another potential patch is to remove the "ret = 0" line before the
"done" label, since ret is initialized at the head of the function.
Come to think of it, you can probably remove the "= 0" part of ret's
declaration as well (in both functions).

Hth!

John

P.S.  Also, please make sure to send wireless patches to
[EMAIL PROTECTED] and CC me.
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)

2007-05-18 Thread Baruch Even
Ilpo Järvinen wrote:
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
> ---
>  Documentation/networking/ip-sysctl.txt |   13 +
>  1 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index ce16e6a..44ba8d4 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -239,6 +239,19 @@ tcp_max_orphans - INTEGER
>   more aggressively. Let me to remind again: each orphan eats
>   up to ~64K of unswappable memory.
>  
> +tcp_max_ssthresh - INTEGER
> + Limited Slow-Start for TCP with Large Congestion Windows defined in
> + RFC3742. Limited slow-start is a mechanism to limit grow of the

s/grow/growth/

> + congestion window on the region where congestion window is larger than
> + tcp_max_ssthresh. A TCP connection with a large congestion window could
> + have its congestion window increased by thousand (or even more)
> + segments per RTT by the traditional slow-start procedure which might be
> + counter-productive to TCP performance when packet losses start to
> + occur. With limited slow-start TCP increments congestion window at
> + most tcp_max_ssthresh/2 segments per RTT when the congestion window is

I'm not a native English speaker but "at most" sounds a bit awkward to
me, maybe change it to "by no more than". But I'm sure someone can find
a better phrasing.

Baruch
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-18 Thread Kok, Auke

David Acker wrote:

Kok, Auke wrote:
First impression just came in: It seems RX performance is dropped to 
10mbit. TX is unaffected and runs at 94mbit/tcp, but RX the new code 
seems to misbehave and  fluctuate, dropping below 10mbit after a few 
netperf runs and staying there...


ideas?


I found the problem.  Another casualty of working with two different kernels at 
once...arg.
The blank rfd needs to have its el-bit clear now.  Here is the new and improved 
patch.




-   nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s);
+   nic->blank_rfd.command = 0;


OK, that fixed it and it's getting 94mbit/tcp rx now without issues. Milton, can 
you look at this patch? I'd like some more feedback. Meanwhile I will try to get 
this tested on a variety of e100 NICs, which will take a few days.


Thanks,

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][af_key]pfkey_add: Optimize SA adds and algorithm probes

2007-05-18 Thread Mark Huth

Herbert Xu wrote:

Mark Huth <[EMAIL PROTECTED]> wrote:
  

This patch provides a performance optimization in the pfkey_add path.
Prior versions have a serious performance problem when adding a large
number of SAs to a node.  For example, if a backup node needs to be
loaded with the SAs previously held by a failed active node, thousands
of SAs may need to be added as rapidly as possible.  Tests show that
without this patch, such additions may take several minutes.  The
cause is that the available algorithm modules are probed each time
instead of only when needed.  This patch changes the unconditional
call to xfrm_probe_algs() to only be done when it may be needed.



Thanks for the patch!
 
  
 static int pfkey_add(struct sock *sk, struct sk_buff *skb, struct 
sadb_msg *hdr, void **ext_hdrs)

 {
   struct xfrm_state *x;
-   int err;
+   int err, probe_done = 0;
   struct km_event c;

-   xfrm_probe_algs();
-
   x = pfkey_msg2xfrm_state(hdr, ext_hdrs);
   if (IS_ERR(x))
   return PTR_ERR(x);



I don't think it works when then algorithm isn't loaded though :)
If the algorithm isn't present pfkey_msg2xfrm_state will return
-ENOSYS so we need to do the probe here.
  
Okay.  I tested with the algorithms not loaded, and they were all loaded 
after the test, but I'm sure you understand this much better than I do.

Actually, I think we should just probe for the specific algorithm
requested rather than everything.  See patch below.

[IPSEC] pfkey: Load specific algorithm in pfkey_add rather than all

This is a natural extension of the changeset

[XFRM]: Probe selected algorithm only.

which only removed the probe call for xfrm_user.  This patch does exactly
the same thing for af_key.  In other words, we load the algorithm requested
by the user rather than everything when adding xfrm states in af_key.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
  
Thanks for the patch.  I'll try it later today and confirm that it fixes 
our problem.


Mark Huth
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-18 Thread David Acker

Kok, Auke wrote:
First impression just came in: It seems RX performance is dropped to 
10mbit. TX is unaffected and runs at 94mbit/tcp, but RX the new code 
seems to misbehave and  fluctuate, dropping below 10mbit after a few 
netperf runs and staying there...


ideas?


I found the problem.  Another casualty of working with two different kernels at 
once...arg.
The blank rfd needs to have its el-bit clear now.  Here is the new and improved 
patch.

On the ARM, their is a race condition between software allocating a new receive
buffer and hardware writing into a buffer.  The two race on touching the last
Receive Frame Descriptor (RFD).  It has its el-bit set and its next link equal
to 0.  When hardware encounters this buffer it attempts to write data to it
and then update Status Word bits and Actual Count in the RFD.  At the same time
software may try to clear the el-bit and set the link address to a new buffer.
Since the entire RFD is once cache-line, the two write operations can
collide.  This can lead to the receive unit stalling or freed receive buffers
getting written to.

The fix is to set the el-bit on and the size to 0 on the next to last buffer
in the chain.  When the hardware encounters this buffer it stops and does
not write to it at all.  The hardware issues an RNR interrupt with the
receive unit in the No Resources state.  When software allocates buffers,
it can update the tail of the list because it knows the hardware will stop
at the buffer before it.  Once it has a new next to last buffer prepared,
it can clear the el-bit and set the size on the previous one.  The race on
this buffer is safe since the link already points to a valid next buffer.
If the hardware sees the el-bit cleared without the size set, it will
move on to the next buffer and complete that one in error.  If it sees
the size set but the el-bit still set, it will complete that buffer
and then RNR interrupt and wait.


Signed-off-by: David Acker <[EMAIL PROTECTED]>

---

--- linux-2.6.22-rc1/drivers/net/e100.c.orig2007-05-18 13:08:01.0 
-0400
+++ linux-2.6.22-rc1/drivers/net/e100.c 2007-05-18 13:08:24.0 -0400
@@ -285,6 +285,12 @@ enum scb_status {
rus_mask = 0x3C,
 };

+enum ru_state  {
+   RU_SUSPENDED = 0,
+   RU_RUNNING   = 1,
+   RU_UNINITIALIZED = -1,
+};
+
 enum scb_stat_ack {
stat_ack_not_ours= 0x00,
stat_ack_sw_gen  = 0x04,
@@ -526,6 +532,7 @@ struct nic {
struct rx *rx_to_use;
struct rx *rx_to_clean;
struct rfd blank_rfd;
+   enum ru_state ru_running;

spinlock_t cb_lock  cacheline_aligned;
spinlock_t cmd_lock;
@@ -947,7 +954,7 @@ static void e100_get_defaults(struct nic
((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i));

/* Template for a freshly allocated RFD */
-   nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s);
+   nic->blank_rfd.command = 0;
nic->blank_rfd.rbd = 0x;
nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);

@@ -1742,11 +1749,19 @@ static int e100_alloc_cbs(struct nic *ni
return 0;
 }

-static inline void e100_start_receiver(struct nic *nic)
+static inline void e100_start_receiver(struct nic *nic, struct rx *rx)
 {
-   /* Start if RFA is non-NULL */
-   if(nic->rx_to_clean->skb)
-   e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr);
+   if(!nic->rxs) return;
+   if(RU_SUSPENDED != nic->ru_running) return;
+
+   /* handle init time starts */
+   if(!rx) rx = nic->rxs;
+
+   /* (Re)start RU if suspended or idle and RFA is non-NULL */
+   if(rx->skb) {
+   e100_exec_cmd(nic, ruc_start, rx->dma_addr);
+   nic->ru_running = RU_RUNNING;
+   }
 }

 #define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN)
@@ -1769,13 +1784,12 @@ static int e100_rx_alloc_skb(struct nic
}

/* Link the RFD to end of RFA by linking previous RFD to
-* this one, and clearing EL bit of previous.  */
+* this one.  We are safe to touch the previous RFD because
+* it is protected the before last buffer's el bit being set  */
if(rx->prev->skb) {
struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
put_unaligned(cpu_to_le32(rx->dma_addr),
(u32 *)&prev_rfd->link);
-   wmb();
-   prev_rfd->command &= ~cpu_to_le16(cb_el & cb_s);
pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr,
sizeof(struct rfd), PCI_DMA_TODEVICE);
}
@@ -1801,8 +1815,12 @@ static int e100_rx_indicate(struct nic *
DPRINTK(RX_STATUS, DEBUG, "status=0x%04X\n", rfd_status);

/* If data isn't ready, nothing to indicate */
-   if(unlikely(!(rfd_status & cb_complete)))
+   if(unlikely(!(rfd_status & cb_complete))) {
+   /* this allows for a fast restart withou

Re: [Cbe-oss-dev] [PATCH 2/10] spidernet: beautify error messages

2007-05-18 Thread Linas Vepstas
On Thu, May 17, 2007 at 09:28:53AM +1000, Michael Ellerman wrote:
> On Wed, 2007-05-16 at 17:00 -0500, Linas Vepstas wrote:
> > -   pr_err("Not enough memory to allocate rx buffer\n");
> > +   pr_err("%s: Not enough memory to allocate rx buffer\n",
> > +   card->netdev->name);
> 
> Isn't that what dev_err() is for?

Ahh, did not know that. These patches just got pushed upstream, I'll send 
a separate round to clean this up.

--linas


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/10] spidernet: node-aware skbuff allocation

2007-05-18 Thread Linas Vepstas
On Thu, May 17, 2007 at 08:46:00PM -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> 
> applied

Thanks

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [TCP] Sysctl: document tcp_max_ssthresh (Limited Slow-Start)

2007-05-18 Thread Ilpo Järvinen

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 Documentation/networking/ip-sysctl.txt |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ce16e6a..44ba8d4 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -239,6 +239,19 @@ tcp_max_orphans - INTEGER
more aggressively. Let me to remind again: each orphan eats
up to ~64K of unswappable memory.
 
+tcp_max_ssthresh - INTEGER
+   Limited Slow-Start for TCP with Large Congestion Windows defined in
+   RFC3742. Limited slow-start is a mechanism to limit grow of the
+   congestion window on the region where congestion window is larger than
+   tcp_max_ssthresh. A TCP connection with a large congestion window could
+   have its congestion window increased by thousand (or even more)
+   segments per RTT by the traditional slow-start procedure which might be
+   counter-productive to TCP performance when packet losses start to
+   occur. With limited slow-start TCP increments congestion window at
+   most tcp_max_ssthresh/2 segments per RTT when the congestion window is
+   above tcp_max_ssthresh.
+   Default: 0 (off)
+
 tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests, which are
still did not receive an acknowledgment from connecting client.
-- 
1.5.0.6


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-18 Thread Kok, Auke

David Acker wrote:

Kok, Auke wrote:

David Acker wrote:

David Acker wrote:
Done.  Below is a patch against 2.6.22-rc1.  It combines removing the 
s-bit patch and applying the patch I previously sent.
Oops.  I missed one state in that patch.  Since the el-bit buffer will 
normally not complete due to a zero size, we need to check if the 
buffer with no data has the el-bit set.  Without this, you have to 
wait for the interrupt.  Sorry about that...this was in the code I 
tested on my embedded system but got lost in the regular kernel patch.

OK. Thanks.

If you don't mind I'm going to have some testing on this patch done for 
a bit now (mostly x86 hardware of course) to see if there's no pitfalls 
in it. It'll be a few days because of the weekend before I get back on it.




Cool.  I will see if I can get some more tests running over the weekend on our PXA255 
platform.


First impression just came in: It seems RX performance is dropped to 10mbit. TX 
is unaffected and runs at 94mbit/tcp, but RX the new code seems to misbehave and 
 fluctuate, dropping below 10mbit after a few netperf runs and staying there...


ideas?

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-18 Thread David Acker

Kok, Auke wrote:

David Acker wrote:

David Acker wrote:
Done.  Below is a patch against 2.6.22-rc1.  It combines removing the 
s-bit patch and applying the patch I previously sent.


Oops.  I missed one state in that patch.  Since the el-bit buffer will 
normally not complete due to a zero size, we need to check if the 
buffer with no data has the el-bit set.  Without this, you have to 
wait for the interrupt.  Sorry about that...this was in the code I 
tested on my embedded system but got lost in the regular kernel patch.


OK. Thanks.

If you don't mind I'm going to have some testing on this patch done for 
a bit now (mostly x86 hardware of course) to see if there's no pitfalls 
in it. It'll be a few days because of the weekend before I get back on it.




Cool.  I will see if I can get some more tests running over the weekend on our PXA255 
platform.


-Ack
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] ucc_geth: eliminate max-speed, change interface-type to phy-connection-type

2007-05-18 Thread Kim Phillips
On Fri, 18 May 2007 09:07:42 -0500
Kumar Gala <[EMAIL PROTECTED]> wrote:

> 
> On May 17, 2007, at 8:53 PM, Jeff Garzik wrote:
> 
> > Kim Phillips wrote:
> >> On Tue, 15 May 2007 17:45:19 -0400
> >> Jeff Garzik <[EMAIL PROTECTED]> wrote:
> >>> Kim Phillips wrote:
>  It was agreed that phy-connection-type was a better name for
>  the interface-type property, so this patch renames it.
> 
>  Also, the max-speed property name was determined too generic,
>  and is therefore eliminated in favour of phy-connection-type
>  derivation logic.
> 
>  includes corrections to copyright text.
> 
>  Signed-off-by: Kim Phillips <[EMAIL PROTECTED]>
>  ---
>   drivers/net/ucc_geth.c |   40 +++ 
>  +
>   drivers/net/ucc_geth_mii.c |9 +
>   drivers/net/ucc_geth_mii.h |   10 +-
>   3 files changed, 26 insertions(+), 33 deletions(-)
> >>> applied to #upstream
> >>>
> >> just to be clear; only the mpc8323e-mds board works on your  
> >> current #upstream-{fixes,linus}.
> >> The mpc8323e-rdb and mpc8360e-mds require
> >> phylib: add the ICPlus IP175C PHY driver
> >> and phylib: enable RGMII-ID on the Marvell m88e PHY
> >> respectively in order to work.
> >> I'm ok with the SGMII patches:
> >> phylib: m88e: enable SGMII mode
> >> gianfar: add support for SGMII
> >> staying on #upstream, but 2 out of 3 boards are currently broken  
> >> for #upstream-{fixes,linus}.
> >
> > New hardware support is always "broken" until merged.  Nonetheless  
> > it is still new and outside the merge window.
> 
> I think new is a little relative here.  The MPC8232-RDB & MPC8360E- 
> MDS boards worked with 2.6.21 and the UCC. They had support for the  
> PHYs that Kim's listed built directly into the ucc driver and not as  
> part of the phy lib.  Since we've moved to using the phy lib in  
> 2.6.22 we've broken these boards since their PHY drivers aren't  
> currently in linus's tree.
> 
> Kim, correct me if I'm wrong.
> 

thanks for clarifying that - it is indeed the case.

Kim
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-18 Thread Kok, Auke

David Acker wrote:

David Acker wrote:

Kok, Auke wrote:

Jeff Garzik wrote:

Can you resend against the latest kernel (2.6.22-rc1)?

And what does Intel think?
I'm expecting at least a reply from Milton as the patch was sent to 
him. I haven't yet tested it but will certainly do so. At first glance 
it looks OK, and I'll try to put it under my colleague's noses who 
know e100 best.


A resend against 2.6.22-rc1 would be nice.

Done.  Below is a patch against 2.6.22-rc1.  It combines removing the 
s-bit patch and applying the patch I previously sent.


Oops.  I missed one state in that patch.  Since the el-bit buffer will normally not 
complete due to a zero size, we need to check if the buffer with no data has the el-bit 
set.  Without this, you have to wait for the interrupt.  Sorry about that...this was in 
the code I tested on my embedded system but got lost in the regular kernel patch.


OK. Thanks.

If you don't mind I'm going to have some testing on this patch done for a bit 
now (mostly x86 hardware of course) to see if there's no pitfalls in it. It'll 
be a few days because of the weekend before I get back on it.


Auke



On the ARM, their is a race condition between software allocating a new receive
buffer and hardware writing into a buffer.  The two race on touching the last
Receive Frame Descriptor (RFD).  It has its el-bit set and its next link equal
to 0.  When hardware encounters this buffer it attempts to write data to it
and then update Status Word bits and Actual Count in the RFD.  At the same time
software may try to clear the el-bit and set the link address to a new buffer.
Since the entire RFD is once cache-line, the two write operations can
collide.  This can lead to the receive unit stalling or freed receive buffers
getting written to.

The fix is to set the el-bit on and the size to 0 on the next to last buffer
in the chain.  When the hardware encounters this buffer it stops and does
not write to it at all.  The hardware issues an RNR interrupt with the
receive unit in the No Resources state.  When software allocates buffers,
it can update the tail of the list because it knows the hardware will stop
at the buffer before it.  Once it has a new next to last buffer prepared,
it can clear the el-bit and set the size on the previous one.  The race on
this buffer is safe since the link already points to a valid next buffer.
If the hardware sees the el-bit cleared without the size set, it will
move on to the next buffer and complete that one in error.  If it sees
the size set but the el-bit still set, it will complete that buffer
and then RNR interrupt and wait.


Signed-off-by: David Acker <[EMAIL PROTECTED]>

---


--- linux-2.6.22-rc1/drivers/net/e100.c.orig2007-05-18 10:16:03.0 
-0400
+++ linux-2.6.22-rc1/drivers/net/e100.c 2007-05-18 10:15:53.0 -0400
@@ -285,6 +285,12 @@ enum scb_status {
rus_mask = 0x3C,
  };

+enum ru_state  {
+   RU_SUSPENDED = 0,
+   RU_RUNNING   = 1,
+   RU_UNINITIALIZED = -1,
+};
+
  enum scb_stat_ack {
stat_ack_not_ours= 0x00,
stat_ack_sw_gen  = 0x04,
@@ -526,6 +532,7 @@ struct nic {
struct rx *rx_to_use;
struct rx *rx_to_clean;
struct rfd blank_rfd;
+   enum ru_state ru_running;

spinlock_t cb_lock  cacheline_aligned;
spinlock_t cmd_lock;
@@ -947,7 +954,7 @@ static void e100_get_defaults(struct nic
((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i));

/* Template for a freshly allocated RFD */
-   nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s);
+   nic->blank_rfd.command = cpu_to_le16(cb_el);
nic->blank_rfd.rbd = 0x;
nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);

@@ -1742,11 +1749,19 @@ static int e100_alloc_cbs(struct nic *ni
return 0;
  }

-static inline void e100_start_receiver(struct nic *nic)
+static inline void e100_start_receiver(struct nic *nic, struct rx *rx)
  {
-   /* Start if RFA is non-NULL */
-   if(nic->rx_to_clean->skb)
-   e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr);
+   if(!nic->rxs) return;
+   if(RU_SUSPENDED != nic->ru_running) return;
+
+   /* handle init time starts */
+   if(!rx) rx = nic->rxs;
+
+   /* (Re)start RU if suspended or idle and RFA is non-NULL */
+   if(rx->skb) {
+   e100_exec_cmd(nic, ruc_start, rx->dma_addr);
+   nic->ru_running = RU_RUNNING;
+   }
  }

  #define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN)
@@ -1769,13 +1784,12 @@ static int e100_rx_alloc_skb(struct nic
}

/* Link the RFD to end of RFA by linking previous RFD to
-* this one, and clearing EL bit of previous.  */
+* this one.  We are safe to touch the previous RFD because
+* it is protected the before last buffer's el bit being set  */
if(rx->prev->skb) {
struct rfd *prev_rfd = (str

Re: [patch 2/7] CAN: Add PF_CAN core module

2007-05-18 Thread Oliver Hartkopp

Hm, this is indeed one step further, than i thought :-)
Thanks for this nifty solution!

I will doublecheck your suggestion with Urs and then we'll change it in 
our next patch update (after some more feedback on this mailing list).

Additional feedback is welcome.

Tnx & best regards,
Oliver


Paul E. McKenney wrote:

On Fri, May 18, 2007 at 11:19:01AM +0200, Oliver Hartkopp wrote:
  

Hi Urs, Hello Paul,

i assume Paul refers to the can_rx_delete_all() function that adds each 
receive list entry for rcu removal using the can_rx_delete RCU callback, 
right?


So the idea would be to create a second RCU callback - e.g. 
can_rx_delete_list() - that removes the complete list inside the RCU 
callback?!?
The list removal would therefore be processed inside this new 
can_rx_delete_list() in RCU context and not inside can_rx_delete_all().


@Paul: Was this your intention?



My intention was that the list-removing be placed into can_rcv_lists_delete(),
perhaps as follows:

static void can_rx_delete_all(struct hlist_head *rl)
{
struct receiver *r;
struct hlist_node *n;

hlist_for_each_entry(r, n, rl, list) {
hlist_del(&r->list);
kmem_cache_free(rcv_cache, r);
}
}

static void can_rcv_lists_delete(struct rcu_head *rp)
{
struct dev_rcv_lists *d = container_of(rp, struct dev_rcv_lists, rcu);

/* remove all receivers hooked at this netdevice */
can_rx_delete_all(&d->rx_err);
can_rx_delete_all(&d->rx_all);
can_rx_delete_all(&d->rx_fil);
can_rx_delete_all(&d->rx_inv);
can_rx_delete_all(&d->rx_eff);
for (i = 0; i < 2048; i++)
can_rx_delete_all(&d->rx_sff[i]);
kfree(d);
}

Then the code in can_notifier() can reduce to the following:

if (d) {
hlist_del_rcu(&d->list);

/* used to be a string of can_rx_delete_all(). */
} else
printk(KERN_ERR "can: notifier: receive list not "
   "found for dev %s\n", dev->name);

spin_lock_bh(&rcv_lists_lock);

if (d) {
call_rcu(&d->rcu, can_rcv_lists_delete);
}

This moves the traversal work into the callback function.  This is not
a problem for CONFIG_PREEMPT_RT and non-CONFIG_PREEMPT, but not sure
about CONFIG_PREEMPT.

But it sure has the potential to cut down on a bunch of call_rcu()
work...

Thanx, Paul
  



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/7] CAN: Add PF_CAN core module

2007-05-18 Thread Paul E. McKenney
On Fri, May 18, 2007 at 11:19:01AM +0200, Oliver Hartkopp wrote:
> Hi Urs, Hello Paul,
> 
> i assume Paul refers to the can_rx_delete_all() function that adds each 
> receive list entry for rcu removal using the can_rx_delete RCU callback, 
> right?
> 
> So the idea would be to create a second RCU callback - e.g. 
> can_rx_delete_list() - that removes the complete list inside the RCU 
> callback?!?
> The list removal would therefore be processed inside this new 
> can_rx_delete_list() in RCU context and not inside can_rx_delete_all().
> 
> @Paul: Was this your intention?

My intention was that the list-removing be placed into can_rcv_lists_delete(),
perhaps as follows:

static void can_rx_delete_all(struct hlist_head *rl)
{
struct receiver *r;
struct hlist_node *n;

hlist_for_each_entry(r, n, rl, list) {
hlist_del(&r->list);
kmem_cache_free(rcv_cache, r);
}
}

static void can_rcv_lists_delete(struct rcu_head *rp)
{
struct dev_rcv_lists *d = container_of(rp, struct dev_rcv_lists, rcu);

/* remove all receivers hooked at this netdevice */
can_rx_delete_all(&d->rx_err);
can_rx_delete_all(&d->rx_all);
can_rx_delete_all(&d->rx_fil);
can_rx_delete_all(&d->rx_inv);
can_rx_delete_all(&d->rx_eff);
for (i = 0; i < 2048; i++)
can_rx_delete_all(&d->rx_sff[i]);
kfree(d);
}

Then the code in can_notifier() can reduce to the following:

if (d) {
hlist_del_rcu(&d->list);

/* used to be a string of can_rx_delete_all(). */
} else
printk(KERN_ERR "can: notifier: receive list not "
   "found for dev %s\n", dev->name);

spin_lock_bh(&rcv_lists_lock);

if (d) {
call_rcu(&d->rcu, can_rcv_lists_delete);
}

This moves the traversal work into the callback function.  This is not
a problem for CONFIG_PREEMPT_RT and non-CONFIG_PREEMPT, but not sure
about CONFIG_PREEMPT.

But it sure has the potential to cut down on a bunch of call_rcu()
work...

Thanx, Paul

> Best regards,
> Oliver
> 
> Paul E. McKenney wrote:
> >On Wed, May 16, 2007 at 04:51:02PM +0200, Urs Thuermann wrote:
> >  
> >>This patch adds the CAN core functionality but no protocols or drivers.
> >>No protocol implementations are included here.  They come as separate
> >>patches.  Protocol numbers are already in include/linux/can.h.
> >>
> >
> >Interesting!  One question called out below -- why do call_rcu() on each
> >piece of the struct dev_rcv_lists, instead of doing call_rcu() on the
> >whole thing and having the RCU callback free up the pieces?  Given that
> >all the pieces are call_rcu()ed separately, there had better not be
> >persistent pointers to the pieces, right?
> >
> >Doing it in one chunk would make the code a bit simpler and also reduce
> >the RCU overhead a bit.
> >
> >Or am I missing something subtle here?
> >
> > Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-18 Thread David Acker

David Acker wrote:

Kok, Auke wrote:

Jeff Garzik wrote:

Can you resend against the latest kernel (2.6.22-rc1)?

And what does Intel think?


I'm expecting at least a reply from Milton as the patch was sent to 
him. I haven't yet tested it but will certainly do so. At first glance 
it looks OK, and I'll try to put it under my colleague's noses who 
know e100 best.


A resend against 2.6.22-rc1 would be nice.



Done.  Below is a patch against 2.6.22-rc1.  It combines removing the 
s-bit patch and applying the patch I previously sent.


Oops.  I missed one state in that patch.  Since the el-bit buffer will normally not 
complete due to a zero size, we need to check if the buffer with no data has the el-bit 
set.  Without this, you have to wait for the interrupt.  Sorry about that...this was in 
the code I tested on my embedded system but got lost in the regular kernel patch.


On the ARM, their is a race condition between software allocating a new receive
buffer and hardware writing into a buffer.  The two race on touching the last
Receive Frame Descriptor (RFD).  It has its el-bit set and its next link equal
to 0.  When hardware encounters this buffer it attempts to write data to it
and then update Status Word bits and Actual Count in the RFD.  At the same time
software may try to clear the el-bit and set the link address to a new buffer.
Since the entire RFD is once cache-line, the two write operations can
collide.  This can lead to the receive unit stalling or freed receive buffers
getting written to.

The fix is to set the el-bit on and the size to 0 on the next to last buffer
in the chain.  When the hardware encounters this buffer it stops and does
not write to it at all.  The hardware issues an RNR interrupt with the
receive unit in the No Resources state.  When software allocates buffers,
it can update the tail of the list because it knows the hardware will stop
at the buffer before it.  Once it has a new next to last buffer prepared,
it can clear the el-bit and set the size on the previous one.  The race on
this buffer is safe since the link already points to a valid next buffer.
If the hardware sees the el-bit cleared without the size set, it will
move on to the next buffer and complete that one in error.  If it sees
the size set but the el-bit still set, it will complete that buffer
and then RNR interrupt and wait.


Signed-off-by: David Acker <[EMAIL PROTECTED]>

---


--- linux-2.6.22-rc1/drivers/net/e100.c.orig2007-05-18 10:16:03.0 
-0400
+++ linux-2.6.22-rc1/drivers/net/e100.c 2007-05-18 10:15:53.0 -0400
@@ -285,6 +285,12 @@ enum scb_status {
rus_mask = 0x3C,
 };

+enum ru_state  {
+   RU_SUSPENDED = 0,
+   RU_RUNNING   = 1,
+   RU_UNINITIALIZED = -1,
+};
+
 enum scb_stat_ack {
stat_ack_not_ours= 0x00,
stat_ack_sw_gen  = 0x04,
@@ -526,6 +532,7 @@ struct nic {
struct rx *rx_to_use;
struct rx *rx_to_clean;
struct rfd blank_rfd;
+   enum ru_state ru_running;

spinlock_t cb_lock  cacheline_aligned;
spinlock_t cmd_lock;
@@ -947,7 +954,7 @@ static void e100_get_defaults(struct nic
((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i));

/* Template for a freshly allocated RFD */
-   nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s);
+   nic->blank_rfd.command = cpu_to_le16(cb_el);
nic->blank_rfd.rbd = 0x;
nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);

@@ -1742,11 +1749,19 @@ static int e100_alloc_cbs(struct nic *ni
return 0;
 }

-static inline void e100_start_receiver(struct nic *nic)
+static inline void e100_start_receiver(struct nic *nic, struct rx *rx)
 {
-   /* Start if RFA is non-NULL */
-   if(nic->rx_to_clean->skb)
-   e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr);
+   if(!nic->rxs) return;
+   if(RU_SUSPENDED != nic->ru_running) return;
+
+   /* handle init time starts */
+   if(!rx) rx = nic->rxs;
+
+   /* (Re)start RU if suspended or idle and RFA is non-NULL */
+   if(rx->skb) {
+   e100_exec_cmd(nic, ruc_start, rx->dma_addr);
+   nic->ru_running = RU_RUNNING;
+   }
 }

 #define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN)
@@ -1769,13 +1784,12 @@ static int e100_rx_alloc_skb(struct nic
}

/* Link the RFD to end of RFA by linking previous RFD to
-* this one, and clearing EL bit of previous.  */
+* this one.  We are safe to touch the previous RFD because
+* it is protected the before last buffer's el bit being set  */
if(rx->prev->skb) {
struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
put_unaligned(cpu_to_le32(rx->dma_addr),
(u32 *)&prev_rfd->link);
-   wmb();
-   prev_rfd->command &= ~cpu_to_le16(cb_el & cb_s);
pci_dma_sync_single_for_d

Re: [PATCH 3/5] ucc_geth: eliminate max-speed, change interface-type to phy-connection-type

2007-05-18 Thread Kumar Gala


On May 17, 2007, at 8:53 PM, Jeff Garzik wrote:


Kim Phillips wrote:

On Tue, 15 May 2007 17:45:19 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

Kim Phillips wrote:

It was agreed that phy-connection-type was a better name for
the interface-type property, so this patch renames it.

Also, the max-speed property name was determined too generic,
and is therefore eliminated in favour of phy-connection-type
derivation logic.

includes corrections to copyright text.

Signed-off-by: Kim Phillips <[EMAIL PROTECTED]>
---
 drivers/net/ucc_geth.c |   40 +++ 
+

 drivers/net/ucc_geth_mii.c |9 +
 drivers/net/ucc_geth_mii.h |   10 +-
 3 files changed, 26 insertions(+), 33 deletions(-)

applied to #upstream

just to be clear; only the mpc8323e-mds board works on your  
current #upstream-{fixes,linus}.

The mpc8323e-rdb and mpc8360e-mds require
phylib: add the ICPlus IP175C PHY driver
and phylib: enable RGMII-ID on the Marvell m88e PHY
respectively in order to work.
I'm ok with the SGMII patches:
phylib: m88e: enable SGMII mode
gianfar: add support for SGMII
staying on #upstream, but 2 out of 3 boards are currently broken  
for #upstream-{fixes,linus}.


New hardware support is always "broken" until merged.  Nonetheless  
it is still new and outside the merge window.


I think new is a little relative here.  The MPC8232-RDB & MPC8360E- 
MDS boards worked with 2.6.21 and the UCC. They had support for the  
PHYs that Kim's listed built directly into the ucc driver and not as  
part of the phy lib.  Since we've moved to using the phy lib in  
2.6.22 we've broken these boards since their PHY drivers aren't  
currently in linus's tree.


Kim, correct me if I'm wrong.

- k
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-18 Thread David Acker

Kok, Auke wrote:

Jeff Garzik wrote:

Can you resend against the latest kernel (2.6.22-rc1)?

And what does Intel think?


I'm expecting at least a reply from Milton as the patch was sent to him. 
I haven't yet tested it but will certainly do so. At first glance it 
looks OK, and I'll try to put it under my colleague's noses who know 
e100 best.


A resend against 2.6.22-rc1 would be nice.



Done.  Below is a patch against 2.6.22-rc1.  It combines removing the s-bit patch and 
applying the patch I previously sent.



On the ARM, their is a race condition between software allocating a new receive
buffer and hardware writing into a buffer.  The two race on touching the last
Receive Frame Descriptor (RFD).  It has its el-bit set and its next link equal
to 0.  When hardware encounters this buffer it attempts to write data to it
and then update Status Word bits and Actual Count in the RFD.  At the same time
software may try to clear the el-bit and set the link address to a new buffer.
Since the entire RFD is once cache-line, the two write operations can
collide.  This can lead to the receive unit stalling or freed receive buffers
getting written to.

The fix is to set the el-bit on and the size to 0 on the next to last buffer
in the chain.  When the hardware encounters this buffer it stops and does
not write to it at all.  The hardware issues an RNR interrupt with the
receive unit in the No Resources state.  When software allocates buffers,
it can update the tail of the list because it knows the hardware will stop
at the buffer before it.  Once it has a new next to last buffer prepared,
it can clear the el-bit and set the size on the previous one.  The race on
this buffer is safe since the link already points to a valid next buffer.
If the hardware sees the el-bit cleared without the size set, it will
move on to the next buffer and complete that one in error.  If it sees
the size set but the el-bit still set, it will complete that buffer
and then RNR interrupt and wait.


Signed-off-by: David Acker <[EMAIL PROTECTED]>

---


--- linux-2.6.22-rc1/drivers/net/e100.c.orig2007-05-18 10:01:52.0 
-0400
+++ linux-2.6.22-rc1/drivers/net/e100.c 2007-05-18 09:59:33.0 -0400
@@ -285,6 +285,12 @@ enum scb_status {
rus_mask = 0x3C,
 };

+enum ru_state  {
+   RU_SUSPENDED = 0,
+   RU_RUNNING   = 1,
+   RU_UNINITIALIZED = -1,
+};
+
 enum scb_stat_ack {
stat_ack_not_ours= 0x00,
stat_ack_sw_gen  = 0x04,
@@ -526,6 +532,7 @@ struct nic {
struct rx *rx_to_use;
struct rx *rx_to_clean;
struct rfd blank_rfd;
+   enum ru_state ru_running;

spinlock_t cb_lock  cacheline_aligned;
spinlock_t cmd_lock;
@@ -947,7 +954,7 @@ static void e100_get_defaults(struct nic
((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i));

/* Template for a freshly allocated RFD */
-   nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s);
+   nic->blank_rfd.command = cpu_to_le16(cb_el);
nic->blank_rfd.rbd = 0x;
nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);

@@ -1742,11 +1749,19 @@ static int e100_alloc_cbs(struct nic *ni
return 0;
 }

-static inline void e100_start_receiver(struct nic *nic)
+static inline void e100_start_receiver(struct nic *nic, struct rx *rx)
 {
-   /* Start if RFA is non-NULL */
-   if(nic->rx_to_clean->skb)
-   e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr);
+   if(!nic->rxs) return;
+   if(RU_SUSPENDED != nic->ru_running) return;
+
+   /* handle init time starts */
+   if(!rx) rx = nic->rxs;
+
+   /* (Re)start RU if suspended or idle and RFA is non-NULL */
+   if(rx->skb) {
+   e100_exec_cmd(nic, ruc_start, rx->dma_addr);
+   nic->ru_running = RU_RUNNING;
+   }
 }

 #define RFD_BUF_LEN (sizeof(struct rfd) + VLAN_ETH_FRAME_LEN)
@@ -1769,13 +1784,12 @@ static int e100_rx_alloc_skb(struct nic
}

/* Link the RFD to end of RFA by linking previous RFD to
-* this one, and clearing EL bit of previous.  */
+* this one.  We are safe to touch the previous RFD because
+* it is protected the before last buffer's el bit being set  */
if(rx->prev->skb) {
struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
put_unaligned(cpu_to_le32(rx->dma_addr),
(u32 *)&prev_rfd->link);
-   wmb();
-   prev_rfd->command &= ~cpu_to_le16(cb_el & cb_s);
pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr,
sizeof(struct rfd), PCI_DMA_TODEVICE);
}
@@ -1813,6 +1827,10 @@ static int e100_rx_indicate(struct nic *
pci_unmap_single(nic->pdev, rx->dma_addr,
RFD_BUF_LEN, PCI_DMA_FROMDEVICE);

+   /* this allows for a fast restart without re-enabling interrupts */
+   if(le16

[PATCH 0/4] [Net] Support Xen accelerated network plugin modules

2007-05-18 Thread Kieran Mansley
This is a repost of some earlier patches to the xen-devel mailing list,
with a number of changes thanks to some useful suggestions from others.
Apologies for the short delay in getting this next version ready.

I've also CC'd netdev@vger.kernel.org as some of the files being patched
may be merged into upstream linux soon, and so folks there may have
opinions too.

This set of patches provides the hooks and support necessary for
accelerated network plugin modules to attach to Xen's netback and
netfront.  These modules provide a fast path for network traffic where
there is hardware support available for the netfront driver to send and
receive packets directly to a NIC (such as those available from
Solarflare).

As there are currently no available plugins, I've attached a couple of
dummy ones to illustrate how the hooks could be used.  These are
incomplete (and clearly wouldn't even compile) in that they only include
code to show the interface between the accelerated module and
netfront/netback.  A lot of the comments hint at what code should go
where.  They don't show any interface between the accelerated frontend
and accelerated backend, or hardware access, for example, as those would
both be specific to the implementation.  I hope they help illustrate
this, but if you have any questions I'm happy to provide more
information.

A brief overview of the operation of the plugins:  When the accelerated
modules are loaded, a VI is created by the accelerated backend to allow
the accelerated frontend to safely access portions of the NIC.  For RX,
when packets are received by the accelerated backend, it will examine
them and if appropriate insert filters into the NIC to deliver future
packets on that address directly to the accelerated frontend's VI.  For
TX, netfront gives each accelerated frontend the option of sending each
packet, which it can accept (if it wants to send it directly to the
hardware) or decline (if it thinks this is more appropriate to send via
the normal network path).

We have tried to ensure that the hooks are hardware-agnostic, i.e. would
be relevant to hardware other than our own, without providing all
possible ways of doing each task (but if others need to extend it, that
would be welcomed).

We have found that using this approach to accelerating network traffic,
domU to domU connections (across the network) can achieve close to the
performance of dom0 to dom0 connections on a 10Gbps ethernet.  This is
roughly double the bandwidth seen with unmodified Xen. 

Kieran
/***/
/*! \file dumm_accel_backend.c Dummy accelerated plugin module

Copyright 2006 Solarflare Communications Inc,
   9501 Jeronimo Road, Suite 250,
   Irvine, CA 92618, USA

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License version 2 as published by the Free
Software Foundation, incorporated herein by reference.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

*/
/***/

static struct netback_accel_hooks accel_hooks = {
&netback_accel_probe,
&netback_accel_remove
};

static const char *frontend_name = "dummynetaccel";

static int netback_accel_init(void)
{
/* Initialise the rest of the module... */

/* Tell the netback that we're here */
netback_connect_accelerator(0, frontend_name, &accel_hooks);
}
module_init(netback_accel_init);


static void __exit netback_accel_exit(void)
{
netback_disconnect_accelerator(0, frontend_name);

/* ...and take down the rest of the module */
}
module_exit(netback_accel_exit);




int netback_accel_probe(struct xenbus_device *dev)
{
struct backend_info *binfo;

/* Setup per-device internal state */

/* Store internal state for future access */
binfo = (struct backend_info *) dev->dev.driver_data;
binfo->netback_accel_priv = my_internal_state;

/* Setup watch on accel-state, so we know when frontend
   acceleration plugin is loaded */
setup_domu_accel_watch(dev, my_internal_state);
}


int netback_accel_remove(struct xenbus_device *dev)
{
/* Cleanup as the device is going away */
}


void netback_accel_frontend_changed(struct xenbus_device *dev,
XenbusState frontend_state)
{
switch(frontend_state) {
case XenbusStateConnected:
/* Frontend has loaded and is ready to go */
/* I

[PATCH 2/4] [Net] Support Xen accelerated network plugin modules

2007-05-18 Thread Kieran Mansley
Add accel option to vif xend config to allow users to specify which
interfaces should be accelerated using which plugin modules.

diff -r 194f5b88d257 tools/python/xen/xend/server/netif.py
--- a/tools/python/xen/xend/server/netif.py Tue Apr 17 09:04:51 2007
+0100
+++ b/tools/python/xen/xend/server/netif.py Tue Apr 17 09:06:58 2007
+0100
@@ -107,6 +107,7 @@ class NetifController(DevController):
 uuid= config.get('uuid')
 ipaddr  = config.get('ip')
 model   = config.get('model')
+accel   = config.get('accel')

 if not typ:
 typ = xoptions.netback_type
@@ -131,6 +132,8 @@ class NetifController(DevController):
 back['uuid'] = uuid
 if model:
 back['model'] = model
+if accel:
+back['accel'] = accel

 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 for x in back:
@@ -157,10 +160,10 @@ class NetifController(DevController):
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 devinfo = ()
 for x in ( 'script', 'ip', 'bridge', 'mac',
-   'type', 'vifname', 'rate', 'uuid', 'model' ):
+   'type', 'vifname', 'rate', 'uuid', 'model',
'accel'):
 y = self.vm._readVm(config_path + x)
 devinfo += (y,)
-(script, ip, bridge, mac, typ, vifname, rate, uuid, model) =
devinfo
+(script, ip, bridge, mac, typ, vifname, rate, uuid, model,
accel) = devinfo

 if script:
 result['script'] = script
@@ -180,5 +183,7 @@ class NetifController(DevController):
 result['uuid'] = uuid
 if model:
 result['model'] = model
+if accel:
+result['accel'] = accel
 
 return result
diff -r 194f5b88d257 tools/python/xen/xm/create.py
--- a/tools/python/xen/xm/create.py Tue Apr 17 09:04:51 2007 +0100
+++ b/tools/python/xen/xm/create.py Tue Apr 17 09:04:54 2007 +0100
@@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals):

 def f(k):
 if k not in ['backend', 'bridge', 'ip', 'mac', 'script',
'type',
- 'vifname', 'rate', 'model']:
+ 'vifname', 'rate', 'model', 'accel']:
 err('Invalid vif option: ' + k)

 config_vif.append([k, d[k]])

Add accel option to vif xend config

diff -r 194f5b88d257 tools/python/xen/xend/server/netif.py
--- a/tools/python/xen/xend/server/netif.py Tue Apr 17 09:04:51 2007 +0100
+++ b/tools/python/xen/xend/server/netif.py Tue Apr 17 09:06:58 2007 +0100
@@ -107,6 +107,7 @@ class NetifController(DevController):
 uuid= config.get('uuid')
 ipaddr  = config.get('ip')
 model   = config.get('model')
+accel   = config.get('accel')
 
 if not typ:
 typ = xoptions.netback_type
@@ -131,6 +132,8 @@ class NetifController(DevController):
 back['uuid'] = uuid
 if model:
 back['model'] = model
+if accel:
+back['accel'] = accel
 
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 for x in back:
@@ -157,10 +160,10 @@ class NetifController(DevController):
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 devinfo = ()
 for x in ( 'script', 'ip', 'bridge', 'mac',
-   'type', 'vifname', 'rate', 'uuid', 'model' ):
+   'type', 'vifname', 'rate', 'uuid', 'model', 'accel'):
 y = self.vm._readVm(config_path + x)
 devinfo += (y,)
-(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo
+(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = 
devinfo
 
 if script:
 result['script'] = script
@@ -180,5 +183,7 @@ class NetifController(DevController):
 result['uuid'] = uuid
 if model:
 result['model'] = model
+if accel:
+result['accel'] = accel
 
 return result
diff -r 194f5b88d257 tools/python/xen/xm/create.py
--- a/tools/python/xen/xm/create.py Tue Apr 17 09:04:51 2007 +0100
+++ b/tools/python/xen/xm/create.py Tue Apr 17 09:04:54 2007 +0100
@@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals):
 
 def f(k):
 if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type',
- 'vifname', 'rate', 'model']:
+ 'vifname', 'rate', 'model', 'accel']:
 err('Invalid vif option: ' + k)
 
 config_vif.append([k, d[k]])


[PATCH 4/4] [Net] Support Xen accelerated network plugin modules

2007-05-18 Thread Kieran Mansley
Add support to Xen netback to support accelerated plugin module

diff -r da9639486bf2 linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri May 18
10:36:47 2007 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri May 18
11:02:49 2007 +0100
@@ -114,6 +114,41 @@ typedef struct netif_st {
 #define netback_carrier_off(netif) ((netif)->carrier = 0)
 #define netback_carrier_ok(netif)  ((netif)->carrier)
 
+
+#include 
+
+/* Function pointers into netback accelerator plugin modules */
+struct netback_accel_hooks {
+int  (*probe)(struct xenbus_device *dev);
+int (*remove)(struct xenbus_device *dev);
+};
+
+/* Structure to track the state of a netback accelerator plugin */
+struct netback_accelerator {
+struct list_head link;
+int id;
+char *frontend;
+struct netback_accel_hooks *hooks;
+};
+
+/* Connect an accelerator plugin module to netback */
+extern void netback_connect_accelerator(int id, const char *frontend, 
+struct netback_accel_hooks
*hooks);
+/* Disconnect a previously connected accelerator pluging module */
+extern void netback_disconnect_accelerator(int id, const char
*frontend);
+
+struct backend_info {
+   struct xenbus_device *dev;
+   netif_t *netif;
+   enum xenbus_state frontend_state;
+
+/* State relating to the netback accelerator */
+void *netback_accel_priv;
+/* The accelerator that this backend is currently using */
+struct netback_accelerator *accelerator;
+};
+
+
 #define NET_TX_RING_SIZE __RING_SIZE((netif_tx_sring_t *)0, PAGE_SIZE)
 #define NET_RX_RING_SIZE __RING_SIZE((netif_rx_sring_t *)0, PAGE_SIZE)
 
diff -r da9639486bf2 linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri May 18
10:36:47 2007 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri May 18
11:02:54 2007 +0100
@@ -1,6 +1,7 @@
 /*  Xenbus code for netif backend
 Copyright (C) 2005 Rusty Russell <[EMAIL PROTECTED]>
 Copyright (C) 2005 XenSource Ltd
+Copyright (c) 2007 Solarflare Communications, Inc.
 
 This program is free software; you can redistribute it and/or
modify
 it under the terms of the GNU General Public License as published
by
@@ -28,11 +29,126 @@
 printk("netback/xenbus (%s:%d) " fmt ".\n", __FUNCTION__, __LINE__,
##args)
 #endif
 
-struct backend_info {
-   struct xenbus_device *dev;
-   netif_t *netif;
-   enum xenbus_state frontend_state;
-};
+/* 
+ * A list of available netback accelerator plugin modules (each list
+ * entry is of type struct netback_accelerator) 
+ */ 
+static struct list_head accelerators_list;
+/* Lock used to protect access to accelerators_list */
+static spinlock_t accelerators_lock;
+
+/* 
+ * Compare a backend to an accelerator, and decide if they are
+ * compatible (i.e. if the accelerator should be used by the
+ * backend) 
+ */
+static int match_accelerator(struct backend_info *be, 
+ struct netback_accelerator *accelerator)
+{
+/*
+ * This could do with being more sophisticated.  For example,
+ * determine which hardware is being used by each backend from
+ * the bridge and network topology of the domain
+ */
+if ( be->accelerator == NULL )
+return 1;
+else
+return 0;
+}
+
+/*
+ * Notify all suitable backends that a new accelerator is available
+ * and connected.  This will also notify the accelerator plugin module
+ * that it is being used for a device through the probe hook.
+ */
+static int netback_accelerator_tell_backend(struct device *dev, void
*arg)
+{
+struct netback_accelerator *accelerator = 
+(struct netback_accelerator *)arg;
+struct xenbus_device *xendev = to_xenbus_device(dev);
+
+if( !strcmp("vif", xendev->devicetype) ) {
+struct backend_info *be = xendev->dev.driver_data;
+
+if ( match_accelerator(be, accelerator) ) {
+ be->accelerator = accelerator;
+ be->accelerator->hooks->probe(xendev);
+}
+}
+return 0;
+}
+
+
+/*
+ * Entry point for an netback accelerator plugin module.  Called to
+ * advertise its presence, and connect to any suitable backends.
+ */
+void netback_connect_accelerator(int id, const char *frontend, 
+ struct netback_accel_hooks *hooks)
+{
+struct netback_accelerator *new_accelerator = 
+kmalloc(sizeof(struct netback_accelerator),
GFP_KERNEL);
+unsigned frontend_len, flags;
+
+if ( new_accelerator ) {
+new_accelerator->id = id;
+
+frontend_len = strlen(frontend)+1;
+new_accelerator->frontend = kmalloc(frontend_len,
GFP_KERNEL);
+i

[PATCH 1/4] [Net] Support Xen accelerated network plugin modules

2007-05-18 Thread Kieran Mansley
Provide two helper functions (xenbus_for_each_frontend,
xenbus_for_each_backend) to allow drivers to iterate the xenbus frontend
and backend buses.

diff -r 4f67d849e788 linux-2.6-xen-
sparse/drivers/xen/xenbus/xenbus_probe.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c Tue Apr 03
09:03:51 2007 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c Tue Apr 03
13:32:33 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version
2
@@ -1079,3 +1080,10 @@ static int __init boot_wait_for_devices(

late_initcall(boot_wait_for_devices);
#endif
+
+
+int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void
*))
+{
+return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_frontend);
diff -r 4f67d849e788 linux-2.6-xen-
sparse/drivers/xen/xenbus/xenbus_probe_backend.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.c Tue
Apr 03 09:03:51 2007 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.c Tue
Apr 03 13:32:33 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version
2
@@ -284,3 +285,10 @@ void xenbus_backend_device_register(void
   xenbus_backend.error);
}
}
+
+int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void
*))
+{
+return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_backend);
+
diff -r 4f67d849e788 linux-2.6-xen-sparse/include/xen/xenbus.h
--- a/linux-2.6-xen-sparse/include/xen/xenbus.h Tue Apr 03 09:03:51 2007
+0100
+++ b/linux-2.6-xen-sparse/include/xen/xenbus.h Tue Apr 03 13:32:33 2007
+0100
@@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d
int xenbus_dev_is_online(struct xenbus_device *dev);
int xenbus_frontend_closed(struct xenbus_device *dev);

+extern int xenbus_for_each_backend(void *arg, int (*fn)(struct device
*, void *));
+extern int xenbus_for_each_frontend(void *arg, int (*fn)(struct device
*, void *));
+
#endif /* _XEN_XENBUS_H */

Add xenbus_for_each_[back,front]end functions to iterate each bus

diff -r 4f67d849e788 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.cTue Apr 03 
09:03:51 2007 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.cTue Apr 03 
13:32:33 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -1079,3 +1080,10 @@ static int __init boot_wait_for_devices(
 
 late_initcall(boot_wait_for_devices);
 #endif
+
+
+int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *))
+{
+return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_frontend);
diff -r 4f67d849e788 
linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.cTue Apr 
03 09:03:51 2007 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe_backend.cTue Apr 
03 13:32:33 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -284,3 +285,10 @@ void xenbus_backend_device_register(void
   xenbus_backend.error);
}
 }
+
+int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *))
+{
+return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_backend);
+
diff -r 4f67d849e788 linux-2.6-xen-sparse/include/xen/xenbus.h
--- a/linux-2.6-xen-sparse/include/xen/xenbus.h Tue Apr 03 09:03:51 2007 +0100
+++ b/linux-2.6-xen-sparse/include/xen/xenbus.h Tue Apr 03 13:32:33 2007 +0100
@@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d
 int xenbus_dev_is_online(struct xenbus_device *dev);
 int xenbus_frontend_closed(struct xenbus_device

Re: select(0, ..) is valid ?

2007-05-18 Thread Andi Kleen
On Wednesday 16 May 2007 17:37, Anton Blanchard wrote:
> Hi Hugh,
>
> > It's interesting that compat_core_sys_select() shows this kmalloc(0)
> > failure but core_sys_select() does not.  That's because core_sys_select()
> > avoids kmalloc by using a buffer on the stack for small allocations (and
> > 0 sure is small).  Shouldn't compat_core_sys_select() do just the same?
> > Or is SLUB going to be so efficient that doing so is a waste of time?
>
> Nice catch, the original optimisation from Andi is:
>
> http://git.kernel.org/git-new/?p=linux/kernel/git/torvalds/linux-2.6.git;a=
>commit;h=70674f95c0a2ea694d5c39f4e514f538a09be36f
>
> And I think it makes sense for the compat code to do it too.

Yes agreed. I just forgot the copy'n'pasted code when doing the original
change.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP] [PATCH] WAS Re: [RFC] New driver API to speed up small packets xmits

2007-05-18 Thread jamal
On Wed, 2007-16-05 at 23:25 -0400, jamal wrote:

> 
> This patch now includes two changed drivers (tun and e1000). I have
> tested tun with this patch. I tested e1000 earlier and i couldnt find
> any issues - although as the tittle says its a WIP.
> 
> As before you need net-2.6. You also need the qdisc restart cleanup
> patch.
> 

I have found a bug on e1000 which manifests under a lot of pounding with
traffic.
I am not gonna be posting the patch anymore - but will continue to fix
and add things. If you want the patches email me privately.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] NetXen: Updates for register access routines

2007-05-18 Thread Mithlesh Thukral
On Friday 18 May 2007 06:17, Jeff Garzik wrote:
> Mithlesh Thukral wrote:
> > NetXen: Changes related to registers and their access routines
> > This patch updates the various access routines to access different
> > control and status settings present in different register locations.
> > This will fix problems related to working of different ports in
> > multi Port card.
> >
> > Signed-off by: Milan Bag <[EMAIL PROTECTED]>
> > Signed-off by: Adhiraj Joshi <[EMAIL PROTECTED]>
> > Acked-by: Mithlesh Thukral <[EMAIL PROTECTED]>
> > ---
> >  drivers/net/netxen/netxen_nic.h  |   96 ++---
> >  drivers/net/netxen/netxen_nic_hw.h   |   65 +++-
> >  drivers/net/netxen/netxen_nic_init.c |   17 ++--
> >  3 files changed, 65 insertions(+), 113 deletions(-)
>
> not applied.  This patch removes a bunch of endianness changes.
Ok. I will take care of these changes in the next patchset that i submit.
Thanks for applying the first patch.

--
Mithlesh Thukral
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/7] CAN: Add PF_CAN core module

2007-05-18 Thread Oliver Hartkopp

Hi Urs, Hello Paul,

i assume Paul refers to the can_rx_delete_all() function that adds each 
receive list entry for rcu removal using the can_rx_delete RCU callback, 
right?


So the idea would be to create a second RCU callback - e.g. 
can_rx_delete_list() - that removes the complete list inside the RCU 
callback?!?
The list removal would therefore be processed inside this new 
can_rx_delete_list() in RCU context and not inside can_rx_delete_all().


@Paul: Was this your intention?

Best regards,
Oliver

Paul E. McKenney wrote:

On Wed, May 16, 2007 at 04:51:02PM +0200, Urs Thuermann wrote:
  

This patch adds the CAN core functionality but no protocols or drivers.
No protocol implementations are included here.  They come as separate
patches.  Protocol numbers are already in include/linux/can.h.



Interesting!  One question called out below -- why do call_rcu() on each
piece of the struct dev_rcv_lists, instead of doing call_rcu() on the
whole thing and having the RCU callback free up the pieces?  Given that
all the pieces are call_rcu()ed separately, there had better not be
persistent pointers to the pieces, right?

Doing it in one chunk would make the code a bit simpler and also reduce
the RCU overhead a bit.

Or am I missing something subtle here?

Thanx, Paul

  



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-2.6 2/2] [TCP] FRTO: Prevent state inconsistency in corner cases

2007-05-18 Thread Ilpo Järvinen
State could become inconsistent in two cases:

1) Userspace disabled FRTO by tuning sysctl when one of the TCP
   flows was in the middle of FRTO algorithm (and then RTO is
   again triggered)

2) SACK reneging occurs during FRTO algorithm

A simple solution is just to abort the previous FRTO when such
obscure condition occurs...

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 net/ipv4/tcp_input.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7ecdc89..38cb25b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1501,6 +1501,8 @@ void tcp_enter_loss(struct sock *sk, int how)
tcp_set_ca_state(sk, TCP_CA_Loss);
tp->high_seq = tp->snd_nxt;
TCP_ECN_queue_cwr(tp);
+   /* Abort FRTO algorithm if one is in progress */
+   tp->frto_counter = 0;
 
clear_all_retrans_hints(tp);
 }
-- 
1.5.0.6


[PATCH RESEND net-2.6] [TCP] FRTO: Add missing ECN CWR sending to one of the responses

2007-05-18 Thread Ilpo Järvinen
The conservative spurious RTO response did not queue CWR even
though the sending rate was lowered. Whenever reduction happens
regardless of reason, CWR should be sent (forgetting to send it
is not very fatal though).

A better approach would be to queue CWR when one of the sending
rate reducing responses (rate-halving one or this conservative
response) is used already at RTO. Doing that would allow CWR to
be sent along with the two new data segments that are sent
during FRTO. However, it's a bit "racy" because userland could
tune the response sysctl to a more aggressive one in between.

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 net/ipv4/tcp_input.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7641b27..7ecdc89 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2608,6 +2608,7 @@ static void tcp_conservative_spur_to_response(struct 
tcp_sock *tp)
 {
tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh);
tp->snd_cwnd_cnt = 0;
+   TCP_ECN_queue_cwr(tp);
tcp_moderate_cwnd(tp);
 }
 
-- 
1.5.0.6


Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-05-18 Thread David Miller
From: Julian Anastasov <[EMAIL PROTECTED]>
Date: Fri, 18 May 2007 11:40:54 +0300 (EEST)

> On Thu, 17 May 2007, Patrick McHardy wrote:
> 
> > In any case some better solution than the current one needs to be
> > found, allowing users to send spoofed packets is far worse than
> > using a non-desired source address for ICMP packets.
> 
>   yes, I would prefer the sysctl_ip_nonlocal_bind change to be
> removed until such solution is found.

Ok, I'll revert it.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-05-18 Thread Julian Anastasov

Hello,

On Thu, 17 May 2007, Patrick McHardy wrote:

> > But what is preferred is to use VIP in ICMP.
> > 
> > ip route add local VIP dev lo table user_defined
> > 
> > returns RTCF_LOCAL but inet_addr_type() does not return RTN_LOCAL,
> > we fix one thing but break another :)
> 
> 
> Actually thats exactly the case that my patch handles. Why does it
> matter which source address the ICMP packet uses, as long as its
> routed properly?

It should work for most of the cases but it can cause problems
in closely connected hosts where using the right subnet matters.
If inet_addr_type is not considered slow for routers and this
local route justifies it then i have no more objections. May be
Janusz should test it first without sysctl_ip_nonlocal_bind change.

> In any case some better solution than the current one needs to be
> found, allowing users to send spoofed packets is far worse than
> using a non-desired source address for ICMP packets.

yes, I would prefer the sysctl_ip_nonlocal_bind change to be
removed until such solution is found.

Regards

--
Julian Anastasov <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] s2io: add PCI error recovery support

2007-05-18 Thread Sivakumar Subramani
Hi,

Fix looks good. I have couple of comments,

1) Return from s2io_updt_stats function if the PCI bus is offline
(pci_channel_offline). 
 if (pci_channel_offline(pdev))
return;

2) No Need to call netif_wake_queue() in s2io_io_resume as
netif_device_attach() will take care of calling it.

3) We need to return from s2io_msix_ring_handle(),
s2io_msix_fifo_handle(), s2io_msi_handle() if the PCI channel is
offline. Some thing similary to the below condition that is added in the
s2io_isr()

 -4117,6 +4129,10 @@ static irqreturn_t s2io_isr(int irq, voi
struct mac_info *mac_control;
struct config_param *config;

+   /* Pretend we handled any irq's from a disconnected card */
+   if (pci_channel_offline(sp->pdev))
+   return IRQ_NONE;

Thanks,
~Siva

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Linas Vepstas
Sent: Tuesday, May 15, 2007 5:08 AM
To: Jeff Garzik
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org;
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: [PATCH 1/2] s2io: add PCI error recovery support


This patch adds PCI error recovery support to the s2io 10-Gigabit
ethernet device driver. Third revision, blocks interrupts and the
watchdog.

Tested, seems to work well.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Acked-by: Ramkrishna Vepa <[EMAIL PROTECTED]>
Cc: Raghavendra Koushik <[EMAIL PROTECTED]>
Cc: Wen Xiong <[EMAIL PROTECTED]>



Jeff, please apply for 2.6.22

--linas


 drivers/net/s2io.c |  116
++---
 drivers/net/s2io.h |5 ++
 2 files changed, 116 insertions(+), 5 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/s2io.c
===
--- linux-2.6.22-rc1.orig/drivers/net/s2io.c2007-05-14
17:05:03.0 -0500
+++ linux-2.6.22-rc1/drivers/net/s2io.c 2007-05-14 17:23:45.0
-0500
@@ -480,11 +480,18 @@ static struct pci_device_id s2io_tbl[] _
 
 MODULE_DEVICE_TABLE(pci, s2io_tbl);
 
+static struct pci_error_handlers s2io_err_handler = {
+   .error_detected = s2io_io_error_detected,
+   .slot_reset = s2io_io_slot_reset,
+   .resume = s2io_io_resume,
+};
+
 static struct pci_driver s2io_driver = {
   .name = "S2IO",
   .id_table = s2io_tbl,
   .probe = s2io_init_nic,
   .remove = __devexit_p(s2io_rem_nic),
+  .err_handler = &s2io_err_handler,
 };
 
 /* A simplifier macro used both by init and free shared_mem Fns(). */
@@ -2700,6 +2707,9 @@ static void s2io_netpoll(struct net_devi
u64 val64 = 0xULL;
int i;
 
+   if (pci_channel_offline(nic->pdev))
+   return;
+
disable_irq(dev->irq);
 
atomic_inc(&nic->isr_cnt);
@@ -3225,6 +3235,8 @@ static void alarm_intr_handler(struct s2
int i;
if (atomic_read(&nic->card_state) == CARD_DOWN)
return;
+   if (pci_channel_offline(nic->pdev))
+   return;
nic->mac_control.stats_info->sw_stat.ring_full_cnt = 0;
/* Handling the XPAK counters update */
if(nic->mac_control.stats_info->xpak_stat.xpak_timer_count <
72000) { @@ -4324,6 +4336,10 @@ static irqreturn_t s2io_isr(int irq, voi
struct mac_info *mac_control;
struct config_param *config;
 
+   /* Pretend we handled any irq's from a disconnected card */
+   if (pci_channel_offline(sp->pdev))
+   return IRQ_NONE;
+
atomic_inc(&sp->isr_cnt);
mac_control = &sp->mac_control;
config = &sp->config;
@@ -6579,7 +6595,7 @@ static void s2io_rem_isr(struct s2io_nic
} while(cnt < 5);
 }
 
-static void s2io_card_down(struct s2io_nic * sp)
+static void do_s2io_card_down(struct s2io_nic * sp, int do_io)
 {
int cnt = 0;
struct XENA_dev_config __iomem *bar0 = sp->bar0; @@ -6594,7
+6610,8 @@ static void s2io_card_down(struct s2io_n
atomic_set(&sp->card_state, CARD_DOWN);
 
/* disable Tx and Rx traffic on the NIC */
-   stop_nic(sp);
+   if (do_io)
+   stop_nic(sp);
 
s2io_rem_isr(sp);
 
@@ -6602,7 +6619,7 @@ static void s2io_card_down(struct s2io_n
tasklet_kill(&sp->task);
 
/* Check if the device is Quiescent and then Reset the NIC */
-   do {
+   while(do_io) {
/* As per the HW requirement we need to replenish the
 * receive buffer to avoid the ring bump. Since there is
 * no intention of processing the Rx frame at this
pointwe are @@ -6627,8 +6644,9 @@ static void s2io_card_down(struct
s2io_n
  (unsigned long long) val64);
break;
}
-   } while (1);
-   s2io_reset(sp);
+   }
+   if (do_io)
+   s2io_reset(sp);
 
spin_lock_irqsave(&sp->tx_lock, flags);
/* Free all Tx buffers */
@@ -6643,6 +6661,11 @@ static void s2io_card_down(struct s2io_n
 

Re: [RFC] New driver API to speed up small packets xmits

2007-05-18 Thread Michael Chan
David Miller wrote:

> From: "Michael Chan" <[EMAIL PROTECTED]>
> Date: Tue, 15 May 2007 15:05:28 -0700
> 
> > For each gso_seg, there will be a header and the payload may span 2
> > pages for 1500-byte packets.  We always assume 1500-byte packets
> > because the buggy chips do not support jumbo frames.
> 
> Correct.
> 
> I think there may be a case where you could see up to 4 segments.
> If the user corks the TCP socket, does a sendmsg() (which puts
> the data in the per-socket page) then does a sendfile() you'll
> see something like:
> 
> skb->data IP, TCP, ethernet headers, etc.
> page0 sendmsg() data
> page1 sendfile
> page2 sendfile
> 
> Ie. this can happen if the sendfile() part starts near the
> end of a page, so it would get split even for a 1500 MTU
> frame.
> 
> Even more complex variants are possible if the user does
> tiny sendfile() requests to different pages within the file.
> 
> So in fact it can span up to N pages.
> 
> But there is an upper limit defined by the original GSO
> frame, and that is controlled by MAX_SKB_FRAGS, so at most
> you would see MAX_SKB_FRAGS plus some small constant.
> 
> 

I see.  Is the following a better estimate?

(gso_segs * 2) + skb_shinfo(skb)->nr_frags

For each gso_seg, you get at least a header segment and a
segment from a page for 1500-byte packets.  You can get more
segments if packets cross page boundaries or if there are tiny
sendfile pages or some sendmsg pages mixed in.  But these
additional segments will never be more than nr_frags.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html