[RTNETLINK]: Fix use of wrong skb in do_getlink()

2006-10-11 Thread Patrick McHardy
[RTNETLINK]: Fix use of wrong skb in do_getlink()

skb is the netlink query, nskb is the reply message.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 7d6fc0796ea33cd86cf20ccd51c351538903dcd9
tree 15bd1793a030f471f4a70762b079cf187bca8207
parent 587b522ced8a8ca3bddc99b05ec5118f11cfe164
author Patrick McHardy <[EMAIL PROTECTED]> Thu, 12 Oct 2006 08:38:03 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Thu, 12 Oct 2006 08:38:03 +0200

 net/core/rtnetlink.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 221e403..02f3c79 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -602,7 +602,7 @@ #endif  /* CONFIG_NET_WIRELESS_RTNETLINK 
goto errout;
}
 
-   err = rtnl_unicast(skb, NETLINK_CB(skb).pid);
+   err = rtnl_unicast(nskb, NETLINK_CB(skb).pid);
 errout:
kfree(iw_buf);
dev_put(dev);


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread Eric Dumazet

David Miller a écrit :

From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 07:48:20 +0200


Not on my gcc here (gcc version 3.4.4) : It wont zeros out the padding bytes


Patrick just proved this too :)


Well, on this machine I have these oprofile numbers :

: /* rt_intern_hash total: 993464  0.3619 */

  31751  0.0116 :803e8d26:   repz cmpsb %es:(%rdi),%ds:(%rsi)
433438  0.1579 :803e8d28:   jne803e8f80 



Indeed, numbers talk bullshit walks :)

How about something like this as a start?

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c41ddba..4d4b1bd 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -566,9 +566,13 @@ static inline u32 rt_score(struct rtable
 
 static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)

 {
-   return memcmp(&fl1->nl_u.ip4_u, &fl2->nl_u.ip4_u, sizeof(fl1->nl_u.ip4_u)) == 0 
&&
-  fl1->oif == fl2->oif &&
-  fl1->iif == fl2->iif;
+   return ((fl1->nl_u.ip4_u.daddr ^ fl2->nl_u.ip4_u.daddr) |
+   (fl1->nl_u.ip4_u.saddr ^ fl2->nl_u.ip4_u.saddr) |
+   (fl1->nl_u.ip4_u.fwmark ^ fl2->nl_u.ip4_u.fwmark) |
+   (*(u16 *)&fl1->nl_u.ip4_u.tos ^
+*(u16 *)&fl2->nl_u.ip4_u.tos) |
+   (fl1->oif ^ fl2->oif) |
+   (fl1->iif ^ fl2->iif)) == 0;
 }
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED





Yes it seems a nice start :)

How about avoiding the fwmark thing if !CONFIG_IP_ROUTE_FWMARK

> +  return ((fl1->nl_u.ip4_u.daddr ^ fl2->nl_u.ip4_u.daddr) |
> +  (fl1->nl_u.ip4_u.saddr ^ fl2->nl_u.ip4_u.saddr) |

#ifdef CONFIG_IP_ROUTE_FWMARK

> +  (fl1->nl_u.ip4_u.fwmark ^ fl2->nl_u.ip4_u.fwmark) |

#endif

> +  (*(u16 *)&fl1->nl_u.ip4_u.tos ^
> +   *(u16 *)&fl2->nl_u.ip4_u.tos) |
> +  (fl1->oif ^ fl2->oif) |
> +  (fl1->iif ^ fl2->iif)) == 0;


Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread David Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 08:10:43 +0200

> David Miller wrote:
> > Indeed, numbers talk bullshit walks :)
> 
> :)
> 
> > How about something like this as a start?
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> 
> The same problem is also present in dn_route.c (3 uninitialized
> bytes in dn_u).

This should take care of that case too:

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index dd0761e..33ccc56 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -267,9 +267,12 @@ static void dn_dst_link_failure(struct s
 
 static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
 {
-   return memcmp(&fl1->nl_u.dn_u, &fl2->nl_u.dn_u, sizeof(fl1->nl_u.dn_u)) 
== 0 &&
-   fl1->oif == fl2->oif &&
-   fl1->iif == fl2->iif;
+   return ((fl1->nl_u.dn_u.daddr ^ fl2->nl_u.dn_u.daddr) |
+   (fl1->nl_u.dn_u.saddr ^ fl2->nl_u.dn_u.saddr) |
+   (fl1->nl_u.dn_u.fwmark ^ fl2->nl_u.dn_u.fwmark) |
+   (fl1->nl_u.dn_u.scope ^ fl2->nl_u.dn_u.scope) |
+   (fl1->oif ^ fl2->oif) |
+   (fl1->iif ^ fl2->iif)) == 0;
 }
 
 static int dn_insert_route(struct dn_route *rt, unsigned hash, struct dn_route 
**rp)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c41ddba..4d4b1bd 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -566,9 +566,13 @@ static inline u32 rt_score(struct rtable
 
 static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
 {
-   return memcmp(&fl1->nl_u.ip4_u, &fl2->nl_u.ip4_u, 
sizeof(fl1->nl_u.ip4_u)) == 0 &&
-  fl1->oif == fl2->oif &&
-  fl1->iif == fl2->iif;
+   return ((fl1->nl_u.ip4_u.daddr ^ fl2->nl_u.ip4_u.daddr) |
+   (fl1->nl_u.ip4_u.saddr ^ fl2->nl_u.ip4_u.saddr) |
+   (fl1->nl_u.ip4_u.fwmark ^ fl2->nl_u.ip4_u.fwmark) |
+   (*(u16 *)&fl1->nl_u.ip4_u.tos ^
+*(u16 *)&fl2->nl_u.ip4_u.tos) |
+   (fl1->oif ^ fl2->oif) |
+   (fl1->iif ^ fl2->iif)) == 0;
 }
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread Patrick McHardy
David Miller wrote:
> Indeed, numbers talk bullshit walks :)

:)

> How about something like this as a start?
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c

The same problem is also present in dn_route.c (3 uninitialized
bytes in dn_u).

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 07:48:20 +0200

> Not on my gcc here (gcc version 3.4.4) : It wont zeros out the padding bytes

Patrick just proved this too :)

> Well, on this machine I have these oprofile numbers :
> 
> : /* rt_intern_hash total: 993464  0.3619 */
> 
>   31751  0.0116 :803e8d26:   repz cmpsb %es:(%rdi),%ds:(%rsi)
> 433438  0.1579 :803e8d28:   jne803e8f80 
> 

Indeed, numbers talk bullshit walks :)

How about something like this as a start?

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c41ddba..4d4b1bd 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -566,9 +566,13 @@ static inline u32 rt_score(struct rtable
 
 static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
 {
-   return memcmp(&fl1->nl_u.ip4_u, &fl2->nl_u.ip4_u, 
sizeof(fl1->nl_u.ip4_u)) == 0 &&
-  fl1->oif == fl2->oif &&
-  fl1->iif == fl2->iif;
+   return ((fl1->nl_u.ip4_u.daddr ^ fl2->nl_u.ip4_u.daddr) |
+   (fl1->nl_u.ip4_u.saddr ^ fl2->nl_u.ip4_u.saddr) |
+   (fl1->nl_u.ip4_u.fwmark ^ fl2->nl_u.ip4_u.fwmark) |
+   (*(u16 *)&fl1->nl_u.ip4_u.tos ^
+*(u16 *)&fl2->nl_u.ip4_u.tos) |
+   (fl1->oif ^ fl2->oif) |
+   (fl1->iif ^ fl2->iif)) == 0;
 }
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread David Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 07:31:12 +0200

> This program demonstrates the effect, it doesn't output the expected
> "1 2" but "1 4294967042" on my x86_64 (gcc-Version 4.1.2 20060901
> (prerelease) (Debian 4.1.1-13)). The initialization doesn't touch
> the padding bytes:
> 
> 0x00400494 :movl   $0x1,0xfff0(%rbp)
> 0x0040049b :   movb   $0x2,0xfff4(%rbp)

Crap, ok we need to fix this.

I even looked at the assembler for a little test foo.c bit
of code... must have misread the sparc64 asm :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread Eric Dumazet

David Miller a écrit :

From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Wed, 11 Oct 2006 15:11:18 +0200

Using memcmp(ptr1, ptr2, sizeof(SOMEFIELD)) is dangerous because 
sizeof(SOMEFIELD) can be larger than the underlying object, because of 
alignment constraints.


In this case, sizeof(fl1->nl_u.ip4_u) is 16, while fl1->nl_u.ip4_u is :

struct {
__u32   daddr;
__u32   saddr;
__u32   fwmark;
__u8tos;
__u8scope;
} ip4_u;

So 14 bytes are really initialized, and 2 padding bytes might contain random 
values...


We always explicitly initialize the flows, and even for local stack
assignment based initialization, gcc zeros out the padding bytes
always.  For non-stack-local cases we do explicit memset()'s over the
entire object.  So I think while not perfect, we're very much safe
here.



Not on my gcc here (gcc version 3.4.4) : It wont zeros out the padding bytes

# cat bug.c
struct s1 {
long d;
char c;
};

void bar()
{
struct s1 s = { .d = 123, .c = 'a'};
foo(&s, sizeof(s));
}
# gcc -O2 -S bug.c
# more bug.s
.globl bar
.type   bar, @function
bar:
.LFB2:
subq$24, %rsp
.LCFI0:
movl$16, %esi
xorl%eax, %eax
movq%rsp, %rdi
movq$123, (%rsp)
movb$97, 8(%rsp)
callfoo
addq$24, %rsp
ret

So 9(%rsp) -> 15(%rsp) are not initialized

Same on more recent gcc (4.1.1)

fast comparison, we should do some clever long/int XOR operations to avoid 
many test/branches, like the optim we did in compare_ether_addr())


As shown in profiles, "repz cmpsb" is really a dog... (and its use of 
esi/edi/ecx registers a high pressure for the compiler/optimizer)


Yes, for the fast comparisons it is almost certainly worth it to do
something saner in compare_keys().

But looking at this further, compare_keys() is only used in hotpath
situations where we are optimizing for a miss, such as during hash
insert.  The optimization therefore might be less justified as a
result.


Well, on this machine I have these oprofile numbers :

: /* rt_intern_hash total: 993464  0.3619 */

 31751  0.0116 :803e8d26:   repz cmpsb %es:(%rdi),%ds:(%rsi)
433438  0.1579 :803e8d28:   jne803e8f80 


Eric



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread Patrick McHardy
David Miller wrote:
> We always explicitly initialize the flows, and even for local stack
> assignment based initialization, gcc zeros out the padding bytes
> always.

I thought so too until I added the iptables compat functions recently
and noticed uninitialized padding of on-stack structures, which
confused iptables since it also uses memcmp.

This program demonstrates the effect, it doesn't output the expected
"1 2" but "1 4294967042" on my x86_64 (gcc-Version 4.1.2 20060901
(prerelease) (Debian 4.1.1-13)). The initialization doesn't touch
the padding bytes:

0x00400494 :movl   $0x1,0xfff0(%rbp)
0x0040049b :   movb   $0x2,0xfff4(%rbp)

#include 

struct x1 {
	unsigned int	x;
	char		y;
};

struct x2 {
	unsigned int	x;
	unsigned int	y;
};

void pollute(void)
{
	struct x2 x = {
		.x	= ~0,
		.y	= ~0,
	};
}

void test(void)
{
	struct x1 x1 = {
		.x	= 1,
		.y	= 2,
	};
	struct x2 *x2 = (struct x2 *)&x1;

	printf("%u %u\n", x2->x, x2->y);
}

int main(int argc, char **argv)
{
	pollute();
	test();
	return 0;
}


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Wed, 11 Oct 2006 15:11:18 +0200

> Using memcmp(ptr1, ptr2, sizeof(SOMEFIELD)) is dangerous because 
> sizeof(SOMEFIELD) can be larger than the underlying object, because of 
> alignment constraints.
> 
> In this case, sizeof(fl1->nl_u.ip4_u) is 16, while fl1->nl_u.ip4_u is :
> 
> struct {
> __u32   daddr;
> __u32   saddr;
> __u32   fwmark;
> __u8tos;
> __u8scope;
> } ip4_u;
> 
> So 14 bytes are really initialized, and 2 padding bytes might contain random 
> values...

We always explicitly initialize the flows, and even for local stack
assignment based initialization, gcc zeros out the padding bytes
always.  For non-stack-local cases we do explicit memset()'s over the
entire object.  So I think while not perfect, we're very much safe
here.

> fast comparison, we should do some clever long/int XOR operations to avoid 
> many test/branches, like the optim we did in compare_ether_addr())
> 
> As shown in profiles, "repz cmpsb" is really a dog... (and its use of 
> esi/edi/ecx registers a high pressure for the compiler/optimizer)

Yes, for the fast comparisons it is almost certainly worth it to do
something saner in compare_keys().

But looking at this further, compare_keys() is only used in hotpath
situations where we are optimizing for a miss, such as during hash
insert.  The optimization therefore might be less justified as a
result.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is current sundance.c status

2006-10-11 Thread Andrew Morton
On Thu, 12 Oct 2006 10:29:37 +0800
"Jesse Huang" <[EMAIL PROTECTED]> wrote:

> Would you tell me what is the current IP100A status? Should I re-generate 
> patches again. Would it put into kernel or not?

I'm sitting on a copy of them.  I didn't send them to Jeff last time
because:

sundance-remove-txstartthresh-and-rxearlythresh.patch

 There's no description of what this patent issue is.

sundance-fix-tx-pause-bug-reset_tx-intr_handler.patch

 There's no description of the bug which got fixed, nor how this patch
 fixes it.

sundance-change-phy-address-search-from-phy=1-to-phy=0.patch

 There's a (small) possibility that this will break on hardware which
 _doesn't_ have a phy at address 0.

sundance-correct-initial-and-close-hardware-step.patch

 There's no real description of the bug which is being fixed, nor of how
 this patch fixes it.

sundance-solve-host-error-problem-in-low-performance-embedded.patch

 No description of what the "host error problem" is, nor of what causes
 it, nor of how this patch fixes it.


So generally these patches are a bit worrying, and it is hard to gauge what
their risk factor is.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] NET/atm: handle sysfs errors

2006-10-11 Thread Jeff Garzik


Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

---

 net/atm/atm_sysfs.c |   15 ---

diff --git a/net/atm/atm_sysfs.c b/net/atm/atm_sysfs.c
index c0a4ae2..62f6ed1 100644
--- a/net/atm/atm_sysfs.c
+++ b/net/atm/atm_sysfs.c
@@ -141,7 +141,7 @@ static struct class atm_class = {
 int atm_register_sysfs(struct atm_dev *adev)
 {
struct class_device *cdev = &adev->class_dev;
-   int i, err;
+   int i, j, err;
 
cdev->class = &atm_class;
class_set_devdata(cdev, adev);
@@ -151,10 +151,19 @@ int atm_register_sysfs(struct atm_dev *a
if (err < 0)
return err;
 
-   for (i = 0; atm_attrs[i]; i++)
-   class_device_create_file(cdev, atm_attrs[i]);
+   for (i = 0; atm_attrs[i]; i++) {
+   err = class_device_create_file(cdev, atm_attrs[i]);
+   if (err)
+   goto err_out;
+   }
 
return 0;
+
+err_out:
+   for (j = 0; j < i; j++)
+   class_device_remove_file(cdev, atm_attrs[j]);
+   class_device_del(cdev);
+   return err;
 }
 
 void atm_unregister_sysfs(struct atm_dev *adev)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] NET/bluetooth: handle sysfs errors

2006-10-11 Thread Jeff Garzik


Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

---

 net/bluetooth/l2cap.c   |   19 ++-
 net/bluetooth/rfcomm/core.c |   21 ++---
 net/bluetooth/rfcomm/sock.c |7 ++-
 net/bluetooth/sco.c |   17 -
 4 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index d56f60b..7dda7ce 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -2206,24 +2206,33 @@ static int __init l2cap_init(void)
err = bt_sock_register(BTPROTO_L2CAP, &l2cap_sock_family_ops);
if (err < 0) {
BT_ERR("L2CAP socket registration failed");
-   goto error;
+   goto err_proto;
}
 
err = hci_register_proto(&l2cap_hci_proto);
if (err < 0) {
BT_ERR("L2CAP protocol registration failed");
-   bt_sock_unregister(BTPROTO_L2CAP);
-   goto error;
+   goto err_sock;
}
 
-   class_create_file(bt_class, &class_attr_l2cap);
+   err = class_create_file(bt_class, &class_attr_l2cap);
+   if (err) {
+   BT_ERR("L2CAP class registration failed");
+   goto err_proto2;
+   }
 
BT_INFO("L2CAP ver %s", VERSION);
BT_INFO("L2CAP socket layer initialized");
 
return 0;
 
-error:
+err_proto2:
+   if (hci_unregister_proto(&l2cap_hci_proto) < 0)
+   BT_ERR("L2CAP protocol unregistration failed");
+err_sock:
+   if (bt_sock_unregister(BTPROTO_L2CAP) < 0)
+   BT_ERR("L2CAP socket unregistration failed");
+err_proto:
proto_unregister(&l2cap_proto);
return err;
 }
diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 468df3b..102d668 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -2052,13 +2052,21 @@ static CLASS_ATTR(rfcomm_dlc, S_IRUGO, r
 /*  Initialization  */
 static int __init rfcomm_init(void)
 {
+   int rc;
+
l2cap_load();
 
-   hci_register_cb(&rfcomm_cb);
+   rc = hci_register_cb(&rfcomm_cb);
+   if (rc)
+   goto err;
 
-   kernel_thread(rfcomm_run, NULL, CLONE_KERNEL);
+   rc = class_create_file(bt_class, &class_attr_rfcomm_dlc);
+   if (rc)
+   goto err_hcireg;
 
-   class_create_file(bt_class, &class_attr_rfcomm_dlc);
+   rc = kernel_thread(rfcomm_run, NULL, CLONE_KERNEL);
+   if (rc < 0)
+   goto err_file;
 
rfcomm_init_sockets();
 
@@ -2069,6 +2077,13 @@ #endif
BT_INFO("RFCOMM ver %s", VERSION);
 
return 0;
+
+err_file:
+   class_remove_file(bt_class, &class_attr_rfcomm_dlc);
+err_hcireg:
+   hci_unregister_cb(&rfcomm_cb);
+err:
+   return rc;
 }
 
 static void __exit rfcomm_exit(void)
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 220fee0..ff9b3f1 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -944,12 +944,17 @@ int __init rfcomm_init_sockets(void)
if (err < 0)
goto error;
 
-   class_create_file(bt_class, &class_attr_rfcomm);
+   err = class_create_file(bt_class, &class_attr_rfcomm);
+   if (err)
+   goto error_sock;
 
BT_INFO("RFCOMM socket layer initialized");
 
return 0;
 
+error_sock:
+   if (bt_sock_unregister(BTPROTO_RFCOMM) < 0)
+   BT_ERR("RFCOMM socket layer unregistration failed");
 error:
BT_ERR("RFCOMM socket layer registration failed");
proto_unregister(&rfcomm_proto);
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 7714a2e..f28392a 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -957,24 +957,31 @@ static int __init sco_init(void)
err = bt_sock_register(BTPROTO_SCO, &sco_sock_family_ops);
if (err < 0) {
BT_ERR("SCO socket registration failed");
-   goto error;
+   goto err_proto;
}
 
err = hci_register_proto(&sco_hci_proto);
if (err < 0) {
BT_ERR("SCO protocol registration failed");
-   bt_sock_unregister(BTPROTO_SCO);
-   goto error;
+   goto err_sock;
}
 
-   class_create_file(bt_class, &class_attr_sco);
+   err = class_create_file(bt_class, &class_attr_sco);
+   if (err)
+   goto err_proto2;
 
BT_INFO("SCO (Voice Link) ver %s", VERSION);
BT_INFO("SCO socket layer initialized");
 
return 0;
 
-error:
+err_proto2:
+   if (hci_unregister_proto(&sco_hci_proto) < 0)
+   BT_ERR("SCO protocol unregistration failed");
+err_sock:
+   if (bt_sock_unregister(BTPROTO_SCO) < 0)
+   BT_ERR("SCO socket unregistration failed");
+err_proto:
proto_unregister(&sco_proto);
return err;
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PR

Re: [PATCH] Customizable TCP backoff patch

2006-10-11 Thread YOSHIFUJI Hideaki / 吉藤英明
In article <[EMAIL PROTECTED]> (at Wed, 11 Oct 2006 17:51:24 -0700), Ben 
Woodard <[EMAIL PROTECTED]> says:

> diff -ru linux-2.6.18/net/ipv4/sysctl_net_ipv4.c 
> linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c
> --- linux-2.6.18/net/ipv4/sysctl_net_ipv4.c   2006-09-19 20:42:06.0 
> -0700
> +++ linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c   2006-10-11 
> 16:00:37.0 -0700
> @@ -128,6 +128,8 @@
>   return ret;
>  }
>  
> +static unsigned long tcp_rto_min=0;
> +static unsigned long tcp_rto_max=65535;
>  
>  ctl_table ipv4_table[] = {
>  {
> @@ -697,6 +699,26 @@
>   .mode   = 0644,
>   .proc_handler   = &proc_dointvec
>   },
> + {
> + .ctl_name   = NET_TCP_RTO_MAX,
> + .procname   = "tcp_rto_max",
> + .data   = &sysctl_tcp_rto_max,
> + .maxlen = sizeof(unsigned),

sizeof(unsigned long)

> + .mode   = 0644,
> + .proc_handler   = &proc_doulongvec_ms_jiffies_minmax,
> + .extra1 = &tcp_rto_min_constant,
> + .extra2 = &tcp_rto_max_constant,
> + },
> + {
> + .ctl_name   = NET_TCP_RTO_INIT,
> + .procname   = "tcp_rto_init",
> + .data   = &sysctl_tcp_rto_init,
> + .maxlen = sizeof(unsigned),

sizeof(unsigned long)

> + .mode   = 0644,
> + .proc_handler   = &proc_doulongvec_ms_jiffies_minmax,
> + .extra1 = &tcp_rto_min_constant,
> + .extra2 = &tcp_rto_max_constant,
> + },
>   { .ctl_name = 0 }
>  };
>  
> diff -ru linux-2.6.18/net/ipv4/tcp.c linux-2.6.18.new/net/ipv4/tcp.c
> --- linux-2.6.18/net/ipv4/tcp.c   2006-09-19 20:42:06.0 -0700
> +++ linux-2.6.18.new/net/ipv4/tcp.c   2006-10-11 16:00:37.0 -0700
> @@ -1764,6 +1764,8 @@
>   return err;
>  }
>  
> +#define TCP_BACKOFF_MAXVAL 65535
> +
>  /*
>   *   Socket option code for TCP.
>   */
> @@ -1939,6 +1941,20 @@
>   }
>   break;
>  
> + case TCP_BACKOFF_MAX:
> + if (val < 1 || val > TCP_BACKOFF_MAXVAL)
> + err = -EINVAL;
> + else
> + tp->rto_max = val;
> + break;
> + 
> + case TCP_BACKOFF_INIT:
> + if (val < 1 || val > TCP_BACKOFF_MAXVAL)
> + err = -EINVAL;
> + else
> + tp->rto_init = val;
> + break;
> + 
>   default:
>   err = -ENOPROTOOPT;
>   break;
> @@ -2110,6 +2126,12 @@
>   if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
>   return -EFAULT;
>   return 0;
> + case TCP_BACKOFF_MAX:
> + val = jiffies_to_msecs(tcp_rto_max(tp));
> + break;

tp->rto_max

> + case TCP_BACKOFF_INIT:
> + val = jiffies_to_msecs(tcp_rto_init(tp));
> + break;
>   default:

tp->rto_init

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Customizable TCP backoff patch

2006-10-11 Thread Ben Woodard

Here we go again. The changes are as follows:
1) the spaces are gone and now it is tabs.
2) used msecs_to_jiffies() and jiffies_to_msecs(). I'm much happier with 
 that. I didn't know those functions existed.
3) changed over to proc_doulongvec_ms_jiffies_minmax(). Last night's 
test compile failed because I was out of disk. I figured that the code I 
changed had compiled. I guess not.


I really didn't understand Yoshifuji Hideaki's comment. (sorry)

-ben


Vlad Yasevich wrote:

Ben Woodard wrote:
 

diff -ru linux-2.6.18/include/net/tcp.h linux-2.6.18.new/include/net/tcp.h
--- linux-2.6.18/include/net/tcp.h  2006-09-19 20:42:06.0 -0700
+++ linux-2.6.18.new/include/net/tcp.h  2006-10-10 18:42:00.0 -0700
@@ -227,11 +227,23 @@
 extern int sysctl_tcp_base_mss;
 extern int sysctl_tcp_workaround_signed_windows;
 extern int sysctl_tcp_slow_start_after_idle;
+extern unsigned long sysctl_tcp_rto_max;
+extern unsigned long sysctl_tcp_rto_init;
 
 extern atomic_t tcp_memory_allocated;

 extern atomic_t tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
+static inline unsigned long tcp_rto_max(struct tcp_sock *tp)

+{
+return tp->rto_max ? tp->rto_max*HZ/1000 : sysctl_tcp_rto_max;

^^^
That should probably be msecs_to_jiffies(tp->rto_max)


+}
+
+static inline unsigned long tcp_rto_init(struct tcp_sock *tp)
+{
+return tp->rto_init ? tp->rto_init*HZ/1000 : sysctl_tcp_rto_init;


Ditto.


+}
+
 /*
  * The next routines deal with comparing 32 bit unsigned ints
  * and worry about wraparound (automatic with unsigned arithmetic).
diff -ru linux-2.6.18/net/ipv4/sysctl_net_ipv4.c 
linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c
--- linux-2.6.18/net/ipv4/sysctl_net_ipv4.c 2006-09-19 20:42:06.0 
-0700
+++ linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c 2006-10-10 16:32:08.0 
-0700
@@ -128,6 +128,8 @@
return ret;
 }
 
+static unsigned long tcp_rto_min=0;

+static unsigned long tcp_rto_max=65535;
 
 ctl_table ipv4_table[] = {

 {
@@ -697,6 +699,26 @@
.mode   = 0644,
.proc_handler   = &proc_dointvec
},
+   {
+   .ctl_name   = NET_TCP_RTO_MAX,
+   .procname   = "tcp_rto_max",
+   .data   = &sysctl_tcp_rto_max,
+   .maxlen = sizeof(unsigned),
+   .mode   = 0644,
+   .proc_handler   = &proc_doulongvec_ms_jiffies,
+   .extra1 = &tcp_rto_min_constant,
+   .extra2 = &tcp_rto_max_constant,
+   },
+   {
+   .ctl_name   = NET_TCP_RTO_INIT,
+   .procname   = "tcp_rto_init",
+   .data   = &sysctl_tcp_rto_init,
+   .maxlen = sizeof(unsigned),
+   .mode   = 0644,
+   .proc_handler   = &proc_doulongvec_ms_jiffies,
+   .extra1 = &tcp_rto_min_constant,
+   .extra2 = &tcp_rto_max_constant,
+   },
{ .ctl_name = 0 }
 };


Try as I might, I can't find proc_doulongvec_ms_jiffies() anywhere.  I think
you meant proc_doulongvec_ms_jiffies_minmax()?

Also, this function has a bug in that it doesn't do corrections for potentially 
different
HZ values.  Look at the msecs_to_jiffies...  When I've used 
proc_doulongvec_ms_jiffies_minmax()
before, I would seen rather interesting truncation errors such that sysctl 
input didn't match
sysctl output.

 
diff -ru linux-2.6.18/net/ipv4/tcp.c linux-2.6.18.new/net/ipv4/tcp.c

--- linux-2.6.18/net/ipv4/tcp.c 2006-09-19 20:42:06.0 -0700
+++ linux-2.6.18.new/net/ipv4/tcp.c 2006-10-10 18:37:40.0 -0700
@@ -1764,6 +1764,8 @@
return err;
 }
 
+#define TCP_BACKOFF_MAXVAL 65535

+
 /*
  * Socket option code for TCP.
  */
@@ -1939,6 +1941,21 @@
}
break;
 
+case TCP_BACKOFF_MAX:

+if (val < 1 || val > TCP_BACKOFF_MAXVAL)
+err = -EINVAL;
+else
+tp->rto_max = val;
+break;
+ 
+case TCP_BACKOFF_INIT:

+if (val < 1 || val > TCP_BACKOFF_MAXVAL)
+err = -EINVAL;
+else
+tp->rto_init = val;
+break;
+ 
+ 
 	default:

err = -ENOPROTOOPT;
break;
@@ -2110,6 +2127,12 @@
if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
return -EFAULT;
return 0;
+case TCP_BACKOFF_MAX:
+   val = tcp_rto_max(tp)*1000/HZ;
+break;
+case TCP_BACKOFF_INIT:
+   val = tcp_rto_init(tp)*1000/HZ;


The two divisions above introduce truncation errors such the input doesn't
match the output.  Better to use jiffies_t

Re: [PATCH] sotftmac: fix a slab corruption in WEP restricted key association

2006-10-11 Thread Daniel Drake

Laurent Riffard wrote:

Fix a slab corruption in ieee80211softmac_auth(). The size of a buffer
was miscomputed.

see http://bugzilla.kernel.org/show_bug.cgi?id=7245

Acked-by: Daniel Drake <[EMAIL PROTECTED]>"
Signed-off-by: Laurent Riffard <[EMAIL PROTECTED]>


Apparently the CC list didn't work out. Adding Johannes and netdev.


net/ieee80211/softmac/ieee80211softmac_io.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-mm/net/ieee80211/softmac/ieee80211softmac_io.c
===
--- linux-2.6-mm.orig/net/ieee80211/softmac/ieee80211softmac_io.c
+++ linux-2.6-mm/net/ieee80211/softmac/ieee80211softmac_io.c
@@ -304,7 +304,7 @@ ieee80211softmac_auth(struct ieee80211_a
2 +/* Auth Transaction Seq */
2 +/* Status Code */
 /* Challenge Text IE */
-is_shared_response ? 0 : 1 + 1 + net->challenge_len
+(is_shared_response ? 1 + 1 + net->challenge_len : 0)
);
if (unlikely((*pkt) == NULL))
return 0;



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] NetLabel: only deref the CIPSOv4 standard map fields when using standard mapping

2006-10-11 Thread paul . moore
From: Paul Moore <[EMAIL PROTECTED]>

Fix several places in the CIPSO code where it was dereferencing fields which
did not have valid pointers by moving those pointer dereferences into code
blocks where the pointers are valid.

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
---
 net/ipv4/cipso_ipv4.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

Index: net-2.6_bugfix_2/net/ipv4/cipso_ipv4.c
===
--- net-2.6_bugfix_2.orig/net/ipv4/cipso_ipv4.c
+++ net-2.6_bugfix_2/net/ipv4/cipso_ipv4.c
@@ -773,13 +773,15 @@ static int cipso_v4_map_cat_rbm_valid(co
 {
int cat = -1;
u32 bitmap_len_bits = bitmap_len * 8;
-   u32 cipso_cat_size = doi_def->map.std->cat.cipso_size;
-   u32 *cipso_array = doi_def->map.std->cat.cipso;
+   u32 cipso_cat_size;
+   u32 *cipso_array;
 
switch (doi_def->type) {
case CIPSO_V4_MAP_PASS:
return 0;
case CIPSO_V4_MAP_STD:
+   cipso_cat_size = doi_def->map.std->cat.cipso_size;
+   cipso_array = doi_def->map.std->cat.cipso;
for (;;) {
cat = cipso_v4_bitmap_walk(bitmap,
   bitmap_len_bits,
@@ -825,8 +827,8 @@ static int cipso_v4_map_cat_rbm_hton(con
u32 net_spot_max = 0;
u32 host_clen_bits = host_cat_len * 8;
u32 net_clen_bits = net_cat_len * 8;
-   u32 host_cat_size = doi_def->map.std->cat.local_size;
-   u32 *host_cat_array = doi_def->map.std->cat.local;
+   u32 host_cat_size;
+   u32 *host_cat_array;
 
switch (doi_def->type) {
case CIPSO_V4_MAP_PASS:
@@ -838,6 +840,8 @@ static int cipso_v4_map_cat_rbm_hton(con
memcpy(net_cat, host_cat, net_spot_max);
return net_spot_max;
case CIPSO_V4_MAP_STD:
+   host_cat_size = doi_def->map.std->cat.local_size;
+   host_cat_array = doi_def->map.std->cat.local;
for (;;) {
host_spot = cipso_v4_bitmap_walk(host_cat,
 host_clen_bits,
@@ -893,8 +897,8 @@ static int cipso_v4_map_cat_rbm_ntoh(con
int net_spot = -1;
u32 net_clen_bits = net_cat_len * 8;
u32 host_clen_bits = host_cat_len * 8;
-   u32 net_cat_size = doi_def->map.std->cat.cipso_size;
-   u32 *net_cat_array = doi_def->map.std->cat.cipso;
+   u32 net_cat_size;
+   u32 *net_cat_array;
 
switch (doi_def->type) {
case CIPSO_V4_MAP_PASS:
@@ -903,6 +907,8 @@ static int cipso_v4_map_cat_rbm_ntoh(con
memcpy(host_cat, net_cat, net_cat_len);
return net_cat_len;
case CIPSO_V4_MAP_STD:
+   net_cat_size = doi_def->map.std->cat.cipso_size;
+   net_cat_array = doi_def->map.std->cat.cipso;
for (;;) {
net_spot = cipso_v4_bitmap_walk(net_cat,
net_clen_bits,

--
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] NetLabel: better error handling involving mls_export_cat()

2006-10-11 Thread paul . moore
From: Paul Moore <[EMAIL PROTECTED]>

Upon inspection it looked like the error handling for mls_export_cat() was
rather poor.  This patch addresses this by NULL'ing out kfree()'d pointers
before returning and checking the return value of the function everywhere
it is called.

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
---
 security/selinux/ss/ebitmap.c  |8 ++--
 security/selinux/ss/mls.c  |   17 ++---
 security/selinux/ss/services.c |   18 ++
 3 files changed, 30 insertions(+), 13 deletions(-)

Index: net-2.6_bugfix_2/security/selinux/ss/ebitmap.c
===
--- net-2.6_bugfix_2.orig/security/selinux/ss/ebitmap.c
+++ net-2.6_bugfix_2/security/selinux/ss/ebitmap.c
@@ -93,11 +93,15 @@ int ebitmap_export(const struct ebitmap 
size_t bitmap_byte;
unsigned char bitmask;
 
+   if (src->highbit == 0) {
+   *dst = NULL;
+   *dst_len = 0;
+   return 0;
+   }
+
bitmap_len = src->highbit / 8;
if (src->highbit % 7)
bitmap_len += 1;
-   if (bitmap_len == 0)
-   return -EINVAL;
 
bitmap = kzalloc((bitmap_len & ~(sizeof(MAPTYPE) - 1)) +
 sizeof(MAPTYPE),
Index: net-2.6_bugfix_2/security/selinux/ss/mls.c
===
--- net-2.6_bugfix_2.orig/security/selinux/ss/mls.c
+++ net-2.6_bugfix_2/security/selinux/ss/mls.c
@@ -640,8 +640,13 @@ int mls_export_cat(const struct context 
 {
int rc = -EPERM;
 
-   if (!selinux_mls_enabled)
+   if (!selinux_mls_enabled) {
+   *low = NULL;
+   *low_len = 0;
+   *high = NULL;
+   *high_len = 0;
return 0;
+   }
 
if (low != NULL) {
rc = ebitmap_export(&context->range.level[0].cat,
@@ -661,10 +666,16 @@ int mls_export_cat(const struct context 
return 0;
 
 export_cat_failure:
-   if (low != NULL)
+   if (low != NULL) {
kfree(*low);
-   if (high != NULL)
+   *low = NULL;
+   *low_len = 0;
+   }
+   if (high != NULL) {
kfree(*high);
+   *high = NULL;
+   *high_len = 0;
+   }
return rc;
 }
 
Index: net-2.6_bugfix_2/security/selinux/ss/services.c
===
--- net-2.6_bugfix_2.orig/security/selinux/ss/services.c
+++ net-2.6_bugfix_2/security/selinux/ss/services.c
@@ -2399,31 +2399,33 @@ static int selinux_netlbl_socket_setsid(
if (!ss_initialized)
return 0;
 
+   netlbl_secattr_init(&secattr);
+
POLICY_RDLOCK;
 
ctx = sidtab_search(&sidtab, sid);
if (ctx == NULL)
goto netlbl_socket_setsid_return;
 
-   netlbl_secattr_init(&secattr);
secattr.domain = kstrdup(policydb.p_type_val_to_name[ctx->type - 1],
 GFP_ATOMIC);
mls_export_lvl(ctx, &secattr.mls_lvl, NULL);
secattr.mls_lvl_vld = 1;
-   mls_export_cat(ctx,
-  &secattr.mls_cat,
-  &secattr.mls_cat_len,
-  NULL,
-  NULL);
+   rc = mls_export_cat(ctx,
+   &secattr.mls_cat,
+   &secattr.mls_cat_len,
+   NULL,
+   NULL);
+   if (rc != 0)
+   goto netlbl_socket_setsid_return;
 
rc = netlbl_socket_setattr(sock, &secattr);
if (rc == 0)
sksec->nlbl_state = NLBL_LABELED;
 
-   netlbl_secattr_destroy(&secattr);
-
 netlbl_socket_setsid_return:
POLICY_RDUNLOCK;
+   netlbl_secattr_destroy(&secattr);
return rc;
 }
 

--
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] NetLabel: the CIPSOv4 passthrough mapping does not pass categories correctly

2006-10-11 Thread paul . moore
From: Paul Moore <[EMAIL PROTECTED]>

The CIPSO passthrough mapping had a problem when sending categories which
would cause no or incorrect categories to be sent on the wire with a packet.
This patch fixes the problem which was a simple off-by-one bug.

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
---
 net/ipv4/cipso_ipv4.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: net-2.6_bugfix_2/net/ipv4/cipso_ipv4.c
===
--- net-2.6_bugfix_2.orig/net/ipv4/cipso_ipv4.c
+++ net-2.6_bugfix_2/net/ipv4/cipso_ipv4.c
@@ -832,8 +832,8 @@ static int cipso_v4_map_cat_rbm_hton(con
 
switch (doi_def->type) {
case CIPSO_V4_MAP_PASS:
-   net_spot_max = host_cat_len - 1;
-   while (net_spot_max > 0 && host_cat[net_spot_max] == 0)
+   net_spot_max = host_cat_len;
+   while (net_spot_max > 0 && host_cat[net_spot_max - 1] == 0)
net_spot_max--;
if (net_spot_max > net_cat_len)
return -EINVAL;

--
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Collection of small NetLabel bugfixes

2006-10-11 Thread paul . moore
When doing some more testing today I ran into a few bugs, this patchset
addresses those bugs.  This patchset is backed against today's net-2.6 git
tree.

Please apply these patches for 2.6.19, thanks.

--
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/10] sky2: MSI test is only a warning

2006-10-11 Thread Stephen Hemminger
Some motherboards don't implement MSI correctly. The driver handles this
but the warning is too verbose and overly cautious.


Signed-off-by: Stephe Hemminger <[EMAIL PROTECTED]>


--- sky2.orig/drivers/net/sky2.c2006-10-11 11:43:08.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 11:43:33.0 -0700
@@ -3326,9 +3326,8 @@
 
if (!hw->msi_detected) {
/* MSI test failed, go back to INTx mode */
-   printk(KERN_WARNING PFX "%s: No interrupt was generated using 
MSI, "
-  "switching to INTx mode. Please report this failure to "
-  "the PCI maintainer and include system chipset 
information.\n",
+   printk(KERN_INFO PFX "%s: No interrupt generated using MSI, "
+  "switching to INTx mode.\n",
   pci_name(pdev));
 
err = -EOPNOTSUPP;
@@ -3336,6 +3335,7 @@
}
 
sky2_write32(hw, B0_IMSK, 0);
+   sky2_read32(hw, B0_IMSK);
 
free_irq(pdev->irq, hw);
 

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Stephen Hemminger
O
> > 
> > You might want to try ignoring the check in dev.c and testing
> > to see if there is a performance gain.  It wouldn't be hard to test
> > a modified version and validate the performance change.
> 
> Yes. With my patch, there is a huge performance gain by increasing MTU to 64K.
> And it seems the only way to do this is by S/G.
> 
> > You could even do what I suggested and use skb_checksum_help()
> > to do inplace checksumming, as a performance test.
> 
> I can. But as network algorithmics says (chapter 5)
> "Since such bus reads are expensive, the CPU might as well piggyback
> the checksum computation with the copy process".
> 
> It speaks about onboard the adapter buffers, but memory bus reads are also 
> much slower
> than CPU nowdays.  So I think even if this works well in benchmark in real 
> life
> single copy should better.
> 

The other alternative might be to make copy/checksum code smarter about using
fragments rather than allocating a large buffer. It should avoid second order
allocations (effective size > PAGESIZE).

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] sky2: no message on rx fifo overflow

2006-10-11 Thread Stephen Hemminger
Under high load it is possible to make the receiver FIFO get overloaded.
The driver/hardware recover properly, so there is no reason to fill the log
with lots of extra messages, just update counter.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

--- sky2.orig/drivers/net/sky2.c2006-10-11 12:05:46.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 12:05:54.0 -0700
@@ -2014,6 +2014,10 @@
 
 error:
++sky2->net_stats.rx_errors;
+   if (status & GMR_FS_RX_FF_OV) {
+   sky2->net_stats.rx_fifo_errors++;
+   goto resubmit;
+   }
 
if (netif_msg_rx_err(sky2) && net_ratelimit())
printk(KERN_INFO PFX "%s: rx error, status 0x%x length %d\n",
@@ -2025,8 +2029,6 @@
sky2->net_stats.rx_frame_errors++;
if (status & GMR_FS_CRC_ERR)
sky2->net_stats.rx_crc_errors++;
-   if (status & GMR_FS_RX_FF_OV)
-   sky2->net_stats.rx_fifo_errors++;
 
goto resubmit;
 }

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] sky2: use duplex result bits

2006-10-11 Thread Stephen Hemminger
The result of duplex negotiation is avaliable in the phy status
register, so use that to simplify code and avoid rereading the PHY.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |   21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

--- sky2.orig/drivers/net/sky2.c2006-10-11 11:43:47.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 11:43:50.0 -0700
@@ -1714,26 +1714,7 @@
}
 
sky2->speed = sky2_phy_speed(hw, aux);
-   if (sky2->speed == SPEED_1000) {
-   u16 ctl2 = gm_phy_read(hw, port, PHY_MARV_1000T_CTRL);
-   u16 lpa2 = gm_phy_read(hw, port, PHY_MARV_1000T_STAT);
-   if (lpa2  & PHY_B_1000S_MSF) {
-   printk(KERN_ERR PFX "%s: master/slave fault",
-  sky2->netdev->name);
-   return -1;
-   }
-
-   if ((ctl2 & PHY_M_1000C_AFD) && (lpa2 & PHY_B_1000S_LP_FD))
-   sky2->duplex = DUPLEX_FULL;
-   else
-   sky2->duplex = DUPLEX_HALF;
-   } else {
-   u16 adv = gm_phy_read(hw, port, PHY_MARV_AUNE_ADV);
-   if ((aux & adv) & PHY_AN_FULL)
-   sky2->duplex = DUPLEX_FULL;
-   else
-   sky2->duplex = DUPLEX_HALF;
-   }
+   sky2->duplex = (aux & PHY_M_PS_FULL_DUP) ? DUPLEX_FULL : DUPLEX_HALF;
 
/* Pause bits are offset (9..8) */
if (hw->chip_id == CHIP_ID_YUKON_XL || hw->chip_id == 
CHIP_ID_YUKON_EC_U)

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. Stephen Hemminger <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> 
> On Wed, 11 Oct 2006 21:11:38 +0100
> Steven Whitehouse <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
> > > Quoting Steven Whitehouse <[EMAIL PROTECTED]>:
> > > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int 
> > > > > offset,
> > > > >  size_t size, int flags)
> > > > > {
> > > > > ssize_t res;
> > > > > struct sock *sk = sock->sk;
> > > > > 
> > > > > if (!(sk->sk_route_caps & NETIF_F_SG) ||
> > > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> > > > > return sock_no_sendpage(sock, page, offset, size, 
> > > > > flags);
> > > > > 
> > > > > 
> > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > > > > data will be copied over rather than sent directly.
> > > > > So why does dev.c have to force set NETIF_F_SG to off then?
> > > > >
> > > > I agree with that analysis,
> > > 
> > > So, would you Ack something like the following then?
> > >
> > 
> > In so far as I'm able to ack it, then yes, but with the following
> > caveats: that you also need to look at the tcp code's checks for
> > NETIF_F_SG (aside from the interface to tcp_sendpage which I think
> > we've agreed is ok) and ensure that this patch will not change their
> > behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
> > in particular - there may be others but thats the only one I can think
> > of off the top of my head. I think this is what davem was getting at
> > with his comment about copy & sum for smaller packets.
> > 
> > Also all subject to approval by davem and shemminger of course :-)
> > 
> > My general feeling is that devices should advertise the features that
> > they actually have and that the protocols should make the decision
> > as to which ones to use or not depending on the combinations available
> > (which I think is pretty much your argument).
> > 
> > Steve.
> > 
> 
> You might want to try ignoring the check in dev.c and testing
> to see if there is a performance gain.  It wouldn't be hard to test
> a modified version and validate the performance change.

Yes. With my patch, there is a huge performance gain by increasing MTU to 64K.
And it seems the only way to do this is by S/G.

> You could even do what I suggested and use skb_checksum_help()
> to do inplace checksumming, as a performance test.

I can. But as network algorithmics says (chapter 5)
"Since such bus reads are expensive, the CPU might as well piggyback
the checksum computation with the copy process".

It speaks about onboard the adapter buffers, but memory bus reads are also much 
slower
than CPU nowdays.  So I think even if this works well in benchmark in real life
single copy should better.

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] sky2: phy irq on shutdown

2006-10-11 Thread Stephen Hemminger
When PHY is turned off on shutdown, it causes the IRQ to get stuck on.
Make sure and disable the IRQ first, and if IRQ occurs when device
is not running, don't access PHY because that will hang.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- sky2.orig/drivers/net/sky2.c2006-10-11 11:43:40.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 11:43:43.0 -0700
@@ -1499,6 +1499,11 @@
/* Stop more packets from being queued */
netif_stop_queue(dev);
 
+   /* Disable port IRQ */
+   imask = sky2_read32(hw, B0_IMSK);
+   imask &= ~portirq_msk[port];
+   sky2_write32(hw, B0_IMSK, imask);
+
sky2_gmac_reset(hw, port);
 
/* Stop transmitter */
@@ -1549,11 +1554,6 @@
sky2_write8(hw, SK_REG(port, RX_GMF_CTRL_T), GMF_RST_SET);
sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_SET);
 
-   /* Disable port IRQ */
-   imask = sky2_read32(hw, B0_IMSK);
-   imask &= ~portirq_msk[port];
-   sky2_write32(hw, B0_IMSK, imask);
-
sky2_phy_power(hw, port, 0);
 
/* turn off LED's */
@@ -1750,13 +1750,13 @@
struct sky2_port *sky2 = netdev_priv(dev);
u16 istatus, phystat;
 
+   if (!netif_running(dev))
+   return;
+
spin_lock(&sky2->phy_lock);
istatus = gm_phy_read(hw, port, PHY_MARV_INT_STAT);
phystat = gm_phy_read(hw, port, PHY_MARV_PHY_STAT);
 
-   if (!netif_running(dev))
-   goto out;
-
if (netif_msg_intr(sky2))
printk(KERN_INFO PFX "%s: phy interrupt status 0x%x 0x%x\n",
   sky2->netdev->name, istatus, phystat);

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/10] sky2: flow control setting fixes

2006-10-11 Thread Stephen Hemminger
The result of flow control negotiation should not limit the next
negotiatition. If board is plugged into an old half duplex 10Mbit port,
without pause, then replugged into a gigabit port, it should negotiate
what is desired, not inherit that last negotiation.

P.s: many other drivers have this bug.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |  132 ++---
 drivers/net/sky2.h |   13 -
 2 files changed, 86 insertions(+), 59 deletions(-)

--- sky2.orig/drivers/net/sky2.c2006-10-11 12:00:37.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 12:05:46.0 -0700
@@ -284,6 +284,31 @@
gma_write16(hw, port, GM_RX_CTRL, reg);
 }
 
+/* flow control to advertise bits */
+static const u16 copper_fc_adv[] = {
+   [FC_NONE]   = 0,
+   [FC_TX] = PHY_M_AN_ASP,
+   [FC_RX] = PHY_M_AN_PC,
+   [FC_BOTH]   = PHY_M_AN_PC | PHY_M_AN_ASP,
+};
+
+/* flow control to advertise bits when using 1000BaseX */
+static const u16 fiber_fc_adv[] = {
+   [FC_BOTH] = PHY_M_P_BOTH_MD_X,
+   [FC_TX]   = PHY_M_P_ASYM_MD_X,
+   [FC_RX]   = PHY_M_P_SYM_MD_X,
+   [FC_NONE] = PHY_M_P_NO_PAUSE_X,
+};
+
+/* flow control to GMA disable bits */
+static const u16 gm_fc_disable[] = {
+   [FC_NONE] = GM_GPCR_FC_RX_DIS | GM_GPCR_FC_TX_DIS,
+   [FC_TX]   = GM_GPCR_FC_RX_DIS,
+   [FC_RX]   = GM_GPCR_FC_TX_DIS,
+   [FC_BOTH] = 0,
+};
+
+
 static void sky2_phy_init(struct sky2_hw *hw, unsigned port)
 {
struct sky2_port *sky2 = netdev_priv(hw->dev[port]);
@@ -376,29 +401,14 @@
if (sky2->advertising & ADVERTISED_10baseT_Half)
adv |= PHY_M_AN_10_HD;
 
-   /* desired flow control */
-   if (sky2->tx_pause && sky2->rx_pause)   /* both */
-   adv |= PHY_M_AN_PC | PHY_M_AN_ASP;
-   else if (sky2->tx_pause)
-   adv |= PHY_M_AN_ASP;
-   else if (sky2->rx_pause)
-   adv |= PHY_M_AN_PC;
-
-
+   adv |= copper_fc_adv[sky2->flow_mode];
} else {/* special defines for FIBER (88E1040S only) */
if (sky2->advertising & ADVERTISED_1000baseT_Full)
adv |= PHY_M_AN_1000X_AFD;
if (sky2->advertising & ADVERTISED_1000baseT_Half)
adv |= PHY_M_AN_1000X_AHD;
 
-   if (sky2->tx_pause && sky2->rx_pause)   /* both */
-   adv |= PHY_M_P_BOTH_MD_X;
-   else if (sky2->tx_pause)
-   adv |= PHY_M_P_ASYM_MD_X;
-   else if (sky2->rx_pause)
-   adv |= PHY_M_P_SYM_MD_X;
-   else
-   adv |= PHY_M_P_NO_PAUSE_X;
+   adv |= fiber_fc_adv[sky2->flow_mode];
}
 
/* Restart Auto-negotiation */
@@ -424,20 +434,14 @@
if (sky2->duplex == DUPLEX_FULL) {
reg |= GM_GPCR_DUP_FULL;
ctrl |= PHY_CT_DUP_MD;
-   } else if (sky2->speed != SPEED_1000 && hw->chip_id != 
CHIP_ID_YUKON_EC_U) {
-   /* Turn off flow control for 10/100mbps */
-   sky2->rx_pause = 0;
-   sky2->tx_pause = 0;
-   }
+   } else if (sky2->speed < SPEED_1000)
+   sky2->flow_mode = FC_NONE;
 
-   if (!sky2->rx_pause)
-   reg |= GM_GPCR_FC_RX_DIS;
 
-   if (!sky2->tx_pause)
-   reg |= GM_GPCR_FC_TX_DIS;
+   reg |= gm_fc_disable[sky2->flow_mode];
 
/* Forward pause packets to GMAC? */
-   if (sky2->tx_pause || sky2->rx_pause)
+   if (sky2->flow_mode & FC_RX)
sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_ON);
else
sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_OFF);
@@ -1605,6 +1609,12 @@
struct sky2_hw *hw = sky2->hw;
unsigned port = sky2->port;
u16 reg;
+   static const char *fc_name[] = {
+   [FC_NONE]   = "none",
+   [FC_TX] = "tx",
+   [FC_RX] = "rx",
+   [FC_BOTH]   = "both",
+   };
 
/* enable Rx/Tx */
reg = gma_read16(hw, port, GM_GP_CTRL);
@@ -1648,8 +1658,7 @@
   "%s: Link is up at %d Mbps, %s duplex, flow control 
%s\n",
   sky2->netdev->name, sky2->speed,
   sky2->duplex == DUPLEX_FULL ? "full" : "half",
-  (sky2->tx_pause && sky2->rx_pause) ? "both" :
-  sky2->tx_paus

Re: [Bug 7296] New: Connected networks persist in routing table despite no IF_RUNNING

2006-10-11 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 11 Oct 2006 14:12:13 -0700

> I rejected this because I don't consider "sticky routes" when link is down
> a bug.

This definitely sounds like it is in the realm of userspace policy.
The routing daemon should adjust the routes, as appropriate, if
a link loses it's IF_RUNNING and it wants to use a backup route.

IFF_RUNNING does not cause routes to get flushed, only IFF_UP events
do.

Changing routes based upon IFF_RUNNING state is a userland policy
issue.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] sky2: advertising register 16 bits

2006-10-11 Thread Stephen Hemminger
The advertising bits (from ethtool.h) fit in 16 bits.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- sky2.orig/drivers/net/sky2.h2006-10-11 11:56:25.0 -0700
+++ sky2/drivers/net/sky2.h 2006-10-11 11:56:42.0 -0700
@@ -1860,7 +1860,7 @@
 
dma_addr_t   rx_le_map;
dma_addr_t   tx_le_map;
-   u32  advertising;   /* ADVERTISED_ bits */
+   u16  advertising;   /* ADVERTISED_ bits */
u16  speed; /* SPEED_1000, SPEED_100, ... */
u8   autoneg;   /* AUTONEG_ENABLE, AUTONEG_DISABLE */
u8   duplex;/* DUPLEX_HALF, DUPLEX_FULL */

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/10] sky2: version 1.9 patches

2006-10-11 Thread Stephen Hemminger
Several little patches to sky2 driver. Mostly having to do with
PHY handling and flow control.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/10] sky2: fiber pause bits

2006-10-11 Thread Stephen Hemminger
The advertisement bits for flow control are located in
different location on fiber (1000baseX)

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- sky2.orig/drivers/net/sky2.c2006-10-11 11:43:43.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 11:43:47.0 -0700
@@ -384,20 +384,31 @@
adv |= PHY_M_AN_10_FD;
if (sky2->advertising & ADVERTISED_10baseT_Half)
adv |= PHY_M_AN_10_HD;
+
+   /* desired flow control */
+   if (sky2->tx_pause && sky2->rx_pause)   /* both */
+   adv |= PHY_M_AN_PC | PHY_M_AN_ASP;
+   else if (sky2->tx_pause)
+   adv |= PHY_M_AN_ASP;
+   else if (sky2->rx_pause)
+   adv |= PHY_M_AN_PC;
+
+
} else {/* special defines for FIBER (88E1040S only) */
if (sky2->advertising & ADVERTISED_1000baseT_Full)
adv |= PHY_M_AN_1000X_AFD;
if (sky2->advertising & ADVERTISED_1000baseT_Half)
adv |= PHY_M_AN_1000X_AHD;
-   }
 
-   /* Set Flow-control capabilities */
-   if (sky2->tx_pause && sky2->rx_pause)
-   adv |= PHY_AN_PAUSE_CAP;/* symmetric */
-   else if (sky2->rx_pause && !sky2->tx_pause)
-   adv |= PHY_AN_PAUSE_ASYM | PHY_AN_PAUSE_CAP;
-   else if (!sky2->rx_pause && sky2->tx_pause)
-   adv |= PHY_AN_PAUSE_ASYM;   /* local */
+   if (sky2->tx_pause && sky2->rx_pause)   /* both */
+   adv |= PHY_M_P_BOTH_MD_X;
+   else if (sky2->tx_pause)
+   adv |= PHY_M_P_ASYM_MD_X;
+   else if (sky2->rx_pause)
+   adv |= PHY_M_P_SYM_MD_X;
+   else
+   adv |= PHY_M_P_NO_PAUSE_X;
+   }
 
/* Restart Auto-negotiation */
ctrl |= PHY_CT_ANE | PHY_CT_RE_CFG;

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] sky2: version 1.9

2006-10-11 Thread Stephen Hemminger
Mark version, this has been a lot of patches.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- sky2.orig/drivers/net/sky2.c2006-10-11 14:19:14.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 14:19:23.0 -0700
@@ -50,7 +50,7 @@
 #include "sky2.h"
 
 #define DRV_NAME   "sky2"
-#define DRV_VERSION"1.9"
+#define DRV_VERSION"1.10"
 #define PFXDRV_NAME " "
 
 /*

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [Bug 7296] New: Connected networks persist in routing table despite no IF_RUNNING

2006-10-11 Thread Stephen Hemminger
I rejected this because I don't consider "sticky routes" when link is down
a bug.

Date: Tue, 10 Oct 2006 02:17:26 -0700
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bug 7296] New: Connected networks persist in routing table despite no 
IF_RUNNING


http://bugzilla.kernel.org/show_bug.cgi?id=7296

   Summary: Connected networks persist in routing table despite no
IF_RUNNING
Kernel Version: 2.6.18
Status: NEW
  Severity: normal
 Owner: [EMAIL PROTECTED]
 Submitter: [EMAIL PROTECTED]


Most recent kernel where this bug did not occur: All affected AFAIK
Distribution: Debian Sarge
Hardware Environment: Dual (Intel) Ethernet, PPro, SATA RAID, etc
:02:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 
0d) 
:02:07.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 
0d)
Software Environment: K:2.6.18 and more. (Quagga: 0.98.3-7.2)

Problem Description:

Introduction

Our servers have two ethernet interfaces to offer resiliency.
They run a routing protocol (OSPF) in order to learn exit paths via either of
these interfaces. 
These routing protocols are independant to this problem, however from experience
the problem curtails the effectiveness of any such protocols in resilient
architectures. 
Therefore with growing interest shown in utilising Linux as a router, IMO this
should be considered a fairly serious bug.

Problem
---
Whilst the nature of my specific topology is rather complex, I can define the
problem generally as:

A "connected" IP network whose interface subsequently loses its IF_RUNNING flag
(ie unusable for routing), persists in the Linux routing table apparently as a
"kernel" route.

In the case of running routing protocols on dual interface servers this
connected network is likely, following the destruction of the connected
interface, to be reachable via the other.
If the route persists however, it will never use this alternate path and black
hole any traffic by routing it via a dead interface.

See below example:
--
# ifconfig
eth0  Link encap:Ethernet  HWaddr 00:20:ED:35:D4:C8
  inet addr:192.168.0.143  Bcast:192.168.0.191  Mask:255.255.255.192
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1  Link encap:Ethernet  HWaddr 00:20:ED:35:D4:C9
  inet addr:192.168.0.207  Bcast:192.168.0.255  Mask:255.255.255.192
  UP BROADCAST MULTICAST  MTU:1500  Metric:1

# ip route show
192.168.0.128/26 dev eth0  proto kernel  scope link  src 192.168.0.143
192.168.0.192/26 dev eth1  proto kernel  scope link  src 192.168.0.207  <-- NO!!
192.168.0.192/26 via 192.168.0.130 dev eth0  proto zebra  metric 60 equalize

# ping {anything else on 192.168.0.192/26}


The path for 192.168.0.192 is learned via 192.168.0.130 (current ospf dr -
irrelevant), but it'll never use it because of the kernel sourced directly
connected route still sitting in there. 
If I then IFDOWN eth1, everything is fine and the route withdraws correctly, but
this kinda defeats the object of dynamic routing =:/

Not sure whether this is a "driver tells the kernel" or a "kernel checks the
driver at {n} intervals" issue - I would suggest the former would be more
correct, but it is a problem regardless.

Steps to reproduce:

Ingredients
---
1 x Linux (mine is Debian sarge but it may affect all kernels)
1 x Dual Ethernet server (see Intel cards above for my exact build)
1 x Switch or Hub or similar (cheap will do)

Connect the Server ports to the switch ports so the server interfaces are
RUNNING in ifconfig

On the server, configure:
eth0 as 192.168.0.1/24
eth1 as 192.168.1.1/24

NOW
"ip route show"

Pull the plug on the eth0 interface, or admin shut it on the SWITCH end if
you're a ciscoite.

If buggy then "ip route show" will still show the network 192.168.0.0/24 in the
routing table. It really shouldn't do this - if a path to this network had been
learned via a protocol via the other interface, it would never use that path
because the network persists as directly connected and thus supercedes any
dynamic path in true cisco style.

Try ifdown, the route will withdraw correctly, it needs to do that if IF_RUNNING
goes away.

Rgds
Shaun Kemp

--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] sky2: dont reset PHY twice

2006-10-11 Thread Stephen Hemminger
Don't need to reset PHY twice on startup.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- sky2.orig/drivers/net/sky2.c2006-10-11 11:59:45.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 14:21:36.0 -0700
@@ -356,16 +356,7 @@
gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg);
}
 
-   ctrl = gm_phy_read(hw, port, PHY_MARV_CTRL);
-   if (sky2->autoneg == AUTONEG_DISABLE)
-   ctrl &= ~PHY_CT_ANE;
-   else
-   ctrl |= PHY_CT_ANE;
-
-   ctrl |= PHY_CT_RESET;
-   gm_phy_write(hw, port, PHY_MARV_CTRL, ctrl);
-
-   ctrl = 0;
+   ctrl = PHY_CT_RESET;
ct1000 = 0;
adv = PHY_AN_CSMA;
reg = 0;
@@ -450,8 +441,6 @@
sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_ON);
else
sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_OFF);
-
-   ctrl |= PHY_CT_RESET;
}
 
gma_write16(hw, port, GM_GP_CTRL, reg);

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. David Miller <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> 
> From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
> Date: Wed, 11 Oct 2006 17:01:03 +0200
> 
> > Quoting Steven Whitehouse <[EMAIL PROTECTED]>:
> > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
> > > >  size_t size, int flags)
> > > > {
> > > > ssize_t res;
> > > > struct sock *sk = sock->sk;
> > > > 
> > > > if (!(sk->sk_route_caps & NETIF_F_SG) ||
> > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> > > > return sock_no_sendpage(sock, page, offset, size, 
> > > > flags);
> > > > 
> > > > 
> > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > > > data will be copied over rather than sent directly.
> > > > So why does dev.c have to force set NETIF_F_SG to off then?
> > > >
> > > I agree with that analysis,
> > 
> > So, would you Ack something like the following then?
> 
> I certainly don't.
> 

Dave, sorry, you lost me.
Would a new flag NETIF_F_SW_CSUM be better, to tell network core that device
computes checksums in software, so we should piggyback the checksum
computation with the copy process if possible?

Or is there some other issue?

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. Stephen Hemminger <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> 
> O
> > > 
> > > You might want to try ignoring the check in dev.c and testing
> > > to see if there is a performance gain.  It wouldn't be hard to test
> > > a modified version and validate the performance change.
> > 
> > Yes. With my patch, there is a huge performance gain by increasing MTU to 
> > 64K.
> > And it seems the only way to do this is by S/G.
> > 
> > > You could even do what I suggested and use skb_checksum_help()
> > > to do inplace checksumming, as a performance test.
> > 
> > I can. But as network algorithmics says (chapter 5)
> > "Since such bus reads are expensive, the CPU might as well piggyback
> > the checksum computation with the copy process".
> > 
> > It speaks about onboard the adapter buffers, but memory bus reads are also 
> > much slower
> > than CPU nowdays.  So I think even if this works well in benchmark in real 
> > life
> > single copy should better.
> > 
> 
> The other alternative might be to make copy/checksum code smarter about using
> fragments rather than allocating a large buffer. It should avoid second order
> allocations (effective size > PAGESIZE).
 
In my testing, it seems quite smart already - once I declare F_SG and clear 
F_...CSUM
I start getting properly checksummed packets with lots of s/g fragments.
But I'm not catching the drift - an alternative to what this would be?

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
Date: Wed, 11 Oct 2006 23:23:39 +0200

> With my patch, there is a huge performance gain by increasing MTU to 64K.
> And it seems the only way to do this is by S/G.

Numbers?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/10] sky2: turn of workaround timer

2006-10-11 Thread Stephen Hemminger
The workaround timer is not needed in most systems with proper IRQ
routing and by perodically waking up it adds to laptop power consumption.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 drivers/net/sky2.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- sky2.orig/drivers/net/sky2.c2006-10-11 11:43:33.0 -0700
+++ sky2/drivers/net/sky2.c 2006-10-11 11:43:40.0 -0700
@@ -96,9 +96,9 @@
 module_param(disable_msi, int, 0);
 MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");
 
-static int idle_timeout = 100;
+static int idle_timeout = 0;
 module_param(idle_timeout, int, 0);
-MODULE_PARM_DESC(idle_timeout, "Idle timeout workaround for lost interrupts 
(ms)");
+MODULE_PARM_DESC(idle_timeout, "Watchdog timer for lost interrupts (ms)");
 
 static const struct pci_device_id sky2_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) },

--
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing

2006-10-11 Thread Benjamin Herrenschmidt

> I started writingthe patch thinking it will have some huge effect on
> performance, based on a false assumption on how i/o was done on this
> machine
> 
> *If* this were another pSeries system, then each call to 
> pci_map_single() chews up an actual hardware "translation 
> control entry" (TCE) that maps pci bus addresses into 
> system RAM addresses. These are somewhat limited resources,
> and so one shouldn't squander them.  Furthermore, I thouhght
> TCE's have TLB's associated with them (similar to how virtual
> memory page tables are backed by hardware page TLB's), of which 
> there are even less of. I was thinking that TLB thrashing would 
> have a big hit on performance. 
> 
> Turns out that there was no difference to performance at all, 
> and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell
> made it clear why: there's no fancy i/o address mapping.
> 
> Thus, the patch has only mrginal benefit; I submit it only in the 
> name of "its the right thing to do anyway".

Well, there is no fancy iommu mapping ... yet.

It's been implemented and is coming after we put together some
workarounds for various other spider hardware issues that trigger when
using it (bogus prefetches and bogus pci ordering).

I think the hypervisor based platforms will be happy with that patch
too.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Stephen Hemminger
On Wed, 11 Oct 2006 21:11:38 +0100
Steven Whitehouse <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
> > Quoting Steven Whitehouse <[EMAIL PROTECTED]>:
> > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
> > > >  size_t size, int flags)
> > > > {
> > > > ssize_t res;
> > > > struct sock *sk = sock->sk;
> > > > 
> > > > if (!(sk->sk_route_caps & NETIF_F_SG) ||
> > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> > > > return sock_no_sendpage(sock, page, offset, size, 
> > > > flags);
> > > > 
> > > > 
> > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > > > data will be copied over rather than sent directly.
> > > > So why does dev.c have to force set NETIF_F_SG to off then?
> > > >
> > > I agree with that analysis,
> > 
> > So, would you Ack something like the following then?
> >
> 
> In so far as I'm able to ack it, then yes, but with the following
> caveats: that you also need to look at the tcp code's checks for
> NETIF_F_SG (aside from the interface to tcp_sendpage which I think
> we've agreed is ok) and ensure that this patch will not change their
> behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
> in particular - there may be others but thats the only one I can think
> of off the top of my head. I think this is what davem was getting at
> with his comment about copy & sum for smaller packets.
> 
> Also all subject to approval by davem and shemminger of course :-)
> 
> My general feeling is that devices should advertise the features that
> they actually have and that the protocols should make the decision
> as to which ones to use or not depending on the combinations available
> (which I think is pretty much your argument).
> 
> Steve.
> 

You might want to try ignoring the check in dev.c and testing
to see if there is a performance gain.  It wouldn't be hard to test
a modified version and validate the performance change.

You could even do what I suggested and use skb_checksum_help()
to do inplace checksumming, as a performance test.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. Steven Whitehouse <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> 
> Hi,
> 
> On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
> > Quoting Steven Whitehouse <[EMAIL PROTECTED]>:
> > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
> > > >  size_t size, int flags)
> > > > {
> > > > ssize_t res;
> > > > struct sock *sk = sock->sk;
> > > > 
> > > > if (!(sk->sk_route_caps & NETIF_F_SG) ||
> > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> > > > return sock_no_sendpage(sock, page, offset, size, 
> > > > flags);
> > > > 
> > > > 
> > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > > > data will be copied over rather than sent directly.
> > > > So why does dev.c have to force set NETIF_F_SG to off then?
> > > >
> > > I agree with that analysis,
> > 
> > So, would you Ack something like the following then?
> >
> 
> In so far as I'm able to ack it, then yes, but with the following
> caveats: that you also need to look at the tcp code's checks for
> NETIF_F_SG (aside from the interface to tcp_sendpage which I think
> we've agreed is ok) and ensure that this patch will not change their
> behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
> in particular - there may be others but thats the only one I can think
> of off the top of my head. I think this is what davem was getting at
> with his comment about copy & sum for smaller packets.

Will do - thanks for the tips.

> Also all subject to approval by davem and shemminger of course :-)

Goes without saying :)

> My general feeling is that devices should advertise the features that
> they actually have and that the protocols should make the decision
> as to which ones to use or not depending on the combinations available
> (which I think is pretty much your argument).
> 
> Steve.

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
Date: Wed, 11 Oct 2006 17:01:03 +0200

> Quoting Steven Whitehouse <[EMAIL PROTECTED]>:
> > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
> > >  size_t size, int flags)
> > > {
> > > ssize_t res;
> > > struct sock *sk = sock->sk;
> > > 
> > > if (!(sk->sk_route_caps & NETIF_F_SG) ||
> > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> > > return sock_no_sendpage(sock, page, offset, size, flags);
> > > 
> > > 
> > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > > data will be copied over rather than sent directly.
> > > So why does dev.c have to force set NETIF_F_SG to off then?
> > >
> > I agree with that analysis,
> 
> So, would you Ack something like the following then?

I certainly don't.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] d80211: extend extra_hdr_room to be a bytecount

2006-10-11 Thread Michael Buesch
On Wednesday 11 October 2006 16:59, David Kimdon wrote:
> Perhaps rename it to extra_tx_headroom?
>  - existing users would then need to take notice of the change
>  - the name 'extra_tx_headroom' is more descriptive of what it actually is

Oh, yeah. This was lazyman's solution ;)
I will send an updated patch.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Steven Whitehouse
Hi,

On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote:
> Quoting Steven Whitehouse <[EMAIL PROTECTED]>:
> > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
> > >  size_t size, int flags)
> > > {
> > > ssize_t res;
> > > struct sock *sk = sock->sk;
> > > 
> > > if (!(sk->sk_route_caps & NETIF_F_SG) ||
> > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> > > return sock_no_sendpage(sock, page, offset, size, flags);
> > > 
> > > 
> > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > > data will be copied over rather than sent directly.
> > > So why does dev.c have to force set NETIF_F_SG to off then?
> > >
> > I agree with that analysis,
> 
> So, would you Ack something like the following then?
>

In so far as I'm able to ack it, then yes, but with the following
caveats: that you also need to look at the tcp code's checks for
NETIF_F_SG (aside from the interface to tcp_sendpage which I think
we've agreed is ok) and ensure that this patch will not change their
behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size()
in particular - there may be others but thats the only one I can think
of off the top of my head. I think this is what davem was getting at
with his comment about copy & sum for smaller packets.

Also all subject to approval by davem and shemminger of course :-)

My general feeling is that devices should advertise the features that
they actually have and that the protocols should make the decision
as to which ones to use or not depending on the combinations available
(which I think is pretty much your argument).

Steve.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 00/11] The _entire_ secid reconciliation patchset (tada!)

2006-10-11 Thread Venkat Yekkirala
> From an initial review of this patchset, it doesn't look 
> quite ready to 
> queue for 2.6.20 (which I plan to to via git once it is).
> 
> Outstanding items include resolving the igmp skb hook issue 
> generally, 
> testing to verify both the design and implementation, and 
> ensuring that 
> all the related policy changes are merged upstream first.
> 
Regarding the igmp hook issue, we could do a generic hook
like Paul suggested. Would that be more palatable you think?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael Krause

At 02:46 AM 10/11/2006, Michael S. Tsirkin wrote:

Quoting r. David Miller <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
>
> From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
> Date: Wed, 11 Oct 2006 11:05:04 +0200
>
> > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > data will be copied over rather than sent directly.
> > So why does dev.c have to force set NETIF_F_SG to off then?
>
> Because it's more efficient to copy into a linear destination
> buffer of an SKB than page sub-chunks when doing checksum+copy.
>

Thanks for the explanation.
Obviously its true as long as you can allocate the skb that big.
I think you won't realistically be able to get 64K in a
linear SKB on a busy system, though, is not that right?

OTOH, having large MTU (e.g. 64K) helps performance a lot since it reduces 
receive side processing overhead.


One thing to keep in mind is while it may help performance in a 
micro-benchmark, the system performance or the QoS impacts to other flows 
can be negatively impacted depending upon implementation.  For example, 
consider multiple messages interleaving (heaven help implementations that 
are not able to interleave multiple messages) on either the transmit or 
receive HCA / RNIC and how the time-to-completion of any message is 
extended out in time as a result of the interleave.  The effective 
throughput in terms of useful units of work can be lower as a result.   The 
same effect can be observed when there are a significant number connections 
in a device being simultaneously processed.


Also, if the copy-checksum is not performed on the processor where the 
application resides, then the performance can also be negatively impacted 
(want to have the right cache hot when initiated or concluded).  While the 
aggregate computational performance of systems may be increasing at a 
significant rate (set aside the per core vs. aggregate core debate), the 
memory performance gains are much less.  If you examine the longer term 
trends, there may be a flattening out of memory performance improvements by 
2009/10 without some major changes in the way controllers and subsystems 
are designed.


Mike 



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: Real time packets and bytes statistics

2006-10-11 Thread Jesse Brandeburg

On 10/11/06, Jean Delvare <[EMAIL PROTECTED]> wrote:

Hi all,

This patch is posted for review and comments.

Let the e1000 driver report the most important statistics (rx/tx_bytes
and rx/tx_packets) in real time, rather than every other second. This
is similar to what the e100 driver is doing.

The current asynchronous statistics refresh model makes it impossible
to monitor the network traffic with an interval which isn't a multiple
of 2 seconds. For example, an interval of 5 seconds would result in a
sawtooth diagram (+20%, -20%) for a constant transfer rate. With a 1
second interval it's even worse (0, 200%) of course. This has been
annoying users for years, but was never actually fixed:


I think the idea is good, however, see below.


rx/tx_bytes will show slightly lower values than before, because the
hardware appears to include the 4-byte ethernet frame CRC into the
frame length, while the driver doesn't. It's probably OK as the
e100, 3c59x and 8139too drivers don't include it either.


this is okay.


I additionally noted a difference of 6 bytes on some TX frames, which
I am not able to explain. It's probably small and rare enough not to
be considered a problem, but if someone can explain it, I would be
grateful.


now, that sounds odd, however, once again, see below.


Signed-off-by: Jean Delvare <[EMAIL PROTECTED]>
---
 drivers/net/e1000/e1000_main.c |   14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

--- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c2006-10-11 
10:53:49.0 +0200
+++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 
11:34:41.0 +0200
@@ -3118,6 +3118,8 @@
   e1000_tx_map(adapter, tx_ring, skb, first,
max_per_txd, nr_frags, mss));

+   adapter->net_stats.tx_packets++;
+   adapter->net_stats.tx_bytes += skb->len;
netdev->trans_start = jiffies;


this is the part I'm most worried about.  as I believe it to be
incorrect for TSO packets.  Maybe something like?
+   if (skb_shinfo(skb)->gso_segs)
+  adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs;
+   else
+  adapter->net_stats.tx_packets++;
+   adapter->net_stats.tx_bytes += skb->len;
   netdev->trans_start = jiffies;

skb len will still be off by some amount, because the skb->data
(header) is replicated across each gso segment but only counted once
this way, but hopefully someone will pipe up with a good way to
compute that.

The rest of the patch seems fine, barring any other comments.

Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/21]: powerpc/cell spidernet bugfixes, etc.

2006-10-11 Thread Arnd Bergmann
On Tuesday 10 October 2006 22:49, Linas Vepstas wrote:
> Andrew, please apply/forward upstream.
> 
> The following set of 21 patches (!) are all aimed at the the 
> spidernet ethernet device driver. The spidernet is an etherenet
> controller built into the Toshiba southbridge for the PowerPC Cell
> processor. (This is the only device in existance that with this
> ethernet hardware in it).
> 
> These patches re-package/re-order/re-cleanup a previous
> set of patches I've previously mailed. Thus, some have
> been previously Acked-by lines, most do not. Most of
> these patches are tiny, and handle problems that cropped
> up during testing. Sorry about there being so many of them.
> 
> The first set of 12 patches fix a large variety of mostly 
> minor bugs. 
> 
> The important patches are 13 through 17: these overcome a 
> debilitating performance problem on transmit (6 megabits
> per second !!) on transmit of patches 500 bytes or larger.
> After applying these, I am able to get the following:
> 
> pkt sz   speed (100K buffs)       speed (4M buffs)
> --   -        
> 1500     700 Mbits/sec            951 Mbits/sec
> 1000     658 Mbits/sec            770
> 800      600                      648
> 500      500                      500
> 300      372                      372
> 60        70                       70
> 
> Above buf size refers to /proc/sys/net/core/wmem_default

Excellent work! I guess this the best tx performance we've
seen so far on this hardware.

Consider this as an Acked-by: for all the patches, I'll save
the effort of replying to each one of them separately.

Jeff, do you plan on merging these fixes for 2.6.19?

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing

2006-10-11 Thread Geoff Levand
Linas Vepstas wrote:
> On Tue, Oct 10, 2006 at 06:46:08PM -0700, Geoff Levand wrote:
>> > Linas Vepstas wrote:
>> >> The current driver code performs 512 DMA mappns of a bunch of 
>> >> 32-byte structures. This is silly, as they are all in contiguous 
>> >> memory. Ths patch changes the code to DMA map the entie area
>> >> with just one call.
>> 
>> Linas, 
>> 
>> Is the motivation for this change to improve performance by reducing the 
>> overhead
>> of the mapping calls?  
> 
> Yes.
> 
>> If so, there may be some benefit for some systems.  Could
>> you please elaborate?
> 
> I started writingthe patch thinking it will have some huge effect on
> performance, based on a false assumption on how i/o was done on this
> machine
> 
> *If* this were another pSeries system, then each call to 
> pci_map_single() chews up an actual hardware "translation 
> control entry" (TCE) that maps pci bus addresses into 
> system RAM addresses. These are somewhat limited resources,
> and so one shouldn't squander them.  Furthermore, I thouhght
> TCE's have TLB's associated with them (similar to how virtual
> memory page tables are backed by hardware page TLB's), of which 
> there are even less of. I was thinking that TLB thrashing would 
> have a big hit on performance. 
> 
> Turns out that there was no difference to performance at all, 
> and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell
> made it clear why: there's no fancy i/o address mapping.

OK, thanks for the explanation.  Actually, the current cell DMA mapping
implementation uses a simple 'linear' mapping, in that, all of RAM is
mapped into the bus DMA address space at once, and in fact, it is all
just done at system startup.

There is ongoing work to implement 'dynamic' mapping, where DMA pages are
mapped into the bus DMA address space on demand.  I think a key point to
understand the benefit to this is that the cell processor's I/O controller
maps pages per device, so you can map one DMA page to one device.  I
currently have this working for my platform, but have not released that
work.  There is some overhead to managing the mapped buffers and to request
pages be mapped by the hypervisor, etc., so I was thinking that is this work
of yours to consolidate the memory buffers prior to requesting the mapping
could be of benefit if it was in an often executed code path.

-Geoff

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing

2006-10-11 Thread Linas Vepstas
On Tue, Oct 10, 2006 at 06:46:08PM -0700, Geoff Levand wrote:
> > Linas Vepstas wrote:
> >> The current driver code performs 512 DMA mappns of a bunch of 
> >> 32-byte structures. This is silly, as they are all in contiguous 
> >> memory. Ths patch changes the code to DMA map the entie area
> >> with just one call.
> 
> Linas, 
> 
> Is the motivation for this change to improve performance by reducing the 
> overhead
> of the mapping calls?  

Yes.

> If so, there may be some benefit for some systems.  Could
> you please elaborate?

I started writingthe patch thinking it will have some huge effect on
performance, based on a false assumption on how i/o was done on this
machine

*If* this were another pSeries system, then each call to 
pci_map_single() chews up an actual hardware "translation 
control entry" (TCE) that maps pci bus addresses into 
system RAM addresses. These are somewhat limited resources,
and so one shouldn't squander them.  Furthermore, I thouhght
TCE's have TLB's associated with them (similar to how virtual
memory page tables are backed by hardware page TLB's), of which 
there are even less of. I was thinking that TLB thrashing would 
have a big hit on performance. 

Turns out that there was no difference to performance at all, 
and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell
made it clear why: there's no fancy i/o address mapping.

Thus, the patch has only mrginal benefit; I submit it only in the 
name of "its the right thing to do anyway".

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting Steven Whitehouse <[EMAIL PROTECTED]>:
> > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
> >  size_t size, int flags)
> > {
> > ssize_t res;
> > struct sock *sk = sock->sk;
> > 
> > if (!(sk->sk_route_caps & NETIF_F_SG) ||
> > !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> > return sock_no_sendpage(sock, page, offset, size, flags);
> > 
> > 
> > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > data will be copied over rather than sent directly.
> > So why does dev.c have to force set NETIF_F_SG to off then?
> >
> I agree with that analysis,

So, would you Ack something like the following then?

==

Enabling NETIF_F_SG without NETIF_F_ALL_CSUM actually seems to work fine by
doing an old-fashioned data copy in tcp_sendpage.
And for devices that do not calculate IP checksum in hardware (e.g. InfiniBand)
calculating the checksum for all packets in network driver is worse than have
the CPU piggyback the checksum compitation with the copy process.
Finally, note that NETIF_F_SG is necessary to be able to allocate skbs >
PAGE_SIZE on busy systems.

So, let's allow that combination, again, for drivers that want it.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

---

diff --git a/net/core/dev.c b/net/core/dev.c
index d4a1ec3..2d731a0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2930,14 +2930,6 @@ #endif
}
}
 
-   /* Fix illegal SG+CSUM combinations. */
-   if ((dev->features & NETIF_F_SG) &&
-   !(dev->features & NETIF_F_ALL_CSUM)) {
-   printk(KERN_NOTICE "%s: Dropping NETIF_F_SG since no checksum 
feature.\n",
-  dev->name);
-   dev->features &= ~NETIF_F_SG;
-   }
-
/* TSO requires that SG is present as well. */
if ((dev->features & NETIF_F_TSO) &&
!(dev->features & NETIF_F_SG)) {

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] d80211: extend extra_hdr_room to be a bytecount

2006-10-11 Thread David Kimdon
Perhaps rename it to extra_tx_headroom?
 - existing users would then need to take notice of the change
 - the name 'extra_tx_headroom' is more descriptive of what it actually is

-David

On Wed, Oct 11, 2006 at 11:58:28AM +0200, Michael Buesch wrote:
> Extend ieee80211_hw's extra_hdr_room to be a bytecount for
> a device specific TX header instead of being a hardcoded
> 0/2 byte choice.
> 
> Signed-off-by: Michael Buesch <[EMAIL PROTECTED]>
> 
> diff --git a/include/net/d80211.h b/include/net/d80211.h
> index a80f48b..9e9709f 100644
> --- a/include/net/d80211.h
> +++ b/include/net/d80211.h
> @@ -476,10 +476,6 @@ struct ieee80211_hw {
>   /* Force software encryption for TKIP packets if WMM is enabled. */
>   unsigned int no_tkip_wmm_hwaccel:1;
>  
> - /* set if the payload needs to be padded at even boundaries after the
> -  * header */
> - unsigned int extra_hdr_room:1;
> -
>   /* Some devices handle Michael MIC internally and do not include MIC in
>* the received packets passed up. device_strips_mic must be set
>* for such devices. The 'encryption' frame control bit is expected to
> @@ -496,6 +492,9 @@ struct ieee80211_hw {
>* i.e. more than one skb per frame */
>   unsigned int fraglist:1;
>  
> + /* Set to the size of a needed device specific skb headroom for TX 
> skbs. */
> + unsigned int extra_hdr_room;
> +
>  /* This is the time in us to change channels
>   */
>  int channel_change_time;
> diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c
> index 1ef2707..7d52c3b 100644
> --- a/net/d80211/ieee80211.c
> +++ b/net/d80211/ieee80211.c
> @@ -1551,7 +1551,7 @@ static int ieee80211_subif_start_xmit(st
>* build in headroom in __dev_alloc_skb() (linux/skbuff.h) and
>* alloc_skb() (net/core/skbuff.c)
>*/
> - head_need = hdrlen + encaps_len + (local->hw->extra_hdr_room ? 2 : 0);
> + head_need = hdrlen + encaps_len + local->hw->extra_hdr_room;
>   head_need -= skb_headroom(skb);
>  
>   /* We are going to modify skb data, so make a copy of it if happens to
> 
> 
> -- 
> Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Customizable TCP backoff patch

2006-10-11 Thread Vlad Yasevich
Ben Woodard wrote:
 
> diff -ru linux-2.6.18/include/net/tcp.h linux-2.6.18.new/include/net/tcp.h
> --- linux-2.6.18/include/net/tcp.h2006-09-19 20:42:06.0 -0700
> +++ linux-2.6.18.new/include/net/tcp.h2006-10-10 18:42:00.0 
> -0700
> @@ -227,11 +227,23 @@
>  extern int sysctl_tcp_base_mss;
>  extern int sysctl_tcp_workaround_signed_windows;
>  extern int sysctl_tcp_slow_start_after_idle;
> +extern unsigned long sysctl_tcp_rto_max;
> +extern unsigned long sysctl_tcp_rto_init;
>  
>  extern atomic_t tcp_memory_allocated;
>  extern atomic_t tcp_sockets_allocated;
>  extern int tcp_memory_pressure;
>  
> +static inline unsigned long tcp_rto_max(struct tcp_sock *tp)
> +{
> +return tp->rto_max ? tp->rto_max*HZ/1000 : sysctl_tcp_rto_max;
^^^
That should probably be msecs_to_jiffies(tp->rto_max)

> +}
> +
> +static inline unsigned long tcp_rto_init(struct tcp_sock *tp)
> +{
> +return tp->rto_init ? tp->rto_init*HZ/1000 : sysctl_tcp_rto_init;

Ditto.

> +}
> +
>  /*
>   * The next routines deal with comparing 32 bit unsigned ints
>   * and worry about wraparound (automatic with unsigned arithmetic).
> diff -ru linux-2.6.18/net/ipv4/sysctl_net_ipv4.c 
> linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c
> --- linux-2.6.18/net/ipv4/sysctl_net_ipv4.c   2006-09-19 20:42:06.0 
> -0700
> +++ linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c   2006-10-10 
> 16:32:08.0 -0700
> @@ -128,6 +128,8 @@
>   return ret;
>  }
>  
> +static unsigned long tcp_rto_min=0;
> +static unsigned long tcp_rto_max=65535;
>  
>  ctl_table ipv4_table[] = {
>  {
> @@ -697,6 +699,26 @@
>   .mode   = 0644,
>   .proc_handler   = &proc_dointvec
>   },
> + {
> + .ctl_name   = NET_TCP_RTO_MAX,
> + .procname   = "tcp_rto_max",
> + .data   = &sysctl_tcp_rto_max,
> + .maxlen = sizeof(unsigned),
> + .mode   = 0644,
> + .proc_handler   = &proc_doulongvec_ms_jiffies,
> + .extra1 = &tcp_rto_min_constant,
> + .extra2 = &tcp_rto_max_constant,
> + },
> + {
> + .ctl_name   = NET_TCP_RTO_INIT,
> + .procname   = "tcp_rto_init",
> + .data   = &sysctl_tcp_rto_init,
> + .maxlen = sizeof(unsigned),
> + .mode   = 0644,
> + .proc_handler   = &proc_doulongvec_ms_jiffies,
> + .extra1 = &tcp_rto_min_constant,
> + .extra2 = &tcp_rto_max_constant,
> + },
>   { .ctl_name = 0 }
>  };

Try as I might, I can't find proc_doulongvec_ms_jiffies() anywhere.  I think
you meant proc_doulongvec_ms_jiffies_minmax()?

Also, this function has a bug in that it doesn't do corrections for potentially 
different
HZ values.  Look at the msecs_to_jiffies...  When I've used 
proc_doulongvec_ms_jiffies_minmax()
before, I would seen rather interesting truncation errors such that sysctl 
input didn't match
sysctl output.

>  
> diff -ru linux-2.6.18/net/ipv4/tcp.c linux-2.6.18.new/net/ipv4/tcp.c
> --- linux-2.6.18/net/ipv4/tcp.c   2006-09-19 20:42:06.0 -0700
> +++ linux-2.6.18.new/net/ipv4/tcp.c   2006-10-10 18:37:40.0 -0700
> @@ -1764,6 +1764,8 @@
>   return err;
>  }
>  
> +#define TCP_BACKOFF_MAXVAL 65535
> +
>  /*
>   *   Socket option code for TCP.
>   */
> @@ -1939,6 +1941,21 @@
>   }
>   break;
>  
> +case TCP_BACKOFF_MAX:
> +if (val < 1 || val > TCP_BACKOFF_MAXVAL)
> +err = -EINVAL;
> +else
> +tp->rto_max = val;
> +break;
> + 
> +case TCP_BACKOFF_INIT:
> +if (val < 1 || val > TCP_BACKOFF_MAXVAL)
> +err = -EINVAL;
> +else
> +tp->rto_init = val;
> +break;
> + 
> + 
>   default:
>   err = -ENOPROTOOPT;
>   break;
> @@ -2110,6 +2127,12 @@
>   if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
>   return -EFAULT;
>   return 0;
> +case TCP_BACKOFF_MAX:
> + val = tcp_rto_max(tp)*1000/HZ;
> +break;
> +case TCP_BACKOFF_INIT:
> + val = tcp_rto_init(tp)*1000/HZ;

The two divisions above introduce truncation errors such the input doesn't
match the output.  Better to use jiffies_to_msecs().

Regards
-vlad
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread Eric Dumazet
Hi David

While browsing net/ipv4/route.c  I discovered compare_keys() function, and a 
potential bug in it.

static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
{
return memcmp(&fl1->nl_u.ip4_u, &fl2->nl_u.ip4_u, 
sizeof(fl1->nl_u.ip4_u)) == 0 &&
   fl1->oif == fl2->oif &&
   fl1->iif == fl2->iif;
}


Using memcmp(ptr1, ptr2, sizeof(SOMEFIELD)) is dangerous because 
sizeof(SOMEFIELD) can be larger than the underlying object, because of 
alignment constraints.

In this case, sizeof(fl1->nl_u.ip4_u) is 16, while fl1->nl_u.ip4_u is :

struct {
__u32   daddr;
__u32   saddr;
__u32   fwmark;
__u8tos;
__u8scope;
} ip4_u;

So 14 bytes are really initialized, and 2 padding bytes might contain random 
values...

So at the very minimum, we should avoid doing the memcmp() including those 
last two bytes : It would be less bugy, and faster too...  (But to get really 
fast comparison, we should do some clever long/int XOR operations to avoid 
many test/branches, like the optim we did in compare_ether_addr())

As shown in profiles, "repz cmpsb" is really a dog... (and its use of 
esi/edi/ecx registers a high pressure for the compiler/optimizer)

Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] drivers/net/sis900.c: Replacing yield() with a better alternative

2006-10-11 Thread Amol Lad
*Please check if timeout should be increased*

Signed-off-by: Amol Lad <[EMAIL PROTECTED]>
---
diff -uprN -X linux-2.6.19-rc1-orig/Documentation/dontdiff 
linux-2.6.19-rc1-orig/drivers/net/wireless/hostap/hostap_hw.c 
linux-2.6.19-rc1/drivers/net/wireless/hostap/hostap_hw.c
--- linux-2.6.19-rc1-orig/drivers/net/wireless/hostap/hostap_hw.c   
2006-09-21 10:15:37.0 +0530
+++ linux-2.6.19-rc1/drivers/net/wireless/hostap/hostap_hw.c2006-10-11 
17:41:25.0 +0530
@@ -993,7 +993,7 @@ static u16 hfa384x_allocate_fid(struct n
delay = jiffies + HFA384X_ALLOC_COMPL_TIMEOUT;
while (!(HFA384X_INW(HFA384X_EVSTAT_OFF) & HFA384X_EV_ALLOC) &&
   time_before(jiffies, delay))
-   yield();
+   schedule_timeout_interruptible(1);
if (!(HFA384X_INW(HFA384X_EVSTAT_OFF) & HFA384X_EV_ALLOC)) {
printk("%s: fid allocate, len=%d - timeout\n", dev->name, len);
return 0x;
@@ -1333,7 +1333,7 @@ static int prism2_hw_init(struct net_dev
delay = jiffies + HFA384X_INIT_TIMEOUT;
while (!(HFA384X_INW(HFA384X_EVSTAT_OFF) & HFA384X_EV_CMD) &&
   time_before(jiffies, delay))
-   yield();
+   schedule_timeout_interruptible(1);
if (!(HFA384X_INW(HFA384X_EVSTAT_OFF) & HFA384X_EV_CMD)) {
printk(KERN_DEBUG "%s: assuming no Primary image in "
   "flash - card initialization not completed\n",


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] drivers/net/depca.c: Replacing yield() with a better alternative

2006-10-11 Thread Amol Lad
In 2.6, the semantics of calling yield() changed from "sleep for a
bit" to "I really don't want to run for a while".  This matches POSIX
better, but there's a lot of drivers still using yield() when they mean
cond_resched(), schedule() or even schedule_timeout().

For this driver schedule_timeout_interruptible seems to be a better
alternative

Tested compile only

Signed-off-by: Amol Lad <[EMAIL PROTECTED]>
---
diff -uprN -X linux-2.6.19-rc1-orig/Documentation/dontdiff 
linux-2.6.19-rc1-orig/drivers/net/depca.c linux-2.6.19-rc1/drivers/net/depca.c
--- linux-2.6.19-rc1-orig/drivers/net/depca.c   2006-10-05 14:00:47.0 
+0530
+++ linux-2.6.19-rc1/drivers/net/depca.c2006-10-11 17:57:02.0 
+0530
@@ -738,7 +738,7 @@ static int __init depca_hw_init (struct 
   interrupts. For now we will always get a DMA error. */
if (dev->irq < 2) {
unsigned char irqnum;
-   unsigned long irq_mask, delay;
+   unsigned long irq_mask;
 
irq_mask = probe_irq_on();
 
@@ -767,9 +767,7 @@ static int __init depca_hw_init (struct 
/* Trigger an initialization just for the interrupt. */
outw(INEA | INIT, DEPCA_DATA);
 
-   delay = jiffies + HZ/50;
-   while (time_before(jiffies, delay))
-   yield();
+   schedule_timeout_interruptible(jiffies + HZ/50);
 
irqnum = probe_irq_off(irq_mask);
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] drivers/net/sb1000.c: Replacing yield() with a better alternative

2006-10-11 Thread Amol Lad
cond_resched() seems to be a better alternative

Signed-off-by: Amol Lad <[EMAIL PROTECTED]>
---
diff -uprN -X linux-2.6.19-rc1-orig/Documentation/dontdiff 
linux-2.6.19-rc1-orig/drivers/net/sb1000.c linux-2.6.19-rc1/drivers/net/sb1000.c
--- linux-2.6.19-rc1-orig/drivers/net/sb1000.c  2006-10-05 14:00:48.0 
+0530
+++ linux-2.6.19-rc1/drivers/net/sb1000.c   2006-10-11 17:57:02.0 
+0530
@@ -264,7 +264,7 @@ card_wait_for_busy_clear(const int ioadd
timeout = jiffies + TimeOutJiffies;
while (a & 0x80 || a & 0x40) {
/* a little sleep */
-   yield();
+   cond_resched();
 
a = inb(ioaddr[0] + 7);
if (time_after_eq(jiffies, timeout)) {
@@ -288,7 +288,7 @@ card_wait_for_ready(const int ioaddr[], 
timeout = jiffies + TimeOutJiffies;
while (a & 0x80 || !(a & 0x40)) {
/* a little sleep */
-   yield();
+   cond_resched();
 
a = inb(ioaddr[1] + 6);
if (time_after_eq(jiffies, timeout)) {


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] drivers/net/sis900.c: Replacing yield() with a better alternative

2006-10-11 Thread Amol Lad
*Please check if the timout value is ok or it should be increased*

Signed-off-by: Amol Lad <[EMAIL PROTECTED]>
---
diff -uprN -X linux-2.6.19-rc1-orig/Documentation/dontdiff 
linux-2.6.19-rc1-orig/drivers/net/sis900.c linux-2.6.19-rc1/drivers/net/sis900.c
--- linux-2.6.19-rc1-orig/drivers/net/sis900.c  2006-10-05 14:00:48.0 
+0530
+++ linux-2.6.19-rc1/drivers/net/sis900.c   2006-10-11 17:57:02.0 
+0530
@@ -664,7 +664,7 @@ static int __init sis900_mii_probe(struc
 
if(status & MII_STAT_LINK){
while (poll_bit) {
-   yield();
+   schedule_timeout_interruptible(1);
 
poll_bit ^= (mdio_read(net_dev, sis_priv->cur_phy, 
MII_STATUS) & poll_bit);
if (time_after_eq(jiffies, timeout)) {


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7278] New: forcedeth slowed down by traffic shaping

2006-10-11 Thread Patrick McHardy
Jarek Poplawski wrote:
> On 07-10-2006 02:22, Andrew Morton wrote:
> 
>>On Fri, 6 Oct 2006 17:18:13 -0700
>>[EMAIL PROTECTED] wrote:
>>
>>
>>>http://bugzilla.kernel.org/show_bug.cgi?id=7278
>>>
>>>   Summary: forcedeth slowed down by traffic shaping
>>>Kernel Version: 2.6.16
>>>Status: NEW
>>>  Severity: normal
>>> Owner: [EMAIL PROTECTED]
>>> Submitter: [EMAIL PROTECTED]
>>>
>>>
>>>When assigninig a simple Tocket Bucket Filter (TBF) classless trafic shaping
>>>policy to the eth0 device (provided by forcedeth), the device's upstream 
>>>speed
>>>drops to ~12kbyte/s, regardless of the traffic limit I set in the shaping 
>>>rule.
>>>In fact, the traffic limit values could be larger than the connection's
>>>throughput and it'll still limit to mere 12kbyte/s. Once I revert the rule to
>>>pfifo_fast, the rate resumes to the connection's real one (64kbyte/s).
>>>
>>>If I set the TBF lower than 96kbit on eth0, it indeed applies; in other 
>>>words, I
>>>can make the connection slower than 12kbyte/s but not faster.
>>>
>>>If I assign the same rule to ppp0, it works just as intended, for rules 
>>>higher
>>>than 12kbyte/s, leading me to believe forcedeth has something to do with it.
>>>
>>>(Is there any module parameter I can try tweaking?)
> 
> 
> Isn't it to unpolite to expect from the submitter some details
> like this tbf rule or testing with other kernel version?


The TBF command would be a good start to make sure its not a
configuration error.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 7278] New: forcedeth slowed down by traffic shaping

2006-10-11 Thread Jarek Poplawski
On 07-10-2006 02:22, Andrew Morton wrote:
> On Fri, 6 Oct 2006 17:18:13 -0700
> [EMAIL PROTECTED] wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=7278
>>
>>Summary: forcedeth slowed down by traffic shaping
>> Kernel Version: 2.6.16
>> Status: NEW
>>   Severity: normal
>>  Owner: [EMAIL PROTECTED]
>>  Submitter: [EMAIL PROTECTED]
>>
>>
>> When assigninig a simple Tocket Bucket Filter (TBF) classless trafic shaping
>> policy to the eth0 device (provided by forcedeth), the device's upstream 
>> speed
>> drops to ~12kbyte/s, regardless of the traffic limit I set in the shaping 
>> rule.
>> In fact, the traffic limit values could be larger than the connection's
>> throughput and it'll still limit to mere 12kbyte/s. Once I revert the rule to
>> pfifo_fast, the rate resumes to the connection's real one (64kbyte/s).
>>
>> If I set the TBF lower than 96kbit on eth0, it indeed applies; in other 
>> words, I
>> can make the connection slower than 12kbyte/s but not faster.
>>
>> If I assign the same rule to ppp0, it works just as intended, for rules 
>> higher
>> than 12kbyte/s, leading me to believe forcedeth has something to do with it.
>>
>> (Is there any module parameter I can try tweaking?)

Isn't it to unpolite to expect from the submitter some details
like this tbf rule or testing with other kernel version?

Jarek P.  
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] e1000: Real time packets and bytes statistics

2006-10-11 Thread Jean Delvare
Hi all,

This patch is posted for review and comments.

Let the e1000 driver report the most important statistics (rx/tx_bytes
and rx/tx_packets) in real time, rather than every other second. This
is similar to what the e100 driver is doing.

The current asynchronous statistics refresh model makes it impossible
to monitor the network traffic with an interval which isn't a multiple
of 2 seconds. For example, an interval of 5 seconds would result in a
sawtooth diagram (+20%, -20%) for a constant transfer rate. With a 1
second interval it's even worse (0, 200%) of course. This has been
annoying users for years, but was never actually fixed:

LKML thread "e1000 statistics timer", August 2003:
  http://lkml.org/lkml/2003/8/2/127
Bug filled against SUSE LINUX 9.0 Professional, September 2003:
  https://bugzilla.novell.com/show_bug.cgi?id=46464
  (access is restricted, sorry about that)
LKML thread "minor e1000 bug", December 2003:
  http://marc.theaimsgroup.com/?t=10719095701
Bug filled against Ubuntu Dapper Drake, August 2006:
  https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/55989
Kernel bug #6986, August 2006:
  http://bugzilla.kernel.org/show_bug.cgi?id=6986

rx/tx_bytes will show slightly lower values than before, because the
hardware appears to include the 4-byte ethernet frame CRC into the
frame length, while the driver doesn't. It's probably OK as the
e100, 3c59x and 8139too drivers don't include it either.

I additionally noted a difference of 6 bytes on some TX frames, which
I am not able to explain. It's probably small and rare enough not to
be considered a problem, but if someone can explain it, I would be
grateful.

Signed-off-by: Jean Delvare <[EMAIL PROTECTED]>
---
 drivers/net/e1000/e1000_main.c |   14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

--- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c2006-10-11 
10:53:49.0 +0200
+++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 
11:34:41.0 +0200
@@ -3118,6 +3118,8 @@
   e1000_tx_map(adapter, tx_ring, skb, first,
max_per_txd, nr_frags, mss));
 
+   adapter->net_stats.tx_packets++;
+   adapter->net_stats.tx_bytes += skb->len;
netdev->trans_start = jiffies;
 
/* Make sure there is space in the ring for the next send. */
@@ -3384,10 +3386,8 @@
 
/* Fill out the OS statistics structure */
 
-   adapter->net_stats.rx_packets = adapter->stats.gprc;
-   adapter->net_stats.tx_packets = adapter->stats.gptc;
-   adapter->net_stats.rx_bytes = adapter->stats.gorcl;
-   adapter->net_stats.tx_bytes = adapter->stats.gotcl;
+   /* rx/tx_packets and rx/tx_bytes are computed in real time by
+  the driver */
adapter->net_stats.multicast = adapter->stats.mprc;
adapter->net_stats.collisions = adapter->stats.colc;
 
@@ -3833,6 +3833,7 @@
  ((uint32_t)(rx_desc->errors) << 24),
  le16_to_cpu(rx_desc->csum), skb);
 
+   length = skb->len;  /* Save actual length for stats */
skb->protocol = eth_type_trans(skb, netdev);
 #ifdef CONFIG_E1000_NAPI
if (unlikely(adapter->vlgrp &&
@@ -3853,6 +3854,8 @@
netif_rx(skb);
}
 #endif /* CONFIG_E1000_NAPI */
+   adapter->net_stats.rx_packets++;
+   adapter->net_stats.rx_bytes += length;
netdev->last_rx = jiffies;
 
 next_desc:
@@ -4009,6 +4012,7 @@
 copydone:
e1000_rx_checksum(adapter, staterr,
  
le16_to_cpu(rx_desc->wb.lower.hi_dword.csum_ip.csum), skb);
+   length = skb->len;  /* Save actual length for stats */
skb->protocol = eth_type_trans(skb, netdev);
 
if (likely(rx_desc->wb.upper.header_status &
@@ -4031,6 +4035,8 @@
netif_rx(skb);
}
 #endif /* CONFIG_E1000_NAPI */
+   adapter->net_stats.rx_packets++;
+   adapter->net_stats.rx_bytes += length;
netdev->last_rx = jiffies;
 
 next_desc:

Thanks,
-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug ? IF_RUNNING/routing table updates

2006-10-11 Thread Jarek Poplawski
On 09-10-2006 12:55, Shaun Kemp wrote:
...
> An interface (+ connected IP network) which loses its IF_RUNNING flag (ie
> unusable for routing) persists in the routing table as a kernel route.
> Thus rather than responding to a dynamically announced route to this
> connected network (the connected being unreachable due to the interface
> being down, but the dynamic offering an alternate path), the box insists on
> trying to route it out of the broken interface via this ?kernel? sourced
> route.
...
> The path for 192.168.0.192 is learned via 192.168.0.130 (current ospf dr -
> irrelevant), but it'll never use it presumably (from Cisco experience)
> because of the kernel sourced directly connected route still sitting in
> there. Furthermore, if I then IFDOWN eth1, everything is fine but I don't
> want to do this manually everytime there's an interface problem because
> that's why we run ospf ! =:D
> 
> Not sure whether this is a "driver tells the kernel" or a "kernel checks the
> driver at {n} intervals" issue - I would suggest the former would be more
> correct, but it is a problem regardless.
> 
> Maybe it's just these Intel drivers ? :/

IMHO it should be done by the same part which changed the
flag - for this purpose routing frontend is listening for
notify events.
 
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pull request for 'jg-20061009-00' tag

2006-10-11 Thread Francois Romieu
(removed the forcedeth people from the Cc:)

Jeff Garzik <[EMAIL PROTECTED]> :
[...]
> I merged the forcedeth change via another source.  Could I pull from a 
> tag with just the r8169 change?

Yes, you can pull from:
git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git jg-20061011-00

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] d80211: extend extra_hdr_room to be a bytecount

2006-10-11 Thread Michael Buesch
Extend ieee80211_hw's extra_hdr_room to be a bytecount for
a device specific TX header instead of being a hardcoded
0/2 byte choice.

Signed-off-by: Michael Buesch <[EMAIL PROTECTED]>

diff --git a/include/net/d80211.h b/include/net/d80211.h
index a80f48b..9e9709f 100644
--- a/include/net/d80211.h
+++ b/include/net/d80211.h
@@ -476,10 +476,6 @@ struct ieee80211_hw {
/* Force software encryption for TKIP packets if WMM is enabled. */
unsigned int no_tkip_wmm_hwaccel:1;
 
-   /* set if the payload needs to be padded at even boundaries after the
-* header */
-   unsigned int extra_hdr_room:1;
-
/* Some devices handle Michael MIC internally and do not include MIC in
 * the received packets passed up. device_strips_mic must be set
 * for such devices. The 'encryption' frame control bit is expected to
@@ -496,6 +492,9 @@ struct ieee80211_hw {
 * i.e. more than one skb per frame */
unsigned int fraglist:1;
 
+   /* Set to the size of a needed device specific skb headroom for TX 
skbs. */
+   unsigned int extra_hdr_room;
+
 /* This is the time in us to change channels
  */
 int channel_change_time;
diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c
index 1ef2707..7d52c3b 100644
--- a/net/d80211/ieee80211.c
+++ b/net/d80211/ieee80211.c
@@ -1551,7 +1551,7 @@ static int ieee80211_subif_start_xmit(st
 * build in headroom in __dev_alloc_skb() (linux/skbuff.h) and
 * alloc_skb() (net/core/skbuff.c)
 */
-   head_need = hdrlen + encaps_len + (local->hw->extra_hdr_room ? 2 : 0);
+   head_need = hdrlen + encaps_len + local->hw->extra_hdr_room;
head_need -= skb_headroom(skb);
 
/* We are going to modify skb data, so make a copy of it if happens to


-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. David Miller <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> 
> From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
> Date: Wed, 11 Oct 2006 11:05:04 +0200
> 
> > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> > data will be copied over rather than sent directly.
> > So why does dev.c have to force set NETIF_F_SG to off then?
> 
> Because it's more efficient to copy into a linear destination
> buffer of an SKB than page sub-chunks when doing checksum+copy.
> 

Thanks for the explanation.
Obviously its true as long as you can allocate the skb that big.
I think you won't realistically be able to get 64K in a
linear SKB on a busy system, though, is not that right?

OTOH, having large MTU (e.g. 64K) helps performance a lot since it
reduces receive side processing overhead.

So, if I understand what you are saying correctly,
things do work correctly (just slower for small skb) if NETIF_F_SG is set bug
clear, it seems that all we need to do is drop the following in dev.c:

/* Fix illegal SG+CSUM combinations. */
if ((dev->features & NETIF_F_SG) &&
!(dev->features & NETIF_F_ALL_CSUM)) {
printk(KERN_NOTICE "%s: Dropping NETIF_F_SG since no checksum 
feature.\n",
   dev->name);
dev->features &= ~NETIF_F_SG;
}

is that right?

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing

2006-10-11 Thread Arnd Bergmann
On Wednesday 11 October 2006 03:46, Geoff Levand wrote:
> > 
> > The others look good, but this one complicates the code and doesn't have 
> > any benefit.  20 
> > for 21 isn't bad.
> 
> Is the motivation for this change to improve performance by reducing the 
> overhead
> of the mapping calls?  If so, there may be some benefit for some systems.  
> Could
> you please elaborate?
> 

>From what I understand, this patch drastically reduces the number of
I/O PTEs that are needed in the iommu. With the current static IOMMU
mapping, it should only make a difference during initialization, but
any platform that uses a dynamic mapping of iommu entries will benefit
a lot from it, because:

- the card can do better prefetching of consecutive memory
- there are more I/O ptes available for other drivers.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread David Miller
From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
Date: Wed, 11 Oct 2006 11:05:04 +0200

> So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> data will be copied over rather than sent directly.
> So why does dev.c have to force set NETIF_F_SG to off then?

Because it's more efficient to copy into a linear destination
buffer of an SKB than page sub-chunks when doing checksum+copy.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Steven Whitehouse
Hi,

On Wed, Oct 11, 2006 at 11:05:04AM +0200, Michael S. Tsirkin wrote:
> Quoting r. David Miller <[EMAIL PROTECTED]>:
> > Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> > 
> > From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
> > Date: Wed, 11 Oct 2006 02:13:38 +0200
> > 
> > > Maybe I can patch linux to allow SG without checksum?
> > > Dave, maybe you could drop a hint or two on whether this is worthwhile
> > > and what are the issues that need addressing to make this work?
> > > 
> > > I imagine it's not just the matter of changing net/core/dev.c :).
> > 
> > You can't, it's a quality of implementation issue.  We sendfile()
> > pages directly out of the filesystem page cache without any
> > blocking of modifications to the page contents, and the only way
> > that works is if the card computes the checksum for us.
> > 
> > If we sendfile() a page directly, we must compute a correct checksum
> > no matter what the contents.  We can't do this on the cpu before the
> > data hits the device because another thread of execution can go in and
> > modify the page contents which would invalidate the checksum and thus
> > invalidating the packet.  We cannot allow this.
> > 
> > Blocking modifications is too expensive, so that's not an option
> > either.
> > 
>

I would argue that SG does make sense without checksum for protocols that
don't need/use a checksum. DECnet for example could do zero-copy without
caring about the checksum since it doesn't have one. One of these days
I'll get around to writing that bit of code :-)
 
> But copying still works fine, does it not?
> Dave, could you clarify this please?
> 
> ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
>  size_t size, int flags)
> {
> ssize_t res;
> struct sock *sk = sock->sk;
> 
> if (!(sk->sk_route_caps & NETIF_F_SG) ||
> !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
> return sock_no_sendpage(sock, page, offset, size, flags);
> 
> 
> So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
> data will be copied over rather than sent directly.
> So why does dev.c have to force set NETIF_F_SG to off then?
>
I agree with that analysis,

Steve.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-11 Thread Michael S. Tsirkin
Quoting r. David Miller <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> 
> From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
> Date: Wed, 11 Oct 2006 02:13:38 +0200
> 
> > Maybe I can patch linux to allow SG without checksum?
> > Dave, maybe you could drop a hint or two on whether this is worthwhile
> > and what are the issues that need addressing to make this work?
> > 
> > I imagine it's not just the matter of changing net/core/dev.c :).
> 
> You can't, it's a quality of implementation issue.  We sendfile()
> pages directly out of the filesystem page cache without any
> blocking of modifications to the page contents, and the only way
> that works is if the card computes the checksum for us.
> 
> If we sendfile() a page directly, we must compute a correct checksum
> no matter what the contents.  We can't do this on the cpu before the
> data hits the device because another thread of execution can go in and
> modify the page contents which would invalidate the checksum and thus
> invalidating the packet.  We cannot allow this.
> 
> Blocking modifications is too expensive, so that's not an option
> either.
> 

But copying still works fine, does it not?
Dave, could you clarify this please?

ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset,
 size_t size, int flags)
{
ssize_t res;
struct sock *sk = sock->sk;

if (!(sk->sk_route_caps & NETIF_F_SG) ||
!(sk->sk_route_caps & NETIF_F_ALL_CSUM))
return sock_no_sendpage(sock, page, offset, size, flags);


So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM,
data will be copied over rather than sent directly.
So why does dev.c have to force set NETIF_F_SG to off then?

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] beginning of 8390 fixes - generic and arm/etherh

2006-10-11 Thread Jeff Garzik
applied 1-4 to #upstream (queued for 2.6.20).  will let it simmer in -mm 
for a cycle, to make sure it gets some wide testing.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pull request for 'jg-20061009-00' tag

2006-10-11 Thread Jeff Garzik

Francois Romieu wrote:

Please pull from tag 'jg-20061009-00' in repository
 
git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git jg-20061009-00


to get the changes below.

Distance from 'upstream-fixes'
-

$ git rev-list 2f614fe04f4463ff22234133319067d7361f54e5..jg-20061009-00
91a6ba7162852092080b0e710dc0ba0f35496308
73f5e28b336772c4b08ee82e5bf28ab872898ee1

Diffstat


 drivers/net/forcedeth.c |   43 +++
 drivers/net/r8169.c |1 +
 2 files changed, 44 insertions(+), 0 deletions(-)


I merged the forcedeth change via another source.  Could I pull from a 
tag with just the r8169 change?


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-update][RFC] net: consolidated UDP / UDP-Lite code

2006-10-11 Thread Gerrit Renker
|  >  csum_copy_err:
|  > -  UDP_INC_STATS_BH(UDP_MIB_INERRORS);
|  > +  UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
|  > +  UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite);
|  
|  I'm not a big fan at all of these "statistic corrections"
|  we're starting to place in various spots.
I am not really fond of this solution either. It evolved via discussion, from a
previous suggestion to place the increment of InDatagrams into udp_recvmsg(). 
The problem with that alternative was in dealing with applications which use the
data_ready handler (such as sunrpc).

|  I really don't think it's the end of the world if we count as
|  INDATAGRAMS a packet that we later discover has a bad checksum.
It would have been nice to say "all these counters count correctly". Maybe that 
has been over-ambitious - in  tcp_ipv4.c I found that tcp_v4_rcv also counts 
incoming segments even if they are bad.

I will restore the original state and remove the counter decrements in the
next upcoming version of the patch.


Thank you,

-- Gerrit
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is each frag of a skb always less than 1 page?

2006-10-11 Thread Jarek Poplawski
On 11-10-2006 01:26, Ronghua Zhang wrote:
...
> The reason I asked this is that I saw the following code in forthdeth 
> drvier:
> 
> #define NV_TX2_TSO_MAX_SHIFT) 14
> /* add fragments to entries count */
> for (i = 0; i < fragments; i++) {
>entries += (skb_shinfo(skb)->frags[i].size >> NV_TX2_TSO_MAX_SHIFT) +
>((skb_shinfo(skb)->frags[i].size &
> (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
> }
> 
> This looks unnecessary if each frag is guaranteed not to span pages.

Even if PAGE_SIZE > NV_TX2_TSO_MAX_SIZE?

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing

2006-10-11 Thread Benjamin Herrenschmidt
On Tue, 2006-10-10 at 18:20 -0500, jschopp wrote:
> Linas Vepstas wrote:
> > The current driver code performs 512 DMA mappns of a bunch of 
> > 32-byte structures. This is silly, as they are all in contiguous 
> > memory. Ths patch changes the code to DMA map the entie area
> > with just one call.
> > 
> > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> > Cc: James K Lewis <[EMAIL PROTECTED]>
> > Cc: Arnd Bergmann <[EMAIL PROTECTED]>
> 
> The others look good, but this one complicates the code and doesn't have any 
> benefit.  20 
> for 21 isn't bad.

Hi Joel !

I'm not sure what you mean here (especially your 20 to 21 comment).

The patch looks perfectly fine to me, and in fact removes more lines of
code than it adds :)

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html