Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Wednesday 14 November 2007 00:27, Adrian Bunk wrote:
> You missed the following in my email:
> "we slowly scare them away due to the many bug reports without any
>  reaction."
>
> The problem is that bug reports take time. If you go away from easy
> things like compile errors then even things like describing what does
> no longer work, ideally producing a scenario where you can reproduce it
> and verifying whether it was present in previous kernels can easily take
> many hours that are spent before the initial bug report.
>
> If the bug report then gets ignored we discourage the person who sent
> the bug report to do any work related to the kernel again.

Cannot agree more. I am in a similar position right now.
My patch to aic7xxx driver was ubmitted four times
with not much reaction from scsi guys.

Finally they replied and asked to rediff it against their
git tree. I did that and sent patches back. No reply since then.

And mind you, the patch is not trying to do anything
complex, it mostly moves code around, removes 'inline',
adds 'const'. What should I think about it?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PATCH 1/2 [SCHED 2.6.24]: Check subqueue status before calling hard_start_xmit

2007-11-13 Thread Krishna Kumar2
Hi Dave,

David Miller <[EMAIL PROTECTED]> wrote on 11/14/2007 11:44:39 AM:

> > You could optimize this by getting HARD_TX_LOCK after the check. I
> > assume that netif_stop_subqueue (from another CPU) would always be
> > called by the driver xmit, and that is not possible since we hold
> > the __LINK_STATE_QDISC_RUNNING bit. Does that sound correct?
>
> I don't think this is a critical optimization at this time,
> but something to certainly do along with the surgery
> we'll undoubtedly be doing here in the future :-)

Agree. Will keep in mind for later :)

Thanks,

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] NET : move Qdisc_class_ops and Qdisc_ops in appropriate sections

2007-11-13 Thread Eric Dumazet

Hi David

Please find this patch against net-2.6.25

Thank you


[PATCH] NET : move Qdisc_class_ops and Qdisc_ops in appropriate sections

Qdisc_class_ops are const, and Qdisc_ops are mostly read.

Using "const" and "__read_mostly" qualifiers helps to reduce false sharing.

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

 include/net/sch_generic.h |2 +-
 net/mac80211/wme.c|4 ++--
 net/sched/cls_api.c   |4 ++--
 net/sched/sch_api.c   |   12 ++--
 net/sched/sch_atm.c   |4 ++--
 net/sched/sch_blackhole.c |2 +-
 net/sched/sch_cbq.c   |4 ++--
 net/sched/sch_dsmark.c|4 ++--
 net/sched/sch_fifo.c  |4 ++--
 net/sched/sch_generic.c   |   10 +-
 net/sched/sch_gred.c  |2 +-
 net/sched/sch_hfsc.c  |4 ++--
 net/sched/sch_htb.c   |4 ++--
 net/sched/sch_ingress.c   |4 ++--
 net/sched/sch_netem.c |6 +++---
 net/sched/sch_prio.c  |6 +++---
 net/sched/sch_red.c   |4 ++--
 net/sched/sch_sfq.c   |2 +-
 net/sched/sch_tbf.c   |4 ++--
 19 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c926551..60b4b35 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -86,7 +86,7 @@ struct Qdisc_class_ops
 struct Qdisc_ops
 {
struct Qdisc_ops*next;
-   struct Qdisc_class_ops  *cl_ops;
+   const struct Qdisc_class_ops*cl_ops;
charid[IFNAMSIZ];
int priv_size;
 
diff --git a/net/mac80211/wme.c b/net/mac80211/wme.c
index 5b8a157..8dbdede 100644
--- a/net/mac80211/wme.c
+++ b/net/mac80211/wme.c
@@ -527,7 +527,7 @@ static struct tcf_proto ** wme_classop_find_tcf(struct 
Qdisc *qd,
 
 /* this qdisc is classful (i.e. has classes, some of which may have leaf 
qdiscs attached)
  * - these are the operations on the classes */
-static struct Qdisc_class_ops class_ops =
+static const struct Qdisc_class_ops class_ops =
 {
.graft = wme_classop_graft,
.leaf = wme_classop_leaf,
@@ -547,7 +547,7 @@ static struct Qdisc_class_ops class_ops =
 
 
 /* queueing discipline operations */
-static struct Qdisc_ops wme_qdisc_ops =
+static struct Qdisc_ops wme_qdisc_ops __read_mostly =
 {
.next = NULL,
.cl_ops = &class_ops,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0365797..bb98045 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -130,7 +130,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n, void *arg)
struct tcf_proto **back, **chain;
struct tcf_proto *tp;
struct tcf_proto_ops *tp_ops;
-   struct Qdisc_class_ops *cops;
+   const struct Qdisc_class_ops *cops;
unsigned long cl;
unsigned long fh;
int err;
@@ -382,7 +382,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
struct tcf_proto *tp, **chain;
struct tcmsg *tcm = (struct tcmsg*)NLMSG_DATA(cb->nlh);
unsigned long cl = 0;
-   struct Qdisc_class_ops *cops;
+   const struct Qdisc_class_ops *cops;
struct tcf_dump_args arg;
 
if (cb->nlh->nlmsg_len < NLMSG_LENGTH(sizeof(*tcm)))
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 8ae137e..259321b 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -195,7 +195,7 @@ static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 
classid)
 {
unsigned long cl;
struct Qdisc *leaf;
-   struct Qdisc_class_ops *cops = p->ops->cl_ops;
+   const struct Qdisc_class_ops *cops = p->ops->cl_ops;
 
if (cops == NULL)
return NULL;
@@ -373,7 +373,7 @@ dev_graft_qdisc(struct net_device *dev, struct Qdisc *qdisc)
 
 void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
 {
-   struct Qdisc_class_ops *cops;
+   const struct Qdisc_class_ops *cops;
unsigned long cl;
u32 parentid;
 
@@ -417,7 +417,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
*old = dev_graft_qdisc(dev, new);
}
} else {
-   struct Qdisc_class_ops *cops = parent->ops->cl_ops;
+   const struct Qdisc_class_ops *cops = parent->ops->cl_ops;
 
err = -EINVAL;
 
@@ -581,7 +581,7 @@ static int
 check_loop_fn(struct Qdisc *q, unsigned long cl, struct qdisc_walker *w)
 {
struct Qdisc *leaf;
-   struct Qdisc_class_ops *cops = q->ops->cl_ops;
+   const struct Qdisc_class_ops *cops = q->ops->cl_ops;
struct check_loop_arg *arg = (struct check_loop_arg *)w;
 
leaf = cops->leaf(q, cl);
@@ -924,7 +924,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct 
nlmsghdr *n, void *arg)
struct rtattr **tca = arg;
struct net_device *dev;
struct Qdisc *q = NULL;
-   struct Qdisc_class_ops *cops;
+   const struct Qdisc_class_ops *cops;
un

[GIT PULL] RFC3484 Configurable Policy Table.

2007-11-13 Thread YOSHIFUJI Hideaki / 吉藤英明
David,

Please consider pulling from
inet6-2.6.25
branch at

which contains the following commits on top of your net-2.6.25 tree.

Regards,

--

HEADLINES
-

[IPV6] ADDRCONF: Rename ipv6_saddr_label() to ipv6_addr_label().
[IPV6] ADDRCONF: Allow address selection policy with ifindex.
[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy 
table.

DIFFSTAT


 include/linux/if_addrlabel.h |   32 ++
 include/linux/rtnetlink.h|7 +
 include/net/addrconf.h   |8 +
 net/ipv6/Makefile|1 
 net/ipv6/addrconf.c  |   50 +---
 net/ipv6/addrlabel.c |  551 ++
 6 files changed, 616 insertions(+), 33 deletions(-)

CHANGESETS
--

commit d4e85a45763d572e39f3c040253a4d7b45ca67aa
Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date:   Wed Nov 14 15:55:29 2007 +0900

[IPV6] ADDRCONF: Rename ipv6_saddr_label() to ipv6_addr_label().

This patch renames ipv6_saddr_label() to ipv6_addr_label() because
address label is used for both of source address and destination
address.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f825b92..a30f4e3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -875,7 +875,7 @@ static inline int ipv6_saddr_preferred(int type)
 }
 
 /* static matching label */
-static inline int ipv6_saddr_label(const struct in6_addr *addr, int type)
+static inline int ipv6_addr_label(const struct in6_addr *addr, int type)
 {
  /*
   *prefix (longest match)  label
@@ -910,7 +910,7 @@ int ipv6_dev_get_saddr(struct net_device *daddr_dev,
struct inet6_ifaddr *ifa_result = NULL;
int daddr_type = __ipv6_addr_type(daddr);
int daddr_scope = __ipv6_addr_src_scope(daddr_type);
-   u32 daddr_label = ipv6_saddr_label(daddr, daddr_type);
+   u32 daddr_label = ipv6_addr_label(daddr, daddr_type);
struct net_device *dev;
 
memset(&hiscore, 0, sizeof(hiscore));
@@ -1083,11 +1083,13 @@ int ipv6_dev_get_saddr(struct net_device *daddr_dev,
 
/* Rule 6: Prefer matching label */
if (hiscore.rule < 6) {
-   if (ipv6_saddr_label(&ifa_result->addr, 
hiscore.addr_type) == daddr_label)
+   if (ipv6_addr_label(&ifa_result->addr,
+   hiscore.addr_type) == 
daddr_label)
hiscore.attrs |= IPV6_SADDR_SCORE_LABEL;
hiscore.rule++;
}
-   if (ipv6_saddr_label(&ifa->addr, score.addr_type) == 
daddr_label) {
+   if (ipv6_addr_label(&ifa->addr,
+   score.addr_type) == daddr_label) {
score.attrs |= IPV6_SADDR_SCORE_LABEL;
if (!(hiscore.attrs & IPV6_SADDR_SCORE_LABEL)) {
score.rule = 6;

---
commit 010ae4d3b1209c724050f10e1832fd8a54d40ef2
Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date:   Wed Nov 14 15:56:15 2007 +0900

[IPV6] ADDRCONF: Allow address selection policy with ifindex.

Think patch allows ifindex to be a key for address selection policy table.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a30f4e3..0076010 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -875,7 +875,8 @@ static inline int ipv6_saddr_preferred(int type)
 }
 
 /* static matching label */
-static inline int ipv6_addr_label(const struct in6_addr *addr, int type)
+static inline int ipv6_addr_label(const struct in6_addr *addr, int type,
+ int ifindex)
 {
  /*
   *prefix (longest match)  label
@@ -910,7 +911,8 @@ int ipv6_dev_get_saddr(struct net_device *daddr_dev,
struct inet6_ifaddr *ifa_result = NULL;
int daddr_type = __ipv6_addr_type(daddr);
int daddr_scope = __ipv6_addr_src_scope(daddr_type);
-   u32 daddr_label = ipv6_addr_label(daddr, daddr_type);
+   int daddr_ifindex = daddr_dev ? daddr_dev->ifindex : 0;
+   u32 daddr_label = ipv6_addr_label(daddr, daddr_type, daddr_ifindex);
struct net_device *dev;
 
memset(&hiscore, 0, sizeof(hiscore));
@@ -1084,12 +1086,14 @@ int ipv6_dev_get_saddr(struct net_device *daddr_dev,
/* Rule 6: Prefer matching label */
if (hiscore.rule < 6) {
if (ipv6_addr_label(&ifa_result->addr,
-   hiscore.addr_type) == 
daddr_label)
+   hiscore.addr_type,
+   
ifa_result->idev->dev->ifindex) == daddr_label)
   

[IPSEC]: Fix ip_local_out when NETFILTER is off

2007-11-13 Thread Herbert Xu
Hi Dave:

[IPSEC]: Fix ip_local_out when NETFILTER is off

Thanks for testing with NETFILTER off because I obviously didn't :)
The code for ip_local_out/ip6_local_out is completely broken in
that case.

This patch fixes and also makes the IPsec path use the correct
local_out function so that the loop executes properly and we
don't end up nesting too deep and overrun the stack.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/net/ip.h b/include/net/ip.h
index dc60ea4..83fb9f1 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -103,11 +103,7 @@ extern int ip_fragment(struct sk_buff *skb, int 
(*output)(struct sk_buff *));
 extern int ip_do_nat(struct sk_buff *skb);
 extern voidip_send_check(struct iphdr *ip);
 extern int __ip_local_out(struct sk_buff *skb);
-#ifdef CONFIG_NETFILTER
 extern int ip_local_out(struct sk_buff *skb);
-#else
-#define ip_local_out   __ip_local_out
-#endif
 extern int ip_queue_xmit(struct sk_buff *skb, int ipfragok);
 extern voidip_init(void);
 extern int ip_append_data(struct sock *sk,
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 5078605..e90f962 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -510,11 +510,7 @@ extern int ip6_input(struct sk_buff *skb);
 extern int ip6_mc_input(struct sk_buff *skb);
 
 extern int __ip6_local_out(struct sk_buff *skb);
-#ifdef CONFIG_NETFILTER
 extern int ip6_local_out(struct sk_buff *skb);
-#else
-#define ip6_local_out  __ip6_local_out
-#endif
 
 /*
  * Extension header (options) processing
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c652ddb..ad8106c 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -101,7 +101,6 @@ int __ip_local_out(struct sk_buff *skb)
   dst_output);
 }
 
-#ifdef CONFIG_NETFILTER
 int ip_local_out(struct sk_buff *skb)
 {
int err;
@@ -112,7 +111,6 @@ int ip_local_out(struct sk_buff *skb)
 
return err;
 }
-#endif
 EXPORT_SYMBOL_GPL(ip_local_out);
 
 /* dev_loopback_xmit for use with netfilter. */
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index d6232df..b4948c1 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -237,7 +237,7 @@ static struct dst_ops xfrm4_dst_ops = {
.update_pmtu =  xfrm4_update_pmtu,
.destroy =  xfrm4_dst_destroy,
.ifdown =   xfrm4_dst_ifdown,
-   .local_out =ip_local_out,
+   .local_out =__ip_local_out,
.gc_thresh =1024,
.entry_size =   sizeof(struct xfrm_dst),
 };
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a0a366f..45a0f39 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -83,7 +83,6 @@ int __ip6_local_out(struct sk_buff *skb)
   dst_output);
 }
 
-#ifdef CONFIG_NETFILTER
 int ip6_local_out(struct sk_buff *skb)
 {
int err;
@@ -94,7 +93,6 @@ int ip6_local_out(struct sk_buff *skb)
 
return err;
 }
-#endif
 EXPORT_SYMBOL_GPL(ip6_local_out);
 
 static int ip6_output_finish(struct sk_buff *skb)
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 6ee0de1..31456c7 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -253,7 +253,7 @@ static struct dst_ops xfrm6_dst_ops = {
.update_pmtu =  xfrm6_update_pmtu,
.destroy =  xfrm6_dst_destroy,
.ifdown =   xfrm6_dst_ifdown,
-   .local_out =ip6_local_out,
+   .local_out =__ip6_local_out,
.gc_thresh =1024,
.entry_size =   sizeof(struct xfrm_dst),
 };
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Adrian Bunk
On Tue, Nov 13, 2007 at 05:39:45PM -0700, Denys Vlasenko wrote:
> On Tuesday 13 November 2007 10:56, Adrian Bunk wrote:
> > On Tue, Nov 13, 2007 at 12:13:56PM -0500, Theodore Tso wrote:
> > > On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote:
> > > > Btw, I used to test every -mm kernel. But since I've switched distros
> > > > (gentoo->ubuntu)
> > > > and I have less time, I feel it's harder to test -rc or -mm kernels (I
> > > > know this isn't a lkml problem
> > > > but more a distro problem, but I would love having an ubuntu blessed
> > > > repo with current dev kernel
> > > > for the latest stable ubuntu release).
> > >
> > > There are two parts to this.  One is a Ubuntu development kernel which
> > > we can give to large numbers of people to expand our testing pool.
> > > But if we don't do a better job of responding to bug reports that
> > > would be generated by expanded testing this won't necessarily help us.
> > >...
> >
> > The main problem aren't missing testers [1] - we already have relatively
> > experienced people testing kernels and/or reporting bugs, and we slowly
> > scare them away due to the many bug reports without any reaction.
> >
> > The main problem is finding experienced developers who spend time on
> > looking into bug reports.
> >
> > Getting many relatively unexperienced users (who need more guidance for
> > debugging issues) as additional testers is therefore IMHO not
> > necessarily a good idea.
> 
> And where experienced developrs are coming from?
> They are not born with Linux kernel skills.
> They grow up from within user base.
> 
> Bigger user base -> more developers (eventually)

You missed the following in my email:
"we slowly scare them away due to the many bug reports without any 
 reaction."

The problem is that bug reports take time. If you go away from easy 
things like compile errors then even things like describing what does
no longer work, ideally producing a scenario where you can reproduce it 
and verifying whether it was present in previous kernels can easily take 
many hours that are spent before the initial bug report.

If the bug report then gets ignored we discourage the person who sent 
the bug report to do any work related to the kernel again.

> vda

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] SOCK: add raw6 drops counter

2007-11-13 Thread Wang Chen
Eric Dumazet said the following on 2007-11-14 13:48:
>> diff -Nurp linux-2.6.24-rc2.org/net/ipv6/raw.c
>> linux-2.6.24-rc2/net/ipv6/raw.c
>> --- linux-2.6.24-rc2.org/net/ipv6/raw.c2007-11-09
>> 16:38:05.0 +0800
>> +++ linux-2.6.24-rc2/net/ipv6/raw.c2007-11-14 09:46:54.0
>> +0800
>> @@ -354,14 +354,14 @@ static inline int rawv6_rcv_skb(struct s
>>  {
>>  if ((raw6_sk(sk)->checksum || sk->sk_filter) &&
>>  skb_checksum_complete(skb)) {
>> -/* FIXME: increment a raw6 drops counter here */
>> +atomic_inc(&sk->sk_drops);   
> 
> I am not sure the comment was refering to a per socket counter here.
> 
> If the frame checksum is bad, we can not be sure the socket is OK, since
> the garbaged bits could be in the tuple that identify the socket.
> 
> Maybe here we want to increment a global raw6 drop counter (well, for
> the given ipv6 instance)
> 

What the /proc/net/raw6 shows is statistical information about raw socket
in IPv6 stack. And the information is per socket per row. 
So I think it's better to count it to per socket.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [IPSEC]: Add async resume support on input

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 14:41:54 +0800

> On Tue, Nov 13, 2007 at 10:38:03PM -0800, David Miller wrote:
> > 
> > We need to fix something else up first :-)
> > 
> > net/ipv4/xfrm4_input.c: In function 'xfrm4_transport_finish':
> > net/ipv4/xfrm4_input.c:65: error: implicit declaration of function 
> > 'xfrm4_rcv_encap_finish'
> > net/ipv4/xfrm4_input.c:67: error: 'nexthdr' undeclared (first use in this 
> > function)
> > net/ipv4/xfrm4_input.c:67: error: (Each undeclared identifier is reported 
> > only once
> > net/ipv4/xfrm4_input.c:67: error: for each function it appears in.)
> > make[2]: *** [net/ipv4/xfrm4_input.o] Error 1
> > make[1]: *** [net/ipv4] Error 2
> > make: *** [net] Error 2
> 
> No netfilter, surely not :)
> 
> Does this patch help?
> 
> [IPSEC]: Fix build problem with netfilter off
> 
> The function xfrm4_rcv_encap_finish is now used even with NETFILTER
> off.  So we need to remove the ifdefs around i.t
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Nope, it needs a little more than that :-)  I just checked
in the following:

>From befb75b758f8a4d4de4c535db9f845726eae05eb Mon Sep 17 00:00:00 2001
From: David S. Miller <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 22:45:41 -0800
Subject: [PATCH] [IPV4] xfrm4_input.c: Fix build in non-NETFILTER case.

Based in part on a patch from Herbert Xu.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
---
 net/ipv4/xfrm4_input.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index cd25351..d5890c8 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -21,7 +21,6 @@ int xfrm4_extract_input(struct xfrm_state *x, struct sk_buff 
*skb)
return xfrm4_extract_header(skb);
 }
 
-#ifdef CONFIG_NETFILTER
 static inline int xfrm4_rcv_encap_finish(struct sk_buff *skb)
 {
if (skb->dst == NULL) {
@@ -36,7 +35,6 @@ drop:
kfree_skb(skb);
return NET_RX_DROP;
 }
-#endif
 
 int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
int encap_type)
@@ -64,7 +62,7 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
if (async)
return xfrm4_rcv_encap_finish(skb);
 
-   return -nexthdr;
+   return -iph->protocol;
 #endif
 }
 
-- 
1.5.3.5

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [IPSEC]: Add async resume support on input

2007-11-13 Thread Herbert Xu
On Tue, Nov 13, 2007 at 10:38:03PM -0800, David Miller wrote:
> 
> We need to fix something else up first :-)
> 
> net/ipv4/xfrm4_input.c: In function 'xfrm4_transport_finish':
> net/ipv4/xfrm4_input.c:65: error: implicit declaration of function 
> 'xfrm4_rcv_encap_finish'
> net/ipv4/xfrm4_input.c:67: error: 'nexthdr' undeclared (first use in this 
> function)
> net/ipv4/xfrm4_input.c:67: error: (Each undeclared identifier is reported 
> only once
> net/ipv4/xfrm4_input.c:67: error: for each function it appears in.)
> make[2]: *** [net/ipv4/xfrm4_input.o] Error 1
> make[1]: *** [net/ipv4] Error 2
> make: *** [net] Error 2

No netfilter, surely not :)

Does this patch help?

[IPSEC]: Fix build problem with netfilter off

The function xfrm4_rcv_encap_finish is now used even with NETFILTER
off.  So we need to remove the ifdefs around i.t

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index cd25351..7795938 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -21,7 +21,6 @@ int xfrm4_extract_input(struct xfrm_state *x, struct sk_buff 
*skb)
return xfrm4_extract_header(skb);
 }
 
-#ifdef CONFIG_NETFILTER
 static inline int xfrm4_rcv_encap_finish(struct sk_buff *skb)
 {
if (skb->dst == NULL) {
@@ -36,7 +35,6 @@ drop:
kfree_skb(skb);
return NET_RX_DROP;
 }
-#endif
 
 int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
int encap_type)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [IPSEC]: Add async resume support on input

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 14:36:58 +0800

> On Tue, Nov 13, 2007 at 10:33:56PM -0800, David Miller wrote:
> >
> > Also applied to net-2.6.25, thanks.
> > 
> > I'll work on integrating Patrick's NF_INET_* patch next.
> 
> Thanks! With that you would be able to remove the nf_post_routing
> field from xfrm_state_afinfo.

We need to fix something else up first :-)

net/ipv4/xfrm4_input.c: In function 'xfrm4_transport_finish':
net/ipv4/xfrm4_input.c:65: error: implicit declaration of function 
'xfrm4_rcv_encap_finish'
net/ipv4/xfrm4_input.c:67: error: 'nexthdr' undeclared (first use in this 
function)
net/ipv4/xfrm4_input.c:67: error: (Each undeclared identifier is reported only 
once
net/ipv4/xfrm4_input.c:67: error: for each function it appears in.)
make[2]: *** [net/ipv4/xfrm4_input.o] Error 1
make[1]: *** [net/ipv4] Error 2
make: *** [net] Error 2
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc2: Network commit causes SLUB performance regression with tbench

2007-11-13 Thread David Miller
From: Nick Piggin <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 05:14:27 +1100

> On Wednesday 14 November 2007 17:12, David Miller wrote:
> > Is your test system using HIGHMEM?
> >
> > That's one thing the page vector in the sk_buff can do a lot,
> > kmaps.
> 
> No, it's an x86-64, so no highmem.

Ok.

> What's also interesting is that SLAB apparently doesn't have this
> condition. The first thing that sprung to mind is that SLAB caches
> order > 0 allocations, while SLUB does not. However if anything,
> that should actually favour the SLUB numbers if network is avoiding
> order > 0 allocations.
> 
> I'm doing some oprofile runs now to see if I can get any more info.

Here are some other things you can play around with:

1) Monitor the values of skb->len and skb->data_len for packets
   going over loopback.

2) Try removing NETIF_F_SG in drivers/net/loopback.c's dev->feastures
   setting.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [IPSEC]: Add async resume support on input

2007-11-13 Thread Herbert Xu
On Tue, Nov 13, 2007 at 10:33:56PM -0800, David Miller wrote:
>
> Also applied to net-2.6.25, thanks.
> 
> I'll work on integrating Patrick's NF_INET_* patch next.

Thanks! With that you would be able to remove the nf_post_routing
field from xfrm_state_afinfo.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [IPSEC]: Add async resume support on input

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 14:28:34 +0800

> [IPSEC]: Add async resume support on input
> 
> This patch adds support for async resumptions on input.  To do so, the
> transform would return -EINPROGRESS and subsequently invoke the function
> xfrm_input_resume to resume processing.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Also applied to net-2.6.25, thanks.

I'll work on integrating Patrick's NF_INET_* patch next.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [IPSEC]: Remove nhoff from xfrm_input

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 14:27:58 +0800

> [IPSEC]: Remove nhoff from xfrm_input
> 
> The nhoff field isn't actually necessary in xfrm_input.  For tunnel mode
> transforms we now throw away the output IP header so it makes no sense to
> fill in the nexthdr field.  For transport mode we can now let the function
> transport_finish do the setting and it knows where the nexthdr field is.
> 
> The only other thing that needs the nexthdr field to be set is the header
> extraction code.  However, we can simply move the protocol extraction out
> of the generic header extraction.
> 
> We want to minimise the amount of info we have to carry around between
> transforms as this simplifies the resumption process for async crypto.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [IPSEC]: Remove nhoff from xfrm_input

2007-11-13 Thread Herbert Xu
On Tue, Nov 13, 2007 at 09:46:09PM -0800, David Miller wrote:
>
> Applied to net-2.6.25

Thanks for merging Dave!

It looks like I forgot to post the last 2 patches of the series
that complete the async support for the input path.  So here
they are.

[IPSEC]: Remove nhoff from xfrm_input

The nhoff field isn't actually necessary in xfrm_input.  For tunnel mode
transforms we now throw away the output IP header so it makes no sense to
fill in the nexthdr field.  For transport mode we can now let the function
transport_finish do the setting and it knows where the nexthdr field is.

The only other thing that needs the nexthdr field to be set is the header
extraction code.  However, we can simply move the protocol extraction out
of the generic header extraction.

We want to minimise the amount of info we have to carry around between
transforms as this simplifies the resumption process for async crypto.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
---

 include/net/xfrm.h  |1 -
 net/ipv4/xfrm4_input.c  |   11 +++
 net/ipv4/xfrm4_output.c |2 ++
 net/ipv4/xfrm4_state.c  |1 -
 net/ipv6/xfrm6_input.c  |4 +++-
 net/ipv6/xfrm6_output.c |3 ++-
 net/ipv6/xfrm6_state.c  |2 --
 net/xfrm/xfrm_input.c   |5 ++---
 8 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 5c457b0..8023795 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -534,7 +534,6 @@ struct xfrm_spi_skb_cb {
struct inet6_skb_parm h6;
} header;
 
-   unsigned int nhoff;
unsigned int daddroff;
 };
 
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index e374903..b19890f 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -41,7 +41,6 @@ drop:
 int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
int encap_type)
 {
-   XFRM_SPI_SKB_CB(skb)->nhoff = offsetof(struct iphdr, protocol);
XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr);
return xfrm_input(skb, nexthdr, spi, encap_type);
 }
@@ -49,16 +48,20 @@ EXPORT_SYMBOL(xfrm4_rcv_encap);
 
 int xfrm4_transport_finish(struct sk_buff *skb, int async)
 {
+   struct iphdr *iph = ip_hdr(skb);
+
+   iph->protocol = XFRM_MODE_SKB_CB(skb)->protocol;
+
 #ifdef CONFIG_NETFILTER
__skb_push(skb, skb->data - skb_network_header(skb));
-   ip_hdr(skb)->tot_len = htons(skb->len);
-   ip_send_check(ip_hdr(skb));
+   iph->tot_len = htons(skb->len);
+   ip_send_check(iph);
 
NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
xfrm4_rcv_encap_finish);
return 0;
 #else
-   return -ip_hdr(skb)->protocol;
+   return -nexthdr;
 #endif
 }
 
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 2fb4efa..1900200 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -47,6 +47,8 @@ int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff 
*skb)
if (err)
return err;
 
+   XFRM_MODE_SKB_CB(skb)->protocol = ip_hdr(skb)->protocol;
+
return xfrm4_extract_header(skb);
 }
 
diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c
index 3b067e8..d837784 100644
--- a/net/ipv4/xfrm4_state.c
+++ b/net/ipv4/xfrm4_state.c
@@ -56,7 +56,6 @@ int xfrm4_extract_header(struct sk_buff *skb)
XFRM_MODE_SKB_CB(skb)->frag_off = iph->frag_off;
XFRM_MODE_SKB_CB(skb)->tos = iph->tos;
XFRM_MODE_SKB_CB(skb)->ttl = iph->ttl;
-   XFRM_MODE_SKB_CB(skb)->protocol = iph->protocol;
memset(XFRM_MODE_SKB_CB(skb)->flow_lbl, 0,
   sizeof(XFRM_MODE_SKB_CB(skb)->flow_lbl));
 
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 3b9eedf..5c006c8 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -23,7 +23,6 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff 
*skb)
 
 int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
-   XFRM_SPI_SKB_CB(skb)->nhoff = IP6CB(skb)->nhoff;
XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
return xfrm_input(skb, nexthdr, spi, 0);
 }
@@ -31,6 +30,9 @@ EXPORT_SYMBOL(xfrm6_rcv_spi);
 
 int xfrm6_transport_finish(struct sk_buff *skb, int async)
 {
+   skb_network_header(skb)[IP6CB(skb)->nhoff] =
+   XFRM_MODE_SKB_CB(skb)->protocol;
+
 #ifdef CONFIG_NETFILTER
ipv6_hdr(skb)->payload_len = htons(skb->len);
__skb_push(skb, skb->data - skb_network_header(skb));
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index a0a9249..318669a 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -53,7 +53,8 @@ int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff 
*skb)
if (err)
return err;
 
-   IP6CB(skb)->nhoff = offsetof(struct ipv6hdr, nexthdr);
+   XFRM_MODE_SKB_CB(skb)->protocol = ipv6_hdr(skb)->nexthdr;
+
return xfrm6_extract_header(s

[PATCH 2/2] [IPSEC]: Add async resume support on input

2007-11-13 Thread Herbert Xu
[IPSEC]: Add async resume support on input

This patch adds support for async resumptions on input.  To do so, the
transform would return -EINPROGRESS and subsequently invoke the function
xfrm_input_resume to resume processing.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
---

 include/net/xfrm.h |1 +
 net/ipv4/xfrm4_input.c |3 +++
 net/ipv6/xfrm6_input.c |3 +++
 net/xfrm/xfrm_input.c  |   38 +-
 4 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 8023795..944fdad 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1138,6 +1138,7 @@ extern int xfrm_init_state(struct xfrm_state *x);
 extern int xfrm_prepare_input(struct xfrm_state *x, struct sk_buff *skb);
 extern int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi,
  int encap_type);
+extern int xfrm_input_resume(struct sk_buff *skb, int nexthdr);
 extern int xfrm_output_resume(struct sk_buff *skb, int err);
 extern int xfrm_output(struct sk_buff *skb);
 extern int xfrm4_extract_header(struct sk_buff *skb);
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index b19890f..cd25351 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -61,6 +61,9 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
xfrm4_rcv_encap_finish);
return 0;
 #else
+   if (async)
+   return xfrm4_rcv_encap_finish(skb);
+
return -nexthdr;
 #endif
 }
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 5c006c8..e317d08 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -41,6 +41,9 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
ip6_rcv_finish);
return -1;
 #else
+   if (async)
+   return ip6_rcv_finish(skb);
+
return 1;
 #endif
 }
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index cce9d45..96f42c1 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -101,8 +101,17 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
int err;
__be32 seq;
struct xfrm_state *x;
+   xfrm_address_t *daddr;
int decaps = 0;
-   unsigned int daddroff = XFRM_SPI_SKB_CB(skb)->daddroff;
+   int async = 0;
+
+   /* A negative encap_type indicates async resumption. */
+   if (encap_type < 0) {
+   async = 1;
+   x = skb->sp->xvec[skb->sp->len - 1];
+   seq = XFRM_SKB_CB(skb)->seq;
+   goto resume;
+   }
 
/* Allocate new secpath or COW existing one. */
if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) {
@@ -116,6 +125,9 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
skb->sp = sp;
}
 
+   daddr = (xfrm_address_t *)(skb_network_header(skb) +
+  XFRM_SPI_SKB_CB(skb)->daddroff);
+
seq = 0;
if (!spi && (err = xfrm_parse_spi(skb, nexthdr, &spi, &seq)) != 0)
goto drop;
@@ -124,9 +136,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
if (skb->sp->len == XFRM_MAX_DEPTH)
goto drop;
 
-   x = xfrm_state_lookup((xfrm_address_t *)
- (skb_network_header(skb) + daddroff),
- spi, nexthdr, AF_INET);
+   x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET);
if (x == NULL)
goto drop;
 
@@ -147,8 +157,14 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
 
spin_unlock(&x->lock);
 
+   XFRM_SKB_CB(skb)->seq = seq;
+
nexthdr = x->type->input(x, skb);
 
+   if (nexthdr == -EINPROGRESS)
+   return 0;
+
+resume:
spin_lock(&x->lock);
if (nexthdr <= 0) {
if (nexthdr == -EBADMSG)
@@ -177,6 +193,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
break;
}
 
+   /*
+* We need the inner address.  However, we only get here for
+* transport mode so the outer address is identical.
+*/
+   daddr = &x->id.daddr;
+
err = xfrm_parse_spi(skb, nexthdr, &spi, &seq);
if (err < 0)
goto drop;
@@ -190,7 +212,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
netif_rx(skb);
return 0;
} else {
-   return x->inner_mode->afinfo->transport_finish(skb, 0);
+   return x->inner_mode->afinfo->transport_finish(skb, async);
}
 
 drop_unlock:
@@ -201,6 +223,12 @@ drop:
 }
 EXPORT_SYMBOL(xfrm_inpu

Re: 2.6.24-rc2: Network commit causes SLUB performance regression with tbench

2007-11-13 Thread Nick Piggin
On Wednesday 14 November 2007 17:12, David Miller wrote:
> From: Nick Piggin <[EMAIL PROTECTED]>
> Date: Wed, 14 Nov 2007 04:36:24 +1100
>
> > On Wednesday 14 November 2007 12:58, David Miller wrote:
> > > I suspect the issue is about having a huge skb->data linear area for
> > > TCP sends over loopback.  We're likely getting a much smaller
> > > skb->data linear data area after the patch in question, the rest using
> > > the sk_buff scatterlist pages which are a little bit more expensive to
> > > process.
> >
> > It didn't seem to be noticeable at 1 client. Unless scatterlist
> > processing is going to cause cacheline bouncing, I don't see why this
> > hurts more as you add CPUs?
>
> Is your test system using HIGHMEM?
>
> That's one thing the page vector in the sk_buff can do a lot,
> kmaps.

No, it's an x86-64, so no highmem.

What's also interesting is that SLAB apparently doesn't have this
condition. The first thing that sprung to mind is that SLAB caches
order > 0 allocations, while SLUB does not. However if anything,
that should actually favour the SLUB numbers if network is avoiding
order > 0 allocations.

I'm doing some oprofile runs now to see if I can get any more info.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [e1000 VLAN] Disable vlan hw accel when promiscuous mode

2007-11-13 Thread Joonwoo Park
> > I'll work e1000e too :-)
>
> awesome, looking forward to that.
>

BTW, It seems to need Patrick's unicast patch for e1000e first.
I'll looking forward to that too.

Thanks
Joonwoo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PATCH 1/2 [SCHED 2.6.24]: Check subqueue status before calling hard_start_xmit

2007-11-13 Thread David Miller
From: Krishna Kumar <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 11:34:12 +0530

> Hi Peter,
> 
> Peter wrote on 11/13/2007 11:14:50 PM:
> > @@ -134,7 +134,7 @@ static inline int qdisc_restart(struct net_device *dev)
> >  {
> > struct Qdisc *q = dev->qdisc;
> > struct sk_buff *skb;
> > -   int ret;
> > +   int ret = NETDEV_TX_BUSY;
> >  
> > /* Dequeue packet */
> > if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL))
> > @@ -145,7 +145,8 @@ static inline int qdisc_restart(struct net_device *dev)
> > spin_unlock(&dev->queue_lock);
> >  
> > HARD_TX_LOCK(dev, smp_processor_id());
> > -   ret = dev_hard_start_xmit(skb, dev);
> > +   if (!netif_subqueue_stopped(dev, skb))
> > +  ret = dev_hard_start_xmit(skb, dev);
> > HARD_TX_UNLOCK(dev);
> 
> You could optimize this by getting HARD_TX_LOCK after the check. I
> assume that netif_stop_subqueue (from another CPU) would always be
> called by the driver xmit, and that is not possible since we hold
> the __LINK_STATE_QDISC_RUNNING bit. Does that sound correct?

I don't think this is a critical optimization at this time,
but something to certainly do along with the surgery
we'll undoubtedly be doing here in the future :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread David Miller
From: Sam Ravnborg <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 06:56:06 +0100

> > 
> > > If so, MANITAINERS claims that it is subscribers-only.  That would cause
> > > some bug reporters to give up and go away.
> > 
> > Find some other mailing list; I'm not hosting *nor* am I willing to run a
> > non-subscribers only mailing list.  Period.  Not negotiable, so don't even
> > try to change my mind.
> 
> The postmasters at vger is pretty good at running mailing lists.
> For linux-kbuild my effort so far has been to request it.
> Thats not a big deal.
> 
> So if they accept it you could have [EMAIL PROTECTED] for zero
> overhead for you.

I already did, get a little deeper in your mailbox before
replying :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc2: Network commit causes SLUB performance regression with tbench

2007-11-13 Thread David Miller
From: Nick Piggin <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 04:36:24 +1100

> On Wednesday 14 November 2007 12:58, David Miller wrote:
> > I suspect the issue is about having a huge skb->data linear area for
> > TCP sends over loopback.  We're likely getting a much smaller
> > skb->data linear data area after the patch in question, the rest using
> > the sk_buff scatterlist pages which are a little bit more expensive to
> > process.
> 
> It didn't seem to be noticeable at 1 client. Unless scatterlist
> processing is going to cause cacheline bouncing, I don't see why this
> hurts more as you add CPUs?

Is your test system using HIGHMEM?

That's one thing the page vector in the sk_buff can do a lot,
kmaps.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPV4: add raw drops counter

2007-11-13 Thread Eric Dumazet

Wang Chen a écrit :

Eric Dumazet said the following on 2007-11-13 19:11:

Wang Chen a écrit :

Add raw drops counter for IPv4 in /proc/net/raw .

+atomic_tsk_drops;
  

This doesnt need an atomic_t , just an 'unsigned int' is OK, since
sock_queue_rcv_skb() is called on a locked socket.



Yes, sock_queue_rcv_skb() is called on a locked socket. But sk_drops
will not only used with sock_queue_rcv_skb(), but also with 
xfrm4_policy_check(), skb_checksum_complete(), skb_kill_datagram(),etc.

So, atomic_t ensure sk_drops will be atomic increment.


Also, I suggest doing the sk_drops increment in sock_queue_rcv_skb() so
that it can be used for other sockets as well ?



As I described before, sk_drops will be used on different conditions,
on which the raw drop happens.
So doing sk_drops increment in upper caller is better than in 
sock_queue_rcv_skb().


Thank you for your suggestion, I will make a new patch to add sk_drops
increment in other places.



Thank you for clarifications, I was not aware upcoming patches were planed.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PATCH 1/2 [SCHED 2.6.24]: Check subqueue status before calling hard_start_xmit

2007-11-13 Thread Krishna Kumar
Hi Peter,

Peter wrote on 11/13/2007 11:14:50 PM:
> @@ -134,7 +134,7 @@ static inline int qdisc_restart(struct net_device *dev)
>  {
> struct Qdisc *q = dev->qdisc;
> struct sk_buff *skb;
> -   int ret;
> +   int ret = NETDEV_TX_BUSY;
>  
> /* Dequeue packet */
> if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL))
> @@ -145,7 +145,8 @@ static inline int qdisc_restart(struct net_device *dev)
> spin_unlock(&dev->queue_lock);
>  
> HARD_TX_LOCK(dev, smp_processor_id());
> -   ret = dev_hard_start_xmit(skb, dev);
> +   if (!netif_subqueue_stopped(dev, skb))
> +  ret = dev_hard_start_xmit(skb, dev);
> HARD_TX_UNLOCK(dev);

You could optimize this by getting HARD_TX_LOCK after the check. I
assume that netif_stop_subqueue (from another CPU) would always be
called by the driver xmit, and that is not possible since we hold
the __LINK_STATE_QDISC_RUNNING bit. Does that sound correct?

PATCH
--

diff -ruNp 1/net/sched/sch_generic.c 2/net/sched/sch_generic.c
--- 1/net/sched/sch_generic.c   2007-11-14 11:14:10.0 +0530
+++ 2/net/sched/sch_generic.c   2007-11-14 11:18:27.0 +0530
@@ -144,10 +144,11 @@ static inline int qdisc_restart(struct n
/* And release queue */
spin_unlock(&dev->queue_lock);
 
-   HARD_TX_LOCK(dev, smp_processor_id());
if (!netif_subqueue_stopped(dev, skb))
+   HARD_TX_LOCK(dev, smp_processor_id());
ret = dev_hard_start_xmit(skb, dev);
-   HARD_TX_UNLOCK(dev);
+   HARD_TX_UNLOCK(dev);
+   }
 
spin_lock(&dev->queue_lock);
q = dev->qdisc;

Thanks,

- KK
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Sam Ravnborg
On Wed, Nov 14, 2007 at 06:56:06AM +0100, Sam Ravnborg wrote:
> > 
> > > If so, MANITAINERS claims that it is subscribers-only.  That would cause
> > > some bug reporters to give up and go away.
> > 
> > Find some other mailing list; I'm not hosting *nor* am I willing to run a
> > non-subscribers only mailing list.  Period.  Not negotiable, so don't even
> > try to change my mind.
> 
> The postmasters at vger is pretty good at running mailing lists.
> For linux-kbuild my effort so far has been to request it.
> Thats not a big deal.
> 
> So if they accept it you could have [EMAIL PROTECTED] for zero
> overhead for you.

And in a later mail I saw davem already created it.

Sam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Sam Ravnborg
> 
> > If so, MANITAINERS claims that it is subscribers-only.  That would cause
> > some bug reporters to give up and go away.
> 
> Find some other mailing list; I'm not hosting *nor* am I willing to run a
> non-subscribers only mailing list.  Period.  Not negotiable, so don't even
> try to change my mind.

The postmasters at vger is pretty good at running mailing lists.
For linux-kbuild my effort so far has been to request it.
Thats not a big deal.

So if they accept it you could have [EMAIL PROTECTED] for zero
overhead for you.

Sam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] SOCK: add raw6 drops counter

2007-11-13 Thread Eric Dumazet

Wang Chen a écrit :

Add raw drops counter for IPv6 in /proc/net/raw6 .

Signed-off-by: Wang Chen <[EMAIL PROTECTED]>
---
 raw.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff -Nurp linux-2.6.24-rc2.org/net/ipv6/raw.c linux-2.6.24-rc2/net/ipv6/raw.c
--- linux-2.6.24-rc2.org/net/ipv6/raw.c 2007-11-09 16:38:05.0 +0800
+++ linux-2.6.24-rc2/net/ipv6/raw.c 2007-11-14 09:46:54.0 +0800
@@ -354,14 +354,14 @@ static inline int rawv6_rcv_skb(struct s
 {
if ((raw6_sk(sk)->checksum || sk->sk_filter) &&
skb_checksum_complete(skb)) {
-   /* FIXME: increment a raw6 drops counter here */
+   atomic_inc(&sk->sk_drops);   


I am not sure the comment was refering to a per socket counter here.

If the frame checksum is bad, we can not be sure the socket is OK, since the 
garbaged bits could be in the tuple that identify the socket.


Maybe here we want to increment a global raw6 drop counter (well, for the 
given ipv6 instance)



kfree_skb(skb);
return 0;
}
 
 	/* Charge it to the socket. */

if (sock_queue_rcv_skb(sk,skb)<0) {
-   /* FIXME: increment a raw6 drops counter here */
+   atomic_inc(&sk->sk_drops);
kfree_skb(skb);
return 0;
}
@@ -382,6 +382,7 @@ int rawv6_rcv(struct sock *sk, struct sk
struct raw6_sock *rp = raw6_sk(sk);
 
 	if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) {

+   atomic_inc(&sk->sk_drops);
kfree_skb(skb);
return NET_RX_DROP;
}
@@ -405,7 +406,7 @@ int rawv6_rcv(struct sock *sk, struct sk
 
 	if (inet->hdrincl) {

if (skb_checksum_complete(skb)) {
-   /* FIXME: increment a raw6 drops counter here */
+   atomic_inc(&sk->sk_drops);


Same remark here.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/24] [IPSEC]: Move x->outer_mode->output out of locked section

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 19:51:19 +0800

> On Tue, Nov 13, 2007 at 03:33:48AM -0800, David Miller wrote:
> >
> > Make 'lastused' an 'unsigned long' (that's all that get_seconds()
> > gives to us anyways), fix up the nla_total_size(x->lastused) thing in
> > net/xfrm/xfrm_user.c, and then you can remove this lock acquisition
> > completely because the store into x->lastused will now be atomic and
> > therefore locks aren't protecting anything.
> 
> Brilliant, make that patch 25/25 :)
> 
> [IPSEC]: Make x->lastused an unsigned long
> 
> Currently x->lastused is u64 which means that it cannot be read/written
> atomically on all architectures.  David Miller observed that the value
> stored in it is only an unsigned long which is always atomic.
> 
> So based on his suggestion this patch changes the internal representation
> from u64 to unsigned long while the user-interface still refers to it as
> u64.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25 :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/24] [IPSEC]: Move state lock into x->type->input

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:41 +0800

> [IPSEC]: Move state lock into x->type->input
> 
> This patch releases the lock on the state before calling x->type->input.
> It also adds the lock to the spots where they're currently needed.
> 
> Most of those places (all except mip6) are expected to disappear with
> async crypto.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/24] [IPSEC]: Move integrity stat collection into xfrm_input

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:40 +0800

> [IPSEC]: Move integrity stat collection into xfrm_input
> 
> Similar to the moving out of the replay processing on the output,
> this patch moves the integrity stat collectin from x->type->input
> into xfrm_input.
> 
> This would eventually allow transforms such as AH/ESP to be lockless.
> 
> The error value EBADMSG (currently unused in the crypto layer) is used
> to indicate a failed integrity check.  In future this error can be
> directly returned by the crypto layer once we switch to aead algorithms.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] [IPSEC]: Store xfrm states in security path directly

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:39 +0800

> [IPSEC]: Store xfrm states in security path directly
> 
> As it is xfrm_input first collects a list of xfrm states on the stack
> before storing them in the packet's security path just before it returns.
> For async crypto, this construction presents an obstacle since we may
> need to leave the loop after each transform.
> 
> In fact, it's much easier to just skip the stack completely and always
> store to the security path.  This is proven by the fact that this patch
> actually shrinks the code.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/24] [IPSEC]: Merge most of the output path

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:36 +0800

> [IPSEC]: Merge most of the output path
> 
> As part of the work on asynchrnous cryptographic operations, we need to
> be able to resume from the spot where they occur.  As such, it helps if
> we isolate them to one spot.
> 
> This patch moves most of the remaining family-specific processing into
> the common output code.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/24] [IPSEC]: Add async resume support on output

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:37 +0800

> [IPSEC]: Add async resume support on output
> 
> This patch adds support for async resumptions on output.  To do so, the
> transform would return -EINPROGRESS and subsequently invoke the function
> xfrm_output_resume to resume processing.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

This stuff is so nice now :-)

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/24] [IPSEC]: Merge most of the input path

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:38 +0800

> [IPSEC]: Merge most of the input path
> 
> As part of the work on asynchrnous cryptographic operations, we need to
> be able to resume from the spot where they occur.  As such, it helps if
> we isolate them to one spot.
> 
> This patch moves most of the remaining family-specific processing into
> the common input code.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/24] [IPV4]: Add ip_local_out

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:33 +0800

> [IPV4]: Add ip_local_out
> 
> Most callers of the LOCAL_OUT chain will set the IP packet length
> and header checksum before doing so.  They also share the same output
> function dst_output.
> 
> This patch creates a new function called ip_local_out which does all
> of that and converts the appropriate users over to it.
> 
> Apart from removing duplicate code, it will also help in merging the
> IPsec output path once the same thing is done for IPv6.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/24] [IPV6]: Add ip6_local_out

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:34 +0800

> [IPV6]: Add ip6_local_out
> 
> Most callers of the LOCAL_OUT chain will set the IP packet length
> before doing so.  They also share the same output function dst_output.
> 
> This patch creates a new function called ip6_local_out which does all
> of that and converts the appropriate users over to it.
> 
> Apart from removing duplicate code, it will also help in merging the
> IPsec output path.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/24] [IPSEC]: Separate inner/outer mode processing on input

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:32 +0800

> [IPSEC]: Separate inner/outer mode processing on input
> 
> With inter-family transforms the inner mode differs from the outer mode.
> Attempting to handle both sides from the same function means that it
> needs to handle both IPv4 and IPv6 which creates duplication and confusion.
> 
> This patch separates the two parts on the input path so that each function
> deals with one family only.
> 
> In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
> moves the pertinent fields from the IPv4/IPv6 IP headers into a neutral
> format stored in skb->cb.  This is then used by the inner mode input
> functions to modify the inner IP header.  In this way the input function
> no longer has to know about the outer address family.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/24] [INET]: Give outer DSCP directly to ip*_copy_dscp

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:30 +0800

> [INET]: Give outer DSCP directly to ip*_copy_dscp
> 
> This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
> that they directly take the outer DSCP rather than the outer IP header.
> This will help us to unify the code for inter-family tunnels.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/24] [IPSEC]: Separate inner/outer mode processing on output

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:31 +0800

> [IPSEC]: Separate inner/outer mode processing on output
> 
> With inter-family transforms the inner mode differs from the outer mode.
> Attempting to handle both sides from the same function means that it
> needs to handle both IPv4 and IPv6 which creates duplication and confusion.
> 
> This patch separates the two parts on the output path so that each function
> deals with one family only.
> 
> In particular, the functions xfrm4_extract_output/xfrm6_extract_output
> moves the pertinent fields from the IPv4/IPv6 IP headers into a neutral
> format stored in skb->cb.  This is then used by the outer mode output
> functions to write the outer IP header.  In this way the output function
> no longer has to know about the inner address family.
> 
> Since the extract functions are only called by tunnel modes (the only
> modes that can support inter-family transforms), I've also moved the
> xfrm*_tunnel_check_size calls into them.  This allows the correct ICMP
> message to be sent as opposed to now where you might call icmp_send with
> an IPv6 packet and vice versa.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/24] [IPSEC]: Move x->outer_mode->output out of locked section

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:29 +0800

> [IPSEC]: Move x->outer_mode->output out of locked section
> 
> RO mode is the only one that requires a locked output function.  So it's
> easier to move the lock into that function rather than requiring everyone
> else to run under the lock.
> 
> In particular, this allows us to move the size check into the output
> function without causing a potential dead-lock should the ICMP error
> somehow hit the same SA on transmission.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/24] [IPSEC]: Forbid BEET + ipcomp for now

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:28 +0800

> [IPSEC]: Forbid BEET + ipcomp for now
> 
> While BEET can theoretically work with IPComp the current code can't do that
> because it tries to construct a BEET mode tunnel type which doesn't (and
> cannot) exist.  In fact as it is it won't even attach a tunnel object at
> all for BEET which is bogus.
> 
> To support this fully we'd also need to change the policy checks on input
> to recognise a plain tunnel as a legal variant of an optional BEET transform.
> 
> This patch simply fails such constructions for now.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/24] [IPSEC]: Move flow construction into xfrm_dst_lookup

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:26 +0800

> [IPSEC]: Move flow construction into xfrm_dst_lookup
> 
> This patch moves the flow construction from the callers of xfrm_dst_lookup
> into that function.  It also changes xfrm_dst_lookup so that it takes an
> xfrm state as its argument instead of explicit addresses.
> 
> This removes any address-specific logic from the callers of xfrm_dst_lookup
> which is needed to correctly support inter-family transforms.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/24] [IPSEC]: Merge common code into xfrm_bundle_create

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:27 +0800

> [IPSEC]: Merge common code into xfrm_bundle_create
> 
> Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are common.
> This patch extracts that logic and puts it into xfrm_bundle_create.  The
> rest of it are then accessed through afinfo.
> 
> As a result this fixes the problem with inter-family transforms where we
> treat every xfrm dst in the bundle as if it belongs to the top family.
> 
> This patch also fixes a long-standing error-path bug where we may free the
> xfrm states twice.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc2: Network commit causes SLUB performance regression with tbench

2007-11-13 Thread Nick Piggin
On Wednesday 14 November 2007 12:58, David Miller wrote:
> From: Nick Piggin <[EMAIL PROTECTED]>
> Date: Tue, 13 Nov 2007 22:41:58 +1100
>
> > On Tuesday 13 November 2007 06:44, Christoph Lameter wrote:
> > > On Sat, 10 Nov 2007, Nick Piggin wrote:
> > > > BTW. your size-2048 kmalloc cache is order-1 in the default setup,
> > > > wheras kmalloc(1024) or kmalloc(4096) will be order-0 allocations.
> > > > And SLAB also uses order-0 for size-2048. It would be nice if SLUB
> > > > did the same...
> > >
> > > You can try to see the effect that order 0 would have by booting with
> > >
> > > slub_max_order=0
> >
> > Yeah, that didn't help much, but in general I think it would give
> > more consistent and reliable behaviour from slub.
>
> Just a note that I'm not ignoring this issue, I just don't have time
> to get to it yet.

No problem. I would like to have helped more, but it's slow going given
my lack of network stack knowledge. If I get any more interesting data,
I'll send it.


> I suspect the issue is about having a huge skb->data linear area for
> TCP sends over loopback.  We're likely getting a much smaller
> skb->data linear data area after the patch in question, the rest using
> the sk_buff scatterlist pages which are a little bit more expensive to
> process.

It didn't seem to be noticeable at 1 client. Unless scatterlist
processing is going to cause cacheline bouncing, I don't see why this
hurts more as you add CPUs?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/24] [IPSEC]: Replace x->type->{local,remote}_addr with flags

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:25 +0800

> [IPSEC]: Replace x->type->{local,remote}_addr with flags
> 
> The functions local_addr and remote_addr are more than what they're needed
> for.  The same thing can be done easily with flags on the type object.
> This patch does that and simplifies the wrapper functions in xfrm6_policy
> accordingly.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/24] [IPSEC]: Make sure idev is consistent with dev in xfrm_dst

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:24 +0800

> [IPSEC]: Make sure idev is consistent with dev in xfrm_dst
> 
> Previously we took the device from the bottom route and idev from the top
> route.  This is bad because idev may well point to a different device.
> This patch changes it so that we get the idev from the device directly.
> 
> It also makes it an error if either dev or idev is NULL.  This is consistent
> with the rest of the routing code which also treats these cases as errors.
> 
> I've removed the err initialisation in xfrm6_policy.c because it achieves
> no purpose and hid a bug when an initial version of this patch neglected
> to set err to -ENODEV (fortunately the IPv4 version warned about it).
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/24] [IPSEC]: Set dst->input to dst_discard

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:23 +0800

> [IPSEC]: Set dst->input to dst_discard
> 
> The input function should never be invoked on IPsec dst objects.  This is
> because we don't apply IPsec on input until after we've made the routing
> decision.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/24] [NET]: Remove unnecessary inclusion of dst.h

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:21 +0800

> [NET]: Remove unnecessary inclusion of dst.h
> 
> The file net/netevent.h only refers to struct dst_entry * so it doesn't
> need to include dst.h.  I've replaced it with a forward declaration.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] [IPSEC]: Only set neighbour on top xfrm dst

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:22 +0800

> [IPSEC]: Only set neighbour on top xfrm dst
> 
> The neighbour field is only used by dst_confirm which only ever happens on
> the top-most xfrm dst.  So it's a waste to duplicate for every other xfrm
> dst.  This patch moves its setting out of the loop so that only the top one
> gets set.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/24] [NET]: Eliminate duplicate copies of dst_discard

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:20 +0800

> [NET]: Eliminate duplicate copies of dst_discard
> 
> We have a number of copies of dst_discard scattered around the place which
> all do the same thing, namely free a packet on the input or output paths.
> 
> This patch deletes all of them except dst_discard and points all the users
> to it.
> 
> The only non-trivial bit is decnet where it returns an error.  However,
> conceptually this is identical to the blackhole functions used in IPv4
> and IPv6 which do not return errors.  So they should either all return
> errors or all return zero.  For now I've stuck with the majority and
> picked zero as the return value.
> 
> It doesn't really matter in practice since few if any driver would react
> differently depending on a zero return value or NET_RX_DROP.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/24] [IPV6]: Move nfheader_len into rt6_info

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:19 +0800

> [IPV6]: Move nfheader_len into rt6_info
> 
> The dst member nfheader_len is only used by IPv6.  It's also currently
> creating a rather ugly alignment hole in struct dst.  Therefore this patch
> moves it from there into struct rt6_info.
> 
> It also reorders the fields in rt6_info to minimize holes.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/24] [IPSEC]: Use dst->header_len when resizing on output

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:18 +0800

> [IPSEC]: Use dst->header_len when resizing on output
> 
> Currently we use x->props.header_len when resizing on output.  However,
> if we're resizing at all we might as well go the whole hog and do it
> for the whole dst.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] [IPV6]: Only set nfheader_len for top xfrm dst

2007-11-13 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 07 Nov 2007 22:08:16 +0800

> [IPV6]: Only set nfheader_len for top xfrm dst
> 
> We only need to set nfheader_len in the top xfrm dst.  This is because
> we only ever read the nfheader_len from the top xfrm dst.
> 
> It is also easier to count nfheader_len as part of header_len which
> then lets us remove the ugly wrapper functions for incrementing and
> decrementing header lengths in xfrm6_policy.c.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied to net-2.6.25
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9375] New: divide error: 0000 [#1] with VIA Velocity when unplugged

2007-11-13 Thread Andrew Morton

(plesae respond via emailed reply-to-all)

On Tue, 13 Nov 2007 20:48:44 -0800 (PST) [EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9375
> 
>Summary: divide error:  [#1] with VIA Velocity when unplugged
>Product: Other
>Version: 2.5
>  KernelVersion: 2.6.22.12 (openSUSE 10.3)
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: Other
> AssignedTo: [EMAIL PROTECTED]
> ReportedBy: [EMAIL PROTECTED]
> 
> 
> Most recent kernel where this bug did not occur: I've never seen it before.
> Distribution: openSUSE 10.3
> Hardware Environment: AMD Athlon 2200+
> Software Environment: openSUSE 10.3
> Problem Description: I unplugged my VIA Velocity and plugged it back into a
> switch and shortly thereafter it gacked.
> 
> I ran 'ip -s -s link show dev eth1' which sigsegd (this machine has been rock
> solid for YEARS so I know it's not the hardware) and the machine locked up. 
> The
> blinkenlights on the switch continued to blink but according to tcpdump no
> traffic was flowing from this machine. A reboot later confirmed a problem and 
> I
> found this in the /var/log/messages file:
> 
> 
> Nov 13 22:26:54 frank kernel: divide error:  [#1]
> Nov 13 22:26:54 frank kernel: SMP
> Nov 13 22:26:54 frank kernel: last sysfs file: /block/drbd0/range
> Nov 13 22:26:54 frank kernel: Modules linked in: drbd xt_tcpudp xt_pkttype
> ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device nfs lockd
> nfs_acl sunrpc af_packet ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat
> iptable_filter nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6_tables
> x_tables tcp_bic apparmor dm_crypt loop dm_mirror dm_log dm_mod snd_intel8x0
> snd_ac97_codec ac97_bus snd_pcm snd_timer snd i2c_sis96x soundcore parport_pc
> button sr_mod via_velocity sis_agp rtc_cmos shpchp i2c_sis630 cdrom i2c_co
> re parport agpgart snd_page_alloc rtc_core rtc_lib pci_hotplug crc_ccitt sg
> usbhid hid ff_memless ehci_hcd sd_mod ohci_hcd usbcore piix sis5513 ide_core
> edd ext3 mbcache jbd fan pata_sis libata scsi_mod thermal processor
> Nov 13 22:26:54 frank kernel: CPU:0
> Nov 13 22:26:54 frank kernel: EIP:0060:[]Tainted: G  N
> VLI
> Nov 13 22:26:54 frank kernel: EFLAGS: 00200287   (2.6.22.12-0.1-default #1)
> Nov 13 22:26:54 frank kernel: EIP is at sys_socketcall+0x21/0x261
> Nov 13 22:26:54 frank kernel: eax: 0001   ebx: 000c   ecx: 0001  
> edx: ffea
> Nov 13 22:26:54 frank kernel: esi: bfdaa27c   edi:    ebp: ef1b8000  
> esp: ef1b9f78
> Nov 13 22:26:54 frank kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss:
> 0068
> Nov 13 22:26:54 frank kernel: Process ip (pid: 4350, ti=ef1b8000 task=f3d4eab0
> task.ti=ef1b8000)
> Nov 13 22:26:54 frank kernel: Stack: 4003  f3d4ebd8 08073160
> bfdaa140 ef1b9fb8  c0107e49
> Nov 13 22:26:54 frank kernel:bfdaa140 08073160 bfdaa27c 0001
> 0001 bfdaa27c 08073184 c0104ea2
> Nov 13 22:26:54 frank kernel:0001 bfdaa140 08073160 bfdaa27c
> 08073184 bfdaa178 ffda 007b
> Nov 13 22:26:54 frank kernel: Call Trace:
> Nov 13 22:26:54 frank kernel:  [] do_syscall_trace+0x12c/0x173
> Nov 13 22:26:54 frank kernel:  [] syscall_call+0x7/0xb
> Nov 13 22:26:54 frank kernel:  ===
> Nov 13 22:26:54 frank kernel: Code: c4 94 00 00 00 5b 5e 5f 5d c3 57 ba ea ff
> ff ff 56 53 83 ec 30 8b 44 24 40 8d 78 ff 83 ff 10 0f 87
>  3d 02 00 00 8a 98 68 32 2e c0 <8d> 74 24 18 8b 54 24 44 89 f0 0f b6 cb e8 2a
> 9b f6 ff ba f2 ff
> Nov 13 22:26:54 frank kernel: EIP: [] sys_socketcall+0x21/0x261
> SS:ESP 0068:ef1b9f78
> 
> 

hm, I see no divide instruction near the start of 2.6.22's
sys_socketcall().  I'm wondering if some patch which opensuse has added is
causing this.  If you have the source handy can you show us what it looks like?
That's net/socket.c, the 50-odd lines after

asmlinkage long sys_socketcall(int call, unsigned long __user *args)


What caused the kernel taint, btw?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bonding: Fix resource use after free

2007-11-13 Thread David Miller
From: Jay Vosburgh <[EMAIL PROTECTED]>
Date: Mon, 12 Nov 2007 18:15:36 -0800

>   Fix bond_destroy and bond_free_all to not reference the
> struct net_device after calling unregister_netdevice.
> 
>   Bug and offending change reported by Moni Shoua <[EMAIL PROTECTED]>
> 
> Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>

Applied to net-2.6, thanks Jay.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix warning for token-ring from sysctl checker

2007-11-13 Thread David Miller
From: Olof Johansson <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 01:23:13 -0600

> As seen when booting ppc64_defconfig:
> 
> sysctl table check failed: /net/token-ring .3.14 procname does not match 
> binary path procname
> 
> 
> Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Patch applied, thanks Olof.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][NET] Convert init_timer into setup_timer

2007-11-13 Thread David Miller
From: "Arnaldo Carvalho de Melo" <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 11:43:40 -0200

> Em Tue, Nov 13, 2007 at 05:34:21AM -0800, David Miller escreveu:
> > From: Pavel Emelyanov <[EMAIL PROTECTED]>
> > Date: Tue, 13 Nov 2007 16:10:03 +0300
> > 
> > > Many-many code in the kernel initialized the timer->function 
> > > and  timer->data together with calling init_timer(timer). There 
> > > is already a helper for this. Use it for networking code.
> > > 
> > > The patch is HUGE, but makes the code 130 lines shorter 
> > > (98 insertions(+), 228 deletions(-)).
> > > 
> > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>
> > 
> > I have no objection to this patch, but it is 2.6.25
> > material for sure.
> 
> Agreed, Pavel, if you want please stick:
> 
> Acked-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

No need, I've applied it to net-2.6.25 :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] random : secure_tcp_sequence_number should not assume CONFIG_KTIME_SCALAR

2007-11-13 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 14:41:19 +0100

> I discovered one other incorrect use of .tv64 (coming from me, I must 
> confess)
> 
> I guess this patch is  needed for 2.6.24 and stable (2.6.22 & 2.6.23)
 ...
> [NET] random : secure_tcp_sequence_number should not assume 
> CONFIG_KTIME_SCALAR

Applied thanks Eric, I'll queue it up for -stable too.

Perhaps ktime_t->tv64 should be renamed to "->__tv64" or
similar to prevent future mistakes like this.  Only
the ktime implementation should be touching that thing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [e1000 VLAN] Disable vlan hw accel when promiscuous mode

2007-11-13 Thread Kok, Auke
Joonwoo Park wrote:
> 2007/11/14, Kok, Auke <[EMAIL PROTECTED]>:
>> Patrick McHardy wrote:
>>> Kok, Auke wrote:
 Patrick McHardy wrote:

> I already posted a patch for this, not sure what happened to it.
> Auke, any news on merging the secondary unicast address support?
 I dropped the ball on that one. Care to resend it and send me one for
 e1000e as well?
>>> Patch for e1000 attached.
>>>
>>> Does e1000e also work with PCI cards if I add the proper IDs?
>>> Otherwise I could only send an untested patch.
>>
>> Johnwoo,
>>
>> your patch unfortunately does not apply after patrick's unicast patch,
>>
>> also, ich8lan support is removed from e1000 in the e1000 version in
>> jgarzik/netdev-2.6 #upstream as planned (moved over to e1000e!).
>>
>> Can you resend your patch so that it applies to jgarzik/netdev-2.6 #upstream 
>> with
>> Patrick's patch applied? That would help a lot. And possibly do the e1000e 
>> patch
>> as well :)
>>
> 
> This is a patch for the jgarzik/netdev-2.6 #upstream with Patrick's one was 
> applied.
> But the ich8lan stuff was not removed at this patch.


no, my TODO list is insane at the moment :)

thanks for the patch. I'll apply and rip the ich8 workarounds out myself later
when appropriate.

> I'll work e1000e too :-)

awesome, looking forward to that.

Auke

> 
> Thanks.
> Joonwoo
> 
> [E1000]: Disable vlan hw accel when promiscuous mode
> 
> Even though netdevice is in the promiscuous mode, we should receive all of 
> ingress packets.
> This disable the vlan filtering feature when a vlan hw accel configured e1000 
> device goes into promiscuous mode.
> This make packets visible to sniffers though it's not vlan id of itself.
> 
> Signed-off-by: Joonwoo Park <[EMAIL PROTECTED]>
> 
> ---
>  drivers/net/e1000/e1000_main.c |   23 ++-
>  1 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 5fd5f51..edf2ced 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -2425,7 +2425,7 @@ e1000_set_rx_mode(struct net_device *netdev)
>   struct e1000_hw *hw = &adapter->hw;
>   struct dev_addr_list *uc_ptr;
>   struct dev_addr_list *mc_ptr;
> - uint32_t rctl;
> + uint32_t rctl, ctrl;
>   uint32_t hash_value;
>   int i, rar_entries = E1000_RAR_ENTRIES;
>   int mta_reg_count = (hw->mac_type == e1000_ich8lan) ?
> @@ -2442,13 +2442,23 @@ e1000_set_rx_mode(struct net_device *netdev)
>   /* Check for Promiscuous and All Multicast modes */
>  
>   rctl = E1000_READ_REG(hw, RCTL);
> + ctrl = E1000_READ_REG(hw, CTRL);
>  
>   if (netdev->flags & IFF_PROMISC) {
>   rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
> - } else if (netdev->flags & IFF_ALLMULTI) {
> - rctl |= E1000_RCTL_MPE;
> + if (adapter->hw.mac_type != e1000_ich8lan) {
> + if (ctrl & E1000_CTRL_VME)
> + rctl &= ~E1000_RCTL_VFE;
> + }
>   } else {
> - rctl &= ~E1000_RCTL_MPE;
> + if (adapter->hw.mac_type != e1000_ich8lan) {
> + if (ctrl & E1000_CTRL_VME)
> + rctl |= E1000_RCTL_VFE;
> + } else if (netdev->flags & IFF_ALLMULTI) {
> + rctl |= E1000_RCTL_MPE;
> + } else {
> + rctl &= ~E1000_RCTL_MPE;
> + }
>   }
>  
>   uc_ptr = NULL;
> @@ -4967,7 +4977,10 @@ e1000_vlan_rx_register(struct net_device *netdev, 
> struct vlan_group *grp)
>   if (adapter->hw.mac_type != e1000_ich8lan) {
>   /* enable VLAN receive filtering */
>   rctl = E1000_READ_REG(&adapter->hw, RCTL);
> - rctl |= E1000_RCTL_VFE;
> + if (netdev->flags & IFF_PROMISC)
> + rctl &= ~E1000_RCTL_VFE;
> + else
> + rctl |= E1000_RCTL_VFE;
>   rctl &= ~E1000_RCTL_CFIEN;
>   E1000_WRITE_REG(&adapter->hw, RCTL, rctl);
>   e1000_update_mng_vlan(adapter);
> ---
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]iwlwifi not correctly dealing with hotunplug

2007-11-13 Thread David Miller
From: Zhu Yi <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 09:30:54 +0800

> 
> On Tue, 2007-11-13 at 19:26 +0100, Oliver Neukum wrote:
> > It makes no sense to enable interrupts if a device has been unplugged.
> > In addition if in doubt IRQ_HANDLED should be returned.
> > 
> > Signed-off-by: Oliver Neukum <[EMAIL PROTECTED]>
> 
> ACK.

I've applied Olver's patch, thanks.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/05] ipv6: RFC4214 Support (3)

2007-11-13 Thread Stephen Hemminger

David Miller wrote:

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 12:53:12 -0800

  

On Fri, 09 Nov 2007 16:35:59 -0800
osprey67 <[EMAIL PROTECTED]> wrote:



From: Fred L. Templin <[EMAIL PROTECTED]>

This message attaches the combined diffs from
messages 01/05 through 04/05. This file should be
suitable for use with the patch utility.

Signed-off-by: Fred L. Templin <[EMAIL PROTECTED]>

  

Isn't increasing the size of struct ip_tunnel_parm
going to cause kernel ABI changes?



Yeah it is, unfortunately.
  
So we can't take it.  It might be possible to extend the structure if 
you put the

new parameters at the end and handled the compatibility cases correctly.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc: Fix fs_enet module build

2007-11-13 Thread David Miller
From: Jochen Friedrich <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 19:32:08 +0100

> If fs_enet is build as module, on PPC_CPM_NEW_BINDING platforms
> mii-fec/mii-bitbang should be build as module, as well. On other
> platforms, mii-fec/mii-bitbang must be included into the main module.
> Otherwise some symbols remain undefined. Additionally, fs_enet uses
> libphy, so add a select PHYLIB.
> 
>   Building modules, stage 2.
>   MODPOST 5 modules
> ERROR: "fs_scc_ops" [drivers/net/fs_enet/fs_enet.ko] undefined!
> make[1]: *** [__modpost] Error 1
> make: *** [modules] Error 2
> 
> Signed-off-by: Jochen Friedrich <[EMAIL PROTECTED]>

This is truly ugly and creates an unnecessarily hard to
maintain and complex driver.

Please find a way to fix this for real, so that the
PPC_CPM_NEW_BINDING ifdef is not necessary at all and
things get built modular or not naturally as we handle
things for other cases like this.

THanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/05] ipv6: RFC4214 Support (3)

2007-11-13 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 12:53:12 -0800

> On Fri, 09 Nov 2007 16:35:59 -0800
> osprey67 <[EMAIL PROTECTED]> wrote:
> 
> > From: Fred L. Templin <[EMAIL PROTECTED]>
> > 
> > This message attaches the combined diffs from
> > messages 01/05 through 04/05. This file should be
> > suitable for use with the patch utility.
> > 
> > Signed-off-by: Fred L. Templin <[EMAIL PROTECTED]>
> > 
> 
> Isn't increasing the size of struct ip_tunnel_parm
> going to cause kernel ABI changes?

Yeah it is, unfortunately.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()

2007-11-13 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 23:35:39 +0200 (EET)

> [PATCH] [TCP] FRTO: Plug potential LOST-bit leak
> 
> It might be possible that, in some extreme scenario that
> I just cannot now construct in my mind, end_seq <=
> frto_highmark check does not match causing the lost_out
> and LOST bits become out-of-sync due to clearing and
> recounting in the loop.
> 
> This may fix LOST-bit leak reported by Chazarain Guillaume
> <[EMAIL PROTECTED]>.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

This patch looks correct to me, so I added it to net-2.6

Chazarain please let us know if it does indeed cure your
problem.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP] FRTO: Limit snd_cwnd if TCP was application limited

2007-11-13 Thread David Miller
From: "Ilpo_Järvinen" <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 00:08:18 +0200 (EET)

> 
> Otherwise TCP might violate packet ordering principles that FRTO
> is based on. If conventional recovery path is chosen, this won't
> be significant at all. In practice, any small enough value will
> be sufficient to provide proper operation for FRTO, yet other
> users of snd_cwnd might benefit from a "close enough" value.
> 
> FRTO's formula is now equal to what tcp_enter_cwr() uses.
> 
> FRTO used to check application limitedness a bit differently but
> I changed that in commit 575ee7140dabe9b9c4f66f4f867039b97e548867
> and as a result checking for application limitedness became
> completely non-existing.
> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>

Applied to net-2.6, thanks Ilpo.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] e1000: fix schedule while atomic when called from mii-tool

2007-11-13 Thread David Miller
From: Auke Kok <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 15:11:38 -0800

> From: Jesse Brandeburg <[EMAIL PROTECTED]>
> 
> mii-tool can cause the driver to call msleep during nway reset,
> bugzilla.kernel.org bug 8430.  Fix by simply calling reinit_locked
> outside of the spinlock, which is safe from ethtool, so it should be
> safe from here.
> 
> Signed-off-by: Jesse Brandeburg <[EMAIL PROTECTED]>
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>

Looks good.

A bug fix, so queued to net-2.6, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] e1000: Secondary unicast address support

2007-11-13 Thread David Miller
From: Auke Kok <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 15:11:33 -0800

> From: Patrick McHardy <[EMAIL PROTECTED]>
> 
> Add support for configuring secondary unicast addresses. Unicast
> addresses take precendece over multicast addresses when filling
> the exact address filters to avoid going to promiscous mode.
> When more unicast addresses are present than filter slots,
> unicast filtering is disabled and all slots can be used for
> multicast addresses.
> 
> Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>

Applied to netdev-2.6, thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] e1000e: convert register test macros to functions

2007-11-13 Thread David Miller
From: Auke Kok <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 15:11:28 -0800

> From: Joe Perches <[EMAIL PROTECTED]>
> 
> Add functions for reg_pattern_test and reg_set_and check
> Changed macros to use these functions
> 
> Compiled x86, untested
> 
> Size decreased ~2K
> 
> old:
> 
> $ size drivers/net/e1000e/ethtool.o
>textdata bss dec hex filename
>   14461   0   0   14461387d drivers/net/e1000e/ethtool.o
> 
> new:
> 
> $ size drivers/net/e1000e/ethtool.o
>textdata bss dec hex filename
>   12498   0   0   1249830d2 drivers/net/e1000e/ethtool.o
> 
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>

Applied to netdev-2.6, but I had to fix up the following
whitespace issues by hand:

Adds trailing whitespace.
diff:28:bool reg_pattern_test_array(struct e1000_adapter *adapter, u64 *data, 
Adds trailing whitespace.
diff:33:static const u32 test[] = 
warning: 2 lines add whitespace errors.

Please correct this before submission in the future.

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] SOCK: add raw6 drops counter

2007-11-13 Thread Wang Chen
David Miller said the following on 2007-11-14 12:31:
> From: Wang Chen <[EMAIL PROTECTED]>
> Date: Wed, 14 Nov 2007 11:15:57 +0800
> 
>> Add raw drops counter for IPv6 in /proc/net/raw6 .
>>
>> Signed-off-by: Wang Chen <[EMAIL PROTECTED]>
> 
> Applied to net-2.6.25, but again more whitespace problems:
> 
> Adds trailing whitespace.
> diff:9:   atomic_inc(&sk->sk_drops);  
> 

Sorry for that. I will pay more attention to it.
Thank you.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] e1000: convert regtest macro's to functions

2007-11-13 Thread David Miller
From: Auke Kok <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 15:11:23 -0800

> Minimal macro to function conversion in e1000_ethtool.c
> 
> Adds functions reg_pattern_test and reg_set_and_check
> Changes REG_PATTERN_TEST and REG_SET_AND_CHECK macros
> to call these functions.
> 
> Saves ~2.5KB
> 
> Compiled x86, untested (no hardware)
> 
> old:
> 
> $ size drivers/net/e1000/e1000_ethtool.o
>textdata bss dec hex filename
>   16778   0   0   16778418a drivers/net/e1000/e1000_ethtool.o
> 
> new:
> 
> $ size drivers/net/e1000/e1000_ethtool.o
>textdata bss dec hex filename
>   14128   0   0   141283730 drivers/net/e1000/e1000_ethtool.o
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>

Definitely this looks nicer :-)

Applied to netdev-2.6, thanks!
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] e1000: update netstats traffic counters realtime

2007-11-13 Thread David Miller
From: Auke Kok <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 15:11:17 -0800

> formerly e1000/e1000e only updated traffic counters once every
> 2 seconds with the register values of bytes/packets. With newer
> code however in the interrupt and polling code we can real-time
> fill in these values in the netstats struct for users to see.
> 
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>

Applied to netdev-2.6, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] e1000e: update netstats traffic counters realtime

2007-11-13 Thread David Miller
From: Auke Kok <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 15:11:10 -0800

> formerly e1000/e1000e only updated traffic counters once every
> 2 seconds with the register values of bytes/packets. With newer
> code however in the interrupt and polling code we can real-time
> fill in these values in the netstats struct for users to see.
> 
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>

Applied to netdev-2.6
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] [e1000 VLAN] Disable vlan hw accel when promiscuous mode

2007-11-13 Thread Joonwoo Park
2007/11/14, Kok, Auke <[EMAIL PROTECTED]>:
> Patrick McHardy wrote:
> > Kok, Auke wrote:
> >> Patrick McHardy wrote:
> >>
> >>> I already posted a patch for this, not sure what happened to it.
> >>> Auke, any news on merging the secondary unicast address support?
> >>
> >> I dropped the ball on that one. Care to resend it and send me one for
> >> e1000e as well?
> >
> > Patch for e1000 attached.
> >
> > Does e1000e also work with PCI cards if I add the proper IDs?
> > Otherwise I could only send an untested patch.
> 
> 
> Johnwoo,
> 
> your patch unfortunately does not apply after patrick's unicast patch,
> 
> also, ich8lan support is removed from e1000 in the e1000 version in
> jgarzik/netdev-2.6 #upstream as planned (moved over to e1000e!).
> 
> Can you resend your patch so that it applies to jgarzik/netdev-2.6 #upstream 
> with
> Patrick's patch applied? That would help a lot. And possibly do the e1000e 
> patch
> as well :)
> 

This is a patch for the jgarzik/netdev-2.6 #upstream with Patrick's one was 
applied.
But the ich8lan stuff was not removed at this patch.
I'll work e1000e too :-)

Thanks.
Joonwoo

[E1000]: Disable vlan hw accel when promiscuous mode

Even though netdevice is in the promiscuous mode, we should receive all of 
ingress packets.
This disable the vlan filtering feature when a vlan hw accel configured e1000 
device goes into promiscuous mode.
This make packets visible to sniffers though it's not vlan id of itself.

Signed-off-by: Joonwoo Park <[EMAIL PROTECTED]>

---
 drivers/net/e1000/e1000_main.c |   23 ++-
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 5fd5f51..edf2ced 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2425,7 +2425,7 @@ e1000_set_rx_mode(struct net_device *netdev)
struct e1000_hw *hw = &adapter->hw;
struct dev_addr_list *uc_ptr;
struct dev_addr_list *mc_ptr;
-   uint32_t rctl;
+   uint32_t rctl, ctrl;
uint32_t hash_value;
int i, rar_entries = E1000_RAR_ENTRIES;
int mta_reg_count = (hw->mac_type == e1000_ich8lan) ?
@@ -2442,13 +2442,23 @@ e1000_set_rx_mode(struct net_device *netdev)
/* Check for Promiscuous and All Multicast modes */
 
rctl = E1000_READ_REG(hw, RCTL);
+   ctrl = E1000_READ_REG(hw, CTRL);
 
if (netdev->flags & IFF_PROMISC) {
rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
-   } else if (netdev->flags & IFF_ALLMULTI) {
-   rctl |= E1000_RCTL_MPE;
+   if (adapter->hw.mac_type != e1000_ich8lan) {
+   if (ctrl & E1000_CTRL_VME)
+   rctl &= ~E1000_RCTL_VFE;
+   }
} else {
-   rctl &= ~E1000_RCTL_MPE;
+   if (adapter->hw.mac_type != e1000_ich8lan) {
+   if (ctrl & E1000_CTRL_VME)
+   rctl |= E1000_RCTL_VFE;
+   } else if (netdev->flags & IFF_ALLMULTI) {
+   rctl |= E1000_RCTL_MPE;
+   } else {
+   rctl &= ~E1000_RCTL_MPE;
+   }
}
 
uc_ptr = NULL;
@@ -4967,7 +4977,10 @@ e1000_vlan_rx_register(struct net_device *netdev, struct 
vlan_group *grp)
if (adapter->hw.mac_type != e1000_ich8lan) {
/* enable VLAN receive filtering */
rctl = E1000_READ_REG(&adapter->hw, RCTL);
-   rctl |= E1000_RCTL_VFE;
+   if (netdev->flags & IFF_PROMISC)
+   rctl &= ~E1000_RCTL_VFE;
+   else
+   rctl |= E1000_RCTL_VFE;
rctl &= ~E1000_RCTL_CFIEN;
E1000_WRITE_REG(&adapter->hw, RCTL, rctl);
e1000_update_mng_vlan(adapter);
---

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PATCH 1/2 [SCHED 2.6.24]: Check subqueue status before calling hard_start_xmit

2007-11-13 Thread David Miller
From: PJ Waskiewicz <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 09:44:50 -0800

> The only qdiscs that check subqueue state before dequeue'ing are PRIO
> and RR.  The other qdiscs, including the default pfifo_fast qdisc, will
> allow traffic bound for subqueue 0 through to hard_start_xmit.  The check
> for netif_queue_stopped() is done above in pkt_sched.h, so it is
> unnecessary for qdisc_restart().  However, if the underlying driver is
> multiqueue capable, and only sets queue states on subqueues, this will
> allow packets to enter the driver when it's currently unable to process
> packets, resulting in expensive requeues and driver entries.  This patch
> re-adds the check for the subqueue status before calling hard_start_xmit,
> so we can try and avoid the driver entry when the queues are stopped.
> 
> Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>

Applied, and I'll queue up the other one for -stable, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] UDP memory accounting and limitation (take 7)

2007-11-13 Thread David Miller
From: Hideo AOKI <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 21:39:18 -0500

> I updated UDP memory accounting and limitation patch set.

I'm dropping this for now, please resubmit after the expensive
divide is fixed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] SOCK: add raw6 drops counter

2007-11-13 Thread David Miller
From: Wang Chen <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 11:15:57 +0800

> Add raw drops counter for IPv6 in /proc/net/raw6 .
> 
> Signed-off-by: Wang Chen <[EMAIL PROTECTED]>

Applied to net-2.6.25, but again more whitespace problems:

Adds trailing whitespace.
diff:9: atomic_inc(&sk->sk_drops);  
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] SOCK: add raw drops counter

2007-11-13 Thread David Miller
From: Wang Chen <[EMAIL PROTECTED]>
Date: Wed, 14 Nov 2007 11:12:18 +0800

> Add raw drops counter for IPv4 in /proc/net/raw .
> 
> Signed-off-by: Wang Chen <[EMAIL PROTECTED]>

Applied to net-2.6.25, but your patch a lot of whitespace
problems:

Adds trailing whitespace.
diff:9:  *  @sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, 
Adds trailing whitespace.
diff:19:  * @sk_prot_creator: sk_prot of original sock creator (see 
ipv6_setsockopt, 
Adds trailing whitespace.
diff:23:  * @sk_err_soft: errors that don't cause failure but are the cause 
of a 
Adds trailing whitespace.
diff:56:atomic_inc(&sk->sk_drops);  
warning: 4 lines add whitespace errors.

I fixed it by hand this time, but please fix this yourself before
future submissions.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bonding: Documentation update

2007-11-13 Thread David Miller
From: Andy Gospodarek <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 22:22:56 -0500

> On Tue, Nov 13, 2007 at 06:58:53PM -0800, Jay Vosburgh wrote:
> > 
> > Update the bonding documentation: more discussion on
> > initialization and configuration, changes to discussion of packet
> > reordering in balance-rr, update some out of date information.
> > 
> > Based in part on input from Rick Jones <[EMAIL PROTECTED]> and
> > Andy Gospodarek <[EMAIL PROTECTED]>.
> > 
> > Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>
> > 
> 
> Looks good.  Thanks, Jay!
> 
> Acked-by: Andy Gospodarek <[EMAIL PROTECTED]>

Applied to netdev-2.6, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.24 very strict patch acceptance mode...

2007-11-13 Thread David Miller

Starting now I am going to be very strict about patches
for 2.6.24

This means:

1) No cleanups
2) No symbol report removals
3) No bug fixes with cleanups attached
4) No code refactoring
5) No code consolidation
6) No whitespace removals

And I really mean it.

It has to fix a bug and it has to do so without requiring
cleanups or other unrelated changes.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] accounting unit and variable

2007-11-13 Thread David Miller
From: Hideo AOKI <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 22:27:13 -0500

> Herbert Xu wrote:
> > On Mon, Oct 29, 2007 at 05:23:10PM -0400, Hideo AOKI wrote:
> >>  
> >> +#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
> >> +
> >> +static inline int sk_datagram_pages(int amt)
> >> +{
> >> +  return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
> >> +}
> > 
> > Does this really have to be int? Unsigned would let the compiler
> > optimise this to a simple shift.
> 
> Thank you for the comment.
> 
> This inline function is used to calculate the first argument of atomic_add()
> and atomic_sub(). Since the argument is int, I believe that using int is
> better than using unsigned int.

If you know the values will always be positive, as you will know here,
it is OK to us unsigned int here and avoids the unacceptable expensive
divide instruction.

Please fix this.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread David Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 18:27:00 -0800

> Let me just say - I'm astonished at how little spam gets though the vger
> lists.  Considering how many times those email addresses must have been
> added to spam databases.
> 
> It must be a lot of work, and whoever is doing it does it well.
> 
> I don't even know.  Is it Matti?  You?

Matti gets all the credit for setting up the bayesian et al.
filters we have and training it as needed.

> 

Yes, sourceforge is a complete joke.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] accounting unit and variable

2007-11-13 Thread Herbert Xu
On Tue, Nov 13, 2007 at 10:27:13PM -0500, Hideo AOKI wrote:
> 
> Herbert Xu wrote:
> >> 
> >>+#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
> >>+
> >>+static inline int sk_datagram_pages(int amt)
> >>+{
> >>+   return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
> >>+}
> >
> >Does this really have to be int? Unsigned would let the compiler
> >optimise this to a simple shift.
> 
> Thank you for the comment.
> 
> This inline function is used to calculate the first argument of atomic_add()
> and atomic_sub(). Since the argument is int, I believe that using int is
> better than using unsigned int.

That doesn't really answer my question.  Is this quantity ever
negative? If not you should make it unsigned for the reason I
gave above.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bonding: Documentation update

2007-11-13 Thread Andy Gospodarek
On Tue, Nov 13, 2007 at 06:58:53PM -0800, Jay Vosburgh wrote:
> 
>   Update the bonding documentation: more discussion on
> initialization and configuration, changes to discussion of packet
> reordering in balance-rr, update some out of date information.
> 
>   Based in part on input from Rick Jones <[EMAIL PROTECTED]> and
> Andy Gospodarek <[EMAIL PROTECTED]>.
> 
> Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>
> 

Looks good.  Thanks, Jay!

Acked-by: Andy Gospodarek <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] accounting unit and variable

2007-11-13 Thread Hideo AOKI

Hello,

Herbert Xu wrote:

On Mon, Oct 29, 2007 at 05:23:10PM -0400, Hideo AOKI wrote:
 
+#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)

+
+static inline int sk_datagram_pages(int amt)
+{
+   return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
+}


Does this really have to be int? Unsigned would let the compiler
optimise this to a simple shift.


Thank you for the comment.

This inline function is used to calculate the first argument of atomic_add()
and atomic_sub(). Since the argument is int, I believe that using int is
better than using unsigned int.

Best regards,
Hideo

--
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] fix send buffer check

2007-11-13 Thread Hideo AOKI

Hello,

I'm sorry that my response is always late.

Herbert Xu wrote:

On Mon, Oct 29, 2007 at 05:22:53PM -0400, Hideo AOKI wrote:

This patch introduces sndbuf size check before memory allocation for
send buffer.


Looks good, what about IPv6?


I'm going to develop IPv6 part if IPv4 patch set is accepted.

Many thanks,
Hideo

--
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] SOCK: add raw6 drops counter

2007-11-13 Thread Wang Chen
Add raw drops counter for IPv6 in /proc/net/raw6 .

Signed-off-by: Wang Chen <[EMAIL PROTECTED]>
---
 raw.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff -Nurp linux-2.6.24-rc2.org/net/ipv6/raw.c linux-2.6.24-rc2/net/ipv6/raw.c
--- linux-2.6.24-rc2.org/net/ipv6/raw.c 2007-11-09 16:38:05.0 +0800
+++ linux-2.6.24-rc2/net/ipv6/raw.c 2007-11-14 09:46:54.0 +0800
@@ -354,14 +354,14 @@ static inline int rawv6_rcv_skb(struct s
 {
if ((raw6_sk(sk)->checksum || sk->sk_filter) &&
skb_checksum_complete(skb)) {
-   /* FIXME: increment a raw6 drops counter here */
+   atomic_inc(&sk->sk_drops);  
kfree_skb(skb);
return 0;
}
 
/* Charge it to the socket. */
if (sock_queue_rcv_skb(sk,skb)<0) {
-   /* FIXME: increment a raw6 drops counter here */
+   atomic_inc(&sk->sk_drops);
kfree_skb(skb);
return 0;
}
@@ -382,6 +382,7 @@ int rawv6_rcv(struct sock *sk, struct sk
struct raw6_sock *rp = raw6_sk(sk);
 
if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) {
+   atomic_inc(&sk->sk_drops);
kfree_skb(skb);
return NET_RX_DROP;
}
@@ -405,7 +406,7 @@ int rawv6_rcv(struct sock *sk, struct sk
 
if (inet->hdrincl) {
if (skb_checksum_complete(skb)) {
-   /* FIXME: increment a raw6 drops counter here */
+   atomic_inc(&sk->sk_drops);
kfree_skb(skb);
return 0;
}
@@ -496,7 +497,7 @@ csum_copy_err:
   as some normal condition.
 */
err = (flags&MSG_DONTWAIT) ? -EAGAIN : -EHOSTUNREACH;
-   /* FIXME: increment a raw6 drops counter here */
+   atomic_inc(&sk->sk_drops);
goto out;
 }
 
@@ -1251,7 +1252,7 @@ static void raw6_sock_seq_show(struct se
srcp  = inet_sk(sp)->num;
seq_printf(seq,
   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-  "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p\n",
+  "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
   i,
   src->s6_addr32[0], src->s6_addr32[1],
   src->s6_addr32[2], src->s6_addr32[3], srcp,
@@ -1263,7 +1264,7 @@ static void raw6_sock_seq_show(struct se
   0, 0L, 0,
   sock_i_uid(sp), 0,
   sock_i_ino(sp),
-  atomic_read(&sp->sk_refcnt), sp);
+  atomic_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops));
 }
 
 static int raw6_seq_show(struct seq_file *seq, void *v)
@@ -1274,7 +1275,7 @@ static int raw6_seq_show(struct seq_file
   "local_address "
   "remote_address"
   "st tx_queue rx_queue tr tm->when retrnsmt"
-  "   uid  timeout inode\n");
+  "   uid  timeout inode  drops\n");
else
raw6_sock_seq_show(seq, v, raw6_seq_private(seq)->bucket);
return 0;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


reproducible cxgb kernel panic in FC8 kernel 2.6.23.1-49

2007-11-13 Thread Ben Greear

This panic happens (almost?) immediately after starting TCP traffic between
the cxgb nic on this system and another.  We also got at least one crash
on a custom/tainted 2.6.20.12 kernel, but it would run for at least
a few minutes at ~1Gbps first.

I think my serial console chomped some of this..but it's very reproducible,
so if you need more info I can make the terminal wider and do it again.

I'm not sure it matters..but the peer NIC (directly connected w/fibre) is
a similar cxgb NIC but with TOE support (the longer, more expensive one).


[EMAIL PROTECTED] ~]# BUG: unable to handle kernel NULL pointer dereference at 
virtual address 0194
printing eip: f8a80b67 *pde = 7d0ac067
Oops: 0002 [#1] SMP
Modules linked in: arc4 michael_mic 8021q cxgb e1000 macvlan pktgen autofs4 
sunrpc ipv6 loop dm_multipath i50d
CPU:1
EIP:0060:[]Not tainted VLI
EFLAGS: 00010206   (2.6.23.1-49.fc8 #1)
EIP is at t1_poll+0x2e0/0x64a [cxgb]
eax: fffd7d78   ebx: f6e56e02   ecx: f6e20500   edx: 
esi: f6ed8846   edi: f6f63428   ebp: f6820500   esp: c0789f7c
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process swapper (pid: 0, ti=c0789000 task=f7c42c20 task.ti=c211d000)
Stack:    c0789fd4 f6e2 f6e20500  
   f69f2060 f6f63448 0040 f6f63428 f6e20500 f6f63400  
    f6e2 c2017714 c2017700 c05bdc74 fffd7d78 012c 0001
Call Trace:
 [] net_rx_action+0x9a/0x196
 [] __do_softirq+0x66/0xd3
 [] do_softirq+0x6c/0xce
 [] tick_do_update_jiffies64+0x15/0xa8
 [] ktime_get+0xf/0x2b
 [] handle_edge_irq+0x0/0xfc
 [] irq_exit+0x38/0x6b
 [] do_IRQ+0x9f/0xb9
 [] hrtimer_start+0xe6/0xf0
 [] common_interrupt+0x23/0x28
 [] mwait_idle_with_hints+0x3b/0x3f
 [] mwait_idle+0x0/0x13
 [] cpu_idle+0xab/0xcc
 ===
Code: 68 b3 c7 e9 ef 01 00 00 8b 45 50 83 e8 08 3b 45 54 89 45 50 73 04 0f 0b 
eb fe 8d 43 08 8b 55 14 89 85 a
EIP: [] t1_poll+0x2e0/0x64a [cxgb] SS:ESP 0068:c0789f7c
Kernel panic - not syncing: Fatal exception in interrupt


lspci is below:

[EMAIL PROTECTED] ~]# lspci
00:00.0 Host bridge: Intel Corporation 5000V Chipset Memory Controller Hub (rev 
b1)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 
2-3 (rev b1)
00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine 
(rev b1)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers 
(rev b1)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers 
(rev b1)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 
b1)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 
b1)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express 
Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB 
Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB 
Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB 
Controller #3 (rev 09)
00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB 
Controller #4 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI 
USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC 
Interface Controller (rev 09)
00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)
00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA IDE 
Controller (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller 
(rev 09)
01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port 
(rev 01)
01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X 
Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E1 (rev 01)
02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E3 (rev 01)
04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
05:01.0 Ethernet controller: Chelsio Communications Inc Unknown device 000a 
07:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
[EMAIL PROTECTED] ~]#



--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.

[PATCH 1/2] SOCK: add raw drops counter

2007-11-13 Thread Wang Chen
Add raw drops counter for IPv4 in /proc/net/raw .

Signed-off-by: Wang Chen <[EMAIL PROTECTED]>
---
 include/net/sock.h |   11 ---
 net/core/sock.c|1 +
 net/ipv4/raw.c |   17 ++---
 3 files changed, 19 insertions(+), 10 deletions(-)

diff -Nurp linux-2.6.24-rc2.org/include/net/sock.h 
linux-2.6.24-rc2/include/net/sock.h
--- linux-2.6.24-rc2.org/include/net/sock.h 2007-11-09 16:37:08.0 
+0800
+++ linux-2.6.24-rc2/include/net/sock.h 2007-11-14 10:57:51.0 +0800
@@ -145,7 +145,8 @@ struct sock_common {
   *@sk_forward_alloc: space allocated forward
   *@sk_allocation: allocation mode
   *@sk_sndbuf: size of send buffer in bytes
-  *@sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, 
%SO_OOBINLINE settings
+  *@sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE, 
+  *   %SO_OOBINLINE settings
   *@sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets
   *@sk_route_caps: route capabilities (e.g. %NETIF_F_TSO)
   *@sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
@@ -153,9 +154,12 @@ struct sock_common {
   *@sk_backlog: always used with the per-socket spinlock held
   *@sk_callback_lock: used with the callbacks in the end of this struct
   *@sk_error_queue: rarely used
-  *@sk_prot_creator: sk_prot of original sock creator (see 
ipv6_setsockopt, IPV6_ADDRFORM for instance)
+  *@sk_prot_creator: sk_prot of original sock creator (see 
ipv6_setsockopt, 
+  *  IPV6_ADDRFORM for instance)
   *@sk_err: last error
-  *@sk_err_soft: errors that don't cause failure but are the cause of a 
persistent failure not just 'timed out'
+  *@sk_err_soft: errors that don't cause failure but are the cause of a 
+  *  persistent failure not just 'timed out'
+  *@sk_drops: raw drops counter
   *@sk_ack_backlog: current listen backlog
   *@sk_max_ack_backlog: listen backlog set in listen()
   *@sk_priority: %SO_PRIORITY setting
@@ -239,6 +243,7 @@ struct sock {
rwlock_tsk_callback_lock;
int sk_err,
sk_err_soft;
+   atomic_tsk_drops;
unsigned short  sk_ack_backlog;
unsigned short  sk_max_ack_backlog;
__u32   sk_priority;
diff -Nurp linux-2.6.24-rc2.org/net/core/sock.c linux-2.6.24-rc2/net/core/sock.c
--- linux-2.6.24-rc2.org/net/core/sock.c2007-11-09 16:37:40.0 
+0800
+++ linux-2.6.24-rc2/net/core/sock.c2007-11-13 15:20:53.0 +0800
@@ -1611,6 +1611,7 @@ void sock_init_data(struct socket *sock,
sk->sk_stamp = ktime_set(-1L, -1L);
 
atomic_set(&sk->sk_refcnt, 1);
+   atomic_set(&sk->sk_drops, 0);
 }
 
 void fastcall lock_sock_nested(struct sock *sk, int subclass)
diff -Nurp linux-2.6.24-rc2.org/net/ipv4/raw.c linux-2.6.24-rc2/net/ipv4/raw.c
--- linux-2.6.24-rc2.org/net/ipv4/raw.c 2007-11-09 16:37:56.0 +0800
+++ linux-2.6.24-rc2/net/ipv4/raw.c 2007-11-14 09:32:02.0 +0800
@@ -241,7 +241,7 @@ static int raw_rcv_skb(struct sock * sk,
/* Charge it to the socket. */
 
if (sock_queue_rcv_skb(sk, skb) < 0) {
-   /* FIXME: increment a raw drops counter here */
+   atomic_inc(&sk->sk_drops);  
kfree_skb(skb);
return NET_RX_DROP;
}
@@ -252,6 +252,7 @@ static int raw_rcv_skb(struct sock * sk,
 int raw_rcv(struct sock *sk, struct sk_buff *skb)
 {
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
+   atomic_inc(&sk->sk_drops);
kfree_skb(skb);
return NET_RX_DROP;
}
@@ -866,28 +867,30 @@ static __inline__ char *get_raw_sock(str
  srcp  = inet->num;
 
sprintf(tmpbuf, "%4d: %08X:%04X %08X:%04X"
-   " %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p",
+   " %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d",
i, src, srcp, dest, destp, sp->sk_state,
atomic_read(&sp->sk_wmem_alloc),
atomic_read(&sp->sk_rmem_alloc),
0, 0L, 0, sock_i_uid(sp), 0, sock_i_ino(sp),
-   atomic_read(&sp->sk_refcnt), sp);
+   atomic_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops));
return tmpbuf;
 }
 
+#define TMPSZ 128
+
 static int raw_seq_show(struct seq_file *seq, void *v)
 {
-   char tmpbuf[129];
+   char tmpbuf[TMPSZ+1];
 
if (v == SEQ_START_TOKEN)
-   seq_printf(seq, "%-127s\n",
+   seq_printf(seq, "%-*s\n", TMPSZ-1,
   "  sl  local_address rem_address   st tx_queue "
   "rx_queue tr tm->when retrnsmt   uid  timeout "
-  "inode");
+  "inode  drops");
else {
struct raw_iter_state *state = ra

[PATCH] bonding: Documentation update

2007-11-13 Thread Jay Vosburgh

Update the bonding documentation: more discussion on
initialization and configuration, changes to discussion of packet
reordering in balance-rr, update some out of date information.

Based in part on input from Rick Jones <[EMAIL PROTECTED]> and
Andy Gospodarek <[EMAIL PROTECTED]>.

Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>

---

diff --git a/Documentation/networking/bonding.txt 
b/Documentation/networking/bonding.txt
index 1134062..eda0f06 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -1,7 +1,7 @@
 
Linux Ethernet Bonding Driver HOWTO
 
-   Latest update: 24 April 2006
+   Latest update: 12 November 2007
 
 Initial release : Thomas Davis 
 Corrections, HA extensions : 2000/10/03-15 :
@@ -166,12 +166,17 @@ to use ifenslave.
 2. Bonding Driver Options
 =
 
-   Options for the bonding driver are supplied as parameters to
-the bonding module at load time.  They may be given as command line
-arguments to the insmod or modprobe command, but are usually specified
-in either the /etc/modules.conf or /etc/modprobe.conf configuration
-file, or in a distro-specific configuration file (some of which are
-detailed in the next section).
+   Options for the bonding driver are supplied as parameters to the
+bonding module at load time, or are specified via sysfs.
+
+   Module options may be given as command line arguments to the
+insmod or modprobe command, but are usually specified in either the
+/etc/modules.conf or /etc/modprobe.conf configuration file, or in a
+distro-specific configuration file (some of which are detailed in the next
+section).
+
+   Details on bonding support for sysfs is provided in the
+"Configuring Bonding Manually via Sysfs" section, below.
 
The available bonding driver parameters are listed below. If a
 parameter is not specified the default value is used.  When initially
@@ -787,11 +792,13 @@ the system /etc/modules.conf or /etc/modprobe.conf 
configuration file.
 3.2 Configuration with Initscripts Support
 --
 
-   This section applies to distros using a version of initscripts
-with bonding support, for example, Red Hat Linux 9 or Red Hat
-Enterprise Linux version 3 or 4.  On these systems, the network
-initialization scripts have some knowledge of bonding, and can be
-configured to control bonding devices.
+   This section applies to distros using a recent version of
+initscripts with bonding support, for example, Red Hat Enterprise Linux
+version 3 or later, Fedora, etc.  On these systems, the network
+initialization scripts have knowledge of bonding, and can be configured to
+control bonding devices.  Note that older versions of the initscripts
+package have lower levels of support for bonding; this will be noted where
+applicable.
 
These distros will not automatically load the network adapter
 driver unless the ethX device is configured with an IP address.
@@ -839,11 +846,31 @@ USERCTL=no
Be sure to change the networking specific lines (IPADDR,
 NETMASK, NETWORK and BROADCAST) to match your network configuration.
 
-   Finally, it is necessary to edit /etc/modules.conf (or
-/etc/modprobe.conf, depending upon your distro) to load the bonding
-module with your desired options when the bond0 interface is brought
-up.  The following lines in /etc/modules.conf (or modprobe.conf) will
-load the bonding module, and select its options:
+   For later versions of initscripts, such as that found with Fedora
+7 and Red Hat Enterprise Linux version 5 (or later), it is possible, and,
+indeed, preferable, to specify the bonding options in the ifcfg-bond0
+file, e.g. a line of the format:
+
+BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=+192.168.1.254"
+
+   will configure the bond with the specified options.  The options
+specified in BONDING_OPTS are identical to the bonding module parameters
+except for the arp_ip_target field.  Each target should be included as a
+separate option and should be preceded by a '+' to indicate it should be
+added to the list of queried targets, e.g.,
+
+   arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2
+
+   is the proper syntax to specify multiple targets.  When specifying
+options via BONDING_OPTS, it is not necessary to edit /etc/modules.conf or
+/etc/modprobe.conf.
+
+   For older versions of initscripts that do not support
+BONDING_OPTS, it is necessary to edit /etc/modules.conf (or
+/etc/modprobe.conf, depending upon your distro) to load the bonding module
+with your desired options when the bond0 interface is brought up.  The
+following lines in /etc/modules.conf (or modprobe.conf) will load the
+bonding module, and select its options:
 
 alias bond0 bonding
 options bond0 mode=balance-alb miimon=100
@@ -858,9 +885,10 @@ up and running.
 3.2.1 Using DHCP with Initscripts
 -

[PATCH 4/5] memory limitation by using udp_mem

2007-11-13 Thread Hideo AOKI

This patch introduces memory limitation for UDP.

signed-off-by: Satoshi Oshima <[EMAIL PROTECTED]>
signed-off-by: Hideo Aoki <[EMAIL PROTECTED]>
---

 Documentation/networking/ip-sysctl.txt |6 
 include/net/udp.h  |3 ++
 net/ipv4/af_inet.c |3 ++
 net/ipv4/ip_output.c   |   47 ++---
 net/ipv4/sysctl_net_ipv4.c |   11 +++
 net/ipv4/udp.c |   24 
 6 files changed, 91 insertions(+), 3 deletions(-)

diff -pruN net-2.6-udp-p3/Documentation/networking/ip-sysctl.txt 
net-2.6-udp-p4/Documentation/networking/ip-sysctl.txt
--- net-2.6-udp-p3/Documentation/networking/ip-sysctl.txt   2007-11-13 
08:19:30.0 -0500
+++ net-2.6-udp-p4/Documentation/networking/ip-sysctl.txt   2007-11-13 
16:12:26.0 -0500
@@ -446,6 +446,12 @@ tcp_dma_copybreak - INTEGER
and CONFIG_NET_DMA is enabled.
Default: 4096

+UDP variables:
+
+udp_mem - INTEGER
+   Number of pages allowed for queueing by all UDP sockets.
+   Default is calculated at boot time from amount of available memory.
+
 CIPSOv4 Variables:

 cipso_cache_enable - BOOLEAN
diff -pruN net-2.6-udp-p3/include/net/udp.h net-2.6-udp-p4/include/net/udp.h
--- net-2.6-udp-p3/include/net/udp.h2007-11-13 16:10:05.0 -0500
+++ net-2.6-udp-p4/include/net/udp.h2007-11-13 16:12:26.0 -0500
@@ -66,6 +66,7 @@ extern rwlock_t udp_hash_lock;
 extern struct proto udp_prot;

 extern atomic_t udp_memory_allocated;
+extern int sysctl_udp_mem;

 struct sk_buff;

@@ -175,4 +176,6 @@ extern void udp_proc_unregister(struct u
 extern int  udp4_proc_init(void);
 extern void udp4_proc_exit(void);
 #endif
+
+extern void udp_init(void);
 #endif /* _UDP_H */
diff -pruN net-2.6-udp-p3/net/ipv4/af_inet.c net-2.6-udp-p4/net/ipv4/af_inet.c
--- net-2.6-udp-p3/net/ipv4/af_inet.c   2007-11-13 16:12:24.0 -0500
+++ net-2.6-udp-p4/net/ipv4/af_inet.c   2007-11-13 16:12:26.0 -0500
@@ -1446,6 +1446,9 @@ static int __init inet_init(void)
/* Setup TCP slab cache for open requests. */
tcp_init();

+   /* Setup UDP memory threshold */
+   udp_init();
+
/* Add UDP-Lite (RFC 3828) */
udplite4_register();

diff -pruN net-2.6-udp-p3/net/ipv4/ip_output.c 
net-2.6-udp-p4/net/ipv4/ip_output.c
--- net-2.6-udp-p3/net/ipv4/ip_output.c 2007-11-13 16:12:24.0 -0500
+++ net-2.6-udp-p4/net/ipv4/ip_output.c 2007-11-13 16:12:26.0 -0500
@@ -75,6 +75,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -699,6 +700,20 @@ csum_page(struct page *page, int offset,
return csum;
 }

+static inline int __ip_check_max_skb_pages(struct sock *sk, int size)
+{
+   switch(sk->sk_protocol) {
+   case IPPROTO_UDP:
+   if (atomic_read(sk->sk_prot->memory_allocated) + size
+   > sk->sk_prot->sysctl_mem[0])
+   return -ENOBUFS;
+   /* Fall through */  
+   default:
+   break;
+   }
+   return 0;
+}
+
 static inline int ip_ufo_append_data(struct sock *sk,
int getfrag(void *from, char *to, int offset, int len,
   int odd, struct sk_buff *skb),
@@ -707,16 +722,20 @@ static inline int ip_ufo_append_data(str
 {
struct sk_buff *skb;
int err;
+   int size = 0;

/* There is support for UDP fragmentation offload by network
 * device, so create one single skb packet containing complete
 * udp datagram
 */
if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
-   skb = sock_alloc_send_skb(sk,
-   hh_len + fragheaderlen + transhdrlen + 20,
-   (flags & MSG_DONTWAIT), &err);
+   size = hh_len + fragheaderlen + transhdrlen + 20;
+   err = __ip_check_max_skb_pages(sk, sk_datagram_pages(size));
+   if (err)
+   return err;

+   skb = sock_alloc_send_skb(sk, size, (flags & MSG_DONTWAIT),
+ &err);
if (skb == NULL)
return err;

@@ -737,6 +756,10 @@ static inline int ip_ufo_append_data(str
sk->sk_sndmsg_off = 0;
}

+   err = __ip_check_max_skb_pages(sk, sk_datagram_pages(size + length -
+transhdrlen));
+   if (err)
+   goto fail;
err = skb_append_datato_frags(sk,skb, getfrag, from,
   (length - transhdrlen));
if (!err) {
@@ -752,6 +775,7 @@ static inline int ip_ufo_append_data(str
/* There is not enough support do UFO ,
 * so follow normal path
 */
+fail:
kfree_skb(skb);
return err;
 }
@@ -910,6 +934,12 @@ alloc_new_skb:
if (datalen == le

[PATCH 5/5] add udp_rmem_min and udp_wmem_min

2007-11-13 Thread Hideo AOKI

This patch added /proc/sys/net/ipv4/udp_rmem_min and
/proc/sys/net/ipv4/udp_rmem_min. Each UDP packet is drooped when the
number of pages for socket buffer is beyond the limit and the socket
already consumes minimum buffer.

Cc: Satoshi Oshima <[EMAIL PROTECTED]>
signed-off-by: Hideo Aoki <[EMAIL PROTECTED]>
---

 Documentation/networking/ip-sysctl.txt |   12 
 include/net/udp.h  |4 
 net/ipv4/ip_output.c   |4 +++-
 net/ipv4/sysctl_net_ipv4.c |   20 
 net/ipv4/udp.c |   13 +++--
 5 files changed, 50 insertions(+), 3 deletions(-)

diff -pruN net-2.6-udp-p4/Documentation/networking/ip-sysctl.txt 
net-2.6-udp-p5/Documentation/networking/ip-sysctl.txt
--- net-2.6-udp-p4/Documentation/networking/ip-sysctl.txt   2007-11-13 
16:12:26.0 -0500
+++ net-2.6-udp-p5/Documentation/networking/ip-sysctl.txt   2007-11-13 
16:12:35.0 -0500
@@ -452,6 +452,18 @@ udp_mem - INTEGER
Number of pages allowed for queueing by all UDP sockets.
Default is calculated at boot time from amount of available memory.

+udp_rmem_min - INTEGER
+   Minimal size of receive buffer used by UDP sockets. Each UDP socket
+   is able to use the size for receiving data, even if total pages of UDP
+   sockets exceed udp_mem. The unit is byte.
+   Default: 4096
+
+udp_wmem_min - INTEGER
+   Minimal size of send buffer used by UDP sockets. Each UDP socket is
+   able to use the size for sending data, even if total pages of UDP
+   sockets exceed udp_mem. The unit is byte.
+   Default: 4096
+
 CIPSOv4 Variables:

 cipso_cache_enable - BOOLEAN
diff -pruN net-2.6-udp-p4/include/net/udp.h net-2.6-udp-p5/include/net/udp.h
--- net-2.6-udp-p4/include/net/udp.h2007-11-13 16:12:26.0 -0500
+++ net-2.6-udp-p5/include/net/udp.h2007-11-13 16:12:35.0 -0500
@@ -66,7 +66,11 @@ extern rwlock_t udp_hash_lock;
 extern struct proto udp_prot;

 extern atomic_t udp_memory_allocated;
+
+/* sysctl variables for udp */
 extern int sysctl_udp_mem;
+extern int sysctl_udp_rmem_min;
+extern int sysctl_udp_wmem_min;

 struct sk_buff;

diff -pruN net-2.6-udp-p4/net/ipv4/ip_output.c 
net-2.6-udp-p5/net/ipv4/ip_output.c
--- net-2.6-udp-p4/net/ipv4/ip_output.c 2007-11-13 16:12:26.0 -0500
+++ net-2.6-udp-p5/net/ipv4/ip_output.c 2007-11-13 16:12:35.0 -0500
@@ -705,7 +705,9 @@ static inline int __ip_check_max_skb_pag
switch(sk->sk_protocol) {
case IPPROTO_UDP:
if (atomic_read(sk->sk_prot->memory_allocated) + size
-   > sk->sk_prot->sysctl_mem[0])
+   > sk->sk_prot->sysctl_mem[0] &&
+   atomic_read(&sk->sk_wmem_alloc) + size
+   > sk->sk_prot->sysctl_wmem[0])
return -ENOBUFS;
/* Fall through */  
default:
diff -pruN net-2.6-udp-p4/net/ipv4/sysctl_net_ipv4.c 
net-2.6-udp-p5/net/ipv4/sysctl_net_ipv4.c
--- net-2.6-udp-p4/net/ipv4/sysctl_net_ipv4.c   2007-11-13 16:12:26.0 
-0500
+++ net-2.6-udp-p5/net/ipv4/sysctl_net_ipv4.c   2007-11-13 16:12:35.0 
-0500
@@ -896,6 +896,26 @@ ctl_table ipv4_table[] = {
.strategy   = &sysctl_intvec,
.extra1 = &zero
},
+   {
+   .ctl_name   = CTL_UNNUMBERED,
+   .procname   = "udp_rmem_min",
+   .data   = &sysctl_udp_rmem_min,
+   .maxlen = sizeof(sysctl_udp_rmem_min),
+   .mode   = 0644,
+   .proc_handler   = &proc_dointvec_minmax,
+   .strategy   = &sysctl_intvec,
+   .extra1 = &zero
+   },
+   {
+   .ctl_name   = CTL_UNNUMBERED,
+   .procname   = "udp_wmem_min",
+   .data   = &sysctl_udp_wmem_min,
+   .maxlen = sizeof(sysctl_udp_wmem_min),
+   .mode   = 0644,
+   .proc_handler   = &proc_dointvec_minmax,
+   .strategy   = &sysctl_intvec,
+   .extra1 = &zero
+   },
{ .ctl_name = 0 }
 };

diff -pruN net-2.6-udp-p4/net/ipv4/udp.c net-2.6-udp-p5/net/ipv4/udp.c
--- net-2.6-udp-p4/net/ipv4/udp.c   2007-11-13 16:12:26.0 -0500
+++ net-2.6-udp-p5/net/ipv4/udp.c   2007-11-13 16:12:35.0 -0500
@@ -117,6 +117,8 @@ DEFINE_RWLOCK(udp_hash_lock);

 atomic_t udp_memory_allocated;
 int sysctl_udp_mem __read_mostly;
+int sysctl_udp_rmem_min __read_mostly;
+int sysctl_udp_wmem_min __read_mostly;

 static inline int __udp_lib_lport_inuse(__u16 num,
const struct hlist_head udptable[])
@@ -1026,8 +1028,10 @@ int udp_queue_rcv_skb(struct sock * sk,
}

if ((atomic_read(sk->sk_prot->memory_allocated)
-  + sk_datagram_pages(skb->truesize))
-

[PATCH 2/5] accounting unit and variable

2007-11-13 Thread Hideo AOKI

This patch introduces global variable for UDP memory accounting.
The unit is page.

signed-off-by: Satoshi Oshima <[EMAIL PROTECTED]>
signed-off-by: Hideo Aoki <[EMAIL PROTECTED]>
---

 include/net/sock.h |7 +++
 include/net/udp.h  |2 ++
 net/ipv4/proc.c|3 ++-
 net/ipv4/udp.c |2 ++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff -pruN net-2.6-udp-p1/include/net/sock.h net-2.6-udp-p2/include/net/sock.h
--- net-2.6-udp-p1/include/net/sock.h   2007-11-13 08:19:56.0 -0500
+++ net-2.6-udp-p2/include/net/sock.h   2007-11-13 16:10:05.0 -0500
@@ -778,6 +778,13 @@ static inline int sk_stream_wmem_schedul
   sk_stream_mem_schedule(sk, size, 0);
 }

+#define SK_DATAGRAM_MEM_QUANTUM ((int)PAGE_SIZE)
+
+static inline int sk_datagram_pages(int amt)
+{
+   return DIV_ROUND_UP(amt, SK_DATAGRAM_MEM_QUANTUM);
+}
+
 /* Used by processes to "lock" a socket state, so that
  * interrupts and bottom half handlers won't change it
  * from under us. It essentially blocks any incoming
diff -pruN net-2.6-udp-p1/include/net/udp.h net-2.6-udp-p2/include/net/udp.h
--- net-2.6-udp-p1/include/net/udp.h2007-11-13 08:19:56.0 -0500
+++ net-2.6-udp-p2/include/net/udp.h2007-11-13 16:10:05.0 -0500
@@ -65,6 +65,8 @@ extern rwlock_t udp_hash_lock;

 extern struct proto udp_prot;

+extern atomic_t udp_memory_allocated;
+
 struct sk_buff;

 /*
diff -pruN net-2.6-udp-p1/net/ipv4/proc.c net-2.6-udp-p2/net/ipv4/proc.c
--- net-2.6-udp-p1/net/ipv4/proc.c  2007-11-13 08:19:57.0 -0500
+++ net-2.6-udp-p2/net/ipv4/proc.c  2007-11-13 16:11:48.0 -0500
@@ -56,7 +56,8 @@ static int sockstat_seq_show(struct seq_
   sock_prot_inuse(&tcp_prot), atomic_read(&tcp_orphan_count),
   tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
   atomic_read(&tcp_memory_allocated));
-   seq_printf(seq, "UDP: inuse %d\n", sock_prot_inuse(&udp_prot));
+   seq_printf(seq, "UDP: inuse %d mem %d\n", sock_prot_inuse(&udp_prot),
+  atomic_read(&udp_memory_allocated));
seq_printf(seq, "UDPLITE: inuse %d\n", sock_prot_inuse(&udplite_prot));
seq_printf(seq, "RAW: inuse %d\n", sock_prot_inuse(&raw_prot));
seq_printf(seq,  "FRAG: inuse %d memory %d\n",
diff -pruN net-2.6-udp-p1/net/ipv4/udp.c net-2.6-udp-p2/net/ipv4/udp.c
--- net-2.6-udp-p1/net/ipv4/udp.c   2007-11-13 08:19:57.0 -0500
+++ net-2.6-udp-p2/net/ipv4/udp.c   2007-11-13 16:10:05.0 -0500
@@ -114,6 +114,8 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_sta
 struct hlist_head udp_hash[UDP_HTABLE_SIZE];
 DEFINE_RWLOCK(udp_hash_lock);

+atomic_t udp_memory_allocated;
+
 static inline int __udp_lib_lport_inuse(__u16 num,
const struct hlist_head udptable[])
 {
--
Hideo Aoki
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] memory accounting

2007-11-13 Thread Hideo AOKI

This patch adds UDP memory usage accounting in IPv4.

signed-off-by: Satoshi Oshima <[EMAIL PROTECTED]>
signed-off-by: Hideo Aoki <[EMAIL PROTECTED]>
---

 af_inet.c   |   30 +-
 ip_output.c |   25 ++---
 udp.c   |9 +
 3 files changed, 60 insertions(+), 4 deletions(-)

diff -pruN net-2.6-udp-p2/net/ipv4/af_inet.c net-2.6-udp-p3/net/ipv4/af_inet.c
--- net-2.6-udp-p2/net/ipv4/af_inet.c   2007-11-13 08:19:57.0 -0500
+++ net-2.6-udp-p3/net/ipv4/af_inet.c   2007-11-13 16:12:24.0 -0500
@@ -126,13 +126,41 @@ extern void ip_mc_drop_socket(struct soc
 static struct list_head inetsw[SOCK_MAX];
 static DEFINE_SPINLOCK(inetsw_lock);

+/**
+ * __skb_queue_purge_and_sub_memory_allocated
+ * - empty a list and subtruct memory allocation counter
+ * @sk:   sk
+ * @list: list to empty
+ * Delete all buffers on an &sk_buff list and subtruct the
+ * truesize of the sk_buff for memory accounting. Each buffer
+ * is removed from the list and one reference dropped. This
+ * function does not take the list lock and the caller must
+ * hold the relevant locks to use it.
+ */
+static inline void __skb_queue_purge_and_sub_memory_allocated(struct sock *sk,
+   struct sk_buff_head *list)
+{
+   struct sk_buff *skb;
+   int purged_skb_size = 0;
+   while ((skb = __skb_dequeue(list)) != NULL) {
+   purged_skb_size += sk_datagram_pages(skb->truesize);
+   kfree_skb(skb);
+   }
+   atomic_sub(purged_skb_size, sk->sk_prot->memory_allocated);
+}
+
 /* New destruction routine */

 void inet_sock_destruct(struct sock *sk)
 {
struct inet_sock *inet = inet_sk(sk);

-   __skb_queue_purge(&sk->sk_receive_queue);
+   if (sk->sk_prot->memory_allocated && sk->sk_type != SOCK_STREAM)
+   __skb_queue_purge_and_sub_memory_allocated(sk,
+   &sk->sk_receive_queue);
+   else
+   __skb_queue_purge(&sk->sk_receive_queue);
+
__skb_queue_purge(&sk->sk_error_queue);

if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) {
diff -pruN net-2.6-udp-p2/net/ipv4/ip_output.c 
net-2.6-udp-p3/net/ipv4/ip_output.c
--- net-2.6-udp-p2/net/ipv4/ip_output.c 2007-11-13 16:10:03.0 -0500
+++ net-2.6-udp-p3/net/ipv4/ip_output.c 2007-11-13 16:12:24.0 -0500
@@ -743,6 +743,8 @@ static inline int ip_ufo_append_data(str
/* specify the length of each IP datagram fragment*/
skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+   atomic_add(sk_datagram_pages(skb->truesize),
+  sk->sk_prot->memory_allocated);
__skb_queue_tail(&sk->sk_write_queue, skb);

return 0;
@@ -924,6 +926,9 @@ alloc_new_skb:
}
if (skb == NULL)
goto error;
+   if (sk->sk_prot->memory_allocated)
+   atomic_add(sk_datagram_pages(skb->truesize),
+  sk->sk_prot->memory_allocated);

/*
 *  Fill in the control structures
@@ -1023,6 +1028,8 @@ alloc_new_skb:
frag = &skb_shinfo(skb)->frags[i];
skb->truesize += PAGE_SIZE;
atomic_add(PAGE_SIZE, &sk->sk_wmem_alloc);
+   if (sk->sk_prot->memory_allocated)
+   
atomic_inc(sk->sk_prot->memory_allocated);
} else {
err = -EMSGSIZE;
goto error;
@@ -1123,7 +1130,9 @@ ssize_t   ip_append_page(struct sock *sk,
if (unlikely(!skb)) {
err = -ENOBUFS;
goto error;
-   }
+   } else if (sk->sk_prot->memory_allocated)
+   atomic_add(sk_datagram_pages(skb->truesize),
+  sk->sk_prot->memory_allocated);

/*
 *  Fill in the control structures
@@ -1213,13 +1222,14 @@ int ip_push_pending_frames(struct sock *
struct iphdr *iph;
__be16 df = 0;
__u8 ttl;
-   int err = 0;
+   int err = 0, send_page_size;

if ((skb = __skb_dequeue(&sk->sk_write_queue)) == NULL)
goto out;
tail_skb = &(skb_shinfo(skb)->frag_list);

/* move skb->data to ip header from ext header */
+   send_page_size = sk_datagram_pages(skb->truesize);
if (skb->data < skb_network_header(skb))
__skb_pull(skb, skb_network_offset(skb));
while ((tmp_skb = __skb_dequeue(&sk-

[PATCH 1/5] fix send buffer check

2007-11-13 Thread Hideo AOKI

This patch introduces sndbuf size check before memory allocation for
send buffer.

signed-off-by: Satoshi Oshima <[EMAIL PROTECTED]>
signed-off-by: Hideo Aoki <[EMAIL PROTECTED]>
---

 ip_output.c |5 +
 1 file changed, 5 insertions(+)

diff -pruN net-2.6/net/ipv4/ip_output.c net-2.6-udp-p1/net/ipv4/ip_output.c
--- net-2.6/net/ipv4/ip_output.c2007-11-13 08:19:57.0 -0500
+++ net-2.6-udp-p1/net/ipv4/ip_output.c 2007-11-13 16:10:03.0 -0500
@@ -1004,6 +1004,11 @@ alloc_new_skb:
frag = &skb_shinfo(skb)->frags[i];
}
} else if (i < MAX_SKB_FRAGS) {
+   if (atomic_read(&sk->sk_wmem_alloc) + PAGE_SIZE
+   > 2 * sk->sk_sndbuf) {
+   err = -ENOBUFS;
+   goto error;
+   }
if (copy > PAGE_SIZE)
copy = PAGE_SIZE;
page = alloc_pages(sk->sk_allocation, 0);
--
Hideo Aoki
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] UDP memory accounting and limitation (take 7)

2007-11-13 Thread Hideo AOKI

Hello,

I updated UDP memory accounting and limitation patch set.

According to comments against previous take, I renamed udp_[rw]mem to
udp_[rw]mem_min. This patch set is for net-2.6.


Changelog take 6 -> take 7:
 * renamed /proc/sys/net/ipv4/udp_rmem to
   /proc/sys/net/ipv4/udp_rmem_min
 * renamed /proc/sys/net/ipv4/udp_wmem to
   /proc/sys/net/ipv4/udp_wmem_min
 * rebased to net-2.6


Changelog take 5 -> take 6:

 * removed minimal limit of /proc/sys/net/ipv4/udp_mem
 * added udp_init() for default value calculation of parameters
 * added /proc/sys/net/ipv4/udp_rmem and
   /proc/sys/net/ipv4/udp_rmem
 * added limitation code to ip_ufo_append_data()
 * improved accounting for receiving packet
 * fixed typos
 * rebased to 2.6.24-rc1


Changelog take 4 -> take 5:

 * removing unnessesary EXPORT_SYMBOLs
 * adding minimal limit of /proc/sys/net/ipv4/udp_mem
 * bugfix of UDP limit affecting protocol other than UDP
 * introducing __ip_check_max_skb_pages()
 * using CTL_UNNUMBERED
 * adding udp_mem usage to Documentation/networking/ip_sysctl.txt


Best regards,
Hideo Aoki

--
Hitachi Computer Products (America) Inc.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Andrew Morton
On Tue, 13 Nov 2007 17:55:51 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote:

> I've created [EMAIL PROTECTED]

Let me just say - I'm astonished at how little spam gets though the vger
lists.  Considering how many times those email addresses must have been
added to spam databases.

It must be a lot of work, and whoever is doing it does it well.

I don't even know.  Is it Matti?  You?


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] - [2/15] - remove defconfig ptr comparisons to 0 - drivers/net

2007-11-13 Thread Michael Chan
On Tue, 2007-11-13 at 18:04 -0800, Joe Perches wrote:
> Remove defconfig ptr comparison to 0
> 
> Remove sparse warning: Using plain integer as NULL pointer
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> 

Acked-by: Michael Chan <[EMAIL PROTECTED]>

Thanks.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Andrew Morton
On Tue, 13 Nov 2007 17:11:36 -0800 Stephen Hemminger <[EMAIL PROTECTED]> wrote:

> On Tue, 13 Nov 2007 19:52:17 -0500
> Chuck Ebbert <[EMAIL PROTECTED]> wrote:
> 
> > On 11/13/2007 04:12 PM, Alan Cox wrote:
> > >> Bug fixing is not about finding someone to blame, it's about getting the 
> > >> bug fixed.
> > > 
> > > Partly - its also about understanding why the bug occurred and making it
> > > not happen again.
> > 
> > Very few people think about that part.
> 
> Why does the kernel have very few useful tests?

Tests would of course be nice, but they aren't very useful(!)

Looking at this list which Natalie has generated I see around thirty which
are dependent on the right hardware and ten which are not.  This ratio is
typical, I think.  In fact I'd say that more than 75% of reported bugs are
dependent on hardware.

So the best test of all for the kernel is "run it on a different machine". 
This is why we are so dependent upon our volunteer testers/reporters to
be able to do kernel development.

>  Lack of interest? resources? expertise?
> Ideally each new feature would just be a small add on to an existing test.

Sure.  For system-call-visible features it would be good to do that.

But this tends not to be where bugs get exposed.  Because the original
developer can 100% exercise such code.  That isn't the case with
driver/arch/platform changes.

> Unlike developing new features which seems to grow well with more developers.
> Bug fixing also seems to be a scarcity process. There often seems to be
> a very few people that understand the problem well enough or have the 
> necessary
> hardware to reproduce and fix the problem.

We're 100% dead if "having the hardware" is a prerequisite to fixing a bug.
The terminal state there is that the kernel runs on about 200 machines
worldwide.  We have to work with reporters via email to fix these sorts of
things.  As we of course do.

> Recent changes like tickless and scheduler rework were well thought out and 
> caused
> very little impact to 90% of the users. The problem is the 10% who do have 
> problems.
> Worse, the developers often only hear about the a small sample of those.

Yes.  An unknown number of people just shrug and go back to an old kernel.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] - [15/15] - remove defconfig ptr comparisons to 0 - net/sunrpc

2007-11-13 Thread Joe Perches
Remove defconfig ptr comparison to 0

Remove sparse warning: Using plain integer as NULL pointer

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

---

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 76be83e..1600df2 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -157,7 +157,7 @@ static struct rpc_clnt * rpc_new_client(struct rpc_xprt 
*xprt, char *servname, s
clnt->cl_server = clnt->cl_inline_name;
if (len > sizeof(clnt->cl_inline_name)) {
char *buf = kmalloc(len, GFP_KERNEL);
-   if (buf != 0)
+   if (buf)
clnt->cl_server = buf;
else
len = sizeof(clnt->cl_inline_name);


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] - [2/15] - remove defconfig ptr comparisons to 0 - drivers/net

2007-11-13 Thread Joe Perches
Remove defconfig ptr comparison to 0

Remove sparse warning: Using plain integer as NULL pointer

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

---

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index da767d3..cec3cb4 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2495,7 +2495,7 @@ reuse_rx:
}
 
 #ifdef BCM_VLAN
-   if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) && (bp->vlgrp != 0)) {
+   if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) && bp->vlgrp) {
vlan_hwaccel_receive_skb(skb, bp->vlgrp,
rx_hdr->l2_fhdr_vlan_tag);
}
@@ -5134,7 +5134,7 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device 
*dev)
vlan_tag_flags |= TX_BD_FLAGS_TCP_UDP_CKSUM;
}
 
-   if (bp->vlgrp != 0 && vlan_tx_tag_present(skb)) {
+   if (bp->vlgrp && vlan_tx_tag_present(skb)) {
vlan_tag_flags |=
(TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16));
}
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 4942f7d..b189c47 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -12576,7 +12576,7 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
tg3reg_len = pci_resource_len(pdev, 2);
 
tp->aperegs = ioremap_nocache(tg3reg_base, tg3reg_len);
-   if (tp->aperegs == 0UL) {
+   if (!tp->aperegs) {
printk(KERN_ERR PFX "Cannot map APE registers, "
   "aborting.\n");
err = -ENOMEM;


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc2: Network commit causes SLUB performance regression with tbench

2007-11-13 Thread David Miller
From: Nick Piggin <[EMAIL PROTECTED]>
Date: Tue, 13 Nov 2007 22:41:58 +1100

> On Tuesday 13 November 2007 06:44, Christoph Lameter wrote:
> > On Sat, 10 Nov 2007, Nick Piggin wrote:
> > > BTW. your size-2048 kmalloc cache is order-1 in the default setup,
> > > wheras kmalloc(1024) or kmalloc(4096) will be order-0 allocations. And
> > > SLAB also uses order-0 for size-2048. It would be nice if SLUB did the
> > > same...
> >
> > You can try to see the effect that order 0 would have by booting with
> >
> > slub_max_order=0
> 
> Yeah, that didn't help much, but in general I think it would give
> more consistent and reliable behaviour from slub.

Just a note that I'm not ignoring this issue, I just don't have time
to get to it yet.

I suspect the issue is about having a huge skb->data linear area for
TCP sends over loopback.  We're likely getting a much smaller
skb->data linear data area after the patch in question, the rest using
the sk_buff scatterlist pages which are a little bit more expensive to
process.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >