Re: [PATCH 2.6.24 1/1] sch_htb: fix too many events situation

2008-02-18 Thread Martin Devera

up to single jiffy interval and then delay remainder to other
jiffy.

Signed-off-by: Martin Devera [EMAIL PROTECTED]


I think we would be wise to use something other than loops_per_jiffy.

Depending upon the loop calibration method used by a particular
architecture it can me one of many different things.

Some platforms don't even make use of it and thus leave it at it's


aha, ok, I'm not so informed about crossplatform issues.
I was also thining about looking at jiffies value and stop once
it is startjiffy+2, but with NO_HZ introduction, are jiffies
still incremented ?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Zhang, Yanmin
On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:
 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Fri, 15 Feb 2008 15:21:48 +0100
 
  On linux-2.6.25-rc1 x86_64 :
  
  offsetof(struct dst_entry, lastuse)=0xb0
  offsetof(struct dst_entry, __refcnt)=0xb8
  offsetof(struct dst_entry, __use)=0xbc
  offsetof(struct dst_entry, next)=0xc0
  
  So it should be optimal... I dont know why tbench prefers __refcnt being 
  on 0xc0, since in this case lastuse will be on a different cache line...
  
  Each incoming IP packet will need to change lastuse, __refcnt and __use, 
  so keeping them in the same cache line is a win.
  
  I suspect then that even this patch could help tbench, since it avoids 
  writing lastuse...
 
 I think your suspicions are right, and even moreso
 it helps to keep __refcnt out of the same cache line
 as input/output/ops which are read-almost-entirely :-
I think you are right. The issue is these three variables sharing the same 
cache line
with input/output/ops.

 )
 
 I haven't done an exhaustive analysis, but it seems that
 the write traffic to lastuse and __refcnt are about the
 same.  However if we find that __refcnt gets hit more
 than lastuse in this workload, it explains the regression.
I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The 
performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 
+0800
@@ -52,11 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-   u32 metrics[RTAX_MAX];
-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
+
+   struct dst_entry*path;
 
 #ifdef CONFIG_NET_CLS_ROUTE
__u32   tclassid;
@@ -70,10 +69,12 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
struct  dst_ops *ops;
-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24 1/1] sch_htb: fix too many events situation

2008-02-18 Thread David Miller
From: Martin Devera [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 09:03:52 +0100

 aha, ok, I'm not so informed about crossplatform issues.
 I was also thining about looking at jiffies value and stop once
 it is startjiffy+2, but with NO_HZ introduction, are jiffies
 still incremented ?

There should always be at least once cpu tasked with incrementing
jiffies.  Lots of stuff would break if not :-)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH resend] virtio_net: Fix oops on early interrupts - introduced by virtio reset code

2008-02-18 Thread Christian Borntraeger
Am Montag, 11. Februar 2008 schrieb Anthony Liguori:
 The reset support is in Linus's tree so we should try to push it for -rc2.

You are right. My repository was borked. will push it to Jeff Garzik. Thanks

Jeff can you schedule this fix into your network driver updates? Thanks
---


With the latest virtio_reset patches I got the following oops:

Unable to handle kernel pointer dereference at virtual kernel address 

Oops: 0004 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 Not tainted 2.6.24zlive-guest-10577-g63f5307-dirty #168
Process swapper (pid: 0, task: 0f866040, ksp: 0f86fd78)
Krnl PSW : 040410018000 0047598a (skb_recv_done+0x52/0x98)
   R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
Krnl GPRS: 0001  0efd0e60 0001
    0f866040  
   008de4c8 1237 1237 0f977dd8
   0020 001132bc 0f977e08 0f977dd8
Krnl Code: 0047597c: e3104034   lg  %r1,48(%r4)
   00475982: b9040001   lgr %r0,%r1
   00475986: b9810003   ogr %r0,%r3
  0047598a: eb1040300030   csg %r1,%r0,48(%r4)
   00475990: a744fff9   brc 4,475982
   00475994: a7110001   tmll%r1,1
   00475998: a7840009   brc 8,4759aa
   0047599c: e340b0b80004   lg  %r4,184(%r11)
Call Trace:
([01500f978000] 0x1500f978000)
 [004779a6] vring_interrupt+0x72/0x88
 [00491d9c] kvm_extint_handler+0x34/0x44
 [0010d2d4] do_extint+0xc0/0xfc
 [00113b5a] ext_no_vtime+0x1c/0x20
 [0010a0b6] cpu_idle+0x21a/0x230
([0010a096] cpu_idle+0x1fa/0x230)
 [0057dfe4] start_secondary+0xa0/0xb4

We must initialize vdev-priv before we use the notify hypercall as 
vdev-priv is used in skb_recv_done. So lets move the assignment of 
vdev-priv before we call try_fill_recv.

Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Acked-by: Anthony Liguori [EMAIL PROTECTED]

---
 drivers/net/virtio_net.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: kvm/drivers/net/virtio_net.c
===
--- kvm.orig/drivers/net/virtio_net.c
+++ kvm/drivers/net/virtio_net.c
@@ -361,6 +361,7 @@ static int virtnet_probe(struct virtio_d
netif_napi_add(dev, vi-napi, virtnet_poll, napi_weight);
vi-dev = dev;
vi-vdev = vdev;
+   vdev-priv = vi;
 
/* We expect two virtqueues, receive then send. */
vi-rvq = vdev-config-find_vq(vdev, 0, skb_recv_done);
@@ -395,7 +396,6 @@ static int virtnet_probe(struct virtio_d
}
 
pr_debug(virtnet: registered device %s\n, dev-name);
-   vdev-priv = vi;
return 0;
 
 unregister:
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


e1000: Question about polling

2008-02-18 Thread Badalian Vyacheslav

Hello all.

Interesting think:

Have PC that do NAT. Bandwidth about 600 mbs.

Have  4 CPU (2xCoRe 2 DUO HT OFF 3.2 HZ).

irqbalance in kernel is off.

nat2 ~ # cat /proc/irq/217/smp_affinity
0001
nat2 ~ # cat /proc/irq/218/smp_affinity
0003

Load SI on CPU0 and CPU1 is about 90%

Good... try do
echo   /proc/irq/217/smp_affinity
echo   /proc/irq/218/smp_affinity

Get 100% SI at CPU0

Question Why?

I listen that if use IRQ from 1 netdevice to 1 CPU i can get 30% 
perfomance... but i have 4 CPU... i must get more perfomance if i cat 
  to smp_affinity.


picture looks liks this:
0-3 CPU get over 50% SI bandwith up 55% SI... bandwith up... 
100% SI on CPU0


I remember patch to fix problem like it... patched function 
e1000_clean...  kernel on pc have this patch (2.6.24-rc7-git2)... e1000 
driver work much better (i up to 1.5-2x bandwidth before i get 100% SI), 
but i think that it not get 100% that it can =)


Thanks for answers and sorry for my English


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-18 Thread Jan-Bernd Themann
switching to proper mail client...

Dave Hansen [EMAIL PROTECTED] wrote on 15.02.2008 17:55:38:

 I've been thinking about that, and I don't think you really *need* to
 keep a comprehensive map like that. 
 
 When the memory is in a particular configuration (range of memory
 present along with unique set of holes) you get a unique ehea_bmap
 configuration.  That layout is completely predictable.
 
 So, if at any time you want to figure out what the ehea_bmap address for
 a particular *Linux* virtual address is, you just need to pretend that
 you're creating the entire ehea_bmap, use the same algorithm and figure
 out host you would have placed things, and use that result.
 
 Now, that's going to be a slow, crappy linear search (but maybe not as
 slow as recreating the silly thing).  So, you might eventually run into
 some scalability problems with a lot of packets going around.  But, I'd
 be curious if you do in practice.

Up to 14 addresses translation per packet (sg_list) might be required on 
the transmit side. On receive side it is only 1. Most packets require only 
very few translations (1 or sometimes more)  translations. However, 
with more then 700.000 packets per second this approach does not seem 
reasonable from performance perspective when memory is fragmented as you
described.

 
 The other idea is that you create a mapping that is precisely 1:1 with
 kernel memory.  Let's say you have two sections present, 0 and 100.  You
 have a high_section_index of 100, and you vmalloc() a 100 entry array.
 
 You need to create a *CONTIGUOUS* ehea map?  Create one like this:
 
 EHEA_VADDR-Linux Section
 0-0
 1-0
 2-0
 3-0
 ...
 100-100
 
 It's contiguous.  Each area points to a valid Linux memory address.
 It's also discernable in O(1) to what EHEA address a given Linux address
 is mapped.  You just have a couple of duplicate entries. 

This has a serious issues with constraint I mentions in the previous mail: 

- MRs can have a maximum size of the memory available under linux

The requirement is not met that the memory region must not be 
larger then the available memory for that partition. The create MR 
H_CALL will fails (we tried this and discussed with FW development)


Regards,
Jan-Bernd  Christoph
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24 1/1] sch_htb: fix too many events situation

2008-02-18 Thread Martin Devera

David Miller wrote:

From: Martin Devera [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 09:03:52 +0100


aha, ok, I'm not so informed about crossplatform issues.
I was also thining about looking at jiffies value and stop once
it is startjiffy+2, but with NO_HZ introduction, are jiffies
still incremented ?


There should always be at least once cpu tasked with incrementing
jiffies.  Lots of stuff would break if not :-)



Aha ok, so that when (at least one) cpu is busy then I can count on
jiffies incrementing via do_timer, can't I ?
So that I'd remove the loop limit altogether but limiting it to
1 or 2 jiffies to prevent livelock.
Like
max_jiff = jiffies+2; /* not +1 at we could be at +0. now */
while (jiffiesmax_jiff) do_hard_potentionaly_long_work();
if (more_work) schedule_to_next_jiffie();

This will keep event queue work load under 66% of system load which
seems reasonable to me.

Would you accept such solution ?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Eric Dumazet
On Mon, 18 Feb 2008 16:12:38 +0800
Zhang, Yanmin [EMAIL PROTECTED] wrote:

 On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:
  From: Eric Dumazet [EMAIL PROTECTED]
  Date: Fri, 15 Feb 2008 15:21:48 +0100
  
   On linux-2.6.25-rc1 x86_64 :
   
   offsetof(struct dst_entry, lastuse)=0xb0
   offsetof(struct dst_entry, __refcnt)=0xb8
   offsetof(struct dst_entry, __use)=0xbc
   offsetof(struct dst_entry, next)=0xc0
   
   So it should be optimal... I dont know why tbench prefers __refcnt being 
   on 0xc0, since in this case lastuse will be on a different cache line...
   
   Each incoming IP packet will need to change lastuse, __refcnt and __use, 
   so keeping them in the same cache line is a win.
   
   I suspect then that even this patch could help tbench, since it avoids 
   writing lastuse...
  
  I think your suspicions are right, and even moreso
  it helps to keep __refcnt out of the same cache line
  as input/output/ops which are read-almost-entirely :-
 I think you are right. The issue is these three variables sharing the same 
 cache line
 with input/output/ops.
 
  )
  
  I haven't done an exhaustive analysis, but it seems that
  the write traffic to lastuse and __refcnt are about the
  same.  However if we find that __refcnt gets hit more
  than lastuse in this workload, it explains the regression.
 I also think __refcnt is the key. I did a new testing by adding 2 unsigned 
 long
 pading before lastuse, so the 3 members are moved to next cache line. The 
 performance is
 recovered.
 
 How about below patch? Almost all performance is recovered with the new patch.
 
 Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]
 
 ---
 
 --- linux-2.6.25-rc1/include/net/dst.h2008-02-21 14:33:43.0 
 +0800
 +++ linux-2.6.25-rc1_work/include/net/dst.h   2008-02-21 14:36:22.0 
 +0800
 @@ -52,11 +52,10 @@ struct dst_entry
   unsigned short  header_len; /* more space at head required 
 */
   unsigned short  trailer_len;/* space to reserve at tail */
  
 - u32 metrics[RTAX_MAX];
 - struct dst_entry*path;
 -
 - unsigned long   rate_last;  /* rate limiting for ICMP */
   unsigned intrate_tokens;
 + unsigned long   rate_last;  /* rate limiting for ICMP */
 +
 + struct dst_entry*path;
  
  #ifdef CONFIG_NET_CLS_ROUTE
   __u32   tclassid;
 @@ -70,10 +69,12 @@ struct dst_entry
   int (*output)(struct sk_buff*);
  
   struct  dst_ops *ops;
 - 
 - unsigned long   lastuse;
 +
 + u32 metrics[RTAX_MAX];
 +
   atomic_t__refcnt;   /* client references*/
   int __use;
 + unsigned long   lastuse;
   union {
   struct dst_entry *next;
   struct rtable*rt_next;
 
 

Well, after this patch, we grow dst_entry by 8 bytes :

sizeof(struct dst_entry)=0xd0
offsetof(struct dst_entry, input)=0x68
offsetof(struct dst_entry, output)=0x70
offsetof(struct dst_entry, __refcnt)=0xb4
offsetof(struct dst_entry, lastuse)=0xc0
offsetof(struct dst_entry, __use)=0xb8
sizeof(struct rtable)=0x140


So we dirty two cache lines instead of one, unless your cpu have 128 bytes 
cache lines ?

I am quite suprised that my patch to not change lastuse if already set to 
jiffies changes nothing...

If you have some time, could you also test this (unrelated) patch ?

We can avoid dirty all the time a cache line of loopback device.

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index f2a6e71..0a4186a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct 
net_device *dev)
return 0;
}
 #endif
-   dev-last_rx = jiffies;
+#ifdef CONFIG_SMP
+   if (dev-last_rx != jiffies)
+#endif
+   dev-last_rx = jiffies;
 
/* it's OK to use per_cpu_ptr() because BHs are off */
pcpu_lstats = netdev_priv(dev);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24 1/1] sch_htb: fix too many events situation

2008-02-18 Thread David Miller
From: Martin Devera [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 11:08:09 +0100

 Like
 max_jiff = jiffies+2; /* not +1 at we could be at +0. now */
 while (jiffiesmax_jiff) do_hard_potentionaly_long_work();
 if (more_work) schedule_to_next_jiffie();
 
 This will keep event queue work load under 66% of system load which
 seems reasonable to me.
 
 Would you accept such solution ?

Sure.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][IBMVETH]: Use single_open instead of manual manipulations.

2008-02-18 Thread Pavel Emelyanov
The code opening proc entry for each device makes the
same thing, as the single_open does, so remove the
unneeded code.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/drivers/net/ibmveth.c b/drivers/net/ibmveth.c
index 57772be..bb31e09 100644
--- a/drivers/net/ibmveth.c
+++ b/drivers/net/ibmveth.c
@@ -1259,26 +1259,7 @@ static void ibmveth_proc_unregister_driver(void)
remove_proc_entry(IBMVETH_PROC_DIR, init_net.proc_net);
 }
 
-static void *ibmveth_seq_start(struct seq_file *seq, loff_t *pos)
-{
-   if (*pos == 0) {
-   return (void *)1;
-   } else {
-   return NULL;
-   }
-}
-
-static void *ibmveth_seq_next(struct seq_file *seq, void *v, loff_t *pos)
-{
-   ++*pos;
-   return NULL;
-}
-
-static void ibmveth_seq_stop(struct seq_file *seq, void *v)
-{
-}
-
-static int ibmveth_seq_show(struct seq_file *seq, void *v)
+static int ibmveth_show(struct seq_file *seq, void *v)
 {
struct ibmveth_adapter *adapter = seq-private;
char *current_mac = ((char*) adapter-netdev-dev_addr);
@@ -1302,27 +1283,10 @@ static int ibmveth_seq_show(struct seq_file *seq, void 
*v)
 
return 0;
 }
-static struct seq_operations ibmveth_seq_ops = {
-   .start = ibmveth_seq_start,
-   .next  = ibmveth_seq_next,
-   .stop  = ibmveth_seq_stop,
-   .show  = ibmveth_seq_show,
-};
 
 static int ibmveth_proc_open(struct inode *inode, struct file *file)
 {
-   struct seq_file *seq;
-   struct proc_dir_entry *proc;
-   int rc;
-
-   rc = seq_open(file, ibmveth_seq_ops);
-   if (!rc) {
-   /* recover the pointer buried in proc_dir_entry data */
-   seq = file-private_data;
-   proc = PDE(inode);
-   seq-private = proc-data;
-   }
-   return rc;
+   return single_open(file, ibmveth_show, PDE(inode)-data);
 }
 
 static const struct file_operations ibmveth_proc_fops = {
@@ -1330,7 +1294,7 @@ static const struct file_operations ibmveth_proc_fops = {
.open= ibmveth_proc_open,
.read= seq_read,
.llseek  = seq_lseek,
-   .release = seq_release,
+   .release = single_release,
 };
 
 static void ibmveth_proc_register_adapter(struct ibmveth_adapter *adapter)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.

2008-02-18 Thread Pavel Emelyanov
Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index f93407c..bab72b6 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1151,7 +1151,7 @@ static void fib6_del_route(struct fib6_node *fn, struct 
rt6_info **rtp,
fn = fn-parent;
}
/* No more references are possible at this point. */
-   if (atomic_read(rt-rt6i_ref) != 1) BUG();
+   BUG_ON(atomic_read(rt-rt6i_ref) != 1);
}
 
inet6_rt_notify(RTM_DELROUTE, rt, info);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?

2008-02-18 Thread Andrew Morton
On Sun, 17 Feb 2008 00:54:08 + (GMT) Chris Rankin [EMAIL PROTECTED] wrote:

 [Try this again, except this time I'll force the attachment as inline text!]
 
 Hi,
 
 I have managed to boot 2.6.24.1 on this machine, with the NMI watchdog 
 enabled, by using the
 acpi=noirq option. (There does seem to be some unhappiness with bridge 
 symlinks in sysfs,
 though.)
 
 ...

 sysfs: duplicate filename 'bridge' can not be created
 WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
 Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
  [c0105020] show_trace_log_lvl+0x1a/0x2f
  [c0105990] show_trace+0x12/0x14
  [c010613d] dump_stack+0x6c/0x72
  [c01991bf] sysfs_add_one+0x57/0xbc
  [c0199e41] sysfs_create_link+0xc2/0x10d
  [c01bae9a] pci_bus_add_devices+0xbd/0x103
  [c034016c] pci_legacy_init+0x56/0xe3
  [c03274e1] kernel_init+0x157/0x2c3
  [c0104c83] kernel_thread_helper+0x7/0x10
  ===
 pci :00:01.0: Error creating sysfs bridge symlink, continuing...
 sysfs: duplicate filename 'bridge' can not be created
 WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
 Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
  [c0105020] show_trace_log_lvl+0x1a/0x2f
  [c0105990] show_trace+0x12/0x14
  [c010613d] dump_stack+0x6c/0x72
  [c01991bf] sysfs_add_one+0x57/0xbc
  [c0199e41] sysfs_create_link+0xc2/0x10d
  [c01bae9a] pci_bus_add_devices+0xbd/0x103
  [c01bae82] pci_bus_add_devices+0xa5/0x103
  [c034016c] pci_legacy_init+0x56/0xe3
  [c03274e1] kernel_init+0x157/0x2c3
  [c0104c83] kernel_thread_helper+0x7/0x10
  ===

I have a vague feeling that this was fixed, perhaps in 2.6.24.x?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.25-rc2] e100: Trying to free already-free IRQ 11 during suspend ...

2008-02-18 Thread Andrew Morton
On Sun, 17 Feb 2008 15:36:50 +0300 Andrey Borzenkov [EMAIL PROTECTED] wrote:

 ... and possibly reboot/poweroff (it flows by too fast to be legible).
 
 [ 8803.850634] ACPI: Preparing to enter system sleep state S3
 [ 8803.853141] Suspending console(s)
 [ 8805.287505] serial 00:09: disabled
 [ 8805.291564] Trying to free already-free IRQ 11
 [ 8805.291579] Pid: 6920, comm: pm-suspend Not tainted 2.6.25-rc2-1avb #2
 [ 8805.291628]  [c0152127] free_irq+0xb7/0x130
 [ 8805.291675]  [c024bd80] e100_suspend+0xc0/0x100
 [ 8805.291724]  [c01eaa36] pci_device_suspend+0x26/0x70
 [ 8805.291747]  [c0243674] suspend_device+0x94/0xd0
 [ 8805.291763]  [c02439a3] device_suspend+0x153/0x240
 [ 8805.291784]  [c014314f] suspend_devices_and_enter+0x4f/0xf0
 [ 8805.291808]  [c0143a5f] ? freeze_processes+0x3f/0x80
 [ 8805.291825]  [c01432fa] enter_state+0xaa/0x140
 [ 8805.291840]  [c014341f] state_store+0x8f/0xd0
 [ 8805.291852]  [c0143390] ? state_store+0x0/0xd0
 [ 8805.291866]  [c01d3404] kobj_attr_store+0x24/0x30
 [ 8805.291901]  [c01b547b] sysfs_write_file+0xbb/0x110
 [ 8805.291936]  [c0177d79] vfs_write+0x99/0x130
 [ 8805.291963]  [c01b53c0] ? sysfs_write_file+0x0/0x110
 [ 8805.291979]  [c01782fd] sys_write+0x3d/0x70
 [ 8805.291998]  [c010409a] sysenter_past_esp+0x5f/0xa5
 [ 8805.292038]  ===
 [ 8805.347640] ACPI: PCI interrupt for device :00:06.0 disabled
 [ 8805.361128] ACPI: PCI interrupt for device :00:02.0 disabled
 [ 8805.376670]  hwsleep-0322 [00] enter_sleep_state : Entering sleep 
 state [S3]
 [ 8805.376670] Back to C!
 
 Interface is unused normally (only for netconsole sometimes). dmesg and config
 attached.

Does reverting this:

commit 8543da6672b0994921f014f2250e27ae81645580
Author: Auke Kok [EMAIL PROTECTED]
Date:   Wed Dec 12 16:30:42 2007 -0800

e100: free IRQ to remove warningwhenrebooting

with this patch:

--- a/drivers/net/e100.c~revert-1
+++ a/drivers/net/e100.c
@@ -2804,9 +2804,8 @@ static int e100_suspend(struct pci_dev *
pci_enable_wake(pdev, PCI_D3cold, 0);
}
 
-   free_irq(pdev-irq, netdev);
-
pci_disable_device(pdev);
+   free_irq(pdev-irq, netdev);
pci_set_power_state(pdev, PCI_D3hot);
 
return 0;
@@ -2848,8 +2847,6 @@ static void e100_shutdown(struct pci_dev
pci_enable_wake(pdev, PCI_D3cold, 0);
}
 
-   free_irq(pdev-irq, netdev);
-
pci_disable_device(pdev);
pci_set_power_state(pdev, PCI_D3hot);
 }
_

fix it?

 Hmm ... after resume device has disappeared at all ...
 
 {pts/1}% cat /proc/interrupts
CPU0
   0:1290492XT-PIC-XTtimer
   1:   6675XT-PIC-XTi8042
   2:  0XT-PIC-XTcascade
   3:  2XT-PIC-XT
   4:  2XT-PIC-XT
   5:  3XT-PIC-XT
   7:  4XT-PIC-XTirda0
   8:  0XT-PIC-XTrtc0
   9:583XT-PIC-XTacpi
  10:  2XT-PIC-XT
  11:  31483XT-PIC-XTyenta, yenta, yenta, ohci_hcd:usb1, ALI 
 5451, pcmcia0.0
  12:  28070XT-PIC-XTi8042
  14:  21705XT-PIC-XTide0
  15:  82123XT-PIC-XTide1
 NMI:  0   Non-maskable interrupts
 TRM:  0   Thermal event interrupts
 SPU:  0   Spurious interrupts
 ERR:  0

I hope that's not a separate bug...
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.25-rc2, 2.6.24-rc8] page allocation failure...

2008-02-18 Thread Andrew Morton
On Sun, 17 Feb 2008 13:20:59 + Daniel J Blueman [EMAIL PROTECTED] wrote:

 I'm still hitting this with e1000e on 2.6.25-rc2, 10 times again.
 
 It's clearly non-fatal, but then do we expect it to occur?
 
 Daniel
 
 --- [dmesg]
 
 [ 1250.822786] swapper: page allocation failure. order:3, mode:0x4020
 [ 1250.822786] Pid: 0, comm: swapper Not tainted 2.6.25-rc2-119 #2
 [ 1250.822786]
 [ 1250.822786] Call Trace:
 [ 1250.822786]  IRQ  [8025fe9e] __alloc_pages+0x34e/0x3a0
 [ 1250.822786]  [8048c6df] ? __netdev_alloc_skb+0x1f/0x40
 [ 1250.822786]  [8027acc2] __slab_alloc+0x102/0x3d0
 [ 1250.822786]  [8048c6df] ? __netdev_alloc_skb+0x1f/0x40
 [ 1250.822786]  [8027b8cb] __kmalloc_track_caller+0x7b/0xc0
 [ 1250.822786]  [8048b74f] __alloc_skb+0x6f/0x160
 [ 1250.822786]  [8048c6df] __netdev_alloc_skb+0x1f/0x40
 [ 1250.822786]  [8042652d] e1000_alloc_rx_buffers+0x1ed/0x260
 [ 1250.822786]  [80426b5a] e1000_clean_rx_irq+0x22a/0x330
 [ 1250.822786]  [80422981] e1000_clean+0x1e1/0x540
 [ 1250.822786]  [8024b7a5] ? tick_program_event+0x45/0x70
 [ 1250.822786]  [804930ba] net_rx_action+0x9a/0x150
 [ 1250.822786]  [802336b4] __do_softirq+0x74/0xf0
 [ 1250.822786]  [8020c5fc] call_softirq+0x1c/0x30
 [ 1250.822786]  [8020eaad] do_softirq+0x3d/0x80
 [ 1250.822786]  [80233635] irq_exit+0x85/0x90
 [ 1250.822786]  [8020eba5] do_IRQ+0x85/0x100
 [ 1250.822786]  [8020a5b0] ? mwait_idle+0x0/0x50
 [ 1250.822786]  [8020b981] ret_from_intr+0x0/0xa
 [ 1250.822786]  EOI  [8020a5f5] ? mwait_idle+0x45/0x50
 [ 1250.822786]  [80209a92] ? enter_idle+0x22/0x30
 [ 1250.822786]  [8020a534] ? cpu_idle+0x74/0xa0
 [ 1250.822786]  [80527825] ? rest_init+0x55/0x60

They're regularly reported with e1000 too - I don't think aything really
changed.

e1000 has this crazy problem where because of a cascade of follies (mainly
borked hardware) it has to do a 32kb allocation for a 9kb(?) packet.  It
would be sad if that was carried over into e1000e?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?

2008-02-18 Thread Chris Rankin
--- Andrew Morton [EMAIL PROTECTED] wrote:
  sysfs: duplicate filename 'bridge' can not be created
  WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
  Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
   [c0105020] show_trace_log_lvl+0x1a/0x2f
   [c0105990] show_trace+0x12/0x14
   [c010613d] dump_stack+0x6c/0x72
   [c01991bf] sysfs_add_one+0x57/0xbc
   [c0199e41] sysfs_create_link+0xc2/0x10d
   [c01bae9a] pci_bus_add_devices+0xbd/0x103
   [c034016c] pci_legacy_init+0x56/0xe3
   [c03274e1] kernel_init+0x157/0x2c3
   [c0104c83] kernel_thread_helper+0x7/0x10
   ===
  pci :00:01.0: Error creating sysfs bridge symlink, continuing...
  sysfs: duplicate filename 'bridge' can not be created
  WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
  Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
   [c0105020] show_trace_log_lvl+0x1a/0x2f
   [c0105990] show_trace+0x12/0x14
   [c010613d] dump_stack+0x6c/0x72
   [c01991bf] sysfs_add_one+0x57/0xbc
   [c0199e41] sysfs_create_link+0xc2/0x10d
   [c01bae9a] pci_bus_add_devices+0xbd/0x103
   [c01bae82] pci_bus_add_devices+0xa5/0x103
   [c034016c] pci_legacy_init+0x56/0xe3
   [c03274e1] kernel_init+0x157/0x2c3
   [c0104c83] kernel_thread_helper+0x7/0x10
   ===
 
 I have a vague feeling that this was fixed, perhaps in 2.6.24.x?

Obviously not in 2.6.24.1, and I thought that 2.6.24.2 just added the fix for 
the vmsplice
exploit. So unless 2.6.24.3 has been released...?

Cheers,
Chris



  ___
Support the World Aids Awareness campaign this month with Yahoo! For Good 
http://uk.promotions.yahoo.com/forgood/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/1] claw: make use of DIV_ROUND_UP

2008-02-18 Thread Ursula Braun
From: Julia Lawall [EMAIL PROTECTED]

The kernel.h macro DIV_ROUND_UP performs the computation
(((n) + (d) - 1) / (d)) but is perhaps more readable.

Signed-off-by: Ursula Braun [EMAIL PROTECTED]
---

 drivers/s390/net/claw.c |   39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

Index: linux-2.6-uschi/drivers/s390/net/claw.c
===
--- linux-2.6-uschi.orig/drivers/s390/net/claw.c
+++ linux-2.6-uschi/drivers/s390/net/claw.c
@@ -1851,8 +1851,7 @@ claw_hw_tx(struct sk_buff *skb, struct n
 }
 }
 /*  See how many write buffers are required to hold this data */
-numBuffers= ( skb-len + privptr-p_env-write_size - 1) /
-   ( privptr-p_env-write_size);
+   numBuffers = DIV_ROUND_UP(skb-len, privptr-p_env-write_size);
 
 /*  If that number of buffers isn't available, give up for now */
 if (privptr-write_free_count  numBuffers ||
@@ -2114,8 +2113,7 @@ init_ccw_bk(struct net_device *dev)
 */
 ccw_blocks_perpage= PAGE_SIZE /  CCWBK_SIZE;
 ccw_pages_required=
-   (ccw_blocks_required+ccw_blocks_perpage -1) /
-ccw_blocks_perpage;
+   DIV_ROUND_UP(ccw_blocks_required, ccw_blocks_perpage);
 
 #ifdef DEBUGMSG
 printk(KERN_INFO %s: %s()  ccw_blocks_perpage=%d\n,
@@ -2131,30 +2129,29 @@ init_ccw_bk(struct net_device *dev)
 * provide good performance. With packing buffers support 32k
 * buffers are used.
  */
-if (privptr-p_env-read_size  PAGE_SIZE) {
-claw_reads_perpage= PAGE_SIZE / privptr-p_env-read_size;
-claw_read_pages= (privptr-p_env-read_buffers +
-   claw_reads_perpage -1) / claw_reads_perpage;
+   if (privptr-p_env-read_size  PAGE_SIZE) {
+   claw_reads_perpage = PAGE_SIZE / privptr-p_env-read_size;
+   claw_read_pages = DIV_ROUND_UP(privptr-p_env-read_buffers,
+   claw_reads_perpage);
  }
  else {   /*  or equal  */
-privptr-p_buff_pages_perread=
-   (privptr-p_env-read_size + PAGE_SIZE - 1) / PAGE_SIZE;
-claw_read_pages=
-   privptr-p_env-read_buffers * privptr-p_buff_pages_perread;
+   privptr-p_buff_pages_perread =
+   DIV_ROUND_UP(privptr-p_env-read_size, PAGE_SIZE);
+   claw_read_pages = privptr-p_env-read_buffers *
+   privptr-p_buff_pages_perread;
  }
 if (privptr-p_env-write_size  PAGE_SIZE) {
-claw_writes_perpage=
-   PAGE_SIZE / privptr-p_env-write_size;
-claw_write_pages=
-   (privptr-p_env-write_buffers + claw_writes_perpage -1) /
-   claw_writes_perpage;
+   claw_writes_perpage =
+   PAGE_SIZE / privptr-p_env-write_size;
+   claw_write_pages = DIV_ROUND_UP(privptr-p_env-write_buffers,
+   claw_writes_perpage);
 
 }
 else {  /*   or equal  */
-privptr-p_buff_pages_perwrite=
-(privptr-p_env-read_size + PAGE_SIZE - 1) / PAGE_SIZE;
-claw_write_pages=
-   privptr-p_env-write_buffers * privptr-p_buff_pages_perwrite;
+   privptr-p_buff_pages_perwrite =
+   DIV_ROUND_UP(privptr-p_env-read_size, PAGE_SIZE);
+   claw_write_pages = privptr-p_env-write_buffers *
+   privptr-p_buff_pages_perwrite;
 }
 #ifdef DEBUGMSG
 if (privptr-p_env-read_size  PAGE_SIZE) {

-- 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/1] s390: claw - More use DIV_ROUND_UP

2008-02-18 Thread Ursula Braun
-- 
Jeff,

this patch is intended for 2.6.25.
It makes use of the DIV_ROUND_UP function as proposed by Julia Lawall.

Regards, Ursula Braun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ipv6 debugging

2008-02-18 Thread Ferenc Wagner
Hi,

I'm kindly asking for some debugging tips with the following problem:
a machine is running Linux 2.6.24.2, several 802.1q VLAN-s over
active/backup bonding over two physical interfaces.  Everything is
allright, except for after a reboot, there's no IPv6, while IPv4
works.  The router's ARP(6) table is empty, the machine doesn't answer
ping6.  However, if I start tcpdump -i bond0 ip6, everything is
allright again.  There are some indications that after some period
without IPv6 traffic, the same can happen again.  Are there known
issues which can exhibit themselves like this?  Other very similar
setups don't show this erratic behaviour.

I know that the above doesn't give a fully detailed picture, but
thought that I'd better ask before taking the setup into pieces.
-- 
Thanks for your thoughts,
Feri.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.24-mm1] error compiling net driver NE2000/NE1000

2008-02-18 Thread Pierre Peiffer
Hi,

I don't know if I have to warn on this or not, but as I didn't find any
discussion, it's probably better to mention it: the compiling error reported
below (or here: http://lkml.org/lkml/2008/2/4/173 ) does not seem to be
corrected in 2.6.25-rc2.mm1...  So, I don't know if a fix is going on somewhere
or if the bug has fallen in a black hole.

(In the original mail, I've proposed a patch as a quick fix, but I don't know if
it can be considered as a definitive correction or not)

Thanks,

P.

Andrew Morton wrote:
 On Mon, 4 Feb 2008 16:29:21 +0100
 Pierre Peiffer [EMAIL PROTECTED] wrote:
 
 Hi,

  When I compile the kernel 2.6.24-mm1 with:
 CONFIG_NET_ISA=y
 CONFIG_NE2000=y

 I have the following compile error:
 ...
   GEN .version
   CHK include/linux/compile.h
   UPD include/linux/compile.h
   CC  init/version.o
   LD  init/built-in.o
   LD  .tmp_vmlinux1
 drivers/built-in.o: In function `ne_block_output':
 linux-2.6.24-mm1/drivers/net/ne.c:797: undefined reference to `NS8390_init'
 drivers/built-in.o: In function `ne_drv_resume':
 linux-2.6.24-mm1/drivers/net/ne.c:858: undefined reference to `NS8390_init'
 drivers/built-in.o: In function `ne_probe1':
 linux-2.6.24-mm1/drivers/net/ne.c:539: undefined reference to `NS8390_init'
 make[1]: *** [.tmp_vmlinux1] Error 1
 make: *** [sub-make] Error 2
 
 Thanks for reporting this.
 
 As I saw that the file 8390p.c is compiled for this driver, but not the file 
 8390.c which contains this function NS8390_init(), I fixed this error with
 the following patch.
 
 Alan's
 8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core.patch
 would be a prime suspect.  I assume this bug isn't present ing mainline or
 in 2.6.24?
 
 As NS8390p_init() does the same thing than NS8390_init(), I suppose that 
 this is the right fix ?

 Signed-off-by: Pierre Peiffer [EMAIL PROTECTED]
 ---
  drivers/net/ne.c |6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 Index: b/drivers/net/ne.c
 ===
 --- a/drivers/net/ne.c
 +++ b/drivers/net/ne.c
 @@ -536,7 +536,7 @@ static int __init ne_probe1(struct net_d
  #ifdef CONFIG_NET_POLL_CONTROLLER
  dev-poll_controller = eip_poll;
  #endif
 -NS8390_init(dev, 0);
 +NS8390p_init(dev, 0);
  
  ret = register_netdev(dev);
  if (ret)
 @@ -794,7 +794,7 @@ retry:
  if (time_after(jiffies, dma_start + 2*HZ/100)) {
 /* 20ms */
  printk(KERN_WARNING %s: timeout waiting for Tx 
 RDC.\n, dev-name);
  ne_reset_8390(dev);
 -NS8390_init(dev,1);
 +NS8390p_init(dev,1);
  break;
  }
  
 @@ -855,7 +855,7 @@ static int ne_drv_resume(struct platform
  
  if (netif_running(dev)) {
  ne_reset_8390(dev);
 -NS8390_init(dev, 1);
 +NS8390p_init(dev, 1);
  netif_device_attach(dev);
  }
  return 0;
 
 
 
 

-- 
Pierre Peiffer
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] remove include/linux/netfilter_ipv4/ipt_SAME.h

2008-02-18 Thread Patrick McHardy

Adrian Bunk wrote:

This patch removes the no longer used include/linux/netfilter_ipv4/ipt_SAME.h



We kept it around because old iptables binaries need it to build.
The kernel no longer supports it, but people might still wish to
use a distributor-built iptables binary with old kernels. It will
be removed with a number of other headers kept for compatibility
in 1-2 years.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread Patrick McHardy

Joe Perches wrote:

On Fri, 2008-02-15 at 02:58 -0800, David Miller wrote:

From: Bruno Randolf [EMAIL PROTECTED]
Date: Fri, 15 Feb 2008 19:48:05 +0900

is there any chance to include a macro like this for printing mac

addresses?

its advantage is that it can be used without the need to declare

buffers for

print_mac(), for example:

We specifically removed this sort of thing, please don't
add it back.


Why?


@@ -404,11 +405,8 @@ static int vlan_dev_hard_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
 	pr_debug(%s: about to send skb: %p to dev: %s\n,

__FUNCTION__, skb, skb-dev-name);
-   pr_debug(   MAC_FMT   MAC_FMT  %4hx %4hx %4hx\n,
-veth-h_dest[0], veth-h_dest[1], veth-h_dest[2],
-veth-h_dest[3], veth-h_dest[4], veth-h_dest[5],
-veth-h_source[0], veth-h_source[1], veth-h_source[2],
-veth-h_source[3], veth-h_source[4], veth-h_source[5],
+   pr_debug(  %s %s %4hx %4hx %4hx\n,
+print_mac(mac, veth-h_dest), print_mac(mac2, veth-h_source),



This results in print_mac getting called twice per packet even without
debugging. Whats the problem with MAC_FMT?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ipv6 debugging

2008-02-18 Thread Jorge Boncompte [DTI2]
   This sounds to me like the same problem that I was having with OSPF, I 
think ARP(6) uses multicast ethernet address too. Can you try if the patch 
below, that I sent Patrick McHardy some days ago, fixes your problem?


   Regards,

   Jorge
---
   Hi Patrick,

   Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 from you
   introduced a new field da_synced to struct dev_addr_list that is
   not properly initialized to 0. So when any of the current users (8021q,
   macvlan, mac80211) calls dev_mc_sync/unsync they mess the address
   list for both devices.

   The attached patch fixed it for me and avoid future problems.

   Regards,

   Jorge

Signed-off-by: Jorge Boncompte [DTI2] [EMAIL PROTECTED]
---
diff --git a/net/core/dev.c b/net/core/dev.c
index 9549417..f1b6708 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int
*count,
   }
   }

-   da = kmalloc(sizeof(*da), GFP_ATOMIC);
+   da = kzalloc(sizeof(*da), GFP_ATOMIC);
   if (da == NULL)
   return -ENOMEM;
   memcpy(da-da_addr, addr, alen);

- Original Message - 
From: Ferenc Wagner [EMAIL PROTECTED]

To: netdev@vger.kernel.org
Sent: Monday, February 18, 2008 3:06 PM
Subject: ipv6 debugging



Hi,

I'm kindly asking for some debugging tips with the following problem:
a machine is running Linux 2.6.24.2, several 802.1q VLAN-s over
active/backup bonding over two physical interfaces.  Everything is
allright, except for after a reboot, there's no IPv6, while IPv4
works.  The router's ARP(6) table is empty, the machine doesn't answer
ping6.  However, if I start tcpdump -i bond0 ip6, everything is
allright again.  There are some indications that after some period
without IPv6 traffic, the same can happen again.  Are there known
issues which can exhibit themselves like this?  Other very similar
setups don't show this erratic behaviour.

I know that the above doesn't give a fully detailed picture, but
thought that I'd better ask before taking the setup into pieces.
--
Thanks for your thoughts,
Feri.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


protocol 0300 is buggy spam in dmesg when injectingcapturing on same interface

2008-02-18 Thread Pekka Pietikainen
When playing with some L2 level fuzzing I started getting lots of
protocol 0300 is buggy, dev eth3 spew in dmesg. That interface is also
capturing the traffic that's being sent, that's probably why the
dev_queue_xmit_nit codepath is getting called in the first place.

Tested on 2.6.23-as-shipped-in-F8. didn't spot any relevant changes in .24
but can pretty easily verify there too. 

Oh. That printk wasn't very easy to find:

  if (net_ratelimit())
printk(KERN_CRIT protocol %04x is 
   buggy, dev %s\n,


and I naturally grepped for is buggy. 

Any ideas? Add a If it came from AF_PACKET, don't print out anything to
that if-statement?


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] fib_trie: move statistics to debugfs

2008-02-18 Thread Stephen Hemminger
On Sun, 17 Feb 2008 22:26:55 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Wed, 13 Feb 2008 11:58:06 -0800
 
  Don't want /proc/net/fib_trie and /proc/net/fib_triestat to become
  permanent kernel space ABI issues, so move to the safer confines of debugfs.
  
  Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
 
 Stephen, the cat is already out of the bag.  We already export this
 thing so if you want to export different stuff you'll have to provide
 it via some other means, somewhere else.
 
 Thanks.

Are we stuck with the format problems?
  * crappy tree printout
  * not printing other tables
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compex FreedomLine 32 PnP-PCI2 broken with de2104x

2008-02-18 Thread Ondrej Zary
On Monday 18 February 2008 04:21:11 Grant Grundler wrote:
 On Wed, Jan 30, 2008 at 09:23:06PM +0100, Ondrej Zary wrote:
  On Saturday 26 January 2008 21:58:10 Ondrej Zary wrote:
   Hello,
   I was having problems with these FreedomLine cards with Linux before
   but tested it thoroughly today. This card uses DEC 21041 chip and has
   TP and BNC connectors:
  
   00:12.0 Ethernet controller [0200]: Digital Equipment Corporation
   DECchip 21041 [Tulip Pass 3] [1011:0014] (rev 21)
  
  
   de2104x driver was loaded automatically by udev and card seemed to
   work. Until I disconnected the TP cable and putting it back after a
   while. The driver then switched to (non-existing) AUI port and remained
   there. I tried to set media to TP using ethtool - and the whole kernel
   crashed because of BUG_ON(de_is_running(de));
   in de_set_media(). Seems that the driver is unable to stop the DMA in
   de_stop_rxtx().

 The BUG_ON() is probably fine normally. But the media selection sounds
 broken. It's possible to select the wrong media type with 21040 chip but
 shouldn't be possible with 21041. For 21040 support, see
 de21040_get_media_info(). But de21041_get_srom_info() is expected to
 determine which media
 types are supported from SEPROM media blocks.   My guess is that code
 is broken since it seems to work with de405 driver. If you care to
 work the difference, I'd be happy to make a patch to fix that up.

I don't think that BUG_ON() should be there. It should probably printk a 
warning but certainly not crash the whole machine.

 Also, from code review, DE2104X driver still has a few places with
 potential PCI MMIO Write posting issues.  Specifically, I was looking
 in de_stop_hw() and de_set_media(). Several other bits of code correctly
 flush MMIO writes: e.g. tulip_read_eeprom().

   I commented out AUI detection in the driver - this time it switched to
   BNC after unplugging the cable and remained there. I also attempted to
   reset the chip when de_stop_rxtx failed but failed to do it.

 You'd have to basically hardcode only one media type and it's corresponding
 parameters.

That's bad. It just works with de4x5 with any cable at any time.

   Then I found that there's de4x5 driver which supports the same cards as
   de2104x (and some other too) - and this one works fine! I can plug and
   unplug the cable and even change between TP and BNC ports just by
   unplugging one and plugging the other cable in. Unfortunately, this
   driver is blacklisted by default - at least in Slackware and Debian.

 ISTR there was a time when tulip would compete with de4x5 for devices.
 tulip is the preferred driver. That's clearly no longer the case
 and perhaps both distro's need to revisit this.

de4x5 has no MODULE_DEVICE_TABLE for PCI devices anymore, so no conflicts. 
That's probably good for cards that work with tulip driver but bad for mine 
card and also probably for some other cards that (should) work with de2104x.


   The question is: why does de2104x exist? Does it work better with some
   hardware?

 de2104x is a work in progress.
 That's why it's marked EXPERIMENTAL in the Kconfig file.

Great, it looks to be 6 years old and it's still experimental. Probably 
because it never worked properly.

I think that de2104x driver should be removed (or at least its 
MODULE_DEVICE_TABLE) and MODULE_DEVICE_TABLE with only 21040 and 21041 PCI 
IDs added to de4x5.

I can send a patch if this is acceptable.


   BTW. Found that the problem exist at least since 2003:
   http://oss.sgi.com/archives/netdev/2003-08/msg00951.html
 
  Does the de2104x driver work correctly for anyone?

 No idea. I've only used tulip driver.

 thanks for the bug report,
 grant
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/



-- 
Ondrej Zary
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?

2008-02-18 Thread Stephen Hemminger
On Mon, 18 Feb 2008 05:00:49 -0800
Andrew Morton [EMAIL PROTECTED] wrote:

 On Sun, 17 Feb 2008 00:54:08 + (GMT) Chris Rankin [EMAIL PROTECTED] 
 wrote:
 
  [Try this again, except this time I'll force the attachment as inline text!]
  
  Hi,
  
  I have managed to boot 2.6.24.1 on this machine, with the NMI watchdog 
  enabled, by using the
  acpi=noirq option. (There does seem to be some unhappiness with bridge 
  symlinks in sysfs,
  though.)
  
  ...
 
  sysfs: duplicate filename 'bridge' can not be created
  WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
  Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
   [c0105020] show_trace_log_lvl+0x1a/0x2f
   [c0105990] show_trace+0x12/0x14
   [c010613d] dump_stack+0x6c/0x72
   [c01991bf] sysfs_add_one+0x57/0xbc
   [c0199e41] sysfs_create_link+0xc2/0x10d
   [c01bae9a] pci_bus_add_devices+0xbd/0x103
   [c034016c] pci_legacy_init+0x56/0xe3
   [c03274e1] kernel_init+0x157/0x2c3
   [c0104c83] kernel_thread_helper+0x7/0x10
   ===
  pci :00:01.0: Error creating sysfs bridge symlink, continuing...
  sysfs: duplicate filename 'bridge' can not be created
  WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
  Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
   [c0105020] show_trace_log_lvl+0x1a/0x2f
   [c0105990] show_trace+0x12/0x14
   [c010613d] dump_stack+0x6c/0x72
   [c01991bf] sysfs_add_one+0x57/0xbc
   [c0199e41] sysfs_create_link+0xc2/0x10d
   [c01bae9a] pci_bus_add_devices+0xbd/0x103
   [c01bae82] pci_bus_add_devices+0xa5/0x103
   [c034016c] pci_legacy_init+0x56/0xe3
   [c03274e1] kernel_init+0x157/0x2c3
   [c0104c83] kernel_thread_helper+0x7/0x10
   ===
 
 I have a vague feeling that this was fixed, perhaps in 2.6.24.x?

Never heard of this, what is the initialization script that causes this?
Also do you have the SYSFS_DEPRECATED option configured? that caused issues
with regular network drivers.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Valdis . Kletnieks
On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said:

 I also think __refcnt is the key. I did a new testing by adding 2 unsigned 
 long
 pading before lastuse, so the 3 members are moved to next cache line. The 
 performance is
 recovered.
 
 How about below patch? Almost all performance is recovered with the new patch.
 
 Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

Could you add a comment someplace that says refcnt wants to be on a different
cache line from input/output/ops or performance tanks badly, to warn some
future kernel hacker who starts adding new fields to the structure?


pgpVvmy7EVPXS.pgp
Description: PGP signature


Re: [PATHCH 1/16] ServerEngines 10Gb NIC driver

2008-02-18 Thread Subbu Seetharaman

Thanks for all comments.

I had run checkpatch and corrected all errors excepting a 
few errors about some macros and the warning about the 
typedefs.  The mail client I used to send the patch folded
lines at arbitrary points introduced several trailing white 
space.  This was also the reason for one of the patch not 
applying clean.   We will use git to generate the  patches
as suggested.

Our desire to share common code across drivers for other
OSes has been a cause for some ugliness in coding styles.   

I have one question about bit fields. Several of  
headers in the common code are  generated by  
srcgen from f/w source files.  Some of the  structures 
in these headers  have bit fields (with separate definitions 
for little endian and big endian hosts).  Are these un-acceptable 
in Linux driver submissions ?

Thanks.

Subbu
--


From: Stephen Hemminger [mailto:[EMAIL PROTECTED]
To: netdev@vger.kernel.org
Sent: Sun, 17 Feb 2008 09:44:45 -0800
Subject: Re: [PATHCH 1/16] ServerEngines 10Gb NIC driver

Do all vendor drivers have to come in with the same mistakes.
Where is the vendor driver ugly school, and how can the Linux
developers teach there?

Run this through checkpatch script or just read some of the
things that a quick scan shows.

snip 

___
This message, together with any attachment(s), contains confidential and 
proprietary information of
ServerEngines Corporation and is intended only for the designated recipient(s) 
named above. Any unauthorized
review, printing, retention, copying, disclosure or distribution is strictly 
prohibited.  If you are not the
intended recipient of this message, please immediately advise the sender by 
reply email message and
delete all copies of this message and any attachment(s). Thank you.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.25-rc2] e100: Trying to free already-free IRQ 11 during suspend ...

2008-02-18 Thread Andrey Borzenkov
On Monday 18 February 2008, Andrew Morton wrote:
 On Sun, 17 Feb 2008 15:36:50 +0300 Andrey Borzenkov [EMAIL PROTECTED] wrote:
 
  ... and possibly reboot/poweroff (it flows by too fast to be legible).
  
  [ 8803.850634] ACPI: Preparing to enter system sleep state S3
  [ 8803.853141] Suspending console(s)
  [ 8805.287505] serial 00:09: disabled
  [ 8805.291564] Trying to free already-free IRQ 11
  [ 8805.291579] Pid: 6920, comm: pm-suspend Not tainted 2.6.25-rc2-1avb #2
  [ 8805.291628]  [c0152127] free_irq+0xb7/0x130
  [ 8805.291675]  [c024bd80] e100_suspend+0xc0/0x100
  [ 8805.291724]  [c01eaa36] pci_device_suspend+0x26/0x70
  [ 8805.291747]  [c0243674] suspend_device+0x94/0xd0
  [ 8805.291763]  [c02439a3] device_suspend+0x153/0x240
  [ 8805.291784]  [c014314f] suspend_devices_and_enter+0x4f/0xf0
  [ 8805.291808]  [c0143a5f] ? freeze_processes+0x3f/0x80
  [ 8805.291825]  [c01432fa] enter_state+0xaa/0x140
  [ 8805.291840]  [c014341f] state_store+0x8f/0xd0
  [ 8805.291852]  [c0143390] ? state_store+0x0/0xd0
  [ 8805.291866]  [c01d3404] kobj_attr_store+0x24/0x30
  [ 8805.291901]  [c01b547b] sysfs_write_file+0xbb/0x110
  [ 8805.291936]  [c0177d79] vfs_write+0x99/0x130
  [ 8805.291963]  [c01b53c0] ? sysfs_write_file+0x0/0x110
  [ 8805.291979]  [c01782fd] sys_write+0x3d/0x70
  [ 8805.291998]  [c010409a] sysenter_past_esp+0x5f/0xa5
  [ 8805.292038]  ===
  [ 8805.347640] ACPI: PCI interrupt for device :00:06.0 disabled
  [ 8805.361128] ACPI: PCI interrupt for device :00:02.0 disabled
  [ 8805.376670]  hwsleep-0322 [00] enter_sleep_state : Entering sleep 
  state [S3]
  [ 8805.376670] Back to C!
  
  Interface is unused normally (only for netconsole sometimes). dmesg and 
  config
  attached.
 
 Does reverting this:
 
 commit 8543da6672b0994921f014f2250e27ae81645580
[...]
 fix it?
 

no

  Hmm ... after resume device has disappeared at all ...
  
  {pts/1}% cat /proc/interrupts
 CPU0
0:1290492XT-PIC-XTtimer
1:   6675XT-PIC-XTi8042
2:  0XT-PIC-XTcascade
3:  2XT-PIC-XT
4:  2XT-PIC-XT
5:  3XT-PIC-XT
7:  4XT-PIC-XTirda0
8:  0XT-PIC-XTrtc0
9:583XT-PIC-XTacpi
   10:  2XT-PIC-XT
   11:  31483XT-PIC-XTyenta, yenta, yenta, ohci_hcd:usb1, ALI 
  5451, pcmcia0.0
   12:  28070XT-PIC-XTi8042
   14:  21705XT-PIC-XTide0
   15:  82123XT-PIC-XTide1
  NMI:  0   Non-maskable interrupts
  TRM:  0   Thermal event interrupts
  SPU:  0   Spurious interrupts
  ERR:  0
 
 I hope that's not a separate bug...
 
 

this is red herring. pm-utils restart network across suspend; eth0 is not
activated automatically so it disappears. ifconfig eth0 up brings it back.


signature.asc
Description: This is a digitally signed message part.


[PATCH] tlan: add static to function definitions

2008-02-18 Thread Harvey Harrison
The forward declarations were already marked static, make the definitions
be static as well.  Fixes the sparse warnings as well.

drivers/net/tlan.c:1403:5: warning: symbol 'TLan_HandleInvalid' was not 
declared. Should it be static?
drivers/net/tlan.c:1435:5: warning: symbol 'TLan_HandleTxEOF' was not declared. 
Should it be static?
drivers/net/tlan.c:1521:5: warning: symbol 'TLan_HandleStatOverflow' was not 
declared. Should it be static?
drivers/net/tlan.c:1557:5: warning: symbol 'TLan_HandleRxEOF' was not declared. 
Should it be static?
drivers/net/tlan.c:1692:5: warning: symbol 'TLan_HandleDummy' was not declared. 
Should it be static?
drivers/net/tlan.c:1722:5: warning: symbol 'TLan_HandleTxEOC' was not declared. 
Should it be static?
drivers/net/tlan.c:1770:5: warning: symbol 'TLan_HandleStatusCheck' was not 
declared. Should it be static?
drivers/net/tlan.c:1845:5: warning: symbol 'TLan_HandleRxEOC' was not declared. 
Should it be static?
drivers/net/tlan.c:1905:6: warning: symbol 'TLan_Timer' was not declared. 
Should it be static?
drivers/net/tlan.c:1986:6: warning: symbol 'TLan_ResetLists' was not declared. 
Should it be static?
drivers/net/tlan.c:2046:6: warning: symbol 'TLan_FreeLists' was not declared. 
Should it be static?
drivers/net/tlan.c:2095:6: warning: symbol 'TLan_PrintDio' was not declared. 
Should it be static?
drivers/net/tlan.c:2130:6: warning: symbol 'TLan_PrintList' was not declared. 
Should it be static?
drivers/net/tlan.c:2166:6: warning: symbol 'TLan_ReadAndClearStats' was not 
declared. Should it be static?
drivers/net/tlan.c:2242:1: warning: symbol 'TLan_ResetAdapter' was not 
declared. Should it be static?
drivers/net/tlan.c:2328:1: warning: symbol 'TLan_FinishReset' was not declared. 
Should it be static?
drivers/net/tlan.c:2451:6: warning: symbol 'TLan_SetMac' was not declared. 
Should it be static?
drivers/net/tlan.c:2493:6: warning: symbol 'TLan_PhyPrint' was not declared. 
Should it be static?
drivers/net/tlan.c:2542:6: warning: symbol 'TLan_PhyDetect' was not declared. 
Should it be static?
drivers/net/tlan.c:2589:6: warning: symbol 'TLan_PhyPowerDown' was not 
declared. Should it be static?
drivers/net/tlan.c:2614:6: warning: symbol 'TLan_PhyPowerUp' was not declared. 
Should it be static?
drivers/net/tlan.c:2635:6: warning: symbol 'TLan_PhyReset' was not declared. 
Should it be static?
drivers/net/tlan.c:2663:6: warning: symbol 'TLan_PhyStartLink' was not 
declared. Should it be static?
drivers/net/tlan.c:2750:6: warning: symbol 'TLan_PhyFinishAutoNeg' was not 
declared. Should it be static?
drivers/net/tlan.c:2906:5: warning: symbol 'TLan_MiiReadReg' was not declared. 
Should it be static?
drivers/net/tlan.c:2996:6: warning: symbol 'TLan_MiiSendData' was not declared. 
Should it be static?
drivers/net/tlan.c:3038:6: warning: symbol 'TLan_MiiSync' was not declared. 
Should it be static?
drivers/net/tlan.c:3077:6: warning: symbol 'TLan_MiiWriteReg' was not declared. 
Should it be static?
drivers/net/tlan.c:3147:6: warning: symbol 'TLan_EeSendStart' was not declared. 
Should it be static?
drivers/net/tlan.c:3187:5: warning: symbol 'TLan_EeSendByte' was not declared. 
Should it be static?
drivers/net/tlan.c:3248:6: warning: symbol 'TLan_EeReceiveByte' was not 
declared. Should it be static?
drivers/net/tlan.c:3306:5: warning: symbol 'TLan_EeReadByte' was not declared. 
Should it be static?

Signed-off-by: Harvey Harrison [EMAIL PROTECTED]
---
Kept the style consistent with the rest of the file, checkpatch will complain.
 drivers/net/tlan.c |   64 ++--
 1 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/net/tlan.c b/drivers/net/tlan.c
index 3af5b92..0166407 100644
--- a/drivers/net/tlan.c
+++ b/drivers/net/tlan.c
@@ -1400,7 +1400,7 @@ static void TLan_SetMulticastList( struct net_device *dev 
)
 *
 **/
 
-u32 TLan_HandleInvalid( struct net_device *dev, u16 host_int )
+static u32 TLan_HandleInvalid( struct net_device *dev, u16 host_int )
 {
/* printk( TLAN:  Invalid interrupt on %s.\n, dev-name ); */
return 0;
@@ -1432,7 +1432,7 @@ u32 TLan_HandleInvalid( struct net_device *dev, u16 
host_int )
 *
 **/
 
-u32 TLan_HandleTxEOF( struct net_device *dev, u16 host_int )
+static u32 TLan_HandleTxEOF( struct net_device *dev, u16 host_int )
 {
TLanPrivateInfo *priv = netdev_priv(dev);
int eoc = 0;
@@ -1518,7 +1518,7 @@ u32 TLan_HandleTxEOF( struct net_device *dev, u16 
host_int )
 *
 **/
 
-u32 TLan_HandleStatOverflow( struct net_device *dev, u16 host_int )
+static u32 TLan_HandleStatOverflow( struct net_device *dev, u16 host_int )
 {
TLan_ReadAndClearStats( dev, TLAN_RECORD );
 
@@ -1554,7 +1554,7 @@ u32 TLan_HandleStatOverflow( 

keyboard dead with 45b5035

2008-02-18 Thread Pierre Ossman
The patch [RTNETLINK]: Send a single notification on device state changes. 
kills (at least) the keyboard here. Everything seems to work fine in single 
user mode, but when init starts spawning of logins, the keyboard goes bye-bye. 
Even the power button is ignored. :/

I've tried just creating another vt with chvt 2, but that is insufficient to 
trigger the bug.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread Joe Perches
On Mon, 2008-02-18 at 16:19 +0100, Patrick McHardy wrote:
  @@ -404,11 +405,8 @@ static int vlan_dev_hard_start_xmit(struct sk_buff 
  *skb, struct net_device *dev)
   
  pr_debug(%s: about to send skb: %p to dev: %s\n,
  __FUNCTION__, skb, skb-dev-name);
  -   pr_debug(   MAC_FMT   MAC_FMT  %4hx %4hx %4hx\n,
  -veth-h_dest[0], veth-h_dest[1], veth-h_dest[2],
  -veth-h_dest[3], veth-h_dest[4], veth-h_dest[5],
  -veth-h_source[0], veth-h_source[1], veth-h_source[2],
  -veth-h_source[3], veth-h_source[4], veth-h_source[5],
  +   pr_debug(  %s %s %4hx %4hx %4hx\n,
  +print_mac(mac, veth-h_dest), print_mac(mac2, veth-h_source),
 This results in print_mac getting called twice per packet even without
 debugging. Whats the problem with MAC_FMT?

It's just a consistency thing.
It identifies code where MAC addresses are used.
an allyesconfig is a bit smaller (~.1%).
pr_debug is a noop when not debugging, print_mac is optimized away.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?

2008-02-18 Thread Chris Rankin
--- Stephen Hemminger [EMAIL PROTECTED] wrote:
   sysfs: duplicate filename 'bridge' can not be created
   WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
   Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
[c0105020] show_trace_log_lvl+0x1a/0x2f
[c0105990] show_trace+0x12/0x14
[c010613d] dump_stack+0x6c/0x72
[c01991bf] sysfs_add_one+0x57/0xbc
[c0199e41] sysfs_create_link+0xc2/0x10d
[c01bae9a] pci_bus_add_devices+0xbd/0x103
[c034016c] pci_legacy_init+0x56/0xe3
[c03274e1] kernel_init+0x157/0x2c3
[c0104c83] kernel_thread_helper+0x7/0x10
===
   pci :00:01.0: Error creating sysfs bridge symlink, continuing...
   sysfs: duplicate filename 'bridge' can not be created
   WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
   Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
[c0105020] show_trace_log_lvl+0x1a/0x2f
[c0105990] show_trace+0x12/0x14
[c010613d] dump_stack+0x6c/0x72
[c01991bf] sysfs_add_one+0x57/0xbc
[c0199e41] sysfs_create_link+0xc2/0x10d
[c01bae9a] pci_bus_add_devices+0xbd/0x103
[c01bae82] pci_bus_add_devices+0xa5/0x103
[c034016c] pci_legacy_init+0x56/0xe3
[c03274e1] kernel_init+0x157/0x2c3
[c0104c83] kernel_thread_helper+0x7/0x10
===
  
  I have a vague feeling that this was fixed, perhaps in 2.6.24.x?
 
 Never heard of this, what is the initialization script that causes this?
 Also do you have the SYSFS_DEPRECATED option configured? that caused issues
 with regular network drivers.

Yes, SYSFS_DEPRECATED is enabled. And the init scripts are from Fedora 8.

Cheers,
Chris






  __
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: keyboard dead with 45b5035

2008-02-18 Thread Rafael J. Wysocki
On Monday, 18 of February 2008, Pierre Ossman wrote:
 The patch [RTNETLINK]: Send a single notification on device state changes. 
 kills (at least)
 the keyboard here. Everything seems to work fine in single user mode, but 
 when init starts
 spawning of logins, the keyboard goes bye-bye. Even the power button is 
 ignored. :/ 

Please try with the patch from http://lkml.org/lkml/2008/2/18/331 .

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?

2008-02-18 Thread Stephen Hemminger
On Mon, 18 Feb 2008 19:42:25 + (GMT)
Chris Rankin [EMAIL PROTECTED] wrote:

 --- Stephen Hemminger [EMAIL PROTECTED] wrote:
sysfs: duplicate filename 'bridge' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
 [c0105020] show_trace_log_lvl+0x1a/0x2f
 [c0105990] show_trace+0x12/0x14
 [c010613d] dump_stack+0x6c/0x72
 [c01991bf] sysfs_add_one+0x57/0xbc
 [c0199e41] sysfs_create_link+0xc2/0x10d
 [c01bae9a] pci_bus_add_devices+0xbd/0x103
 [c034016c] pci_legacy_init+0x56/0xe3
 [c03274e1] kernel_init+0x157/0x2c3
 [c0104c83] kernel_thread_helper+0x7/0x10
 ===
pci :00:01.0: Error creating sysfs bridge symlink, continuing...
sysfs: duplicate filename 'bridge' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
 [c0105020] show_trace_log_lvl+0x1a/0x2f
 [c0105990] show_trace+0x12/0x14
 [c010613d] dump_stack+0x6c/0x72
 [c01991bf] sysfs_add_one+0x57/0xbc
 [c0199e41] sysfs_create_link+0xc2/0x10d
 [c01bae9a] pci_bus_add_devices+0xbd/0x103
 [c01bae82] pci_bus_add_devices+0xa5/0x103
 [c034016c] pci_legacy_init+0x56/0xe3
 [c03274e1] kernel_init+0x157/0x2c3
 [c0104c83] kernel_thread_helper+0x7/0x10
 ===
   
   I have a vague feeling that this was fixed, perhaps in 2.6.24.x?
  
  Never heard of this, what is the initialization script that causes this?
  Also do you have the SYSFS_DEPRECATED option configured? that caused issues
  with regular network drivers.
 
 Yes, SYSFS_DEPRECATED is enabled. And the init scripts are from Fedora 8.

There was a bug (fixed in 2.6.24) that had to do with sysfs_create_link
and SYSFS_DEPRECATED probably there is a similar problem with directories. 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] sis190: read the mac address from the eeprom first

2008-02-18 Thread Francois Romieu
Reading a serie of zero from the cmos sram area do not work
well with is_valid_ether_addr(). Let's read the mac address
from the eeprom first as it seems more reliable.

Fix for http://bugzilla.kernel.org/show_bug.cgi?id=9831

Signed-off-by: Francois Romieu [EMAIL PROTECTED]
---
 drivers/net/sis190.c |   15 ++-
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c
index 202fdf3..20745fd 100644
--- a/drivers/net/sis190.c
+++ b/drivers/net/sis190.c
@@ -1633,13 +1633,18 @@ static inline void sis190_init_rxfilter(struct 
net_device *dev)
 static int __devinit sis190_get_mac_addr(struct pci_dev *pdev, 
 struct net_device *dev)
 {
-   u8 from;
+   int rc;
+
+   rc = sis190_get_mac_addr_from_eeprom(pdev, dev);
+   if (rc  0) {
+   u8 reg;
 
-   pci_read_config_byte(pdev, 0x73, from);
+   pci_read_config_byte(pdev, 0x73, reg);
 
-   return (from  0x0001) ?
-   sis190_get_mac_addr_from_apc(pdev, dev) :
-   sis190_get_mac_addr_from_eeprom(pdev, dev);
+   if (reg  0x0001)
+   rc = sis190_get_mac_addr_from_apc(pdev, dev);
+   }
+   return rc;
 }
 
 static void sis190_set_speed_auto(struct net_device *dev)
-- 
1.5.3.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: keyboard dead with 45b5035

2008-02-18 Thread Pierre Ossman
On Mon, 18 Feb 2008 20:50:01 +0100
Rafael J. Wysocki [EMAIL PROTECTED] wrote:

 On Monday, 18 of February 2008, Pierre Ossman wrote:
  The patch [RTNETLINK]: Send a single notification on device state 
  changes. kills (at least)
  the keyboard here. Everything seems to work fine in single user mode, but 
  when init starts
  spawning of logins, the keyboard goes bye-bye. Even the power button is 
  ignored. :/ 
 
 Please try with the patch from http://lkml.org/lkml/2008/2/18/331 .
 

That solved it.

I wonder if that's also why modprobe tends to wedge up with the new USB 
announce thingy... Tomorrow's debugging will tell.

-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread Patrick McHardy

David Miller wrote:

From: Patrick McHardy [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 16:19:40 +0100

  

Joe Perches wrote:




We specifically removed this sort of thing, please don't
add it back.


Why?



We converted the entire tree over the print_mac(), and since
the MAC_FMT stuff was therefore no longer used we could
remove it.

Some references slipped back in somehow, and thus MAC_FMT
did too.

There is no reason to keep around a global interface for
_one_ user when that user can use the recommended interface
just as equally as the rest of the tree which we converted.

This is a pr_debug() statement we're talking about here.
:-)
  


The way pr_debug is implemented it still results in two function
calls per packet since the compiler doesn't know that it doesn't
have visible side-effects besides modifying the (unused) buffer.
I confirmed this using codiff.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 16:19:40 +0100

 Joe Perches wrote:
  On Fri, 2008-02-15 at 02:58 -0800, David Miller wrote:
  From: Bruno Randolf [EMAIL PROTECTED]
  Date: Fri, 15 Feb 2008 19:48:05 +0900
  is there any chance to include a macro like this for printing mac
  addresses?
  its advantage is that it can be used without the need to declare
  buffers for
  print_mac(), for example:
  We specifically removed this sort of thing, please don't
  add it back.
 
 Why?

We converted the entire tree over the print_mac(), and since
the MAC_FMT stuff was therefore no longer used we could
remove it.

Some references slipped back in somehow, and thus MAC_FMT
did too.

There is no reason to keep around a global interface for
_one_ user when that user can use the recommended interface
just as equally as the rest of the tree which we converted.

This is a pr_debug() statement we're talking about here.
:-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread Patrick McHardy

Joe Perches wrote:

On Mon, 2008-02-18 at 16:19 +0100, Patrick McHardy wrote:
  

@@ -404,11 +405,8 @@ static int vlan_dev_hard_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
 	pr_debug(%s: about to send skb: %p to dev: %s\n,

__FUNCTION__, skb, skb-dev-name);
-   pr_debug(   MAC_FMT   MAC_FMT  %4hx %4hx %4hx\n,
-veth-h_dest[0], veth-h_dest[1], veth-h_dest[2],
-veth-h_dest[3], veth-h_dest[4], veth-h_dest[5],
-veth-h_source[0], veth-h_source[1], veth-h_source[2],
-veth-h_source[3], veth-h_source[4], veth-h_source[5],
+   pr_debug(  %s %s %4hx %4hx %4hx\n,
+print_mac(mac, veth-h_dest), print_mac(mac2, veth-h_source),
  

This results in print_mac getting called twice per packet even without
debugging. Whats the problem with MAC_FMT?



It's just a consistency thing.
It identifies code where MAC addresses are used.
  
an allyesconfig is a bit smaller (~.1%).

pr_debug is a noop when not debugging, print_mac is optimized away.
  


No its not, which I also stated in the commit message that restored
it.

0x60244313 vlan_dev_hard_start_xmit+433:  callq  
0x60161dbd print_mac
0x60244318 vlan_dev_hard_start_xmit+438:  lea
-0x50(%rbp),%rdi

0x6024431c vlan_dev_hard_start_xmit+442:  mov%r15,%rsi
0x6024431f vlan_dev_hard_start_xmit+445:  callq  
0x60161dbd print_mac



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ipv6 debugging

2008-02-18 Thread Ferenc Wagner
Jorge Boncompte [DTI2] [EMAIL PROTECTED] writes:

 Ferenc Wagner [EMAIL PROTECTED] writes:

 I'm kindly asking for some debugging tips with the following problem:
 a machine is running Linux 2.6.24.2, several 802.1q VLAN-s over
 active/backup bonding over two physical interfaces.  Everything is
 allright, except for after a reboot, there's no IPv6, while IPv4
 works.  The router's ARP(6) table is empty, the machine doesn't answer
 ping6.  However, if I start tcpdump -i bond0 ip6, everything is
 allright again.  There are some indications that after some period
 without IPv6 traffic, the same can happen again.  Are there known
 issues which can exhibit themselves like this?  Other very similar
 setups don't show this erratic behaviour.

 I know that the above doesn't give a fully detailed picture, but
 thought that I'd better ask before taking the setup into pieces.

 This sounds to me like the same problem that I was having with
 OSPF, I think ARP(6) uses multicast ethernet address too. Can you try
 if the patch below, that I sent Patrick McHardy some days ago, fixes
 your problem?

Hi Jorge,

Thank you very much!  Your patch indeed fixes my problem.  I hope the
fix will make it into a stable release soon!
-- 
Regards,
Feri.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: fix kernel-doc warnings in header files

2008-02-18 Thread Randy Dunlap
From: Randy Dunlap [EMAIL PROTECTED]

Add missing structure kernel-doc descriptions to sock.h  skbuff.h
to fix kernel-doc warnings.

(I think that Stephen H. sent a similar patch, but I can't find it.
I just want to kill the warnings, with either patch.)

Signed-off-by: Randy Dunlap [EMAIL PROTECTED]
---
 include/linux/skbuff.h |2 ++
 include/net/sock.h |1 +
 2 files changed, 3 insertions(+)

--- linux-2625-rc1g4-kdoc.orig/include/linux/skbuff.h
+++ linux-2625-rc1g4-kdoc/include/linux/skbuff.h
@@ -232,6 +232,8 @@ typedef unsigned char *sk_buff_data_t;
  * @mark: Generic packet mark
  * @nfct: Associated connection, if any
  * @ipvs_property: skbuff is owned by ipvs
+ * @peeked: this packet has been seen already, so stats have been
+ * done for it, don't do them again
  * @nf_trace: netfilter packet trace flag
  * @nfctinfo: Relationship of this skb to the connection
  * @nfct_reasm: netfilter conntrack re-assembly pointer
--- linux-2625-rc1g4-kdoc.orig/include/net/sock.h
+++ linux-2625-rc1g4-kdoc/include/net/sock.h
@@ -180,6 +180,7 @@ struct sock_common {
   *@sk_sndmsg_off: cached offset for sendmsg
   *@sk_send_head: front of stuff to transmit
   *@sk_security: used by security modules
+  *@sk_mark: generic packet mark
   *@sk_write_pending: a write to stream socket waits to start
   *@sk_state_change: callback to indicate change in the state of the sock
   *@sk_data_ready: callback to indicate there is data to be processed
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][IPROUTE] tc filters usage fixes

2008-02-18 Thread Jarek Poplawski
A few usage description fixes of tc filters for some minimal
consistency (FILTER_KIND because of QDISC_KIND).


Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

---

 tc/f_basic.c   |4 ++--
 tc/f_rsvp.c|2 +-
 tc/f_u32.c |2 +-
 tc/tc_filter.c |6 +++---
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tc/f_basic.c b/tc/f_basic.c
index ad41633..cf2650b 100644
--- a/tc/f_basic.c
+++ b/tc/f_basic.c
@@ -30,8 +30,8 @@ static void explain(void)
fprintf(stderr, Usage: ... basic [ match EMATCH_TREE ] [ police 
POLICE_SPEC ]\n);
fprintf(stderr,  [ action ACTION_SPEC ] [ classid 
CLASSID ]\n);
fprintf(stderr, \n);
-   fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n);
-   fprintf(stderr,FILTERID := X:Y:Z\n);
+   fprintf(stderr, Where:\n);
+   fprintf(stderr,CLASSID := X:Y:Z\n);
fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n);
 }
 
diff --git a/tc/f_rsvp.c b/tc/f_rsvp.c
index 7e1e6d9..8f92e8f 100644
--- a/tc/f_rsvp.c
+++ b/tc/f_rsvp.c
@@ -33,7 +33,7 @@ static void explain(void)
fprintf(stderr, Where: GPI := { flowlabel NUMBER | spi/ah SPI | 
spi/esp SPI |\n);
fprintf(stderr, u{8|16|32} NUMBER mask MASK at 
OFFSET}\n);
fprintf(stderr,POLICE_SPEC := ... look at TBF\n);
-   fprintf(stderr,FILTERID := X:Y\n);
+   fprintf(stderr,CLASSID := X:Y\n);
fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n);
 }
 
diff --git a/tc/f_u32.c b/tc/f_u32.c
index 9bc4bb5..957b1b1 100644
--- a/tc/f_u32.c
+++ b/tc/f_u32.c
@@ -38,7 +38,7 @@ static void explain(void)
fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n);
fprintf(stderr,SAMPLE := { ip | ip6 | udp | tcp | icmp |
 u{32|16|8} | mark } SAMPLE_ARGS [divisor DIVISOR]\n);
-   fprintf(stderr,FILTERID := X:Y:Z\n);
+   fprintf(stderr,CLASSID := X:Y:Z\n);
fprintf(stderr, \nNOTE: CLASSID is parsed at hexadecimal input.\n);
 }
 
diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index d70c656..eb74f89 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -33,12 +33,12 @@ static void usage(void)
fprintf(stderr, Usage: tc filter [ add | del | change | replace | show 
] dev STRING\n);
fprintf(stderr,[ pref PRIO ] protocol PROTO\n);
fprintf(stderr,[ estimator INTERVAL TIME_CONSTANT ]\n);
-   fprintf(stderr,[ root | classid CLASSID ] [ handle FILTERID 
]\n);
-   fprintf(stderr,[ [ FILTER_TYPE ] [ help | OPTIONS ] ]\n);
+   fprintf(stderr,[ root | parent CLASSID ] [ handle FILTERID 
]\n);
+   fprintf(stderr,[ [ FILTER_KIND ] [ help | OPTIONS ] ]\n);
fprintf(stderr, \n);
fprintf(stderr,tc filter show [ dev STRING ] [ root | parent 
CLASSID ]\n);
fprintf(stderr, Where:\n);
-   fprintf(stderr, FILTER_TYPE := { rsvp | u32 | fw | route | etc. }\n);
+   fprintf(stderr, FILTER_KIND := { rsvp | u32 | fw | route | etc. }\n);
fprintf(stderr, FILTERID := ... format depends on classifier, see 
there\n);
fprintf(stderr, OPTIONS := ... try tc filter add desired FILTER_KIND 
help\n);
return;
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Support arbitrary initial TCP timestamps

2008-02-18 Thread Glenn Griffin
 Adding yet another member to the already bloated tcp_sock structure to
 implement this is too high a cost.

Yes, I was worried that would be deemed too high of a cost, but it was
the most efficient way I could think to accomplish what I wanted.

 I would instead prefer that there be some global random number
 calculated when the first TCP socket is created, and use that as a
 global offset.  You can even recompute it every few hours if you
 like.

That would work fine if my mine purpose was to randomize the tcp
timestamp to mitigate the leak in information regarding uptime, but
despite the brief description, that's only a side effect of what I
intended to do.  What I wanted was a way to be able to choose an initial
tcp timestamp for a particular connection that was not tied directly to
jiffies.

The two patches following this show my intended use case.  I intend to
enhance syncookie support to allow it to support advanced tcp options
(sack and window scaling).  Normally syncookies encode the bare minimum
state of a connection in the ISN they choose, but the 32bit ISN isn't
enough to encode advanced tcp options so you are left with a working but
crippled tcp stack during a synflood attack.  If in addition to choosing
an ISN we are able to choose an initial tcp timestamp, we are then able
to encode an additional 32 bits of information that can contain more of
the advanced tcp options.

This stems from a discussion about implementing IPv6 support for
syncookies, and the main concern being that syncookies disabled too many
valuable tcp features to be relevant on modern systems.  Many people
stood in opposition to that statement, but it did not seem as though a
general consensus was reached.  http://lkml.org/lkml/2008/2/4/396

I'm always open to alternatives.

--Glenn
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Tony Battersby
I am experiencing network data corruption with a 3Com 3C996B-T NIC
(Broadcom NetXtreme BCM5701; driver tg3.ko).  I have identified the
following patch as the trigger:

commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9
Author: Herbert Xu [EMAIL PROTECTED]
Date:   Wed Nov 14 15:45:21 2007 -0800

[TCP]: Fix size calculation in sk_stream_alloc_pskb
   
We round up the header size in sk_stream_alloc_pskb so that
TSO packets get zero tail room.  Unfortunately this rounding
up is not coordinated with the select_size() function used by
TCP to calculate the second parameter of sk_stream_alloc_pskb.
   
As a result, we may allocate more than a page of data in the
non-TSO case when exactly one page is desired.
   
In fact, rounding up the head room is detrimental in the non-TSO
case because it makes memory that would otherwise be available to
the payload head room.  TSO doesn't need this either, all it wants
is the guarantee that there is no tail room.
   
So this patch fixes this by adjusting the skb_reserve call so that
exactly the requested amount (which all callers have calculated in
a precise way) is made available as tail room.
   
Signed-off-by: Herbert Xu [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

This patch was included in 2.6.24 and 2.6.23.4 -stable.  I am
experiencing data corruption with kernels 2.6.23.4 - 2.6.23.16, 2.6.24 -
2.6.24.2, and 2.6.25-rc2-git1.  I have verified that reverting the above
patch (by hand) makes the data corruption go away on all affected
kernels (note that in 2.6.25 the function is sk_stream_alloc_skb() in
net/ipv4/tcp.c rather than sk_stream_alloc_pskb() in include/net/sock.h).

(Also note that when testing 2.6.23 - 2.6.23.4, I had to apply the
individual patch TG3: Fix performance regression on 5705. from 2.6.23.5.)

I do not get data corruption when substituting a SysKonnect 9D21 NIC
(which also uses the tg3.ko driver) or a Intel PRO/1000 82546GB NIC
(which uses the e1000.ko driver).

In addition to the 3Com NIC, my computer has a SCSI HBA with an attached
tape drive.  The network data corruption happens only when reading from
or writing to the tape drive.  I have tried both a LSI MPT Fusion
Ultra320 SCSI HBA (mptspi.ko) and a LSI 53c1010 Ultra160 HBA
(sym53c8xx.ko) with the same results.  The NIC and SCSI HBA are on
separate PCI-X buses and do not share IRQs.  I am using two completely
separate test programs to access the SCSI tape drive and test network
data integrity, so one would expect no interaction between the two tests
other than CPU scheduling and DMA bandwidth.  There is no disk I/O
generated by either test program.

The test program that I am using to debug this problem does the following:

Computer A (kernel 2.6.24.2; 3Com 3C996B-T NIC):
   malloc a 64 KB buf aligned to a 4 KB boundary
   loop {
  fill 64 KB buf with count data pattern
  send(64 KB, MSG_MORE) --- eventually sends corrupted data
   }
   (SCSI tape drive test program runs separately in the background)

Computer B (kernel 2.6.12):
   malloc a 64 KB buf aligned to a 4 KB boundary
   loop {
  recv(64 KB, MSG_WAITALL)
  verify count data pattern in 64 KB buf
   }

After running for a few seconds, the verify on computer B detects data
corruption in the last 4 bytes of the 64 KB buffer.  The last 48 bytes
of the corrupted 64 KB buffer look like this:

D0 D1 D2 D3 | D4 D5 D6 D7 | D8 D9 DA DB | DC DD DE DF
E0 E1 E2 E3 | E4 E5 E6 E7 | E8 E9 EA EB | EC ED EE EF
F0 F1 F2 F3 | F4 F5 F6 F7 | F8 F9 FA FB | F4 F5 F6 F7

The last 4 bytes should be FC FD FE FF but instead are corrupted to
F4 F5 F6 F7, a sequence which came earlier in the data stream.  The
data corruption always occurs at this same buffer offset and with the
same 4 earlier bytes duplicated.  However, it occurs on a different
iteration of the send()/recv() loop each time the test is run.

When I reverse the test so that Computer A does recv() and Computer B
does send(), the test passes with no data corruption.  Therefore, it
appears that the data corruption happens on send() but not recv().

The motherboard that I am using is a Commell LV-672.  This motherboard
has a PCI-express x16 slot but no PCI-X slots.  To plug in the PCI-X NIC
and SCSI HBA, I am using a SuperMicro CSE-RR2UE-AX riser card which
plugs into the PCI-express slot on the motherboard and provides 3 PCI-X
slots (two slots together on one PCI-X bus and one slot on its own PCI-X
bus).  The data corruption happens with every combination of the 2 cards
in the 3 slots.

I assume that the above patch is just exposing some way in which the tg3
driver or the BCM5701 chip are broken.  For now, I am just reverting the
above patch for kernels that I use until a better solution is
forthcoming.  I expect that this problem will be difficult for other
developers to reproduce, but I can test any patches that anyone wants to
send me.  [ In the meantime, should we revert the patch for 2.6.23.x and

Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-18 Thread Jarek Poplawski
On Mon, Feb 18, 2008 at 10:09:24PM +, James Chapman wrote:
 Jarek Poplawski wrote:
 Hi,

 It seems, this nice report is still uncomplete: could you check if
 there could have been something more yet?

 Unfortunately the ISP's syslog stops. But I've been able to borrow two
 Quad Xeon boxes and have reproduced the problem.

 Here's a new version of the patch. The patch avoids disabling irqs and
 fixes the sk_dst_get() usage that DaveM mentioned. But even with this
 patch, lockdep still complains if hundreds of ppp sessions are inserted
 into a tunnel as rapidly as possible (lockdep trace is below). I can
 stop these errors by wrapping the call to ppp_input() in
 pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a
 better fix?

Hmm... This is a really long report and quite a bit different from
the previous one. I need some time for this. BTW: you sent before a
lockdep report with hlist_lock problem. I think this could be fixed
in some independent patch to make this all more readable. Are all
the other changes in this current patch only because of this or
previous lockdep report or for some other reasons (or reports) yet?

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add IPv6 support to TCP SYN cookies

2008-02-18 Thread Glenn Griffin
I've posted a series of patches that I believe address Andi's concerns
about syncookies not supporting valuable tcp options (primarily SACK,
and window scaling).  The premise being if the client support tcp
timestamps we can encode the additional tcp options in the initial
timestamp we send back to the client, and they will be echo'd back to us
in the ack.  Anyone interested have a look, and provide any suggestions
you may have.  The new patches are a superset of this patch, so if they
are accepted this is one obsolete.

Support arbitrary initial TCP timestamps
http://lkml.org/lkml/2008/2/15/244

Enable the use of TCP options with syncookies
http://lkml.org/lkml/2008/2/15/245

Add IPv6 Support to TCP SYN cookies
http://lkml.org/lkml/2008/2/15/246

--Glenn
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][IPROUTE] tc filters usage fixes

2008-02-18 Thread Jarek Poplawski
Jarek Poplawski wrote, On 02/18/2008 11:10 PM:

 A few usage description fixes of tc filters for some minimal
 consistency (FILTER_KIND because of QDISC_KIND).
 
 
 Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

Don't apply: I've sent 2nd version of this patch.

Sorry,
Jarek P.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2.6.25] gianfar: don't pass NULL dev ptr to DMA ops

2008-02-18 Thread Andy Fleming
From: Becky Bruce [EMAIL PROTECTED]

Change all dma op invocations in gianfar.c to actually pass in the
device pointer.  Currently, the value is ignored, but it will be
used going forward as we implement archdata for 32-bit powerpc.

Signed-off-by: Becky Bruce [EMAIL PROTECTED]
Acked-by: Andy Fleming [EMAIL PROTECTED]
---
 drivers/net/gianfar.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 0431e9e..9a5160b 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -605,7 +605,7 @@ void stop_gfar(struct net_device *dev)
 
free_skb_resources(priv);
 
-   dma_free_coherent(NULL,
+   dma_free_coherent(dev-dev,
sizeof(struct txbd8)*priv-tx_ring_size
+ sizeof(struct rxbd8)*priv-rx_ring_size,
priv-tx_bd_base,
@@ -626,7 +626,7 @@ static void free_skb_resources(struct gfar_private *priv)
for (i = 0; i  priv-tx_ring_size; i++) {
 
if (priv-tx_skbuff[i]) {
-   dma_unmap_single(NULL, txbdp-bufPtr,
+   dma_unmap_single(priv-dev-dev, txbdp-bufPtr,
txbdp-length,
DMA_TO_DEVICE);
dev_kfree_skb_any(priv-tx_skbuff[i]);
@@ -643,7 +643,7 @@ static void free_skb_resources(struct gfar_private *priv)
if(priv-rx_skbuff != NULL) {
for (i = 0; i  priv-rx_ring_size; i++) {
if (priv-rx_skbuff[i]) {
-   dma_unmap_single(NULL, rxbdp-bufPtr,
+   dma_unmap_single(priv-dev-dev, rxbdp-bufPtr,
priv-rx_buffer_size,
DMA_FROM_DEVICE);
 
@@ -708,7 +708,7 @@ int startup_gfar(struct net_device *dev)
gfar_write(regs-imask, IMASK_INIT_CLEAR);
 
/* Allocate memory for the buffer descriptors */
-   vaddr = (unsigned long) dma_alloc_coherent(NULL,
+   vaddr = (unsigned long) dma_alloc_coherent(dev-dev,
sizeof (struct txbd8) * priv-tx_ring_size +
sizeof (struct rxbd8) * priv-rx_ring_size,
addr, GFP_KERNEL);
@@ -919,7 +919,7 @@ err_irq_fail:
 rx_skb_fail:
free_skb_resources(priv);
 tx_skb_fail:
-   dma_free_coherent(NULL,
+   dma_free_coherent(dev-dev,
sizeof(struct txbd8)*priv-tx_ring_size
+ sizeof(struct rxbd8)*priv-rx_ring_size,
priv-tx_bd_base,
@@ -1053,7 +1053,7 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
/* Set buffer length and pointer */
txbdp-length = skb-len;
-   txbdp-bufPtr = dma_map_single(NULL, skb-data,
+   txbdp-bufPtr = dma_map_single(dev-dev, skb-data,
skb-len, DMA_TO_DEVICE);
 
/* Save the skb pointer so we can free it later */
@@ -1332,7 +1332,7 @@ struct sk_buff * gfar_new_skb(struct net_device *dev, 
struct rxbd8 *bdp)
 */
skb_reserve(skb, alignamount);
 
-   bdp-bufPtr = dma_map_single(NULL, skb-data,
+   bdp-bufPtr = dma_map_single(dev-dev, skb-data,
priv-rx_buffer_size, DMA_FROM_DEVICE);
 
bdp-length = 0;
-- 
1.5.4.23.gef5b9

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2][IPROUTE] tc filters usage fixes

2008-02-18 Thread Jarek Poplawski

CLASSID := X:Y:Z == X:Y in f_basic is changed here and no change for
f_u32 (it has both CLASSID and FILTERID mentioned).

- (take 2)

A few usage description fixes of tc filters for some minimal
consistency (FILTER_KIND because of QDISC_KIND).


Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

---

 tc/f_basic.c   |4 ++--
 tc/f_rsvp.c|2 +-
 tc/tc_filter.c |6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tc/f_basic.c b/tc/f_basic.c
index ad41633..d8a42d9 100644
--- a/tc/f_basic.c
+++ b/tc/f_basic.c
@@ -30,8 +30,8 @@ static void explain(void)
fprintf(stderr, Usage: ... basic [ match EMATCH_TREE ] [ police 
POLICE_SPEC ]\n);
fprintf(stderr,  [ action ACTION_SPEC ] [ classid 
CLASSID ]\n);
fprintf(stderr, \n);
-   fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n);
-   fprintf(stderr,FILTERID := X:Y:Z\n);
+   fprintf(stderr, Where:\n);
+   fprintf(stderr,CLASSID := X:Y\n);
fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n);
 }
 
diff --git a/tc/f_rsvp.c b/tc/f_rsvp.c
index 7e1e6d9..8f92e8f 100644
--- a/tc/f_rsvp.c
+++ b/tc/f_rsvp.c
@@ -33,7 +33,7 @@ static void explain(void)
fprintf(stderr, Where: GPI := { flowlabel NUMBER | spi/ah SPI | 
spi/esp SPI |\n);
fprintf(stderr, u{8|16|32} NUMBER mask MASK at 
OFFSET}\n);
fprintf(stderr,POLICE_SPEC := ... look at TBF\n);
-   fprintf(stderr,FILTERID := X:Y\n);
+   fprintf(stderr,CLASSID := X:Y\n);
fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n);
 }
 
diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index d70c656..eb74f89 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -33,12 +33,12 @@ static void usage(void)
fprintf(stderr, Usage: tc filter [ add | del | change | replace | show 
] dev STRING\n);
fprintf(stderr,[ pref PRIO ] protocol PROTO\n);
fprintf(stderr,[ estimator INTERVAL TIME_CONSTANT ]\n);
-   fprintf(stderr,[ root | classid CLASSID ] [ handle FILTERID 
]\n);
-   fprintf(stderr,[ [ FILTER_TYPE ] [ help | OPTIONS ] ]\n);
+   fprintf(stderr,[ root | parent CLASSID ] [ handle FILTERID 
]\n);
+   fprintf(stderr,[ [ FILTER_KIND ] [ help | OPTIONS ] ]\n);
fprintf(stderr, \n);
fprintf(stderr,tc filter show [ dev STRING ] [ root | parent 
CLASSID ]\n);
fprintf(stderr, Where:\n);
-   fprintf(stderr, FILTER_TYPE := { rsvp | u32 | fw | route | etc. }\n);
+   fprintf(stderr, FILTER_KIND := { rsvp | u32 | fw | route | etc. }\n);
fprintf(stderr, FILTERID := ... format depends on classifier, see 
there\n);
fprintf(stderr, OPTIONS := ... try tc filter add desired FILTER_KIND 
help\n);
return;
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 16:32:00 -0800

 On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote:
  I am experiencing network data corruption with a 3Com 3C996B-T NIC
  (Broadcom NetXtreme BCM5701; driver tg3.ko).  I have identified the
  following patch as the trigger:
 
 Assuming this problem is unique to the 5701, I'm not sure how it is
 exposed by Herbert's patch.  One thing unique on the 5701 is that it
 double-copies all RX packets so that the data starts at offset 2, but
 that's quite unrelated to the patch below.

One consequence of Herbert's change is that the chip will see a
different datastream.  The initial skb-data linear area will be
smaller, and the transition to the fragmented area of pages will be
quicker.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread David Miller
From: David Miller [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 16:43:05 -0800 (PST)

 I think we can fix this easily by using __attribute_const_
 on the print_mac() declaration.  Let me play with that.

Actually it seems the 'pure' attribute is more important
here.  Although it's not semantically a perfect match,
what we need to tell the compiler is basically that:

1) the return value depends upon the inputs
2) if the input is not used, it's safe to avoid the call

and 'pure' accomplishes that without any unwanted side-effects.

I think this will not result in any unwanted over-optimization.
Because if the inputs change in any way GCC has to emit the
call.

Any objections?

commit 8f789c48448aed74fe1c07af76de8f04adacec7d
Author: David S. Miller [EMAIL PROTECTED]
Date:   Mon Feb 18 16:50:22 2008 -0800

[NET]: Elminate spurious print_mac() calls.

Patrick McHardy notes that print_mac() can get invoked
even if the result it unused (f.e. as an argument to
pr_debug() when DEBUG is not defined).

Mark this function as __pure to eliminate this problem.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index 7a1e011..42dc6a3 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -129,7 +129,7 @@ extern ssize_t sysfs_format_mac(char *buf, const unsigned 
char *addr, int len);
 /*
  * Display a 6 byte device address (MAC) in a readable format.
  */
-extern char *print_mac(char *buf, const unsigned char *addr);
+extern __pure char *print_mac(char *buf, const unsigned char *addr);
 #define MAC_BUF_SIZE   18
 #define DECLARE_MAC_BUF(var) char var[MAC_BUF_SIZE] __maybe_unused
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Michael Chan
On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote:

 One consequence of Herbert's change is that the chip will see a
 different datastream.  The initial skb-data linear area will be
 smaller, and the transition to the fragmented area of pages will be
 quicker.
 

I see.  Perhaps when we get to the end of the data-stream, there is a
tiny frag that the chip cannot handle.  That's the only thing I can
think of.

Please try this patch to see if the problem goes away.  This will
disable SG on 5701 so we always get linear SKBs.

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index db606b6..bb37e76 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
} else
tp-tg3_flags = ~TG3_FLAG_RX_CHECKSUMS;
 
+   if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5701)
+   dev-features = ~(NETIF_F_IP_CSUM | NETIF_F_SG);
+
/* flow control autonegotiation is default behavior */
tp-tg3_flags |= TG3_FLAG_PAUSE_AUTONEG;
tp-link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX;


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread Joe Perches
On Mon, 2008-02-18 at 16:50 -0800, David Miller wrote:
 Actually it seems the 'pure' attribute is more important
 here.  Although it's not semantically a perfect match,
 what we need to tell the compiler is basically that:
 
 1) the return value depends upon the inputs
 2) if the input is not used, it's safe to avoid the call
 
 and 'pure' accomplishes that without any unwanted side-effects.
 
 I think this will not result in any unwanted over-optimization.
 Because if the inputs change in any way GCC has to emit the
 call.
 
 Any objections?

Does this need to be done for all function calls
declared with __attribute__((format(printf, x, y)))
{
return 0;
}

ie: pr_debug, dev_dbg, dev_vdbg?

Perhaps it's more sensible to go back to

#ifdef DEBUG
#define pr_debug(fmt, arg...) do {} while (0)
#endif

and give up the printf argument verification

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Michael Chan
On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote:
 I am experiencing network data corruption with a 3Com 3C996B-T NIC
 (Broadcom NetXtreme BCM5701; driver tg3.ko).  I have identified the
 following patch as the trigger:

Assuming this problem is unique to the 5701, I'm not sure how it is
exposed by Herbert's patch.  One thing unique on the 5701 is that it
double-copies all RX packets so that the data starts at offset 2, but
that's quite unrelated to the patch below.

 
 commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9
 Author: Herbert Xu [EMAIL PROTECTED]
 Date:   Wed Nov 14 15:45:21 2007 -0800
 
 [TCP]: Fix size calculation in sk_stream_alloc_pskb

 

 I do not get data corruption when substituting a SysKonnect 9D21 NIC
 (which also uses the tg3.ko driver)

What Broadcom chip is on the Syskonnect card?



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 22:17:27 +0100

 The way pr_debug is implemented it still results in two function
 calls per packet since the compiler doesn't know that it doesn't
 have visible side-effects besides modifying the (unused) buffer.
 I confirmed this using codiff.

That's a bug.

I think we can fix this easily by using __attribute_const_
on the print_mac() declaration.  Let me play with that.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread Philip Craig
Joe Perches wrote:
 Perhaps it's more sensible to go back to
 
 #ifdef DEBUG
 #define pr_debug(fmt, arg...) do {} while (0)
 #endif
 
 and give up the printf argument verification

I think argument verification is important.   Can you keep it
like this:

#ifdef DEBUG
#define pr_debug(fmt, arg...) \
do { \
if (0) \
printk(KERN_DEBUG fmt, ##arg); \
} while (0)
#endif

We still lose the return value though, I'm not sure how to keep that
safely in macros.

But does anything rely on the side effects already?  This would
introduce bugs if so.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Zhang, Yanmin
On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote:
 On Mon, 18 Feb 2008 16:12:38 +0800
 Zhang, Yanmin [EMAIL PROTECTED] wrote:
 
  On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:
   From: Eric Dumazet [EMAIL PROTECTED]
   Date: Fri, 15 Feb 2008 15:21:48 +0100
   
On linux-2.6.25-rc1 x86_64 :

offsetof(struct dst_entry, lastuse)=0xb0
offsetof(struct dst_entry, __refcnt)=0xb8
offsetof(struct dst_entry, __use)=0xbc
offsetof(struct dst_entry, next)=0xc0

So it should be optimal... I dont know why tbench prefers __refcnt 
being 
on 0xc0, since in this case lastuse will be on a different cache line...

Each incoming IP packet will need to change lastuse, __refcnt and 
__use, 
so keeping them in the same cache line is a win.

I suspect then that even this patch could help tbench, since it avoids 
writing lastuse...
   
   I think your suspicions are right, and even moreso
   it helps to keep __refcnt out of the same cache line
   as input/output/ops which are read-almost-entirely :-
  I think you are right. The issue is these three variables sharing the same 
  cache line
  with input/output/ops.
  
   )
   
   I haven't done an exhaustive analysis, but it seems that
   the write traffic to lastuse and __refcnt are about the
   same.  However if we find that __refcnt gets hit more
   than lastuse in this workload, it explains the regression.
  I also think __refcnt is the key. I did a new testing by adding 2 unsigned 
  long
  pading before lastuse, so the 3 members are moved to next cache line. The 
  performance is
  recovered.
  
  How about below patch? Almost all performance is recovered with the new 
  patch.
  
  Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]
  
  ---
  
  --- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 
  +0800
  +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 
  +0800
  @@ -52,11 +52,10 @@ struct dst_entry
  unsigned short  header_len; /* more space at head required 
  */
  unsigned short  trailer_len;/* space to reserve at tail */
   
  -   u32 metrics[RTAX_MAX];
  -   struct dst_entry*path;
  -
  -   unsigned long   rate_last;  /* rate limiting for ICMP */
  unsigned intrate_tokens;
  +   unsigned long   rate_last;  /* rate limiting for ICMP */
  +
  +   struct dst_entry*path;
   
   #ifdef CONFIG_NET_CLS_ROUTE
  __u32   tclassid;
  @@ -70,10 +69,12 @@ struct dst_entry
  int (*output)(struct sk_buff*);
   
  struct  dst_ops *ops;
  -   
  -   unsigned long   lastuse;
  +
  +   u32 metrics[RTAX_MAX];
  +
  atomic_t__refcnt;   /* client references*/
  int __use;
  +   unsigned long   lastuse;
  union {
  struct dst_entry *next;
  struct rtable*rt_next;
  
  
 
 Well, after this patch, we grow dst_entry by 8 bytes :
With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I 
don't
enable it. I will move tclassid under ops.

 
 sizeof(struct dst_entry)=0xd0
 offsetof(struct dst_entry, input)=0x68
 offsetof(struct dst_entry, output)=0x70
 offsetof(struct dst_entry, __refcnt)=0xb4
 offsetof(struct dst_entry, lastuse)=0xc0
 offsetof(struct dst_entry, __use)=0xb8
 sizeof(struct rtable)=0x140
 
 
 So we dirty two cache lines instead of one, unless your cpu have 128 bytes 
 cache lines ?
 
 I am quite suprised that my patch to not change lastuse if already set to 
 jiffies changes nothing...
 
 If you have some time, could you also test this (unrelated) patch ?
 
 We can avoid dirty all the time a cache line of loopback device.
 
 diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
 index f2a6e71..0a4186a 100644
 --- a/drivers/net/loopback.c
 +++ b/drivers/net/loopback.c
 @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct 
 net_device *dev)
 return 0;
 }
  #endif
 -   dev-last_rx = jiffies;
 +#ifdef CONFIG_SMP
 +   if (dev-last_rx != jiffies)
 +#endif
 +   dev-last_rx = jiffies;
  
 /* it's OK to use per_cpu_ptr() because BHs are off */
 pcpu_lstats = netdev_priv(dev);
 
Although I didn't test it, I don't think it's ok. The key is __refcnt shares 
the same
cache line with ops/input/output.

-yanmin


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9920] New: kernel panic when using ebtables redirect target

2008-02-18 Thread Joonwoo Park
On Fri, Feb 08, 2008 at 05:59:42PM -0800, Andrew Morton wrote:
 On Fri,  8 Feb 2008 17:40:20 -0800 (PST) [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=9920
  
 Summary: kernel panic when using ebtables redirect target
 Product: Networking
 Version: 2.5
   KernelVersion: 2.6.24 and 2.6.24-git
Platform: All
  OS/Version: Linux
Tree: Mainline
  Status: NEW
Severity: normal
Priority: P1
   Component: Other
  AssignedTo: [EMAIL PROTECTED]
  ReportedBy: [EMAIL PROTECTED]
  
  
  Latest working kernel version: 2.6.22 ( did not test 2.6.23 )
  Earliest failing kernel version: 2.6.24 
  Distribution:
  Hardware Environment: 
  Software Environment: bridge working as a router
  Problem Description: when using ebtables to set up target-redirect, there 
  will
  be kernel panic
  
  Steps to reproduce:
  1. set up a basic bridge br0 with slaves eth0, eth1
  2. on the bridge setup a default router to route traffic
  3. use ebtables to setup target redirect, 
  
  ebtables -t broute -A BROUTING --logical-in br0 \
  -p ipv4  --ip-protocol tcp --ip-destination-port 80 \
  -j redirect --redirect-target ACCEPT
  
  4. from a client which is connect to the bridge, 
  send some traffic to allow the BROUTE chain to be 
  traversed :-
  
  lynx http://www.google.com
  
  5. Kernel panic :-
  
  Pid: 0, comm: swapper Not tainted (2.6.24-tmc #1)
  EIP: 0060:[c69f61aa] EFLAGS: 0217 CPU: 0
  EIP is at ebt_do_table+0x4ea/0x5d0 [ebtables]
  EAX:  EBX:  ECX:  EDX: 0001
  ESI: c69f1178 EDI: c69f1108 EBP: c69f1000 ESP: c0315e20
  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
  Process swapper (pid: 0, ti=c0314000 task=c02f1300 task.ti=c0314000)
  Stack:  c69f11dc 0004  c28c7800 c2b79c20 0005 
  c69de350
0001 0002 c69ed040 c69ed040   c69f1000 
  00b0
00b0 c29b0812  c69f1122   a0c3 
  c29b0812
  Call Trace:
  [c69de032] ebt_broute+0x22/0x30 [ebtable_broute]
  [c69fef48] br_handle_frame+0xb8/0x220 [bridge]
  [c02274ac] netif_receive_skb+0x19c/0x440
  [c0229ffb] process_backlog+0x6b/0xd0
  [c0229a45] net_rx_action+0x105/0x1b0
  [c011f835] __do_softirq+0x75/0xf0
  [c011f8e7] do_softirq+0x37/0x40
  [c011fb25] irq_exit+0x75/0x80
  [c010d877] smp_apic_timer_interrupt+0x57/0x90
  [c0105b34] apic_timer_interrupt+0x28/0x30
  [c0103cd0] default_idle+0x0/0x40
  [c0103cff] default_idle+0x2f/0x40
  [c0103443] cpu_idle+0x73/0xa0
  [c0319cd5] start_kernel+0x2c5/0x340
  [c0319420] unknown_bootoption+0x0/0x1e0
  ===
  Code: 00 00 83 f9 fe 74 64 83 f9 fc 0f 84 d7 fb ff ff 83 f9 fd 0f 84 bb fc 
  ff
  ff 8b 5c 24 30 8b 54 24 34 8d 04 5b 8d 04 82 8b 54 24 20 89 28 42 89 50 
  08 8b
  5f 6c 01 df 89 78 04 8b 6c 24 38 8b 54 24
  EIP: [c69f61aa] ebt_do_table+0x4ea/0x5d0 [ebtables] SS:ESP 0068:c0315e20
  
  
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] netfilter: fix incorrect use of skb_make_writable

http://bugzilla.kernel.org/show_bug.cgi?id=9920
The function skb_make_writable returns true or false.

Signed-off-by: Joonwoo Park [EMAIL PROTECTED]
---
 net/bridge/netfilter/ebt_dnat.c |2 +-
 net/bridge/netfilter/ebt_redirect.c |2 +-
 net/bridge/netfilter/ebt_snat.c |2 +-
 net/ipv4/netfilter/arpt_mangle.c|2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/bridge/netfilter/ebt_dnat.c b/net/bridge/netfilter/ebt_dnat.c
index e700cbf..1ec671d 100644
--- a/net/bridge/netfilter/ebt_dnat.c
+++ b/net/bridge/netfilter/ebt_dnat.c
@@ -20,7 +20,7 @@ static int ebt_target_dnat(struct sk_buff *skb, unsigned int 
hooknr,
 {
const struct ebt_nat_info *info = data;
 
-   if (skb_make_writable(skb, 0))
+   if (!skb_make_writable(skb, 0))
return NF_DROP;
 
memcpy(eth_hdr(skb)-h_dest, info-mac, ETH_ALEN);
diff --git a/net/bridge/netfilter/ebt_redirect.c 
b/net/bridge/netfilter/ebt_redirect.c
index bfdf2fb..bfb9f74 100644
--- a/net/bridge/netfilter/ebt_redirect.c
+++ b/net/bridge/netfilter/ebt_redirect.c
@@ -21,7 +21,7 @@ static int ebt_target_redirect(struct sk_buff *skb, unsigned 
int hooknr,
 {
const struct ebt_redirect_info *info = data;
 
-   if (skb_make_writable(skb, 0))
+   if (!skb_make_writable(skb, 0))
return NF_DROP;
 
if (hooknr != NF_BR_BROUTING)
diff --git a/net/bridge/netfilter/ebt_snat.c b/net/bridge/netfilter/ebt_snat.c
index e252dab..204f996 100644
--- a/net/bridge/netfilter/ebt_snat.c
+++ b/net/bridge/netfilter/ebt_snat.c
@@ -22,7 +22,7 @@ static int ebt_target_snat(struct sk_buff *skb, unsigned int 
hooknr,
 {
const struct ebt_nat_info *info = data;
 
-   if 

Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-18 Thread David Miller
From: Joe Perches [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 17:03:32 -0800

 Does this need to be done for all function calls
 declared with __attribute__((format(printf, x, y)))
 {
 return 0;
 }
 
 ie: pr_debug, dev_dbg, dev_vdbg?

No, I don't think so.

We're adding the tag to teach the compiler that if the
return value isn't used, it is OK not to emit the call
altogether.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-18 Thread David Miller
From: James Chapman [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 22:09:24 +

 Here's a new version of the patch. The patch avoids disabling irqs
 and fixes the sk_dst_get() usage that DaveM mentioned. But even with
 this patch, lockdep still complains if hundreds of ppp sessions are
 inserted into a tunnel as rapidly as possible (lockdep trace is
 below). I can stop these errors by wrapping the call to ppp_input()
 in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is
 a better fix?

Firstly, let's fix one thing at a time.  Leave the sk_dst_get()
thing alone until we can prove that it's part of the lockdep
traces.

Next, I can't see why ppp_input() needs to be invoked with
interrupts disabled.  There are many other things that invoke
that in software interrupt context, such as pppoe.

Please provide the lockdep traces without the ppp_input() IRQ
disabling so this can be properly analyzed.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bluetooth : put hci dev after del conn

2008-02-18 Thread David Miller
From: Dave Young [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 15:55:55 +0800

 Move hci_dev_put to del_conn to avoid hci dev going away before hci conn.

This looks correct so I have applied it.

 Signed-off-by: Dave Young [EMAIL PROTECTED] 

Please remove the extraneous space at the end of your
signoff line next time :-)

Also, I reworked the loop in del_conn() so that it no longer
generates a compile warning, so I had to apply your patch
by hand.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] bluetooth : do not move child device other than rfcomm

2008-02-18 Thread David Miller
From: Dave Young [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 15:58:05 +0800

 hci conn child devices other than rfcomm tty should not be moved here.
 This is my lost, thanks for Barnaby's reporting and testing.
 
 Signed-off-by: Dave Young [EMAIL PROTECTED] 

Applied, thanks Dave.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPV6]: dst_entry leak in ip4ip6_err. (resend)

2008-02-18 Thread David Miller
From: Denis V. Lunev [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 11:59:38 +0300

 The result of the ip_route_output is not assigned to skb. This means that
 - it is leaked
 - possible OOPS below dereferrencing skb-dst
 - no ICMP message for this case
 
 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

This bug has been there for a few releases :-)

Applied and I'll queue this up for -stable too.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.

2008-02-18 Thread David Miller
From: Pavel Emelyanov [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 15:50:11 +0300

 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

Applied, thanks Pavel.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fix kernel-doc warnings in header files

2008-02-18 Thread David Miller
From: Randy Dunlap [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 13:26:47 -0800

 From: Randy Dunlap [EMAIL PROTECTED]
 
 Add missing structure kernel-doc descriptions to sock.h  skbuff.h
 to fix kernel-doc warnings.
 
 (I think that Stephen H. sent a similar patch, but I can't find it.
 I just want to kill the warnings, with either patch.)
 
 Signed-off-by: Randy Dunlap [EMAIL PROTECTED]

Applied, thanks Randy.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9920] New: kernel panic when using ebtables redirect target

2008-02-18 Thread David Miller
From: Joonwoo Park [EMAIL PROTECTED]
Date: Tue, 19 Feb 2008 11:53:24 +0900

 [PATCH] netfilter: fix incorrect use of skb_make_writable
 
 http://bugzilla.kernel.org/show_bug.cgi?id=9920
 The function skb_make_writable returns true or false.
 
 Signed-off-by: Joonwoo Park [EMAIL PROTECTED]

I'll let Patrick pull this in, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bonding: simplify code and get rid of warning

2008-02-18 Thread Stephen Hemminger
Get rid of warning and simplify code that looks up vlan tag.
No need to get tag, then copy it. Also no need for a local status
variable.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
Patch against current 2.6.25 version.

--- a/drivers/net/bonding/bond_alb.c2008-02-18 20:58:53.0 -0800
+++ b/drivers/net/bonding/bond_alb.c2008-02-18 21:01:10.0 -0800
@@ -678,12 +678,8 @@ static struct slave *rlb_choose_channel(
}
 
if (!list_empty(bond-vlan_list)) {
-   unsigned short vlan_id;
-   int res = vlan_get_tag(skb, vlan_id);
-   if (!res) {
+   if (!vlan_get_tag(skb, client_info-vlan_id))
client_info-tag = 1;
-   client_info-vlan_id = vlan_id;
-   }
}
 
if (!client_info-assigned) {
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cls_u32 u32_classify()

2008-02-18 Thread David Miller
From: Dzianis Kahanovich [EMAIL PROTECTED]
Date: Wed, 30 Jan 2008 11:16:30 -0200

 Currently fine u32 hashkey ... at ... not work with relative offsets.
 There are simpliest fix to use eat.

So the question is whether 'sel' is defined to be calculated
before all offsets and EAT operations are processed or before.

I do not understand the U32 classifier enough to know what
this kind of change might or might not break.

Can some u32 expert review this?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: keyboard dead with 45b5035

2008-02-18 Thread Pierre Ossman
On Mon, 18 Feb 2008 21:50:12 +0100
Pierre Ossman [EMAIL PROTECTED] wrote:

 On Mon, 18 Feb 2008 20:50:01 +0100
 Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 
  On Monday, 18 of February 2008, Pierre Ossman wrote:
   The patch [RTNETLINK]: Send a single notification on device state 
   changes. kills (at least)
   the keyboard here. Everything seems to work fine in single user mode, but 
   when init starts
   spawning of logins, the keyboard goes bye-bye. Even the power button is 
   ignored. :/ 
  
  Please try with the patch from http://lkml.org/lkml/2008/2/18/331 .
  
 
 That solved it.
 

Perhaps not quite. When I returned to my laptop this morning, the keyboard was 
gone again. Did a hard reboot, and the machine locked up a few seconds after 
starting X. I'll see if it can be reproduced...

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Zhang, Yanmin
On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 
 On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said:
 
  I also think __refcnt is the key. I did a new testing by adding 2 unsigned 
  long
  pading before lastuse, so the 3 members are moved to next cache line. The 
  performance is
  recovered.
  
  How about below patch? Almost all performance is recovered with the new 
  patch.
  
  Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]
 
 Could you add a comment someplace that says refcnt wants to be on a different
 cache line from input/output/ops or performance tanks badly, to warn some
 future kernel hacker who starts adding new fields to the structure?
Ok. Below is the new patch.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
tigerton by
moving tclassid to different place. It looks like tclassid could also have 
impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance 
isn't
good. So I move it behind metrics.

2) Add comments before __refcnt.

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch.

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
+0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-   u32 metrics[RTAX_MAX];
-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
 
-#ifdef CONFIG_NET_CLS_ROUTE
-   __u32   tclassid;
-#endif
+   struct dst_entry*path;
 
struct neighbour*neighbour;
struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
struct  dst_ops *ops;
-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /*
+* __refcnt wants to be on a different cache line from
+* input/output/ops or performance tanks badly
+*/
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/17 net-2.6.26] [NETNS]: Process ip_rt_redirect in the correct namespace.

2008-02-18 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/route.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 525787b..44708ab 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1132,10 +1132,12 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 
new_gw,
__be32  skeys[2] = { saddr, 0 };
int  ikeys[2] = { dev-ifindex, 0 };
struct netevent_redirect netevent;
+   struct net *net;
 
if (!in_dev)
return;
 
+   net = dev-nd_net;
if (new_gw == old_gw || !IN_DEV_RX_REDIRECTS(in_dev)
|| ipv4_is_multicast(new_gw) || ipv4_is_lbcast(new_gw)
|| ipv4_is_zeronet(new_gw))
@@ -1147,7 +1149,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 
new_gw,
if (IN_DEV_SEC_REDIRECTS(in_dev)  
ip_fib_check_default(new_gw, dev))
goto reject_redirect;
} else {
-   if (inet_addr_type(init_net, new_gw) != RTN_UNICAST)
+   if (inet_addr_type(net, new_gw) != RTN_UNICAST)
goto reject_redirect;
}
 
@@ -1165,7 +1167,8 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 
new_gw,
rth-fl.fl4_src != skeys[i] ||
rth-fl.oif != ikeys[k] ||
rth-fl.iif != 0 ||
-   rth-rt_genid != atomic_read(rt_genid)) {
+   rth-rt_genid != atomic_read(rt_genid) ||
+   rth-u.dst.dev-nd_net != net) {
rthp = rth-u.dst.rt_next;
continue;
}
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/17 net-2.6.26] [NETNS]: Process /proc/net/rt_cache inside a namespace.

2008-02-18 Thread Denis V. Lunev
Show routing cache for a particular namespace only.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/route.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 67df872..c11e6bf 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -273,6 +273,7 @@ static unsigned int rt_hash_code(u32 daddr, u32 saddr)
 
 #ifdef CONFIG_PROC_FS
 struct rt_cache_iter_state {
+   struct seq_net_private p;
int bucket;
int genid;
 };
@@ -285,7 +286,8 @@ static struct rtable *rt_cache_get_first(struct 
rt_cache_iter_state *st)
rcu_read_lock_bh();
r = rcu_dereference(rt_hash_table[st-bucket].chain);
while (r) {
-   if (r-rt_genid == st-genid)
+   if (r-u.dst.dev-nd_net == st-p.net 
+   r-rt_genid == st-genid)
return r;
r = rcu_dereference(r-u.dst.rt_next);
}
@@ -312,6 +314,8 @@ static struct rtable *rt_cache_get_next(struct 
rt_cache_iter_state *st,
struct rtable *r)
 {
while ((r = __rt_cache_get_next(st, r)) != NULL) {
+   if (r-u.dst.dev-nd_net != st-p.net)
+   continue;
if (r-rt_genid == st-genid)
break;
}
@@ -398,7 +402,7 @@ static const struct seq_operations rt_cache_seq_ops = {
 
 static int rt_cache_seq_open(struct inode *inode, struct file *file)
 {
-   return seq_open_private(file, rt_cache_seq_ops,
+   return seq_open_net(inode, file, rt_cache_seq_ops,
sizeof(struct rt_cache_iter_state));
 }
 
@@ -407,7 +411,7 @@ static const struct file_operations rt_cache_seq_fops = {
.open= rt_cache_seq_open,
.read= seq_read,
.llseek  = seq_lseek,
-   .release = seq_release_private,
+   .release = seq_release_net,
 };
 
 
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/17 net-2.6.26] [NETNS]: Register /proc/net/rt_cache for each namespace.

2008-02-18 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/route.c |   24 +---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c11e6bf..5f67eba 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -545,7 +545,7 @@ static int ip_rt_acct_read(char *buffer, char **start, 
off_t offset,
 }
 #endif
 
-static __init int ip_rt_proc_init(struct net *net)
+static int __net_init ip_rt_do_proc_init(struct net *net)
 {
struct proc_dir_entry *pde;
 
@@ -577,8 +577,26 @@ err2:
 err1:
return -ENOMEM;
 }
+
+static void __net_exit ip_rt_do_proc_exit(struct net *net)
+{
+   remove_proc_entry(rt_cache, net-proc_net_stat);
+   remove_proc_entry(rt_cache, net-proc_net);
+   remove_proc_entry(rt_acct, net-proc_net);
+}
+
+static struct pernet_operations ip_rt_proc_ops __net_initdata =  {
+   .init = ip_rt_do_proc_init,
+   .exit = ip_rt_do_proc_exit,
+};
+
+static int __init ip_rt_proc_init(void)
+{
+   return register_pernet_subsys(ip_rt_proc_ops);
+}
+
 #else
-static inline int ip_rt_proc_init(struct net *net)
+static inline int ip_rt_proc_init(void)
 {
return 0;
 }
@@ -3056,7 +3074,7 @@ int __init ip_rt_init(void)
ip_rt_secret_interval;
add_timer(rt_secret_timer);
 
-   if (ip_rt_proc_init(init_net))
+   if (ip_rt_proc_init())
printk(KERN_ERR Unable to create route proc files\n);
 #ifdef CONFIG_XFRM
xfrm_init();
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.

2008-02-18 Thread Denis V. Lunev
Default ARP parameters should be findable regardless of the context.
Required to make inetdev_event working.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/neighbour.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c895ad4..45ed620 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1275,9 +1275,7 @@ static inline struct neigh_parms 
*lookup_neigh_params(struct neigh_table *tbl,
struct neigh_parms *p;
 
for (p = tbl-parms; p; p = p-next) {
-   if (p-net != net)
-   continue;
-   if ((p-dev  p-dev-ifindex == ifindex) ||
+   if ((p-dev  p-dev-ifindex == ifindex  p-net == net) ||
(!p-dev  !ifindex))
return p;
}
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/17 net-2.6.26] [NETNS]: DST cleanup routines should be called inside namespace.

2008-02-18 Thread Denis V. Lunev
Device inside the namespace can be started and downed. So, active routing
cache should be cleaned up on device stop.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/dst.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index 7deef48..3a01a81 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -295,9 +295,6 @@ static int dst_dev_event(struct notifier_block *this, 
unsigned long event, void
struct net_device *dev = ptr;
struct dst_entry *dst, *last = NULL;
 
-   if (dev-nd_net != init_net)
-   return NOTIFY_DONE;
-
switch (event) {
case NETDEV_UNREGISTER:
case NETDEV_DOWN:
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/17 net-2.6.26] [IPV4]: rt_cache_get_next should take rt_genid into account.

2008-02-18 Thread Denis V. Lunev
In the other case /proc/net/rt_cache will look inconsistent in respect to
genid.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Acked-by: Alexey Kuznetsov [EMAIL PROTECTED]
---
 net/ipv4/route.c |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 44708ab..67df872 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -294,7 +294,8 @@ static struct rtable *rt_cache_get_first(struct 
rt_cache_iter_state *st)
return r;
 }
 
-static struct rtable *rt_cache_get_next(struct rt_cache_iter_state *st, struct 
rtable *r)
+static struct rtable *__rt_cache_get_next(struct rt_cache_iter_state *st,
+ struct rtable *r)
 {
r = r-u.dst.rt_next;
while (!r) {
@@ -307,16 +308,23 @@ static struct rtable *rt_cache_get_next(struct 
rt_cache_iter_state *st, struct r
return rcu_dereference(r);
 }
 
+static struct rtable *rt_cache_get_next(struct rt_cache_iter_state *st,
+   struct rtable *r)
+{
+   while ((r = __rt_cache_get_next(st, r)) != NULL) {
+   if (r-rt_genid == st-genid)
+   break;
+   }
+   return r;
+}
+
 static struct rtable *rt_cache_get_idx(struct rt_cache_iter_state *st, loff_t 
pos)
 {
struct rtable *r = rt_cache_get_first(st);
 
if (r)
-   while (pos  (r = rt_cache_get_next(st, r))) {
-   if (r-rt_genid != st-genid)
-   continue;
+   while (pos  (r = rt_cache_get_next(st, r)))
--pos;
-   }
return pos ? NULL : r;
 }
 
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/17 net-2.6.26] [NETNS]: Disable multicaststing configuration inside non-initial namespace.

2008-02-18 Thread Denis V. Lunev
Do not calls hooks from device notifiers and disallow configuration from
ioctl/netlink layer.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/igmp.c |   39 +++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 732cd07..d3f34a7 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1198,6 +1198,9 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 
addr)
 
ASSERT_RTNL();
 
+   if (in_dev-dev-nd_net != init_net)
+   return;
+
for (im=in_dev-mc_list; im; im=im-next) {
if (im-multiaddr == addr) {
im-users++;
@@ -1277,6 +1280,9 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 
addr)
 
ASSERT_RTNL();
 
+   if (in_dev-dev-nd_net != init_net)
+   return;
+
for (ip=in_dev-mc_list; (i=*ip)!=NULL; ip=i-next) {
if (i-multiaddr==addr) {
if (--i-users == 0) {
@@ -1304,6 +1310,9 @@ void ip_mc_down(struct in_device *in_dev)
 
ASSERT_RTNL();
 
+   if (in_dev-dev-nd_net != init_net)
+   return;
+
for (i=in_dev-mc_list; i; i=i-next)
igmp_group_dropped(i);
 
@@ -1324,6 +1333,9 @@ void ip_mc_init_dev(struct in_device *in_dev)
 {
ASSERT_RTNL();
 
+   if (in_dev-dev-nd_net != init_net)
+   return;
+
in_dev-mc_tomb = NULL;
 #ifdef CONFIG_IP_MULTICAST
in_dev-mr_gq_running = 0;
@@ -1347,6 +1359,9 @@ void ip_mc_up(struct in_device *in_dev)
 
ASSERT_RTNL();
 
+   if (in_dev-dev-nd_net != init_net)
+   return;
+
ip_mc_inc_group(in_dev, IGMP_ALL_HOSTS);
 
for (i=in_dev-mc_list; i; i=i-next)
@@ -1363,6 +1378,9 @@ void ip_mc_destroy_dev(struct in_device *in_dev)
 
ASSERT_RTNL();
 
+   if (in_dev-dev-nd_net != init_net)
+   return;
+
/* Deactivate timers */
ip_mc_down(in_dev);
 
@@ -1744,6 +1762,9 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn 
*imr)
if (!ipv4_is_multicast(addr))
return -EINVAL;
 
+   if (sk-sk_net != init_net)
+   return -EPROTONOSUPPORT;
+
rtnl_lock();
 
in_dev = ip_mc_find_dev(imr);
@@ -1812,6 +1833,9 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn 
*imr)
u32 ifindex;
int ret = -EADDRNOTAVAIL;
 
+   if (sk-sk_net != init_net)
+   return -EPROTONOSUPPORT;
+
rtnl_lock();
in_dev = ip_mc_find_dev(imr);
ifindex = imr-imr_ifindex;
@@ -1857,6 +1881,9 @@ int ip_mc_source(int add, int omode, struct sock *sk, 
struct
if (!ipv4_is_multicast(addr))
return -EINVAL;
 
+   if (sk-sk_net != init_net)
+   return -EPROTONOSUPPORT;
+
rtnl_lock();
 
imr.imr_multiaddr.s_addr = mreqs-imr_multiaddr;
@@ -1990,6 +2017,9 @@ int ip_mc_msfilter(struct sock *sk, struct ip_msfilter 
*msf, int ifindex)
msf-imsf_fmode != MCAST_EXCLUDE)
return -EINVAL;
 
+   if (sk-sk_net != init_net)
+   return -EPROTONOSUPPORT;
+
rtnl_lock();
 
imr.imr_multiaddr.s_addr = msf-imsf_multiaddr;
@@ -2070,6 +2100,9 @@ int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf,
if (!ipv4_is_multicast(addr))
return -EINVAL;
 
+   if (sk-sk_net != init_net)
+   return -EPROTONOSUPPORT;
+
rtnl_lock();
 
imr.imr_multiaddr.s_addr = msf-imsf_multiaddr;
@@ -2132,6 +2165,9 @@ int ip_mc_gsfget(struct sock *sk, struct group_filter 
*gsf,
if (!ipv4_is_multicast(addr))
return -EINVAL;
 
+   if (sk-sk_net != init_net)
+   return -EPROTONOSUPPORT;
+
rtnl_lock();
 
err = -EADDRNOTAVAIL;
@@ -2216,6 +2252,9 @@ void ip_mc_drop_socket(struct sock *sk)
if (inet-mc_list == NULL)
return;
 
+   if (sk-sk_net != init_net)
+   return;
+
rtnl_lock();
while ((iml = inet-mc_list) != NULL) {
struct in_device *in_dev;
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/17 net-2.6.26] [NETNS]: Process devinet ioctl in the correct namespace.

2008-02-18 Thread Denis V. Lunev
Add namespace parameter to devinet_ioctl and locate device inside it for
state changes.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 include/linux/inetdevice.h |2 +-
 net/ipv4/af_inet.c |7 ---
 net/ipv4/devinet.c |6 +++---
 net/ipv4/ipconfig.c|2 +-
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index fc4e3db..da05ab4 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -129,7 +129,7 @@ extern int unregister_inetaddr_notifier(struct 
notifier_block *nb);
 
 extern struct net_device *ip_dev_find(struct net *net, __be32 addr);
 extern int inet_addr_onlink(struct in_device *in_dev, __be32 a, 
__be32 b);
-extern int devinet_ioctl(unsigned int cmd, void __user *);
+extern int devinet_ioctl(struct net *net, unsigned int cmd, void 
__user *);
 extern voiddevinet_init(void);
 extern struct in_device*inetdev_by_index(struct net *, int);
 extern __be32  inet_select_addr(const struct net_device *dev, __be32 
dst, int scope);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 09ca529..c270080 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -784,6 +784,7 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, 
unsigned long arg)
 {
struct sock *sk = sock-sk;
int err = 0;
+   struct net *net = sk-sk_net;
 
switch (cmd) {
case SIOCGSTAMP:
@@ -795,12 +796,12 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, 
unsigned long arg)
case SIOCADDRT:
case SIOCDELRT:
case SIOCRTMSG:
-   err = ip_rt_ioctl(sk-sk_net, cmd, (void __user *)arg);
+   err = ip_rt_ioctl(net, cmd, (void __user *)arg);
break;
case SIOCDARP:
case SIOCGARP:
case SIOCSARP:
-   err = arp_ioctl(sk-sk_net, cmd, (void __user *)arg);
+   err = arp_ioctl(net, cmd, (void __user *)arg);
break;
case SIOCGIFADDR:
case SIOCSIFADDR:
@@ -813,7 +814,7 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, 
unsigned long arg)
case SIOCSIFPFLAGS:
case SIOCGIFPFLAGS:
case SIOCSIFFLAGS:
-   err = devinet_ioctl(cmd, (void __user *)arg);
+   err = devinet_ioctl(net, cmd, (void __user *)arg);
break;
default:
if (sk-sk_prot-ioctl)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 963e711..f7e78b7 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -595,7 +595,7 @@ static __inline__ int inet_abc_len(__be32 addr)
 }
 
 
-int devinet_ioctl(unsigned int cmd, void __user *arg)
+int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 {
struct ifreq ifr;
struct sockaddr_in sin_orig;
@@ -624,7 +624,7 @@ int devinet_ioctl(unsigned int cmd, void __user *arg)
*colon = 0;
 
 #ifdef CONFIG_KMOD
-   dev_load(init_net, ifr.ifr_name);
+   dev_load(net, ifr.ifr_name);
 #endif
 
switch (cmd) {
@@ -665,7 +665,7 @@ int devinet_ioctl(unsigned int cmd, void __user *arg)
rtnl_lock();
 
ret = -ENODEV;
-   if ((dev = __dev_get_by_name(init_net, ifr.ifr_name)) == NULL)
+   if ((dev = __dev_get_by_name(net, ifr.ifr_name)) == NULL)
goto done;
 
if (colon)
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index a52b585..009d78f 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -291,7 +291,7 @@ static int __init ic_dev_ioctl(unsigned int cmd, struct 
ifreq *arg)
 
mm_segment_t oldfs = get_fs();
set_fs(get_ds());
-   res = devinet_ioctl(cmd, (struct ifreq __user *) arg);
+   res = devinet_ioctl(init_net, cmd, (struct ifreq __user *) arg);
set_fs(oldfs);
return res;
 }
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/17 net-2.6.26] [NETNS]: Disable inetaddr notifiers in namespaces other than initial.

2008-02-18 Thread Denis V. Lunev
ip_fib_init is kept enabled. It is already namespace-aware.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 drivers/net/bonding/bond_main.c |3 +++
 drivers/net/via-velocity.c  |3 +++
 drivers/s390/net/qeth_main.c|3 +++
 net/sctp/protocol.c |3 +++
 4 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 0942d82..9666434 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3511,6 +3511,9 @@ static int bond_inetaddr_event(struct notifier_block 
*this, unsigned long event,
struct bonding *bond, *bond_next;
struct vlan_entry *vlan, *vlan_next;
 
+   if (ifa-ifa_dev-dev-nd_net != init_net)
+   return NOTIFY_DONE;
+
list_for_each_entry_safe(bond, bond_next, bond_dev_list, bond_list) {
if (bond-dev == event_dev) {
switch (event) {
diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index c50fdee..1525e8a 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -3464,6 +3464,9 @@ static int velocity_netdev_event(struct notifier_block 
*nb, unsigned long notifi
struct velocity_info *vptr;
unsigned long flags;
 
+   if (dev-nd_net != init_net)
+   return NOTIFY_DONE;
+
spin_lock_irqsave(velocity_dev_list_lock, flags);
list_for_each_entry(vptr, velocity_dev_list, list) {
if (vptr-dev == dev) {
diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c
index 62606ce..d063e9e 100644
--- a/drivers/s390/net/qeth_main.c
+++ b/drivers/s390/net/qeth_main.c
@@ -8622,6 +8622,9 @@ qeth_ip_event(struct notifier_block *this,
struct qeth_ipaddr *addr;
struct qeth_card *card;
 
+   if (dev-nd_net != init_net)
+   return NOTIFY_DONE;
+
QETH_DBF_TEXT(trace,3,ipevent);
card = qeth_get_card_from_dev(dev);
if (!card)
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 22a1657..4475f7e 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -629,6 +629,9 @@ static int sctp_inetaddr_event(struct notifier_block *this, 
unsigned long ev,
struct sctp_sockaddr_entry *addr = NULL;
struct sctp_sockaddr_entry *temp;
 
+   if (ifa-ifa_dev-dev-nd_net != init_net)
+   return NOTIFY_DONE;
+
switch (ev) {
case NETDEV_UP:
addr = kmalloc(sizeof(struct sctp_sockaddr_entry), GFP_ATOMIC);
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/17 net-2.6.26] [NETNS]: Enable inetdev_event notifier.

2008-02-18 Thread Denis V. Lunev
After all these preparations it is time to enable main IPv4 device
initialization routine inside namespace. It is safe do this now.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/devinet.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index f282b26..963e711 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1044,9 +1044,6 @@ static int inetdev_event(struct notifier_block *this, 
unsigned long event,
struct net_device *dev = ptr;
struct in_device *in_dev = __in_dev_get_rtnl(dev);
 
-   if (dev-nd_net != init_net)
-   return NOTIFY_DONE;
-
ASSERT_RTNL();
 
if (!in_dev) {
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/17 net-2.6.26] [NETNS]: Enable IPv4 address manipulations inside namespace.

2008-02-18 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/devinet.c |9 -
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index f7e78b7..aa23d10 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -446,9 +446,6 @@ static int inet_rtm_deladdr(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg
 
ASSERT_RTNL();
 
-   if (net != init_net)
-   return -EINVAL;
-
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv4_policy);
if (err  0)
goto errout;
@@ -560,9 +557,6 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg
 
ASSERT_RTNL();
 
-   if (net != init_net)
-   return -EINVAL;
-
ifa = rtm_to_ifaddr(net, nlh);
if (IS_ERR(ifa))
return PTR_ERR(ifa);
@@ -1169,9 +1163,6 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct 
netlink_callback *cb)
struct in_ifaddr *ifa;
int s_ip_idx, s_idx = cb-args[0];
 
-   if (net != init_net)
-   return 0;
-
s_ip_idx = ip_idx = cb-args[1];
idx = 0;
for_each_netdev(net, dev) {
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/17 net-2.6.26] [NETNS]: Process inet_select_addr inside a namespace.

2008-02-18 Thread Denis V. Lunev
The context is available from a network device passed in.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/devinet.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index aa23d10..033670d 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -871,6 +871,7 @@ __be32 inet_select_addr(const struct net_device *dev, 
__be32 dst, int scope)
 {
__be32 addr = 0;
struct in_device *in_dev;
+   struct net *net = dev-nd_net;
 
rcu_read_lock();
in_dev = __in_dev_get_rcu(dev);
@@ -899,7 +900,7 @@ no_in_dev:
 */
read_lock(dev_base_lock);
rcu_read_lock();
-   for_each_netdev(init_net, dev) {
+   for_each_netdev(net, dev) {
if ((in_dev = __in_dev_get_rcu(dev)) == NULL)
continue;
 
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/17 net-2.6.26] [NETNS]: Enable all routing manipulation via netlink inside namespace.

2008-02-18 Thread Denis V. Lunev
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/route.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5f67eba..79e2e8a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2702,9 +2702,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr* nlh, void
int err;
struct sk_buff *skb;
 
-   if (net != init_net)
-   return -EINVAL;
-
err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy);
if (err  0)
goto errout;
@@ -2734,7 +2731,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr* nlh, void
if (iif) {
struct net_device *dev;
 
-   dev = __dev_get_by_index(init_net, iif);
+   dev = __dev_get_by_index(net, iif);
if (dev == NULL) {
err = -ENODEV;
goto errout_free;
@@ -2760,7 +2757,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr* nlh, void
},
.oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0,
};
-   err = ip_route_output_key(init_net, rt, fl);
+   err = ip_route_output_key(net, rt, fl);
}
 
if (err)
@@ -2771,11 +2768,11 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr* nlh, void
rt-rt_flags |= RTCF_NOTIFY;
 
err = rt_fill_info(skb, NETLINK_CB(in_skb).pid, nlh-nlmsg_seq,
-   RTM_NEWROUTE, 0, 0);
+  RTM_NEWROUTE, 0, 0);
if (err = 0)
goto errout_free;
 
-   err = rtnl_unicast(skb, init_net, NETLINK_CB(in_skb).pid);
+   err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).pid);
 errout:
return err;
 
@@ -2789,6 +2786,9 @@ int ip_rt_dump(struct sk_buff *skb,  struct 
netlink_callback *cb)
struct rtable *rt;
int h, s_h;
int idx, s_idx;
+   struct net *net;
+
+   net = skb-sk-sk_net;
 
s_h = cb-args[0];
if (s_h  0)
@@ -2798,7 +2798,7 @@ int ip_rt_dump(struct sk_buff *skb,  struct 
netlink_callback *cb)
rcu_read_lock_bh();
for (rt = rcu_dereference(rt_hash_table[h].chain), idx = 0; rt;
 rt = rcu_dereference(rt-u.dst.rt_next), idx++) {
-   if (idx  s_idx)
+   if (rt-u.dst.dev-nd_net != net || idx  s_idx)
continue;
if (rt-rt_genid != atomic_read(rt_genid))
continue;
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/17 net-2.6.26] [NETNS]: Register neighbour table parameters in the correct namespace.

2008-02-18 Thread Denis V. Lunev
neigh_sysctl_register should register sysctl entries inside correct namespace
to avoid naming conflict. Typical example is a loopback. Entries for it
present in all namespaces.

Required to make inetdev_event working.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/neighbour.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 7bb6a9a..c895ad4 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2732,7 +2732,8 @@ int neigh_sysctl_register(struct net_device *dev, struct 
neigh_parms *p,
neigh_path[NEIGH_CTL_PATH_PROTO].procname = p_name;
neigh_path[NEIGH_CTL_PATH_PROTO].ctl_name = p_id;
 
-   t-sysctl_header = register_sysctl_paths(neigh_path, t-neigh_vars);
+   t-sysctl_header =
+   register_net_sysctl_table(p-net, neigh_path, t-neigh_vars);
if (!t-sysctl_header)
goto free_procname;
 
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/17 net-2.6.26] [NETFILTER]: Consolidate masq_inet_event and masq_device_event.

2008-02-18 Thread Denis V. Lunev
They do exactly the same job.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/netfilter/ipt_MASQUERADE.c |   14 ++
 1 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c 
b/net/ipv4/netfilter/ipt_MASQUERADE.c
index d80fee8..313b3fc 100644
--- a/net/ipv4/netfilter/ipt_MASQUERADE.c
+++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
@@ -139,18 +139,8 @@ static int masq_inet_event(struct notifier_block *this,
   unsigned long event,
   void *ptr)
 {
-   const struct net_device *dev = ((struct in_ifaddr *)ptr)-ifa_dev-dev;
-
-   if (event == NETDEV_DOWN) {
-   /* IP address was deleted.  Search entire table for
-  conntracks which were associated with that device,
-  and forget them. */
-   NF_CT_ASSERT(dev-ifindex != 0);
-
-   nf_ct_iterate_cleanup(device_cmp, (void *)(long)dev-ifindex);
-   }
-
-   return NOTIFY_DONE;
+   struct net_device *dev = ((struct in_ifaddr *)ptr)-ifa_dev-dev;
+   return masq_device_event(this, event, dev);
 }
 
 static struct notifier_block masq_dev_notifier = {
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/17 net-2.6.26] [IPV4]: Remove check for ifa-ifa_dev != NULL.

2008-02-18 Thread Denis V. Lunev
This is a callback registered to inet address notifier chain.
The check is useless as:
- ifa-ifa_dev is always != NULL
- similar checks are abscent in all other notifiers.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/atm/clip.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/net/atm/clip.c b/net/atm/clip.c
index 86b885e..dd96440 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -648,10 +648,6 @@ static int clip_inet_event(struct notifier_block *this, 
unsigned long event,
struct in_device *in_dev;
 
in_dev = ((struct in_ifaddr *)ifa)-ifa_dev;
-   if (!in_dev || !in_dev-dev) {
-   printk(KERN_WARNING clip_inet_event: no device\n);
-   return NOTIFY_DONE;
-   }
/*
 * Transitions are of the down-change-up type, so it's sufficient to
 * handle the change on up.
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/17 net-2.6.26] [IPV4]: Remove ifa != NULL check.

2008-02-18 Thread Denis V. Lunev
This is a callback registered to inet address notifier chain.
The check is useless as:
- ifa is always != NULL
- similar checks are abscent in all other notifiers.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 drivers/net/via-velocity.c |   22 ++
 1 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index cc0addb..c50fdee 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -3460,21 +3460,19 @@ static int velocity_resume(struct pci_dev *pdev)
 static int velocity_netdev_event(struct notifier_block *nb, unsigned long 
notification, void *ptr)
 {
struct in_ifaddr *ifa = (struct in_ifaddr *) ptr;
+   struct net_device *dev = ifa-ifa_dev-dev;
+   struct velocity_info *vptr;
+   unsigned long flags;
 
-   if (ifa) {
-   struct net_device *dev = ifa-ifa_dev-dev;
-   struct velocity_info *vptr;
-   unsigned long flags;
-
-   spin_lock_irqsave(velocity_dev_list_lock, flags);
-   list_for_each_entry(vptr, velocity_dev_list, list) {
-   if (vptr-dev == dev) {
-   velocity_get_ip(vptr);
-   break;
-   }
+   spin_lock_irqsave(velocity_dev_list_lock, flags);
+   list_for_each_entry(vptr, velocity_dev_list, list) {
+   if (vptr-dev == dev) {
+   velocity_get_ip(vptr);
+   break;
}
-   spin_unlock_irqrestore(velocity_dev_list_lock, flags);
}
+   spin_unlock_irqrestore(velocity_dev_list_lock, flags);
+
return NOTIFY_DONE;
 }
 
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Eric Dumazet

Zhang, Yanmin a écrit :
On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 

On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said:


I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The 
performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

Could you add a comment someplace that says refcnt wants to be on a different
cache line from input/output/ops or performance tanks badly, to warn some
future kernel hacker who starts adding new fields to the structure?

Ok. Below is the new patch.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
tigerton by
moving tclassid to different place. It looks like tclassid could also have 
impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance 
isn't
good. So I move it behind metrics.

2) Add comments before __refcnt.

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch.

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
+0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-	u32			metrics[RTAX_MAX];

-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
 
-#ifdef CONFIG_NET_CLS_ROUTE

-   __u32   tclassid;
-#endif
+   struct dst_entry*path;
 
 	struct neighbour	*neighbour;

struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
 	struct  dst_ops	*ops;

-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /*
+* __refcnt wants to be on a different cache line from
+* input/output/ops or performance tanks badly
+*/
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;





I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels.

Could you please test now with 32 bits one ?

Thank you
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Eric Dumazet

Zhang, Yanmin a écrit :

On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote:

On Mon, 18 Feb 2008 16:12:38 +0800
Zhang, Yanmin [EMAIL PROTECTED] wrote:


On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:

From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 15 Feb 2008 15:21:48 +0100


On linux-2.6.25-rc1 x86_64 :

offsetof(struct dst_entry, lastuse)=0xb0
offsetof(struct dst_entry, __refcnt)=0xb8
offsetof(struct dst_entry, __use)=0xbc
offsetof(struct dst_entry, next)=0xc0

So it should be optimal... I dont know why tbench prefers __refcnt being 
on 0xc0, since in this case lastuse will be on a different cache line...


Each incoming IP packet will need to change lastuse, __refcnt and __use, 
so keeping them in the same cache line is a win.


I suspect then that even this patch could help tbench, since it avoids 
writing lastuse...

I think your suspicions are right, and even moreso
it helps to keep __refcnt out of the same cache line
as input/output/ops which are read-almost-entirely :-

I think you are right. The issue is these three variables sharing the same 
cache line
with input/output/ops.


)

I haven't done an exhaustive analysis, but it seems that
the write traffic to lastuse and __refcnt are about the
same.  However if we find that __refcnt gets hit more
than lastuse in this workload, it explains the regression.

I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The 
performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 
+0800
@@ -52,11 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-	u32			metrics[RTAX_MAX];

-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
+
+   struct dst_entry*path;
 
 #ifdef CONFIG_NET_CLS_ROUTE

__u32   tclassid;
@@ -70,10 +69,12 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
 	struct  dst_ops	*ops;

-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;



Well, after this patch, we grow dst_entry by 8 bytes :

With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I 
don't
enable it. I will move tclassid under ops.


sizeof(struct dst_entry)=0xd0
offsetof(struct dst_entry, input)=0x68
offsetof(struct dst_entry, output)=0x70
offsetof(struct dst_entry, __refcnt)=0xb4
offsetof(struct dst_entry, lastuse)=0xc0
offsetof(struct dst_entry, __use)=0xb8
sizeof(struct rtable)=0x140


So we dirty two cache lines instead of one, unless your cpu have 128 bytes 
cache lines ?

I am quite suprised that my patch to not change lastuse if already set to 
jiffies changes nothing...

If you have some time, could you also test this (unrelated) patch ?

We can avoid dirty all the time a cache line of loopback device.

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index f2a6e71..0a4186a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct 
net_device *dev)
return 0;
}
 #endif
-   dev-last_rx = jiffies;
+#ifdef CONFIG_SMP
+   if (dev-last_rx != jiffies)
+#endif
+   dev-last_rx = jiffies;
 
/* it's OK to use per_cpu_ptr() because BHs are off */

pcpu_lstats = netdev_priv(dev);


Although I didn't test it, I don't think it's ok. The key is __refcnt shares 
the same
cache line with ops/input/output.



Note it was unrelated to struct dst, but dirtying of one cache line of 
'loopback netdevice'


I tested it, and tbench result was better with this patch : 890 MB/s instead 
of 870 MB/s on a bi dual core machine.



I was curious of the potential gain on your 16 cores (4x4) machine.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/17] Finish IPv4 infrastructure namespacing.

2008-02-18 Thread Denis V. Lunev
This set finally allows to manipulate with network devices inside a
namespace and allows to configure them [via netlink]. 'route' is not yet
supported (but prepared to) as it requires a socket.

Additionally, better routing cache support is added.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html