Re: [PATCH 2.6.24 1/1] sch_htb: fix too many events situation
up to single jiffy interval and then delay remainder to other jiffy. Signed-off-by: Martin Devera [EMAIL PROTECTED] I think we would be wise to use something other than loops_per_jiffy. Depending upon the loop calibration method used by a particular architecture it can me one of many different things. Some platforms don't even make use of it and thus leave it at it's aha, ok, I'm not so informed about crossplatform issues. I was also thining about looking at jiffies value and stop once it is startjiffy+2, but with NO_HZ introduction, are jiffies still incremented ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 15:21:48 +0100 On linux-2.6.25-rc1 x86_64 : offsetof(struct dst_entry, lastuse)=0xb0 offsetof(struct dst_entry, __refcnt)=0xb8 offsetof(struct dst_entry, __use)=0xbc offsetof(struct dst_entry, next)=0xc0 So it should be optimal... I dont know why tbench prefers __refcnt being on 0xc0, since in this case lastuse will be on a different cache line... Each incoming IP packet will need to change lastuse, __refcnt and __use, so keeping them in the same cache line is a win. I suspect then that even this patch could help tbench, since it avoids writing lastuse... I think your suspicions are right, and even moreso it helps to keep __refcnt out of the same cache line as input/output/ops which are read-almost-entirely :- I think you are right. The issue is these three variables sharing the same cache line with input/output/ops. ) I haven't done an exhaustive analysis, but it seems that the write traffic to lastuse and __refcnt are about the same. However if we find that __refcnt gets hit more than lastuse in this workload, it explains the regression. I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 +0800 @@ -52,11 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ + + struct dst_entry*path; #ifdef CONFIG_NET_CLS_ROUTE __u32 tclassid; @@ -70,10 +69,12 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24 1/1] sch_htb: fix too many events situation
From: Martin Devera [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 09:03:52 +0100 aha, ok, I'm not so informed about crossplatform issues. I was also thining about looking at jiffies value and stop once it is startjiffy+2, but with NO_HZ introduction, are jiffies still incremented ? There should always be at least once cpu tasked with incrementing jiffies. Lots of stuff would break if not :-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH resend] virtio_net: Fix oops on early interrupts - introduced by virtio reset code
Am Montag, 11. Februar 2008 schrieb Anthony Liguori: The reset support is in Linus's tree so we should try to push it for -rc2. You are right. My repository was borked. will push it to Jeff Garzik. Thanks Jeff can you schedule this fix into your network driver updates? Thanks --- With the latest virtio_reset patches I got the following oops: Unable to handle kernel pointer dereference at virtual kernel address Oops: 0004 [#1] PREEMPT SMP Modules linked in: CPU: 1 Not tainted 2.6.24zlive-guest-10577-g63f5307-dirty #168 Process swapper (pid: 0, task: 0f866040, ksp: 0f86fd78) Krnl PSW : 040410018000 0047598a (skb_recv_done+0x52/0x98) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 Krnl GPRS: 0001 0efd0e60 0001 0f866040 008de4c8 1237 1237 0f977dd8 0020 001132bc 0f977e08 0f977dd8 Krnl Code: 0047597c: e3104034 lg %r1,48(%r4) 00475982: b9040001 lgr %r0,%r1 00475986: b9810003 ogr %r0,%r3 0047598a: eb1040300030 csg %r1,%r0,48(%r4) 00475990: a744fff9 brc 4,475982 00475994: a7110001 tmll%r1,1 00475998: a7840009 brc 8,4759aa 0047599c: e340b0b80004 lg %r4,184(%r11) Call Trace: ([01500f978000] 0x1500f978000) [004779a6] vring_interrupt+0x72/0x88 [00491d9c] kvm_extint_handler+0x34/0x44 [0010d2d4] do_extint+0xc0/0xfc [00113b5a] ext_no_vtime+0x1c/0x20 [0010a0b6] cpu_idle+0x21a/0x230 ([0010a096] cpu_idle+0x1fa/0x230) [0057dfe4] start_secondary+0xa0/0xb4 We must initialize vdev-priv before we use the notify hypercall as vdev-priv is used in skb_recv_done. So lets move the assignment of vdev-priv before we call try_fill_recv. Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Acked-by: Anthony Liguori [EMAIL PROTECTED] --- drivers/net/virtio_net.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: kvm/drivers/net/virtio_net.c === --- kvm.orig/drivers/net/virtio_net.c +++ kvm/drivers/net/virtio_net.c @@ -361,6 +361,7 @@ static int virtnet_probe(struct virtio_d netif_napi_add(dev, vi-napi, virtnet_poll, napi_weight); vi-dev = dev; vi-vdev = vdev; + vdev-priv = vi; /* We expect two virtqueues, receive then send. */ vi-rvq = vdev-config-find_vq(vdev, 0, skb_recv_done); @@ -395,7 +396,6 @@ static int virtnet_probe(struct virtio_d } pr_debug(virtnet: registered device %s\n, dev-name); - vdev-priv = vi; return 0; unregister: -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
e1000: Question about polling
Hello all. Interesting think: Have PC that do NAT. Bandwidth about 600 mbs. Have 4 CPU (2xCoRe 2 DUO HT OFF 3.2 HZ). irqbalance in kernel is off. nat2 ~ # cat /proc/irq/217/smp_affinity 0001 nat2 ~ # cat /proc/irq/218/smp_affinity 0003 Load SI on CPU0 and CPU1 is about 90% Good... try do echo /proc/irq/217/smp_affinity echo /proc/irq/218/smp_affinity Get 100% SI at CPU0 Question Why? I listen that if use IRQ from 1 netdevice to 1 CPU i can get 30% perfomance... but i have 4 CPU... i must get more perfomance if i cat to smp_affinity. picture looks liks this: 0-3 CPU get over 50% SI bandwith up 55% SI... bandwith up... 100% SI on CPU0 I remember patch to fix problem like it... patched function e1000_clean... kernel on pc have this patch (2.6.24-rc7-git2)... e1000 driver work much better (i up to 1.5-2x bandwidth before i get 100% SI), but i think that it not get 100% that it can =) Thanks for answers and sorry for my English -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
switching to proper mail client... Dave Hansen [EMAIL PROTECTED] wrote on 15.02.2008 17:55:38: I've been thinking about that, and I don't think you really *need* to keep a comprehensive map like that. When the memory is in a particular configuration (range of memory present along with unique set of holes) you get a unique ehea_bmap configuration. That layout is completely predictable. So, if at any time you want to figure out what the ehea_bmap address for a particular *Linux* virtual address is, you just need to pretend that you're creating the entire ehea_bmap, use the same algorithm and figure out host you would have placed things, and use that result. Now, that's going to be a slow, crappy linear search (but maybe not as slow as recreating the silly thing). So, you might eventually run into some scalability problems with a lot of packets going around. But, I'd be curious if you do in practice. Up to 14 addresses translation per packet (sg_list) might be required on the transmit side. On receive side it is only 1. Most packets require only very few translations (1 or sometimes more) translations. However, with more then 700.000 packets per second this approach does not seem reasonable from performance perspective when memory is fragmented as you described. The other idea is that you create a mapping that is precisely 1:1 with kernel memory. Let's say you have two sections present, 0 and 100. You have a high_section_index of 100, and you vmalloc() a 100 entry array. You need to create a *CONTIGUOUS* ehea map? Create one like this: EHEA_VADDR-Linux Section 0-0 1-0 2-0 3-0 ... 100-100 It's contiguous. Each area points to a valid Linux memory address. It's also discernable in O(1) to what EHEA address a given Linux address is mapped. You just have a couple of duplicate entries. This has a serious issues with constraint I mentions in the previous mail: - MRs can have a maximum size of the memory available under linux The requirement is not met that the memory region must not be larger then the available memory for that partition. The create MR H_CALL will fails (we tried this and discussed with FW development) Regards, Jan-Bernd Christoph -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24 1/1] sch_htb: fix too many events situation
David Miller wrote: From: Martin Devera [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 09:03:52 +0100 aha, ok, I'm not so informed about crossplatform issues. I was also thining about looking at jiffies value and stop once it is startjiffy+2, but with NO_HZ introduction, are jiffies still incremented ? There should always be at least once cpu tasked with incrementing jiffies. Lots of stuff would break if not :-) Aha ok, so that when (at least one) cpu is busy then I can count on jiffies incrementing via do_timer, can't I ? So that I'd remove the loop limit altogether but limiting it to 1 or 2 jiffies to prevent livelock. Like max_jiff = jiffies+2; /* not +1 at we could be at +0. now */ while (jiffiesmax_jiff) do_hard_potentionaly_long_work(); if (more_work) schedule_to_next_jiffie(); This will keep event queue work load under 66% of system load which seems reasonable to me. Would you accept such solution ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
On Mon, 18 Feb 2008 16:12:38 +0800 Zhang, Yanmin [EMAIL PROTECTED] wrote: On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 15:21:48 +0100 On linux-2.6.25-rc1 x86_64 : offsetof(struct dst_entry, lastuse)=0xb0 offsetof(struct dst_entry, __refcnt)=0xb8 offsetof(struct dst_entry, __use)=0xbc offsetof(struct dst_entry, next)=0xc0 So it should be optimal... I dont know why tbench prefers __refcnt being on 0xc0, since in this case lastuse will be on a different cache line... Each incoming IP packet will need to change lastuse, __refcnt and __use, so keeping them in the same cache line is a win. I suspect then that even this patch could help tbench, since it avoids writing lastuse... I think your suspicions are right, and even moreso it helps to keep __refcnt out of the same cache line as input/output/ops which are read-almost-entirely :- I think you are right. The issue is these three variables sharing the same cache line with input/output/ops. ) I haven't done an exhaustive analysis, but it seems that the write traffic to lastuse and __refcnt are about the same. However if we find that __refcnt gets hit more than lastuse in this workload, it explains the regression. I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 +0800 @@ -52,11 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ + + struct dst_entry*path; #ifdef CONFIG_NET_CLS_ROUTE __u32 tclassid; @@ -70,10 +69,12 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; Well, after this patch, we grow dst_entry by 8 bytes : sizeof(struct dst_entry)=0xd0 offsetof(struct dst_entry, input)=0x68 offsetof(struct dst_entry, output)=0x70 offsetof(struct dst_entry, __refcnt)=0xb4 offsetof(struct dst_entry, lastuse)=0xc0 offsetof(struct dst_entry, __use)=0xb8 sizeof(struct rtable)=0x140 So we dirty two cache lines instead of one, unless your cpu have 128 bytes cache lines ? I am quite suprised that my patch to not change lastuse if already set to jiffies changes nothing... If you have some time, could you also test this (unrelated) patch ? We can avoid dirty all the time a cache line of loopback device. diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index f2a6e71..0a4186a 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev) return 0; } #endif - dev-last_rx = jiffies; +#ifdef CONFIG_SMP + if (dev-last_rx != jiffies) +#endif + dev-last_rx = jiffies; /* it's OK to use per_cpu_ptr() because BHs are off */ pcpu_lstats = netdev_priv(dev); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24 1/1] sch_htb: fix too many events situation
From: Martin Devera [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 11:08:09 +0100 Like max_jiff = jiffies+2; /* not +1 at we could be at +0. now */ while (jiffiesmax_jiff) do_hard_potentionaly_long_work(); if (more_work) schedule_to_next_jiffie(); This will keep event queue work load under 66% of system load which seems reasonable to me. Would you accept such solution ? Sure. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][IBMVETH]: Use single_open instead of manual manipulations.
The code opening proc entry for each device makes the same thing, as the single_open does, so remove the unneeded code. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/drivers/net/ibmveth.c b/drivers/net/ibmveth.c index 57772be..bb31e09 100644 --- a/drivers/net/ibmveth.c +++ b/drivers/net/ibmveth.c @@ -1259,26 +1259,7 @@ static void ibmveth_proc_unregister_driver(void) remove_proc_entry(IBMVETH_PROC_DIR, init_net.proc_net); } -static void *ibmveth_seq_start(struct seq_file *seq, loff_t *pos) -{ - if (*pos == 0) { - return (void *)1; - } else { - return NULL; - } -} - -static void *ibmveth_seq_next(struct seq_file *seq, void *v, loff_t *pos) -{ - ++*pos; - return NULL; -} - -static void ibmveth_seq_stop(struct seq_file *seq, void *v) -{ -} - -static int ibmveth_seq_show(struct seq_file *seq, void *v) +static int ibmveth_show(struct seq_file *seq, void *v) { struct ibmveth_adapter *adapter = seq-private; char *current_mac = ((char*) adapter-netdev-dev_addr); @@ -1302,27 +1283,10 @@ static int ibmveth_seq_show(struct seq_file *seq, void *v) return 0; } -static struct seq_operations ibmveth_seq_ops = { - .start = ibmveth_seq_start, - .next = ibmveth_seq_next, - .stop = ibmveth_seq_stop, - .show = ibmveth_seq_show, -}; static int ibmveth_proc_open(struct inode *inode, struct file *file) { - struct seq_file *seq; - struct proc_dir_entry *proc; - int rc; - - rc = seq_open(file, ibmveth_seq_ops); - if (!rc) { - /* recover the pointer buried in proc_dir_entry data */ - seq = file-private_data; - proc = PDE(inode); - seq-private = proc-data; - } - return rc; + return single_open(file, ibmveth_show, PDE(inode)-data); } static const struct file_operations ibmveth_proc_fops = { @@ -1330,7 +1294,7 @@ static const struct file_operations ibmveth_proc_fops = { .open= ibmveth_proc_open, .read= seq_read, .llseek = seq_lseek, - .release = seq_release, + .release = single_release, }; static void ibmveth_proc_register_adapter(struct ibmveth_adapter *adapter) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.
Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index f93407c..bab72b6 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1151,7 +1151,7 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp, fn = fn-parent; } /* No more references are possible at this point. */ - if (atomic_read(rt-rt6i_ref) != 1) BUG(); + BUG_ON(atomic_read(rt-rt6i_ref) != 1); } inet6_rt_notify(RTM_DELROUTE, rt, info); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?
On Sun, 17 Feb 2008 00:54:08 + (GMT) Chris Rankin [EMAIL PROTECTED] wrote: [Try this again, except this time I'll force the attachment as inline text!] Hi, I have managed to boot 2.6.24.1 on this machine, with the NMI watchdog enabled, by using the acpi=noirq option. (There does seem to be some unhappiness with bridge symlinks in sysfs, though.) ... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === pci :00:01.0: Error creating sysfs bridge symlink, continuing... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c01bae82] pci_bus_add_devices+0xa5/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === I have a vague feeling that this was fixed, perhaps in 2.6.24.x? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25-rc2] e100: Trying to free already-free IRQ 11 during suspend ...
On Sun, 17 Feb 2008 15:36:50 +0300 Andrey Borzenkov [EMAIL PROTECTED] wrote: ... and possibly reboot/poweroff (it flows by too fast to be legible). [ 8803.850634] ACPI: Preparing to enter system sleep state S3 [ 8803.853141] Suspending console(s) [ 8805.287505] serial 00:09: disabled [ 8805.291564] Trying to free already-free IRQ 11 [ 8805.291579] Pid: 6920, comm: pm-suspend Not tainted 2.6.25-rc2-1avb #2 [ 8805.291628] [c0152127] free_irq+0xb7/0x130 [ 8805.291675] [c024bd80] e100_suspend+0xc0/0x100 [ 8805.291724] [c01eaa36] pci_device_suspend+0x26/0x70 [ 8805.291747] [c0243674] suspend_device+0x94/0xd0 [ 8805.291763] [c02439a3] device_suspend+0x153/0x240 [ 8805.291784] [c014314f] suspend_devices_and_enter+0x4f/0xf0 [ 8805.291808] [c0143a5f] ? freeze_processes+0x3f/0x80 [ 8805.291825] [c01432fa] enter_state+0xaa/0x140 [ 8805.291840] [c014341f] state_store+0x8f/0xd0 [ 8805.291852] [c0143390] ? state_store+0x0/0xd0 [ 8805.291866] [c01d3404] kobj_attr_store+0x24/0x30 [ 8805.291901] [c01b547b] sysfs_write_file+0xbb/0x110 [ 8805.291936] [c0177d79] vfs_write+0x99/0x130 [ 8805.291963] [c01b53c0] ? sysfs_write_file+0x0/0x110 [ 8805.291979] [c01782fd] sys_write+0x3d/0x70 [ 8805.291998] [c010409a] sysenter_past_esp+0x5f/0xa5 [ 8805.292038] === [ 8805.347640] ACPI: PCI interrupt for device :00:06.0 disabled [ 8805.361128] ACPI: PCI interrupt for device :00:02.0 disabled [ 8805.376670] hwsleep-0322 [00] enter_sleep_state : Entering sleep state [S3] [ 8805.376670] Back to C! Interface is unused normally (only for netconsole sometimes). dmesg and config attached. Does reverting this: commit 8543da6672b0994921f014f2250e27ae81645580 Author: Auke Kok [EMAIL PROTECTED] Date: Wed Dec 12 16:30:42 2007 -0800 e100: free IRQ to remove warningwhenrebooting with this patch: --- a/drivers/net/e100.c~revert-1 +++ a/drivers/net/e100.c @@ -2804,9 +2804,8 @@ static int e100_suspend(struct pci_dev * pci_enable_wake(pdev, PCI_D3cold, 0); } - free_irq(pdev-irq, netdev); - pci_disable_device(pdev); + free_irq(pdev-irq, netdev); pci_set_power_state(pdev, PCI_D3hot); return 0; @@ -2848,8 +2847,6 @@ static void e100_shutdown(struct pci_dev pci_enable_wake(pdev, PCI_D3cold, 0); } - free_irq(pdev-irq, netdev); - pci_disable_device(pdev); pci_set_power_state(pdev, PCI_D3hot); } _ fix it? Hmm ... after resume device has disappeared at all ... {pts/1}% cat /proc/interrupts CPU0 0:1290492XT-PIC-XTtimer 1: 6675XT-PIC-XTi8042 2: 0XT-PIC-XTcascade 3: 2XT-PIC-XT 4: 2XT-PIC-XT 5: 3XT-PIC-XT 7: 4XT-PIC-XTirda0 8: 0XT-PIC-XTrtc0 9:583XT-PIC-XTacpi 10: 2XT-PIC-XT 11: 31483XT-PIC-XTyenta, yenta, yenta, ohci_hcd:usb1, ALI 5451, pcmcia0.0 12: 28070XT-PIC-XTi8042 14: 21705XT-PIC-XTide0 15: 82123XT-PIC-XTide1 NMI: 0 Non-maskable interrupts TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 I hope that's not a separate bug... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25-rc2, 2.6.24-rc8] page allocation failure...
On Sun, 17 Feb 2008 13:20:59 + Daniel J Blueman [EMAIL PROTECTED] wrote: I'm still hitting this with e1000e on 2.6.25-rc2, 10 times again. It's clearly non-fatal, but then do we expect it to occur? Daniel --- [dmesg] [ 1250.822786] swapper: page allocation failure. order:3, mode:0x4020 [ 1250.822786] Pid: 0, comm: swapper Not tainted 2.6.25-rc2-119 #2 [ 1250.822786] [ 1250.822786] Call Trace: [ 1250.822786] IRQ [8025fe9e] __alloc_pages+0x34e/0x3a0 [ 1250.822786] [8048c6df] ? __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8027acc2] __slab_alloc+0x102/0x3d0 [ 1250.822786] [8048c6df] ? __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8027b8cb] __kmalloc_track_caller+0x7b/0xc0 [ 1250.822786] [8048b74f] __alloc_skb+0x6f/0x160 [ 1250.822786] [8048c6df] __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8042652d] e1000_alloc_rx_buffers+0x1ed/0x260 [ 1250.822786] [80426b5a] e1000_clean_rx_irq+0x22a/0x330 [ 1250.822786] [80422981] e1000_clean+0x1e1/0x540 [ 1250.822786] [8024b7a5] ? tick_program_event+0x45/0x70 [ 1250.822786] [804930ba] net_rx_action+0x9a/0x150 [ 1250.822786] [802336b4] __do_softirq+0x74/0xf0 [ 1250.822786] [8020c5fc] call_softirq+0x1c/0x30 [ 1250.822786] [8020eaad] do_softirq+0x3d/0x80 [ 1250.822786] [80233635] irq_exit+0x85/0x90 [ 1250.822786] [8020eba5] do_IRQ+0x85/0x100 [ 1250.822786] [8020a5b0] ? mwait_idle+0x0/0x50 [ 1250.822786] [8020b981] ret_from_intr+0x0/0xa [ 1250.822786] EOI [8020a5f5] ? mwait_idle+0x45/0x50 [ 1250.822786] [80209a92] ? enter_idle+0x22/0x30 [ 1250.822786] [8020a534] ? cpu_idle+0x74/0xa0 [ 1250.822786] [80527825] ? rest_init+0x55/0x60 They're regularly reported with e1000 too - I don't think aything really changed. e1000 has this crazy problem where because of a cascade of follies (mainly borked hardware) it has to do a 32kb allocation for a 9kb(?) packet. It would be sad if that was carried over into e1000e? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?
--- Andrew Morton [EMAIL PROTECTED] wrote: sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === pci :00:01.0: Error creating sysfs bridge symlink, continuing... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c01bae82] pci_bus_add_devices+0xa5/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === I have a vague feeling that this was fixed, perhaps in 2.6.24.x? Obviously not in 2.6.24.1, and I thought that 2.6.24.2 just added the fix for the vmsplice exploit. So unless 2.6.24.3 has been released...? Cheers, Chris ___ Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/1] claw: make use of DIV_ROUND_UP
From: Julia Lawall [EMAIL PROTECTED] The kernel.h macro DIV_ROUND_UP performs the computation (((n) + (d) - 1) / (d)) but is perhaps more readable. Signed-off-by: Ursula Braun [EMAIL PROTECTED] --- drivers/s390/net/claw.c | 39 ++- 1 file changed, 18 insertions(+), 21 deletions(-) Index: linux-2.6-uschi/drivers/s390/net/claw.c === --- linux-2.6-uschi.orig/drivers/s390/net/claw.c +++ linux-2.6-uschi/drivers/s390/net/claw.c @@ -1851,8 +1851,7 @@ claw_hw_tx(struct sk_buff *skb, struct n } } /* See how many write buffers are required to hold this data */ -numBuffers= ( skb-len + privptr-p_env-write_size - 1) / - ( privptr-p_env-write_size); + numBuffers = DIV_ROUND_UP(skb-len, privptr-p_env-write_size); /* If that number of buffers isn't available, give up for now */ if (privptr-write_free_count numBuffers || @@ -2114,8 +2113,7 @@ init_ccw_bk(struct net_device *dev) */ ccw_blocks_perpage= PAGE_SIZE / CCWBK_SIZE; ccw_pages_required= - (ccw_blocks_required+ccw_blocks_perpage -1) / -ccw_blocks_perpage; + DIV_ROUND_UP(ccw_blocks_required, ccw_blocks_perpage); #ifdef DEBUGMSG printk(KERN_INFO %s: %s() ccw_blocks_perpage=%d\n, @@ -2131,30 +2129,29 @@ init_ccw_bk(struct net_device *dev) * provide good performance. With packing buffers support 32k * buffers are used. */ -if (privptr-p_env-read_size PAGE_SIZE) { -claw_reads_perpage= PAGE_SIZE / privptr-p_env-read_size; -claw_read_pages= (privptr-p_env-read_buffers + - claw_reads_perpage -1) / claw_reads_perpage; + if (privptr-p_env-read_size PAGE_SIZE) { + claw_reads_perpage = PAGE_SIZE / privptr-p_env-read_size; + claw_read_pages = DIV_ROUND_UP(privptr-p_env-read_buffers, + claw_reads_perpage); } else { /* or equal */ -privptr-p_buff_pages_perread= - (privptr-p_env-read_size + PAGE_SIZE - 1) / PAGE_SIZE; -claw_read_pages= - privptr-p_env-read_buffers * privptr-p_buff_pages_perread; + privptr-p_buff_pages_perread = + DIV_ROUND_UP(privptr-p_env-read_size, PAGE_SIZE); + claw_read_pages = privptr-p_env-read_buffers * + privptr-p_buff_pages_perread; } if (privptr-p_env-write_size PAGE_SIZE) { -claw_writes_perpage= - PAGE_SIZE / privptr-p_env-write_size; -claw_write_pages= - (privptr-p_env-write_buffers + claw_writes_perpage -1) / - claw_writes_perpage; + claw_writes_perpage = + PAGE_SIZE / privptr-p_env-write_size; + claw_write_pages = DIV_ROUND_UP(privptr-p_env-write_buffers, + claw_writes_perpage); } else { /* or equal */ -privptr-p_buff_pages_perwrite= -(privptr-p_env-read_size + PAGE_SIZE - 1) / PAGE_SIZE; -claw_write_pages= - privptr-p_env-write_buffers * privptr-p_buff_pages_perwrite; + privptr-p_buff_pages_perwrite = + DIV_ROUND_UP(privptr-p_env-read_size, PAGE_SIZE); + claw_write_pages = privptr-p_env-write_buffers * + privptr-p_buff_pages_perwrite; } #ifdef DEBUGMSG if (privptr-p_env-read_size PAGE_SIZE) { -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/1] s390: claw - More use DIV_ROUND_UP
-- Jeff, this patch is intended for 2.6.25. It makes use of the DIV_ROUND_UP function as proposed by Julia Lawall. Regards, Ursula Braun -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ipv6 debugging
Hi, I'm kindly asking for some debugging tips with the following problem: a machine is running Linux 2.6.24.2, several 802.1q VLAN-s over active/backup bonding over two physical interfaces. Everything is allright, except for after a reboot, there's no IPv6, while IPv4 works. The router's ARP(6) table is empty, the machine doesn't answer ping6. However, if I start tcpdump -i bond0 ip6, everything is allright again. There are some indications that after some period without IPv6 traffic, the same can happen again. Are there known issues which can exhibit themselves like this? Other very similar setups don't show this erratic behaviour. I know that the above doesn't give a fully detailed picture, but thought that I'd better ask before taking the setup into pieces. -- Thanks for your thoughts, Feri. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24-mm1] error compiling net driver NE2000/NE1000
Hi, I don't know if I have to warn on this or not, but as I didn't find any discussion, it's probably better to mention it: the compiling error reported below (or here: http://lkml.org/lkml/2008/2/4/173 ) does not seem to be corrected in 2.6.25-rc2.mm1... So, I don't know if a fix is going on somewhere or if the bug has fallen in a black hole. (In the original mail, I've proposed a patch as a quick fix, but I don't know if it can be considered as a definitive correction or not) Thanks, P. Andrew Morton wrote: On Mon, 4 Feb 2008 16:29:21 +0100 Pierre Peiffer [EMAIL PROTECTED] wrote: Hi, When I compile the kernel 2.6.24-mm1 with: CONFIG_NET_ISA=y CONFIG_NE2000=y I have the following compile error: ... GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 drivers/built-in.o: In function `ne_block_output': linux-2.6.24-mm1/drivers/net/ne.c:797: undefined reference to `NS8390_init' drivers/built-in.o: In function `ne_drv_resume': linux-2.6.24-mm1/drivers/net/ne.c:858: undefined reference to `NS8390_init' drivers/built-in.o: In function `ne_probe1': linux-2.6.24-mm1/drivers/net/ne.c:539: undefined reference to `NS8390_init' make[1]: *** [.tmp_vmlinux1] Error 1 make: *** [sub-make] Error 2 Thanks for reporting this. As I saw that the file 8390p.c is compiled for this driver, but not the file 8390.c which contains this function NS8390_init(), I fixed this error with the following patch. Alan's 8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core.patch would be a prime suspect. I assume this bug isn't present ing mainline or in 2.6.24? As NS8390p_init() does the same thing than NS8390_init(), I suppose that this is the right fix ? Signed-off-by: Pierre Peiffer [EMAIL PROTECTED] --- drivers/net/ne.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Index: b/drivers/net/ne.c === --- a/drivers/net/ne.c +++ b/drivers/net/ne.c @@ -536,7 +536,7 @@ static int __init ne_probe1(struct net_d #ifdef CONFIG_NET_POLL_CONTROLLER dev-poll_controller = eip_poll; #endif -NS8390_init(dev, 0); +NS8390p_init(dev, 0); ret = register_netdev(dev); if (ret) @@ -794,7 +794,7 @@ retry: if (time_after(jiffies, dma_start + 2*HZ/100)) { /* 20ms */ printk(KERN_WARNING %s: timeout waiting for Tx RDC.\n, dev-name); ne_reset_8390(dev); -NS8390_init(dev,1); +NS8390p_init(dev,1); break; } @@ -855,7 +855,7 @@ static int ne_drv_resume(struct platform if (netif_running(dev)) { ne_reset_8390(dev); -NS8390_init(dev, 1); +NS8390p_init(dev, 1); netif_device_attach(dev); } return 0; -- Pierre Peiffer -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] remove include/linux/netfilter_ipv4/ipt_SAME.h
Adrian Bunk wrote: This patch removes the no longer used include/linux/netfilter_ipv4/ipt_SAME.h We kept it around because old iptables binaries need it to build. The kernel no longer supports it, but people might still wish to use a distributor-built iptables binary with old kernels. It will be removed with a number of other headers kept for compatibility in 1-2 years. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
Joe Perches wrote: On Fri, 2008-02-15 at 02:58 -0800, David Miller wrote: From: Bruno Randolf [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 19:48:05 +0900 is there any chance to include a macro like this for printing mac addresses? its advantage is that it can be used without the need to declare buffers for print_mac(), for example: We specifically removed this sort of thing, please don't add it back. Why? @@ -404,11 +405,8 @@ static int vlan_dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) pr_debug(%s: about to send skb: %p to dev: %s\n, __FUNCTION__, skb, skb-dev-name); - pr_debug( MAC_FMT MAC_FMT %4hx %4hx %4hx\n, -veth-h_dest[0], veth-h_dest[1], veth-h_dest[2], -veth-h_dest[3], veth-h_dest[4], veth-h_dest[5], -veth-h_source[0], veth-h_source[1], veth-h_source[2], -veth-h_source[3], veth-h_source[4], veth-h_source[5], + pr_debug( %s %s %4hx %4hx %4hx\n, +print_mac(mac, veth-h_dest), print_mac(mac2, veth-h_source), This results in print_mac getting called twice per packet even without debugging. Whats the problem with MAC_FMT? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipv6 debugging
This sounds to me like the same problem that I was having with OSPF, I think ARP(6) uses multicast ethernet address too. Can you try if the patch below, that I sent Patrick McHardy some days ago, fixes your problem? Regards, Jorge --- Hi Patrick, Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 from you introduced a new field da_synced to struct dev_addr_list that is not properly initialized to 0. So when any of the current users (8021q, macvlan, mac80211) calls dev_mc_sync/unsync they mess the address list for both devices. The attached patch fixed it for me and avoid future problems. Regards, Jorge Signed-off-by: Jorge Boncompte [DTI2] [EMAIL PROTECTED] --- diff --git a/net/core/dev.c b/net/core/dev.c index 9549417..f1b6708 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int *count, } } - da = kmalloc(sizeof(*da), GFP_ATOMIC); + da = kzalloc(sizeof(*da), GFP_ATOMIC); if (da == NULL) return -ENOMEM; memcpy(da-da_addr, addr, alen); - Original Message - From: Ferenc Wagner [EMAIL PROTECTED] To: netdev@vger.kernel.org Sent: Monday, February 18, 2008 3:06 PM Subject: ipv6 debugging Hi, I'm kindly asking for some debugging tips with the following problem: a machine is running Linux 2.6.24.2, several 802.1q VLAN-s over active/backup bonding over two physical interfaces. Everything is allright, except for after a reboot, there's no IPv6, while IPv4 works. The router's ARP(6) table is empty, the machine doesn't answer ping6. However, if I start tcpdump -i bond0 ip6, everything is allright again. There are some indications that after some period without IPv6 traffic, the same can happen again. Are there known issues which can exhibit themselves like this? Other very similar setups don't show this erratic behaviour. I know that the above doesn't give a fully detailed picture, but thought that I'd better ask before taking the setup into pieces. -- Thanks for your thoughts, Feri. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
protocol 0300 is buggy spam in dmesg when injectingcapturing on same interface
When playing with some L2 level fuzzing I started getting lots of protocol 0300 is buggy, dev eth3 spew in dmesg. That interface is also capturing the traffic that's being sent, that's probably why the dev_queue_xmit_nit codepath is getting called in the first place. Tested on 2.6.23-as-shipped-in-F8. didn't spot any relevant changes in .24 but can pretty easily verify there too. Oh. That printk wasn't very easy to find: if (net_ratelimit()) printk(KERN_CRIT protocol %04x is buggy, dev %s\n, and I naturally grepped for is buggy. Any ideas? Add a If it came from AF_PACKET, don't print out anything to that if-statement? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] fib_trie: move statistics to debugfs
On Sun, 17 Feb 2008 22:26:55 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 13 Feb 2008 11:58:06 -0800 Don't want /proc/net/fib_trie and /proc/net/fib_triestat to become permanent kernel space ABI issues, so move to the safer confines of debugfs. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] Stephen, the cat is already out of the bag. We already export this thing so if you want to export different stuff you'll have to provide it via some other means, somewhere else. Thanks. Are we stuck with the format problems? * crappy tree printout * not printing other tables -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Compex FreedomLine 32 PnP-PCI2 broken with de2104x
On Monday 18 February 2008 04:21:11 Grant Grundler wrote: On Wed, Jan 30, 2008 at 09:23:06PM +0100, Ondrej Zary wrote: On Saturday 26 January 2008 21:58:10 Ondrej Zary wrote: Hello, I was having problems with these FreedomLine cards with Linux before but tested it thoroughly today. This card uses DEC 21041 chip and has TP and BNC connectors: 00:12.0 Ethernet controller [0200]: Digital Equipment Corporation DECchip 21041 [Tulip Pass 3] [1011:0014] (rev 21) de2104x driver was loaded automatically by udev and card seemed to work. Until I disconnected the TP cable and putting it back after a while. The driver then switched to (non-existing) AUI port and remained there. I tried to set media to TP using ethtool - and the whole kernel crashed because of BUG_ON(de_is_running(de)); in de_set_media(). Seems that the driver is unable to stop the DMA in de_stop_rxtx(). The BUG_ON() is probably fine normally. But the media selection sounds broken. It's possible to select the wrong media type with 21040 chip but shouldn't be possible with 21041. For 21040 support, see de21040_get_media_info(). But de21041_get_srom_info() is expected to determine which media types are supported from SEPROM media blocks. My guess is that code is broken since it seems to work with de405 driver. If you care to work the difference, I'd be happy to make a patch to fix that up. I don't think that BUG_ON() should be there. It should probably printk a warning but certainly not crash the whole machine. Also, from code review, DE2104X driver still has a few places with potential PCI MMIO Write posting issues. Specifically, I was looking in de_stop_hw() and de_set_media(). Several other bits of code correctly flush MMIO writes: e.g. tulip_read_eeprom(). I commented out AUI detection in the driver - this time it switched to BNC after unplugging the cable and remained there. I also attempted to reset the chip when de_stop_rxtx failed but failed to do it. You'd have to basically hardcode only one media type and it's corresponding parameters. That's bad. It just works with de4x5 with any cable at any time. Then I found that there's de4x5 driver which supports the same cards as de2104x (and some other too) - and this one works fine! I can plug and unplug the cable and even change between TP and BNC ports just by unplugging one and plugging the other cable in. Unfortunately, this driver is blacklisted by default - at least in Slackware and Debian. ISTR there was a time when tulip would compete with de4x5 for devices. tulip is the preferred driver. That's clearly no longer the case and perhaps both distro's need to revisit this. de4x5 has no MODULE_DEVICE_TABLE for PCI devices anymore, so no conflicts. That's probably good for cards that work with tulip driver but bad for mine card and also probably for some other cards that (should) work with de2104x. The question is: why does de2104x exist? Does it work better with some hardware? de2104x is a work in progress. That's why it's marked EXPERIMENTAL in the Kconfig file. Great, it looks to be 6 years old and it's still experimental. Probably because it never worked properly. I think that de2104x driver should be removed (or at least its MODULE_DEVICE_TABLE) and MODULE_DEVICE_TABLE with only 21040 and 21041 PCI IDs added to de4x5. I can send a patch if this is acceptable. BTW. Found that the problem exist at least since 2003: http://oss.sgi.com/archives/netdev/2003-08/msg00951.html Does the de2104x driver work correctly for anyone? No idea. I've only used tulip driver. thanks for the bug report, grant -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Ondrej Zary -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?
On Mon, 18 Feb 2008 05:00:49 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 17 Feb 2008 00:54:08 + (GMT) Chris Rankin [EMAIL PROTECTED] wrote: [Try this again, except this time I'll force the attachment as inline text!] Hi, I have managed to boot 2.6.24.1 on this machine, with the NMI watchdog enabled, by using the acpi=noirq option. (There does seem to be some unhappiness with bridge symlinks in sysfs, though.) ... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === pci :00:01.0: Error creating sysfs bridge symlink, continuing... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c01bae82] pci_bus_add_devices+0xa5/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === I have a vague feeling that this was fixed, perhaps in 2.6.24.x? Never heard of this, what is the initialization script that causes this? Also do you have the SYSFS_DEPRECATED option configured? that caused issues with regular network drivers. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said: I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] Could you add a comment someplace that says refcnt wants to be on a different cache line from input/output/ops or performance tanks badly, to warn some future kernel hacker who starts adding new fields to the structure? pgpVvmy7EVPXS.pgp Description: PGP signature
Re: [PATHCH 1/16] ServerEngines 10Gb NIC driver
Thanks for all comments. I had run checkpatch and corrected all errors excepting a few errors about some macros and the warning about the typedefs. The mail client I used to send the patch folded lines at arbitrary points introduced several trailing white space. This was also the reason for one of the patch not applying clean. We will use git to generate the patches as suggested. Our desire to share common code across drivers for other OSes has been a cause for some ugliness in coding styles. I have one question about bit fields. Several of headers in the common code are generated by srcgen from f/w source files. Some of the structures in these headers have bit fields (with separate definitions for little endian and big endian hosts). Are these un-acceptable in Linux driver submissions ? Thanks. Subbu -- From: Stephen Hemminger [mailto:[EMAIL PROTECTED] To: netdev@vger.kernel.org Sent: Sun, 17 Feb 2008 09:44:45 -0800 Subject: Re: [PATHCH 1/16] ServerEngines 10Gb NIC driver Do all vendor drivers have to come in with the same mistakes. Where is the vendor driver ugly school, and how can the Linux developers teach there? Run this through checkpatch script or just read some of the things that a quick scan shows. snip ___ This message, together with any attachment(s), contains confidential and proprietary information of ServerEngines Corporation and is intended only for the designated recipient(s) named above. Any unauthorized review, printing, retention, copying, disclosure or distribution is strictly prohibited. If you are not the intended recipient of this message, please immediately advise the sender by reply email message and delete all copies of this message and any attachment(s). Thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25-rc2] e100: Trying to free already-free IRQ 11 during suspend ...
On Monday 18 February 2008, Andrew Morton wrote: On Sun, 17 Feb 2008 15:36:50 +0300 Andrey Borzenkov [EMAIL PROTECTED] wrote: ... and possibly reboot/poweroff (it flows by too fast to be legible). [ 8803.850634] ACPI: Preparing to enter system sleep state S3 [ 8803.853141] Suspending console(s) [ 8805.287505] serial 00:09: disabled [ 8805.291564] Trying to free already-free IRQ 11 [ 8805.291579] Pid: 6920, comm: pm-suspend Not tainted 2.6.25-rc2-1avb #2 [ 8805.291628] [c0152127] free_irq+0xb7/0x130 [ 8805.291675] [c024bd80] e100_suspend+0xc0/0x100 [ 8805.291724] [c01eaa36] pci_device_suspend+0x26/0x70 [ 8805.291747] [c0243674] suspend_device+0x94/0xd0 [ 8805.291763] [c02439a3] device_suspend+0x153/0x240 [ 8805.291784] [c014314f] suspend_devices_and_enter+0x4f/0xf0 [ 8805.291808] [c0143a5f] ? freeze_processes+0x3f/0x80 [ 8805.291825] [c01432fa] enter_state+0xaa/0x140 [ 8805.291840] [c014341f] state_store+0x8f/0xd0 [ 8805.291852] [c0143390] ? state_store+0x0/0xd0 [ 8805.291866] [c01d3404] kobj_attr_store+0x24/0x30 [ 8805.291901] [c01b547b] sysfs_write_file+0xbb/0x110 [ 8805.291936] [c0177d79] vfs_write+0x99/0x130 [ 8805.291963] [c01b53c0] ? sysfs_write_file+0x0/0x110 [ 8805.291979] [c01782fd] sys_write+0x3d/0x70 [ 8805.291998] [c010409a] sysenter_past_esp+0x5f/0xa5 [ 8805.292038] === [ 8805.347640] ACPI: PCI interrupt for device :00:06.0 disabled [ 8805.361128] ACPI: PCI interrupt for device :00:02.0 disabled [ 8805.376670] hwsleep-0322 [00] enter_sleep_state : Entering sleep state [S3] [ 8805.376670] Back to C! Interface is unused normally (only for netconsole sometimes). dmesg and config attached. Does reverting this: commit 8543da6672b0994921f014f2250e27ae81645580 [...] fix it? no Hmm ... after resume device has disappeared at all ... {pts/1}% cat /proc/interrupts CPU0 0:1290492XT-PIC-XTtimer 1: 6675XT-PIC-XTi8042 2: 0XT-PIC-XTcascade 3: 2XT-PIC-XT 4: 2XT-PIC-XT 5: 3XT-PIC-XT 7: 4XT-PIC-XTirda0 8: 0XT-PIC-XTrtc0 9:583XT-PIC-XTacpi 10: 2XT-PIC-XT 11: 31483XT-PIC-XTyenta, yenta, yenta, ohci_hcd:usb1, ALI 5451, pcmcia0.0 12: 28070XT-PIC-XTi8042 14: 21705XT-PIC-XTide0 15: 82123XT-PIC-XTide1 NMI: 0 Non-maskable interrupts TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 I hope that's not a separate bug... this is red herring. pm-utils restart network across suspend; eth0 is not activated automatically so it disappears. ifconfig eth0 up brings it back. signature.asc Description: This is a digitally signed message part.
[PATCH] tlan: add static to function definitions
The forward declarations were already marked static, make the definitions be static as well. Fixes the sparse warnings as well. drivers/net/tlan.c:1403:5: warning: symbol 'TLan_HandleInvalid' was not declared. Should it be static? drivers/net/tlan.c:1435:5: warning: symbol 'TLan_HandleTxEOF' was not declared. Should it be static? drivers/net/tlan.c:1521:5: warning: symbol 'TLan_HandleStatOverflow' was not declared. Should it be static? drivers/net/tlan.c:1557:5: warning: symbol 'TLan_HandleRxEOF' was not declared. Should it be static? drivers/net/tlan.c:1692:5: warning: symbol 'TLan_HandleDummy' was not declared. Should it be static? drivers/net/tlan.c:1722:5: warning: symbol 'TLan_HandleTxEOC' was not declared. Should it be static? drivers/net/tlan.c:1770:5: warning: symbol 'TLan_HandleStatusCheck' was not declared. Should it be static? drivers/net/tlan.c:1845:5: warning: symbol 'TLan_HandleRxEOC' was not declared. Should it be static? drivers/net/tlan.c:1905:6: warning: symbol 'TLan_Timer' was not declared. Should it be static? drivers/net/tlan.c:1986:6: warning: symbol 'TLan_ResetLists' was not declared. Should it be static? drivers/net/tlan.c:2046:6: warning: symbol 'TLan_FreeLists' was not declared. Should it be static? drivers/net/tlan.c:2095:6: warning: symbol 'TLan_PrintDio' was not declared. Should it be static? drivers/net/tlan.c:2130:6: warning: symbol 'TLan_PrintList' was not declared. Should it be static? drivers/net/tlan.c:2166:6: warning: symbol 'TLan_ReadAndClearStats' was not declared. Should it be static? drivers/net/tlan.c:2242:1: warning: symbol 'TLan_ResetAdapter' was not declared. Should it be static? drivers/net/tlan.c:2328:1: warning: symbol 'TLan_FinishReset' was not declared. Should it be static? drivers/net/tlan.c:2451:6: warning: symbol 'TLan_SetMac' was not declared. Should it be static? drivers/net/tlan.c:2493:6: warning: symbol 'TLan_PhyPrint' was not declared. Should it be static? drivers/net/tlan.c:2542:6: warning: symbol 'TLan_PhyDetect' was not declared. Should it be static? drivers/net/tlan.c:2589:6: warning: symbol 'TLan_PhyPowerDown' was not declared. Should it be static? drivers/net/tlan.c:2614:6: warning: symbol 'TLan_PhyPowerUp' was not declared. Should it be static? drivers/net/tlan.c:2635:6: warning: symbol 'TLan_PhyReset' was not declared. Should it be static? drivers/net/tlan.c:2663:6: warning: symbol 'TLan_PhyStartLink' was not declared. Should it be static? drivers/net/tlan.c:2750:6: warning: symbol 'TLan_PhyFinishAutoNeg' was not declared. Should it be static? drivers/net/tlan.c:2906:5: warning: symbol 'TLan_MiiReadReg' was not declared. Should it be static? drivers/net/tlan.c:2996:6: warning: symbol 'TLan_MiiSendData' was not declared. Should it be static? drivers/net/tlan.c:3038:6: warning: symbol 'TLan_MiiSync' was not declared. Should it be static? drivers/net/tlan.c:3077:6: warning: symbol 'TLan_MiiWriteReg' was not declared. Should it be static? drivers/net/tlan.c:3147:6: warning: symbol 'TLan_EeSendStart' was not declared. Should it be static? drivers/net/tlan.c:3187:5: warning: symbol 'TLan_EeSendByte' was not declared. Should it be static? drivers/net/tlan.c:3248:6: warning: symbol 'TLan_EeReceiveByte' was not declared. Should it be static? drivers/net/tlan.c:3306:5: warning: symbol 'TLan_EeReadByte' was not declared. Should it be static? Signed-off-by: Harvey Harrison [EMAIL PROTECTED] --- Kept the style consistent with the rest of the file, checkpatch will complain. drivers/net/tlan.c | 64 ++-- 1 files changed, 32 insertions(+), 32 deletions(-) diff --git a/drivers/net/tlan.c b/drivers/net/tlan.c index 3af5b92..0166407 100644 --- a/drivers/net/tlan.c +++ b/drivers/net/tlan.c @@ -1400,7 +1400,7 @@ static void TLan_SetMulticastList( struct net_device *dev ) * **/ -u32 TLan_HandleInvalid( struct net_device *dev, u16 host_int ) +static u32 TLan_HandleInvalid( struct net_device *dev, u16 host_int ) { /* printk( TLAN: Invalid interrupt on %s.\n, dev-name ); */ return 0; @@ -1432,7 +1432,7 @@ u32 TLan_HandleInvalid( struct net_device *dev, u16 host_int ) * **/ -u32 TLan_HandleTxEOF( struct net_device *dev, u16 host_int ) +static u32 TLan_HandleTxEOF( struct net_device *dev, u16 host_int ) { TLanPrivateInfo *priv = netdev_priv(dev); int eoc = 0; @@ -1518,7 +1518,7 @@ u32 TLan_HandleTxEOF( struct net_device *dev, u16 host_int ) * **/ -u32 TLan_HandleStatOverflow( struct net_device *dev, u16 host_int ) +static u32 TLan_HandleStatOverflow( struct net_device *dev, u16 host_int ) { TLan_ReadAndClearStats( dev, TLAN_RECORD ); @@ -1554,7 +1554,7 @@ u32 TLan_HandleStatOverflow(
keyboard dead with 45b5035
The patch [RTNETLINK]: Send a single notification on device state changes. kills (at least) the keyboard here. Everything seems to work fine in single user mode, but when init starts spawning of logins, the keyboard goes bye-bye. Even the power button is ignored. :/ I've tried just creating another vt with chvt 2, but that is insufficient to trigger the bug. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
On Mon, 2008-02-18 at 16:19 +0100, Patrick McHardy wrote: @@ -404,11 +405,8 @@ static int vlan_dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) pr_debug(%s: about to send skb: %p to dev: %s\n, __FUNCTION__, skb, skb-dev-name); - pr_debug( MAC_FMT MAC_FMT %4hx %4hx %4hx\n, -veth-h_dest[0], veth-h_dest[1], veth-h_dest[2], -veth-h_dest[3], veth-h_dest[4], veth-h_dest[5], -veth-h_source[0], veth-h_source[1], veth-h_source[2], -veth-h_source[3], veth-h_source[4], veth-h_source[5], + pr_debug( %s %s %4hx %4hx %4hx\n, +print_mac(mac, veth-h_dest), print_mac(mac2, veth-h_source), This results in print_mac getting called twice per packet even without debugging. Whats the problem with MAC_FMT? It's just a consistency thing. It identifies code where MAC addresses are used. an allyesconfig is a bit smaller (~.1%). pr_debug is a noop when not debugging, print_mac is optimized away. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?
--- Stephen Hemminger [EMAIL PROTECTED] wrote: sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === pci :00:01.0: Error creating sysfs bridge symlink, continuing... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c01bae82] pci_bus_add_devices+0xa5/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === I have a vague feeling that this was fixed, perhaps in 2.6.24.x? Never heard of this, what is the initialization script that causes this? Also do you have the SYSFS_DEPRECATED option configured? that caused issues with regular network drivers. Yes, SYSFS_DEPRECATED is enabled. And the init scripts are from Fedora 8. Cheers, Chris __ Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: keyboard dead with 45b5035
On Monday, 18 of February 2008, Pierre Ossman wrote: The patch [RTNETLINK]: Send a single notification on device state changes. kills (at least) the keyboard here. Everything seems to work fine in single user mode, but when init starts spawning of logins, the keyboard goes bye-bye. Even the power button is ignored. :/ Please try with the patch from http://lkml.org/lkml/2008/2/18/331 . Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?
On Mon, 18 Feb 2008 19:42:25 + (GMT) Chris Rankin [EMAIL PROTECTED] wrote: --- Stephen Hemminger [EMAIL PROTECTED] wrote: sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === pci :00:01.0: Error creating sysfs bridge symlink, continuing... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c01bae82] pci_bus_add_devices+0xa5/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === I have a vague feeling that this was fixed, perhaps in 2.6.24.x? Never heard of this, what is the initialization script that causes this? Also do you have the SYSFS_DEPRECATED option configured? that caused issues with regular network drivers. Yes, SYSFS_DEPRECATED is enabled. And the init scripts are from Fedora 8. There was a bug (fixed in 2.6.24) that had to do with sysfs_create_link and SYSFS_DEPRECATED probably there is a similar problem with directories. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] sis190: read the mac address from the eeprom first
Reading a serie of zero from the cmos sram area do not work well with is_valid_ether_addr(). Let's read the mac address from the eeprom first as it seems more reliable. Fix for http://bugzilla.kernel.org/show_bug.cgi?id=9831 Signed-off-by: Francois Romieu [EMAIL PROTECTED] --- drivers/net/sis190.c | 15 ++- 1 files changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c index 202fdf3..20745fd 100644 --- a/drivers/net/sis190.c +++ b/drivers/net/sis190.c @@ -1633,13 +1633,18 @@ static inline void sis190_init_rxfilter(struct net_device *dev) static int __devinit sis190_get_mac_addr(struct pci_dev *pdev, struct net_device *dev) { - u8 from; + int rc; + + rc = sis190_get_mac_addr_from_eeprom(pdev, dev); + if (rc 0) { + u8 reg; - pci_read_config_byte(pdev, 0x73, from); + pci_read_config_byte(pdev, 0x73, reg); - return (from 0x0001) ? - sis190_get_mac_addr_from_apc(pdev, dev) : - sis190_get_mac_addr_from_eeprom(pdev, dev); + if (reg 0x0001) + rc = sis190_get_mac_addr_from_apc(pdev, dev); + } + return rc; } static void sis190_set_speed_auto(struct net_device *dev) -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: keyboard dead with 45b5035
On Mon, 18 Feb 2008 20:50:01 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 18 of February 2008, Pierre Ossman wrote: The patch [RTNETLINK]: Send a single notification on device state changes. kills (at least) the keyboard here. Everything seems to work fine in single user mode, but when init starts spawning of logins, the keyboard goes bye-bye. Even the power button is ignored. :/ Please try with the patch from http://lkml.org/lkml/2008/2/18/331 . That solved it. I wonder if that's also why modprobe tends to wedge up with the new USB announce thingy... Tomorrow's debugging will tell. -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
David Miller wrote: From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 16:19:40 +0100 Joe Perches wrote: We specifically removed this sort of thing, please don't add it back. Why? We converted the entire tree over the print_mac(), and since the MAC_FMT stuff was therefore no longer used we could remove it. Some references slipped back in somehow, and thus MAC_FMT did too. There is no reason to keep around a global interface for _one_ user when that user can use the recommended interface just as equally as the rest of the tree which we converted. This is a pr_debug() statement we're talking about here. :-) The way pr_debug is implemented it still results in two function calls per packet since the compiler doesn't know that it doesn't have visible side-effects besides modifying the (unused) buffer. I confirmed this using codiff. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 16:19:40 +0100 Joe Perches wrote: On Fri, 2008-02-15 at 02:58 -0800, David Miller wrote: From: Bruno Randolf [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 19:48:05 +0900 is there any chance to include a macro like this for printing mac addresses? its advantage is that it can be used without the need to declare buffers for print_mac(), for example: We specifically removed this sort of thing, please don't add it back. Why? We converted the entire tree over the print_mac(), and since the MAC_FMT stuff was therefore no longer used we could remove it. Some references slipped back in somehow, and thus MAC_FMT did too. There is no reason to keep around a global interface for _one_ user when that user can use the recommended interface just as equally as the rest of the tree which we converted. This is a pr_debug() statement we're talking about here. :-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
Joe Perches wrote: On Mon, 2008-02-18 at 16:19 +0100, Patrick McHardy wrote: @@ -404,11 +405,8 @@ static int vlan_dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) pr_debug(%s: about to send skb: %p to dev: %s\n, __FUNCTION__, skb, skb-dev-name); - pr_debug( MAC_FMT MAC_FMT %4hx %4hx %4hx\n, -veth-h_dest[0], veth-h_dest[1], veth-h_dest[2], -veth-h_dest[3], veth-h_dest[4], veth-h_dest[5], -veth-h_source[0], veth-h_source[1], veth-h_source[2], -veth-h_source[3], veth-h_source[4], veth-h_source[5], + pr_debug( %s %s %4hx %4hx %4hx\n, +print_mac(mac, veth-h_dest), print_mac(mac2, veth-h_source), This results in print_mac getting called twice per packet even without debugging. Whats the problem with MAC_FMT? It's just a consistency thing. It identifies code where MAC addresses are used. an allyesconfig is a bit smaller (~.1%). pr_debug is a noop when not debugging, print_mac is optimized away. No its not, which I also stated in the commit message that restored it. 0x60244313 vlan_dev_hard_start_xmit+433: callq 0x60161dbd print_mac 0x60244318 vlan_dev_hard_start_xmit+438: lea -0x50(%rbp),%rdi 0x6024431c vlan_dev_hard_start_xmit+442: mov%r15,%rsi 0x6024431f vlan_dev_hard_start_xmit+445: callq 0x60161dbd print_mac -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipv6 debugging
Jorge Boncompte [DTI2] [EMAIL PROTECTED] writes: Ferenc Wagner [EMAIL PROTECTED] writes: I'm kindly asking for some debugging tips with the following problem: a machine is running Linux 2.6.24.2, several 802.1q VLAN-s over active/backup bonding over two physical interfaces. Everything is allright, except for after a reboot, there's no IPv6, while IPv4 works. The router's ARP(6) table is empty, the machine doesn't answer ping6. However, if I start tcpdump -i bond0 ip6, everything is allright again. There are some indications that after some period without IPv6 traffic, the same can happen again. Are there known issues which can exhibit themselves like this? Other very similar setups don't show this erratic behaviour. I know that the above doesn't give a fully detailed picture, but thought that I'd better ask before taking the setup into pieces. This sounds to me like the same problem that I was having with OSPF, I think ARP(6) uses multicast ethernet address too. Can you try if the patch below, that I sent Patrick McHardy some days ago, fixes your problem? Hi Jorge, Thank you very much! Your patch indeed fixes my problem. I hope the fix will make it into a stable release soon! -- Regards, Feri. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: fix kernel-doc warnings in header files
From: Randy Dunlap [EMAIL PROTECTED] Add missing structure kernel-doc descriptions to sock.h skbuff.h to fix kernel-doc warnings. (I think that Stephen H. sent a similar patch, but I can't find it. I just want to kill the warnings, with either patch.) Signed-off-by: Randy Dunlap [EMAIL PROTECTED] --- include/linux/skbuff.h |2 ++ include/net/sock.h |1 + 2 files changed, 3 insertions(+) --- linux-2625-rc1g4-kdoc.orig/include/linux/skbuff.h +++ linux-2625-rc1g4-kdoc/include/linux/skbuff.h @@ -232,6 +232,8 @@ typedef unsigned char *sk_buff_data_t; * @mark: Generic packet mark * @nfct: Associated connection, if any * @ipvs_property: skbuff is owned by ipvs + * @peeked: this packet has been seen already, so stats have been + * done for it, don't do them again * @nf_trace: netfilter packet trace flag * @nfctinfo: Relationship of this skb to the connection * @nfct_reasm: netfilter conntrack re-assembly pointer --- linux-2625-rc1g4-kdoc.orig/include/net/sock.h +++ linux-2625-rc1g4-kdoc/include/net/sock.h @@ -180,6 +180,7 @@ struct sock_common { *@sk_sndmsg_off: cached offset for sendmsg *@sk_send_head: front of stuff to transmit *@sk_security: used by security modules + *@sk_mark: generic packet mark *@sk_write_pending: a write to stream socket waits to start *@sk_state_change: callback to indicate change in the state of the sock *@sk_data_ready: callback to indicate there is data to be processed -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][IPROUTE] tc filters usage fixes
A few usage description fixes of tc filters for some minimal consistency (FILTER_KIND because of QDISC_KIND). Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- tc/f_basic.c |4 ++-- tc/f_rsvp.c|2 +- tc/f_u32.c |2 +- tc/tc_filter.c |6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/tc/f_basic.c b/tc/f_basic.c index ad41633..cf2650b 100644 --- a/tc/f_basic.c +++ b/tc/f_basic.c @@ -30,8 +30,8 @@ static void explain(void) fprintf(stderr, Usage: ... basic [ match EMATCH_TREE ] [ police POLICE_SPEC ]\n); fprintf(stderr, [ action ACTION_SPEC ] [ classid CLASSID ]\n); fprintf(stderr, \n); - fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n); - fprintf(stderr,FILTERID := X:Y:Z\n); + fprintf(stderr, Where:\n); + fprintf(stderr,CLASSID := X:Y:Z\n); fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n); } diff --git a/tc/f_rsvp.c b/tc/f_rsvp.c index 7e1e6d9..8f92e8f 100644 --- a/tc/f_rsvp.c +++ b/tc/f_rsvp.c @@ -33,7 +33,7 @@ static void explain(void) fprintf(stderr, Where: GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |\n); fprintf(stderr, u{8|16|32} NUMBER mask MASK at OFFSET}\n); fprintf(stderr,POLICE_SPEC := ... look at TBF\n); - fprintf(stderr,FILTERID := X:Y\n); + fprintf(stderr,CLASSID := X:Y\n); fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n); } diff --git a/tc/f_u32.c b/tc/f_u32.c index 9bc4bb5..957b1b1 100644 --- a/tc/f_u32.c +++ b/tc/f_u32.c @@ -38,7 +38,7 @@ static void explain(void) fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n); fprintf(stderr,SAMPLE := { ip | ip6 | udp | tcp | icmp | u{32|16|8} | mark } SAMPLE_ARGS [divisor DIVISOR]\n); - fprintf(stderr,FILTERID := X:Y:Z\n); + fprintf(stderr,CLASSID := X:Y:Z\n); fprintf(stderr, \nNOTE: CLASSID is parsed at hexadecimal input.\n); } diff --git a/tc/tc_filter.c b/tc/tc_filter.c index d70c656..eb74f89 100644 --- a/tc/tc_filter.c +++ b/tc/tc_filter.c @@ -33,12 +33,12 @@ static void usage(void) fprintf(stderr, Usage: tc filter [ add | del | change | replace | show ] dev STRING\n); fprintf(stderr,[ pref PRIO ] protocol PROTO\n); fprintf(stderr,[ estimator INTERVAL TIME_CONSTANT ]\n); - fprintf(stderr,[ root | classid CLASSID ] [ handle FILTERID ]\n); - fprintf(stderr,[ [ FILTER_TYPE ] [ help | OPTIONS ] ]\n); + fprintf(stderr,[ root | parent CLASSID ] [ handle FILTERID ]\n); + fprintf(stderr,[ [ FILTER_KIND ] [ help | OPTIONS ] ]\n); fprintf(stderr, \n); fprintf(stderr,tc filter show [ dev STRING ] [ root | parent CLASSID ]\n); fprintf(stderr, Where:\n); - fprintf(stderr, FILTER_TYPE := { rsvp | u32 | fw | route | etc. }\n); + fprintf(stderr, FILTER_KIND := { rsvp | u32 | fw | route | etc. }\n); fprintf(stderr, FILTERID := ... format depends on classifier, see there\n); fprintf(stderr, OPTIONS := ... try tc filter add desired FILTER_KIND help\n); return; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Support arbitrary initial TCP timestamps
Adding yet another member to the already bloated tcp_sock structure to implement this is too high a cost. Yes, I was worried that would be deemed too high of a cost, but it was the most efficient way I could think to accomplish what I wanted. I would instead prefer that there be some global random number calculated when the first TCP socket is created, and use that as a global offset. You can even recompute it every few hours if you like. That would work fine if my mine purpose was to randomize the tcp timestamp to mitigate the leak in information regarding uptime, but despite the brief description, that's only a side effect of what I intended to do. What I wanted was a way to be able to choose an initial tcp timestamp for a particular connection that was not tied directly to jiffies. The two patches following this show my intended use case. I intend to enhance syncookie support to allow it to support advanced tcp options (sack and window scaling). Normally syncookies encode the bare minimum state of a connection in the ISN they choose, but the 32bit ISN isn't enough to encode advanced tcp options so you are left with a working but crippled tcp stack during a synflood attack. If in addition to choosing an ISN we are able to choose an initial tcp timestamp, we are then able to encode an additional 32 bits of information that can contain more of the advanced tcp options. This stems from a discussion about implementing IPv6 support for syncookies, and the main concern being that syncookies disabled too many valuable tcp features to be relevant on modern systems. Many people stood in opposition to that statement, but it did not seem as though a general consensus was reached. http://lkml.org/lkml/2008/2/4/396 I'm always open to alternatives. --Glenn -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
TG3 network data corruption regression 2.6.24/2.6.23.4
I am experiencing network data corruption with a 3Com 3C996B-T NIC (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the following patch as the trigger: commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9 Author: Herbert Xu [EMAIL PROTECTED] Date: Wed Nov 14 15:45:21 2007 -0800 [TCP]: Fix size calculation in sk_stream_alloc_pskb We round up the header size in sk_stream_alloc_pskb so that TSO packets get zero tail room. Unfortunately this rounding up is not coordinated with the select_size() function used by TCP to calculate the second parameter of sk_stream_alloc_pskb. As a result, we may allocate more than a page of data in the non-TSO case when exactly one page is desired. In fact, rounding up the head room is detrimental in the non-TSO case because it makes memory that would otherwise be available to the payload head room. TSO doesn't need this either, all it wants is the guarantee that there is no tail room. So this patch fixes this by adjusting the skb_reserve call so that exactly the requested amount (which all callers have calculated in a precise way) is made available as tail room. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Signed-off-by: David S. Miller [EMAIL PROTECTED] This patch was included in 2.6.24 and 2.6.23.4 -stable. I am experiencing data corruption with kernels 2.6.23.4 - 2.6.23.16, 2.6.24 - 2.6.24.2, and 2.6.25-rc2-git1. I have verified that reverting the above patch (by hand) makes the data corruption go away on all affected kernels (note that in 2.6.25 the function is sk_stream_alloc_skb() in net/ipv4/tcp.c rather than sk_stream_alloc_pskb() in include/net/sock.h). (Also note that when testing 2.6.23 - 2.6.23.4, I had to apply the individual patch TG3: Fix performance regression on 5705. from 2.6.23.5.) I do not get data corruption when substituting a SysKonnect 9D21 NIC (which also uses the tg3.ko driver) or a Intel PRO/1000 82546GB NIC (which uses the e1000.ko driver). In addition to the 3Com NIC, my computer has a SCSI HBA with an attached tape drive. The network data corruption happens only when reading from or writing to the tape drive. I have tried both a LSI MPT Fusion Ultra320 SCSI HBA (mptspi.ko) and a LSI 53c1010 Ultra160 HBA (sym53c8xx.ko) with the same results. The NIC and SCSI HBA are on separate PCI-X buses and do not share IRQs. I am using two completely separate test programs to access the SCSI tape drive and test network data integrity, so one would expect no interaction between the two tests other than CPU scheduling and DMA bandwidth. There is no disk I/O generated by either test program. The test program that I am using to debug this problem does the following: Computer A (kernel 2.6.24.2; 3Com 3C996B-T NIC): malloc a 64 KB buf aligned to a 4 KB boundary loop { fill 64 KB buf with count data pattern send(64 KB, MSG_MORE) --- eventually sends corrupted data } (SCSI tape drive test program runs separately in the background) Computer B (kernel 2.6.12): malloc a 64 KB buf aligned to a 4 KB boundary loop { recv(64 KB, MSG_WAITALL) verify count data pattern in 64 KB buf } After running for a few seconds, the verify on computer B detects data corruption in the last 4 bytes of the 64 KB buffer. The last 48 bytes of the corrupted 64 KB buffer look like this: D0 D1 D2 D3 | D4 D5 D6 D7 | D8 D9 DA DB | DC DD DE DF E0 E1 E2 E3 | E4 E5 E6 E7 | E8 E9 EA EB | EC ED EE EF F0 F1 F2 F3 | F4 F5 F6 F7 | F8 F9 FA FB | F4 F5 F6 F7 The last 4 bytes should be FC FD FE FF but instead are corrupted to F4 F5 F6 F7, a sequence which came earlier in the data stream. The data corruption always occurs at this same buffer offset and with the same 4 earlier bytes duplicated. However, it occurs on a different iteration of the send()/recv() loop each time the test is run. When I reverse the test so that Computer A does recv() and Computer B does send(), the test passes with no data corruption. Therefore, it appears that the data corruption happens on send() but not recv(). The motherboard that I am using is a Commell LV-672. This motherboard has a PCI-express x16 slot but no PCI-X slots. To plug in the PCI-X NIC and SCSI HBA, I am using a SuperMicro CSE-RR2UE-AX riser card which plugs into the PCI-express slot on the motherboard and provides 3 PCI-X slots (two slots together on one PCI-X bus and one slot on its own PCI-X bus). The data corruption happens with every combination of the 2 cards in the 3 slots. I assume that the above patch is just exposing some way in which the tg3 driver or the BCM5701 chip are broken. For now, I am just reverting the above patch for kernels that I use until a better solution is forthcoming. I expect that this problem will be difficult for other developers to reproduce, but I can test any patches that anyone wants to send me. [ In the meantime, should we revert the patch for 2.6.23.x and
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 18, 2008 at 10:09:24PM +, James Chapman wrote: Jarek Poplawski wrote: Hi, It seems, this nice report is still uncomplete: could you check if there could have been something more yet? Unfortunately the ISP's syslog stops. But I've been able to borrow two Quad Xeon boxes and have reproduced the problem. Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? Hmm... This is a really long report and quite a bit different from the previous one. I need some time for this. BTW: you sent before a lockdep report with hlist_lock problem. I think this could be fixed in some independent patch to make this all more readable. Are all the other changes in this current patch only because of this or previous lockdep report or for some other reasons (or reports) yet? Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add IPv6 support to TCP SYN cookies
I've posted a series of patches that I believe address Andi's concerns about syncookies not supporting valuable tcp options (primarily SACK, and window scaling). The premise being if the client support tcp timestamps we can encode the additional tcp options in the initial timestamp we send back to the client, and they will be echo'd back to us in the ack. Anyone interested have a look, and provide any suggestions you may have. The new patches are a superset of this patch, so if they are accepted this is one obsolete. Support arbitrary initial TCP timestamps http://lkml.org/lkml/2008/2/15/244 Enable the use of TCP options with syncookies http://lkml.org/lkml/2008/2/15/245 Add IPv6 Support to TCP SYN cookies http://lkml.org/lkml/2008/2/15/246 --Glenn -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][IPROUTE] tc filters usage fixes
Jarek Poplawski wrote, On 02/18/2008 11:10 PM: A few usage description fixes of tc filters for some minimal consistency (FILTER_KIND because of QDISC_KIND). Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] Don't apply: I've sent 2nd version of this patch. Sorry, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2.6.25] gianfar: don't pass NULL dev ptr to DMA ops
From: Becky Bruce [EMAIL PROTECTED] Change all dma op invocations in gianfar.c to actually pass in the device pointer. Currently, the value is ignored, but it will be used going forward as we implement archdata for 32-bit powerpc. Signed-off-by: Becky Bruce [EMAIL PROTECTED] Acked-by: Andy Fleming [EMAIL PROTECTED] --- drivers/net/gianfar.c | 14 +++--- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 0431e9e..9a5160b 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -605,7 +605,7 @@ void stop_gfar(struct net_device *dev) free_skb_resources(priv); - dma_free_coherent(NULL, + dma_free_coherent(dev-dev, sizeof(struct txbd8)*priv-tx_ring_size + sizeof(struct rxbd8)*priv-rx_ring_size, priv-tx_bd_base, @@ -626,7 +626,7 @@ static void free_skb_resources(struct gfar_private *priv) for (i = 0; i priv-tx_ring_size; i++) { if (priv-tx_skbuff[i]) { - dma_unmap_single(NULL, txbdp-bufPtr, + dma_unmap_single(priv-dev-dev, txbdp-bufPtr, txbdp-length, DMA_TO_DEVICE); dev_kfree_skb_any(priv-tx_skbuff[i]); @@ -643,7 +643,7 @@ static void free_skb_resources(struct gfar_private *priv) if(priv-rx_skbuff != NULL) { for (i = 0; i priv-rx_ring_size; i++) { if (priv-rx_skbuff[i]) { - dma_unmap_single(NULL, rxbdp-bufPtr, + dma_unmap_single(priv-dev-dev, rxbdp-bufPtr, priv-rx_buffer_size, DMA_FROM_DEVICE); @@ -708,7 +708,7 @@ int startup_gfar(struct net_device *dev) gfar_write(regs-imask, IMASK_INIT_CLEAR); /* Allocate memory for the buffer descriptors */ - vaddr = (unsigned long) dma_alloc_coherent(NULL, + vaddr = (unsigned long) dma_alloc_coherent(dev-dev, sizeof (struct txbd8) * priv-tx_ring_size + sizeof (struct rxbd8) * priv-rx_ring_size, addr, GFP_KERNEL); @@ -919,7 +919,7 @@ err_irq_fail: rx_skb_fail: free_skb_resources(priv); tx_skb_fail: - dma_free_coherent(NULL, + dma_free_coherent(dev-dev, sizeof(struct txbd8)*priv-tx_ring_size + sizeof(struct rxbd8)*priv-rx_ring_size, priv-tx_bd_base, @@ -1053,7 +1053,7 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) /* Set buffer length and pointer */ txbdp-length = skb-len; - txbdp-bufPtr = dma_map_single(NULL, skb-data, + txbdp-bufPtr = dma_map_single(dev-dev, skb-data, skb-len, DMA_TO_DEVICE); /* Save the skb pointer so we can free it later */ @@ -1332,7 +1332,7 @@ struct sk_buff * gfar_new_skb(struct net_device *dev, struct rxbd8 *bdp) */ skb_reserve(skb, alignamount); - bdp-bufPtr = dma_map_single(NULL, skb-data, + bdp-bufPtr = dma_map_single(dev-dev, skb-data, priv-rx_buffer_size, DMA_FROM_DEVICE); bdp-length = 0; -- 1.5.4.23.gef5b9 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2][IPROUTE] tc filters usage fixes
CLASSID := X:Y:Z == X:Y in f_basic is changed here and no change for f_u32 (it has both CLASSID and FILTERID mentioned). - (take 2) A few usage description fixes of tc filters for some minimal consistency (FILTER_KIND because of QDISC_KIND). Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- tc/f_basic.c |4 ++-- tc/f_rsvp.c|2 +- tc/tc_filter.c |6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tc/f_basic.c b/tc/f_basic.c index ad41633..d8a42d9 100644 --- a/tc/f_basic.c +++ b/tc/f_basic.c @@ -30,8 +30,8 @@ static void explain(void) fprintf(stderr, Usage: ... basic [ match EMATCH_TREE ] [ police POLICE_SPEC ]\n); fprintf(stderr, [ action ACTION_SPEC ] [ classid CLASSID ]\n); fprintf(stderr, \n); - fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n); - fprintf(stderr,FILTERID := X:Y:Z\n); + fprintf(stderr, Where:\n); + fprintf(stderr,CLASSID := X:Y\n); fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n); } diff --git a/tc/f_rsvp.c b/tc/f_rsvp.c index 7e1e6d9..8f92e8f 100644 --- a/tc/f_rsvp.c +++ b/tc/f_rsvp.c @@ -33,7 +33,7 @@ static void explain(void) fprintf(stderr, Where: GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |\n); fprintf(stderr, u{8|16|32} NUMBER mask MASK at OFFSET}\n); fprintf(stderr,POLICE_SPEC := ... look at TBF\n); - fprintf(stderr,FILTERID := X:Y\n); + fprintf(stderr,CLASSID := X:Y\n); fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n); } diff --git a/tc/tc_filter.c b/tc/tc_filter.c index d70c656..eb74f89 100644 --- a/tc/tc_filter.c +++ b/tc/tc_filter.c @@ -33,12 +33,12 @@ static void usage(void) fprintf(stderr, Usage: tc filter [ add | del | change | replace | show ] dev STRING\n); fprintf(stderr,[ pref PRIO ] protocol PROTO\n); fprintf(stderr,[ estimator INTERVAL TIME_CONSTANT ]\n); - fprintf(stderr,[ root | classid CLASSID ] [ handle FILTERID ]\n); - fprintf(stderr,[ [ FILTER_TYPE ] [ help | OPTIONS ] ]\n); + fprintf(stderr,[ root | parent CLASSID ] [ handle FILTERID ]\n); + fprintf(stderr,[ [ FILTER_KIND ] [ help | OPTIONS ] ]\n); fprintf(stderr, \n); fprintf(stderr,tc filter show [ dev STRING ] [ root | parent CLASSID ]\n); fprintf(stderr, Where:\n); - fprintf(stderr, FILTER_TYPE := { rsvp | u32 | fw | route | etc. }\n); + fprintf(stderr, FILTER_KIND := { rsvp | u32 | fw | route | etc. }\n); fprintf(stderr, FILTERID := ... format depends on classifier, see there\n); fprintf(stderr, OPTIONS := ... try tc filter add desired FILTER_KIND help\n); return; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
From: Michael Chan [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 16:32:00 -0800 On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote: I am experiencing network data corruption with a 3Com 3C996B-T NIC (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the following patch as the trigger: Assuming this problem is unique to the 5701, I'm not sure how it is exposed by Herbert's patch. One thing unique on the 5701 is that it double-copies all RX packets so that the data starts at offset 2, but that's quite unrelated to the patch below. One consequence of Herbert's change is that the chip will see a different datastream. The initial skb-data linear area will be smaller, and the transition to the fragmented area of pages will be quicker. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
From: David Miller [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 16:43:05 -0800 (PST) I think we can fix this easily by using __attribute_const_ on the print_mac() declaration. Let me play with that. Actually it seems the 'pure' attribute is more important here. Although it's not semantically a perfect match, what we need to tell the compiler is basically that: 1) the return value depends upon the inputs 2) if the input is not used, it's safe to avoid the call and 'pure' accomplishes that without any unwanted side-effects. I think this will not result in any unwanted over-optimization. Because if the inputs change in any way GCC has to emit the call. Any objections? commit 8f789c48448aed74fe1c07af76de8f04adacec7d Author: David S. Miller [EMAIL PROTECTED] Date: Mon Feb 18 16:50:22 2008 -0800 [NET]: Elminate spurious print_mac() calls. Patrick McHardy notes that print_mac() can get invoked even if the result it unused (f.e. as an argument to pr_debug() when DEBUG is not defined). Mark this function as __pure to eliminate this problem. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h index 7a1e011..42dc6a3 100644 --- a/include/linux/if_ether.h +++ b/include/linux/if_ether.h @@ -129,7 +129,7 @@ extern ssize_t sysfs_format_mac(char *buf, const unsigned char *addr, int len); /* * Display a 6 byte device address (MAC) in a readable format. */ -extern char *print_mac(char *buf, const unsigned char *addr); +extern __pure char *print_mac(char *buf, const unsigned char *addr); #define MAC_BUF_SIZE 18 #define DECLARE_MAC_BUF(var) char var[MAC_BUF_SIZE] __maybe_unused -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote: One consequence of Herbert's change is that the chip will see a different datastream. The initial skb-data linear area will be smaller, and the transition to the fragmented area of pages will be quicker. I see. Perhaps when we get to the end of the data-stream, there is a tiny frag that the chip cannot handle. That's the only thing I can think of. Please try this patch to see if the problem goes away. This will disable SG on 5701 so we always get linear SKBs. diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index db606b6..bb37e76 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev *pdev, } else tp-tg3_flags = ~TG3_FLAG_RX_CHECKSUMS; + if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5701) + dev-features = ~(NETIF_F_IP_CSUM | NETIF_F_SG); + /* flow control autonegotiation is default behavior */ tp-tg3_flags |= TG3_FLAG_PAUSE_AUTONEG; tp-link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
On Mon, 2008-02-18 at 16:50 -0800, David Miller wrote: Actually it seems the 'pure' attribute is more important here. Although it's not semantically a perfect match, what we need to tell the compiler is basically that: 1) the return value depends upon the inputs 2) if the input is not used, it's safe to avoid the call and 'pure' accomplishes that without any unwanted side-effects. I think this will not result in any unwanted over-optimization. Because if the inputs change in any way GCC has to emit the call. Any objections? Does this need to be done for all function calls declared with __attribute__((format(printf, x, y))) { return 0; } ie: pr_debug, dev_dbg, dev_vdbg? Perhaps it's more sensible to go back to #ifdef DEBUG #define pr_debug(fmt, arg...) do {} while (0) #endif and give up the printf argument verification -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote: I am experiencing network data corruption with a 3Com 3C996B-T NIC (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the following patch as the trigger: Assuming this problem is unique to the 5701, I'm not sure how it is exposed by Herbert's patch. One thing unique on the 5701 is that it double-copies all RX packets so that the data starts at offset 2, but that's quite unrelated to the patch below. commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9 Author: Herbert Xu [EMAIL PROTECTED] Date: Wed Nov 14 15:45:21 2007 -0800 [TCP]: Fix size calculation in sk_stream_alloc_pskb I do not get data corruption when substituting a SysKonnect 9D21 NIC (which also uses the tg3.ko driver) What Broadcom chip is on the Syskonnect card? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 22:17:27 +0100 The way pr_debug is implemented it still results in two function calls per packet since the compiler doesn't know that it doesn't have visible side-effects besides modifying the (unused) buffer. I confirmed this using codiff. That's a bug. I think we can fix this easily by using __attribute_const_ on the print_mac() declaration. Let me play with that. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
Joe Perches wrote: Perhaps it's more sensible to go back to #ifdef DEBUG #define pr_debug(fmt, arg...) do {} while (0) #endif and give up the printf argument verification I think argument verification is important. Can you keep it like this: #ifdef DEBUG #define pr_debug(fmt, arg...) \ do { \ if (0) \ printk(KERN_DEBUG fmt, ##arg); \ } while (0) #endif We still lose the return value though, I'm not sure how to keep that safely in macros. But does anything rely on the side effects already? This would introduce bugs if so. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote: On Mon, 18 Feb 2008 16:12:38 +0800 Zhang, Yanmin [EMAIL PROTECTED] wrote: On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 15:21:48 +0100 On linux-2.6.25-rc1 x86_64 : offsetof(struct dst_entry, lastuse)=0xb0 offsetof(struct dst_entry, __refcnt)=0xb8 offsetof(struct dst_entry, __use)=0xbc offsetof(struct dst_entry, next)=0xc0 So it should be optimal... I dont know why tbench prefers __refcnt being on 0xc0, since in this case lastuse will be on a different cache line... Each incoming IP packet will need to change lastuse, __refcnt and __use, so keeping them in the same cache line is a win. I suspect then that even this patch could help tbench, since it avoids writing lastuse... I think your suspicions are right, and even moreso it helps to keep __refcnt out of the same cache line as input/output/ops which are read-almost-entirely :- I think you are right. The issue is these three variables sharing the same cache line with input/output/ops. ) I haven't done an exhaustive analysis, but it seems that the write traffic to lastuse and __refcnt are about the same. However if we find that __refcnt gets hit more than lastuse in this workload, it explains the regression. I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 +0800 @@ -52,11 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ + + struct dst_entry*path; #ifdef CONFIG_NET_CLS_ROUTE __u32 tclassid; @@ -70,10 +69,12 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; Well, after this patch, we grow dst_entry by 8 bytes : With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I don't enable it. I will move tclassid under ops. sizeof(struct dst_entry)=0xd0 offsetof(struct dst_entry, input)=0x68 offsetof(struct dst_entry, output)=0x70 offsetof(struct dst_entry, __refcnt)=0xb4 offsetof(struct dst_entry, lastuse)=0xc0 offsetof(struct dst_entry, __use)=0xb8 sizeof(struct rtable)=0x140 So we dirty two cache lines instead of one, unless your cpu have 128 bytes cache lines ? I am quite suprised that my patch to not change lastuse if already set to jiffies changes nothing... If you have some time, could you also test this (unrelated) patch ? We can avoid dirty all the time a cache line of loopback device. diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index f2a6e71..0a4186a 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev) return 0; } #endif - dev-last_rx = jiffies; +#ifdef CONFIG_SMP + if (dev-last_rx != jiffies) +#endif + dev-last_rx = jiffies; /* it's OK to use per_cpu_ptr() because BHs are off */ pcpu_lstats = netdev_priv(dev); Although I didn't test it, I don't think it's ok. The key is __refcnt shares the same cache line with ops/input/output. -yanmin -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9920] New: kernel panic when using ebtables redirect target
On Fri, Feb 08, 2008 at 05:59:42PM -0800, Andrew Morton wrote: On Fri, 8 Feb 2008 17:40:20 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9920 Summary: kernel panic when using ebtables redirect target Product: Networking Version: 2.5 KernelVersion: 2.6.24 and 2.6.24-git Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.22 ( did not test 2.6.23 ) Earliest failing kernel version: 2.6.24 Distribution: Hardware Environment: Software Environment: bridge working as a router Problem Description: when using ebtables to set up target-redirect, there will be kernel panic Steps to reproduce: 1. set up a basic bridge br0 with slaves eth0, eth1 2. on the bridge setup a default router to route traffic 3. use ebtables to setup target redirect, ebtables -t broute -A BROUTING --logical-in br0 \ -p ipv4 --ip-protocol tcp --ip-destination-port 80 \ -j redirect --redirect-target ACCEPT 4. from a client which is connect to the bridge, send some traffic to allow the BROUTE chain to be traversed :- lynx http://www.google.com 5. Kernel panic :- Pid: 0, comm: swapper Not tainted (2.6.24-tmc #1) EIP: 0060:[c69f61aa] EFLAGS: 0217 CPU: 0 EIP is at ebt_do_table+0x4ea/0x5d0 [ebtables] EAX: EBX: ECX: EDX: 0001 ESI: c69f1178 EDI: c69f1108 EBP: c69f1000 ESP: c0315e20 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Process swapper (pid: 0, ti=c0314000 task=c02f1300 task.ti=c0314000) Stack: c69f11dc 0004 c28c7800 c2b79c20 0005 c69de350 0001 0002 c69ed040 c69ed040 c69f1000 00b0 00b0 c29b0812 c69f1122 a0c3 c29b0812 Call Trace: [c69de032] ebt_broute+0x22/0x30 [ebtable_broute] [c69fef48] br_handle_frame+0xb8/0x220 [bridge] [c02274ac] netif_receive_skb+0x19c/0x440 [c0229ffb] process_backlog+0x6b/0xd0 [c0229a45] net_rx_action+0x105/0x1b0 [c011f835] __do_softirq+0x75/0xf0 [c011f8e7] do_softirq+0x37/0x40 [c011fb25] irq_exit+0x75/0x80 [c010d877] smp_apic_timer_interrupt+0x57/0x90 [c0105b34] apic_timer_interrupt+0x28/0x30 [c0103cd0] default_idle+0x0/0x40 [c0103cff] default_idle+0x2f/0x40 [c0103443] cpu_idle+0x73/0xa0 [c0319cd5] start_kernel+0x2c5/0x340 [c0319420] unknown_bootoption+0x0/0x1e0 === Code: 00 00 83 f9 fe 74 64 83 f9 fc 0f 84 d7 fb ff ff 83 f9 fd 0f 84 bb fc ff ff 8b 5c 24 30 8b 54 24 34 8d 04 5b 8d 04 82 8b 54 24 20 89 28 42 89 50 08 8b 5f 6c 01 df 89 78 04 8b 6c 24 38 8b 54 24 EIP: [c69f61aa] ebt_do_table+0x4ea/0x5d0 [ebtables] SS:ESP 0068:c0315e20 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html [PATCH] netfilter: fix incorrect use of skb_make_writable http://bugzilla.kernel.org/show_bug.cgi?id=9920 The function skb_make_writable returns true or false. Signed-off-by: Joonwoo Park [EMAIL PROTECTED] --- net/bridge/netfilter/ebt_dnat.c |2 +- net/bridge/netfilter/ebt_redirect.c |2 +- net/bridge/netfilter/ebt_snat.c |2 +- net/ipv4/netfilter/arpt_mangle.c|2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/bridge/netfilter/ebt_dnat.c b/net/bridge/netfilter/ebt_dnat.c index e700cbf..1ec671d 100644 --- a/net/bridge/netfilter/ebt_dnat.c +++ b/net/bridge/netfilter/ebt_dnat.c @@ -20,7 +20,7 @@ static int ebt_target_dnat(struct sk_buff *skb, unsigned int hooknr, { const struct ebt_nat_info *info = data; - if (skb_make_writable(skb, 0)) + if (!skb_make_writable(skb, 0)) return NF_DROP; memcpy(eth_hdr(skb)-h_dest, info-mac, ETH_ALEN); diff --git a/net/bridge/netfilter/ebt_redirect.c b/net/bridge/netfilter/ebt_redirect.c index bfdf2fb..bfb9f74 100644 --- a/net/bridge/netfilter/ebt_redirect.c +++ b/net/bridge/netfilter/ebt_redirect.c @@ -21,7 +21,7 @@ static int ebt_target_redirect(struct sk_buff *skb, unsigned int hooknr, { const struct ebt_redirect_info *info = data; - if (skb_make_writable(skb, 0)) + if (!skb_make_writable(skb, 0)) return NF_DROP; if (hooknr != NF_BR_BROUTING) diff --git a/net/bridge/netfilter/ebt_snat.c b/net/bridge/netfilter/ebt_snat.c index e252dab..204f996 100644 --- a/net/bridge/netfilter/ebt_snat.c +++ b/net/bridge/netfilter/ebt_snat.c @@ -22,7 +22,7 @@ static int ebt_target_snat(struct sk_buff *skb, unsigned int hooknr, { const struct ebt_nat_info *info = data; - if
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
From: Joe Perches [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 17:03:32 -0800 Does this need to be done for all function calls declared with __attribute__((format(printf, x, y))) { return 0; } ie: pr_debug, dev_dbg, dev_vdbg? No, I don't think so. We're adding the tag to teach the compiler that if the return value isn't used, it is OK not to emit the call altogether. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
From: James Chapman [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 22:09:24 + Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? Firstly, let's fix one thing at a time. Leave the sk_dst_get() thing alone until we can prove that it's part of the lockdep traces. Next, I can't see why ppp_input() needs to be invoked with interrupts disabled. There are many other things that invoke that in software interrupt context, such as pppoe. Please provide the lockdep traces without the ppp_input() IRQ disabling so this can be properly analyzed. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] bluetooth : put hci dev after del conn
From: Dave Young [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 15:55:55 +0800 Move hci_dev_put to del_conn to avoid hci dev going away before hci conn. This looks correct so I have applied it. Signed-off-by: Dave Young [EMAIL PROTECTED] Please remove the extraneous space at the end of your signoff line next time :-) Also, I reworked the loop in del_conn() so that it no longer generates a compile warning, so I had to apply your patch by hand. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] bluetooth : do not move child device other than rfcomm
From: Dave Young [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 15:58:05 +0800 hci conn child devices other than rfcomm tty should not be moved here. This is my lost, thanks for Barnaby's reporting and testing. Signed-off-by: Dave Young [EMAIL PROTECTED] Applied, thanks Dave. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPV6]: dst_entry leak in ip4ip6_err. (resend)
From: Denis V. Lunev [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 11:59:38 +0300 The result of the ip_route_output is not assigned to skb. This means that - it is leaked - possible OOPS below dereferrencing skb-dst - no ICMP message for this case Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] This bug has been there for a few releases :-) Applied and I'll queue this up for -stable too. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.
From: Pavel Emelyanov [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 15:50:11 +0300 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Applied, thanks Pavel. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: fix kernel-doc warnings in header files
From: Randy Dunlap [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 13:26:47 -0800 From: Randy Dunlap [EMAIL PROTECTED] Add missing structure kernel-doc descriptions to sock.h skbuff.h to fix kernel-doc warnings. (I think that Stephen H. sent a similar patch, but I can't find it. I just want to kill the warnings, with either patch.) Signed-off-by: Randy Dunlap [EMAIL PROTECTED] Applied, thanks Randy. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9920] New: kernel panic when using ebtables redirect target
From: Joonwoo Park [EMAIL PROTECTED] Date: Tue, 19 Feb 2008 11:53:24 +0900 [PATCH] netfilter: fix incorrect use of skb_make_writable http://bugzilla.kernel.org/show_bug.cgi?id=9920 The function skb_make_writable returns true or false. Signed-off-by: Joonwoo Park [EMAIL PROTECTED] I'll let Patrick pull this in, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bonding: simplify code and get rid of warning
Get rid of warning and simplify code that looks up vlan tag. No need to get tag, then copy it. Also no need for a local status variable. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- Patch against current 2.6.25 version. --- a/drivers/net/bonding/bond_alb.c2008-02-18 20:58:53.0 -0800 +++ b/drivers/net/bonding/bond_alb.c2008-02-18 21:01:10.0 -0800 @@ -678,12 +678,8 @@ static struct slave *rlb_choose_channel( } if (!list_empty(bond-vlan_list)) { - unsigned short vlan_id; - int res = vlan_get_tag(skb, vlan_id); - if (!res) { + if (!vlan_get_tag(skb, client_info-vlan_id)) client_info-tag = 1; - client_info-vlan_id = vlan_id; - } } if (!client_info-assigned) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cls_u32 u32_classify()
From: Dzianis Kahanovich [EMAIL PROTECTED] Date: Wed, 30 Jan 2008 11:16:30 -0200 Currently fine u32 hashkey ... at ... not work with relative offsets. There are simpliest fix to use eat. So the question is whether 'sel' is defined to be calculated before all offsets and EAT operations are processed or before. I do not understand the U32 classifier enough to know what this kind of change might or might not break. Can some u32 expert review this? Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: keyboard dead with 45b5035
On Mon, 18 Feb 2008 21:50:12 +0100 Pierre Ossman [EMAIL PROTECTED] wrote: On Mon, 18 Feb 2008 20:50:01 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Monday, 18 of February 2008, Pierre Ossman wrote: The patch [RTNETLINK]: Send a single notification on device state changes. kills (at least) the keyboard here. Everything seems to work fine in single user mode, but when init starts spawning of logins, the keyboard goes bye-bye. Even the power button is ignored. :/ Please try with the patch from http://lkml.org/lkml/2008/2/18/331 . That solved it. Perhaps not quite. When I returned to my laptop this morning, the keyboard was gone again. Did a hard reboot, and the machine locked up a few seconds after starting X. I'll see if it can be reproduced... Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said: I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] Could you add a comment someplace that says refcnt wants to be on a different cache line from input/output/ops or performance tanks badly, to warn some future kernel hacker who starts adding new fields to the structure? Ok. Below is the new patch. 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by moving tclassid to different place. It looks like tclassid could also have impact on performance. If moving tclassid before metrics, or just don't move tclassid, the performance isn't good. So I move it behind metrics. 2) Add comments before __refcnt. If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than the one without the patch. If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than the one without the patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 +0800 @@ -52,15 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ -#ifdef CONFIG_NET_CLS_ROUTE - __u32 tclassid; -#endif + struct dst_entry*path; struct neighbour*neighbour; struct hh_cache *hh; @@ -70,10 +65,20 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + +#ifdef CONFIG_NET_CLS_ROUTE + __u32 tclassid; +#endif + + /* +* __refcnt wants to be on a different cache line from +* input/output/ops or performance tanks badly +*/ atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/17 net-2.6.26] [NETNS]: Process ip_rt_redirect in the correct namespace.
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/route.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 525787b..44708ab 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1132,10 +1132,12 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw, __be32 skeys[2] = { saddr, 0 }; int ikeys[2] = { dev-ifindex, 0 }; struct netevent_redirect netevent; + struct net *net; if (!in_dev) return; + net = dev-nd_net; if (new_gw == old_gw || !IN_DEV_RX_REDIRECTS(in_dev) || ipv4_is_multicast(new_gw) || ipv4_is_lbcast(new_gw) || ipv4_is_zeronet(new_gw)) @@ -1147,7 +1149,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw, if (IN_DEV_SEC_REDIRECTS(in_dev) ip_fib_check_default(new_gw, dev)) goto reject_redirect; } else { - if (inet_addr_type(init_net, new_gw) != RTN_UNICAST) + if (inet_addr_type(net, new_gw) != RTN_UNICAST) goto reject_redirect; } @@ -1165,7 +1167,8 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw, rth-fl.fl4_src != skeys[i] || rth-fl.oif != ikeys[k] || rth-fl.iif != 0 || - rth-rt_genid != atomic_read(rt_genid)) { + rth-rt_genid != atomic_read(rt_genid) || + rth-u.dst.dev-nd_net != net) { rthp = rth-u.dst.rt_next; continue; } -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/17 net-2.6.26] [NETNS]: Process /proc/net/rt_cache inside a namespace.
Show routing cache for a particular namespace only. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/route.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 67df872..c11e6bf 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -273,6 +273,7 @@ static unsigned int rt_hash_code(u32 daddr, u32 saddr) #ifdef CONFIG_PROC_FS struct rt_cache_iter_state { + struct seq_net_private p; int bucket; int genid; }; @@ -285,7 +286,8 @@ static struct rtable *rt_cache_get_first(struct rt_cache_iter_state *st) rcu_read_lock_bh(); r = rcu_dereference(rt_hash_table[st-bucket].chain); while (r) { - if (r-rt_genid == st-genid) + if (r-u.dst.dev-nd_net == st-p.net + r-rt_genid == st-genid) return r; r = rcu_dereference(r-u.dst.rt_next); } @@ -312,6 +314,8 @@ static struct rtable *rt_cache_get_next(struct rt_cache_iter_state *st, struct rtable *r) { while ((r = __rt_cache_get_next(st, r)) != NULL) { + if (r-u.dst.dev-nd_net != st-p.net) + continue; if (r-rt_genid == st-genid) break; } @@ -398,7 +402,7 @@ static const struct seq_operations rt_cache_seq_ops = { static int rt_cache_seq_open(struct inode *inode, struct file *file) { - return seq_open_private(file, rt_cache_seq_ops, + return seq_open_net(inode, file, rt_cache_seq_ops, sizeof(struct rt_cache_iter_state)); } @@ -407,7 +411,7 @@ static const struct file_operations rt_cache_seq_fops = { .open= rt_cache_seq_open, .read= seq_read, .llseek = seq_lseek, - .release = seq_release_private, + .release = seq_release_net, }; -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/17 net-2.6.26] [NETNS]: Register /proc/net/rt_cache for each namespace.
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/route.c | 24 +--- 1 files changed, 21 insertions(+), 3 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index c11e6bf..5f67eba 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -545,7 +545,7 @@ static int ip_rt_acct_read(char *buffer, char **start, off_t offset, } #endif -static __init int ip_rt_proc_init(struct net *net) +static int __net_init ip_rt_do_proc_init(struct net *net) { struct proc_dir_entry *pde; @@ -577,8 +577,26 @@ err2: err1: return -ENOMEM; } + +static void __net_exit ip_rt_do_proc_exit(struct net *net) +{ + remove_proc_entry(rt_cache, net-proc_net_stat); + remove_proc_entry(rt_cache, net-proc_net); + remove_proc_entry(rt_acct, net-proc_net); +} + +static struct pernet_operations ip_rt_proc_ops __net_initdata = { + .init = ip_rt_do_proc_init, + .exit = ip_rt_do_proc_exit, +}; + +static int __init ip_rt_proc_init(void) +{ + return register_pernet_subsys(ip_rt_proc_ops); +} + #else -static inline int ip_rt_proc_init(struct net *net) +static inline int ip_rt_proc_init(void) { return 0; } @@ -3056,7 +3074,7 @@ int __init ip_rt_init(void) ip_rt_secret_interval; add_timer(rt_secret_timer); - if (ip_rt_proc_init(init_net)) + if (ip_rt_proc_init()) printk(KERN_ERR Unable to create route proc files\n); #ifdef CONFIG_XFRM xfrm_init(); -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.
Default ARP parameters should be findable regardless of the context. Required to make inetdev_event working. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index c895ad4..45ed620 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1275,9 +1275,7 @@ static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, struct neigh_parms *p; for (p = tbl-parms; p; p = p-next) { - if (p-net != net) - continue; - if ((p-dev p-dev-ifindex == ifindex) || + if ((p-dev p-dev-ifindex == ifindex p-net == net) || (!p-dev !ifindex)) return p; } -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/17 net-2.6.26] [NETNS]: DST cleanup routines should be called inside namespace.
Device inside the namespace can be started and downed. So, active routing cache should be cleaned up on device stop. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/dst.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/net/core/dst.c b/net/core/dst.c index 7deef48..3a01a81 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -295,9 +295,6 @@ static int dst_dev_event(struct notifier_block *this, unsigned long event, void struct net_device *dev = ptr; struct dst_entry *dst, *last = NULL; - if (dev-nd_net != init_net) - return NOTIFY_DONE; - switch (event) { case NETDEV_UNREGISTER: case NETDEV_DOWN: -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/17 net-2.6.26] [IPV4]: rt_cache_get_next should take rt_genid into account.
In the other case /proc/net/rt_cache will look inconsistent in respect to genid. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Acked-by: Alexey Kuznetsov [EMAIL PROTECTED] --- net/ipv4/route.c | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 44708ab..67df872 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -294,7 +294,8 @@ static struct rtable *rt_cache_get_first(struct rt_cache_iter_state *st) return r; } -static struct rtable *rt_cache_get_next(struct rt_cache_iter_state *st, struct rtable *r) +static struct rtable *__rt_cache_get_next(struct rt_cache_iter_state *st, + struct rtable *r) { r = r-u.dst.rt_next; while (!r) { @@ -307,16 +308,23 @@ static struct rtable *rt_cache_get_next(struct rt_cache_iter_state *st, struct r return rcu_dereference(r); } +static struct rtable *rt_cache_get_next(struct rt_cache_iter_state *st, + struct rtable *r) +{ + while ((r = __rt_cache_get_next(st, r)) != NULL) { + if (r-rt_genid == st-genid) + break; + } + return r; +} + static struct rtable *rt_cache_get_idx(struct rt_cache_iter_state *st, loff_t pos) { struct rtable *r = rt_cache_get_first(st); if (r) - while (pos (r = rt_cache_get_next(st, r))) { - if (r-rt_genid != st-genid) - continue; + while (pos (r = rt_cache_get_next(st, r))) --pos; - } return pos ? NULL : r; } -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/17 net-2.6.26] [NETNS]: Disable multicaststing configuration inside non-initial namespace.
Do not calls hooks from device notifiers and disallow configuration from ioctl/netlink layer. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/igmp.c | 39 +++ 1 files changed, 39 insertions(+), 0 deletions(-) diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 732cd07..d3f34a7 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1198,6 +1198,9 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 addr) ASSERT_RTNL(); + if (in_dev-dev-nd_net != init_net) + return; + for (im=in_dev-mc_list; im; im=im-next) { if (im-multiaddr == addr) { im-users++; @@ -1277,6 +1280,9 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr) ASSERT_RTNL(); + if (in_dev-dev-nd_net != init_net) + return; + for (ip=in_dev-mc_list; (i=*ip)!=NULL; ip=i-next) { if (i-multiaddr==addr) { if (--i-users == 0) { @@ -1304,6 +1310,9 @@ void ip_mc_down(struct in_device *in_dev) ASSERT_RTNL(); + if (in_dev-dev-nd_net != init_net) + return; + for (i=in_dev-mc_list; i; i=i-next) igmp_group_dropped(i); @@ -1324,6 +1333,9 @@ void ip_mc_init_dev(struct in_device *in_dev) { ASSERT_RTNL(); + if (in_dev-dev-nd_net != init_net) + return; + in_dev-mc_tomb = NULL; #ifdef CONFIG_IP_MULTICAST in_dev-mr_gq_running = 0; @@ -1347,6 +1359,9 @@ void ip_mc_up(struct in_device *in_dev) ASSERT_RTNL(); + if (in_dev-dev-nd_net != init_net) + return; + ip_mc_inc_group(in_dev, IGMP_ALL_HOSTS); for (i=in_dev-mc_list; i; i=i-next) @@ -1363,6 +1378,9 @@ void ip_mc_destroy_dev(struct in_device *in_dev) ASSERT_RTNL(); + if (in_dev-dev-nd_net != init_net) + return; + /* Deactivate timers */ ip_mc_down(in_dev); @@ -1744,6 +1762,9 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr) if (!ipv4_is_multicast(addr)) return -EINVAL; + if (sk-sk_net != init_net) + return -EPROTONOSUPPORT; + rtnl_lock(); in_dev = ip_mc_find_dev(imr); @@ -1812,6 +1833,9 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr) u32 ifindex; int ret = -EADDRNOTAVAIL; + if (sk-sk_net != init_net) + return -EPROTONOSUPPORT; + rtnl_lock(); in_dev = ip_mc_find_dev(imr); ifindex = imr-imr_ifindex; @@ -1857,6 +1881,9 @@ int ip_mc_source(int add, int omode, struct sock *sk, struct if (!ipv4_is_multicast(addr)) return -EINVAL; + if (sk-sk_net != init_net) + return -EPROTONOSUPPORT; + rtnl_lock(); imr.imr_multiaddr.s_addr = mreqs-imr_multiaddr; @@ -1990,6 +2017,9 @@ int ip_mc_msfilter(struct sock *sk, struct ip_msfilter *msf, int ifindex) msf-imsf_fmode != MCAST_EXCLUDE) return -EINVAL; + if (sk-sk_net != init_net) + return -EPROTONOSUPPORT; + rtnl_lock(); imr.imr_multiaddr.s_addr = msf-imsf_multiaddr; @@ -2070,6 +2100,9 @@ int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf, if (!ipv4_is_multicast(addr)) return -EINVAL; + if (sk-sk_net != init_net) + return -EPROTONOSUPPORT; + rtnl_lock(); imr.imr_multiaddr.s_addr = msf-imsf_multiaddr; @@ -2132,6 +2165,9 @@ int ip_mc_gsfget(struct sock *sk, struct group_filter *gsf, if (!ipv4_is_multicast(addr)) return -EINVAL; + if (sk-sk_net != init_net) + return -EPROTONOSUPPORT; + rtnl_lock(); err = -EADDRNOTAVAIL; @@ -2216,6 +2252,9 @@ void ip_mc_drop_socket(struct sock *sk) if (inet-mc_list == NULL) return; + if (sk-sk_net != init_net) + return; + rtnl_lock(); while ((iml = inet-mc_list) != NULL) { struct in_device *in_dev; -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/17 net-2.6.26] [NETNS]: Process devinet ioctl in the correct namespace.
Add namespace parameter to devinet_ioctl and locate device inside it for state changes. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- include/linux/inetdevice.h |2 +- net/ipv4/af_inet.c |7 --- net/ipv4/devinet.c |6 +++--- net/ipv4/ipconfig.c|2 +- 4 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index fc4e3db..da05ab4 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -129,7 +129,7 @@ extern int unregister_inetaddr_notifier(struct notifier_block *nb); extern struct net_device *ip_dev_find(struct net *net, __be32 addr); extern int inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b); -extern int devinet_ioctl(unsigned int cmd, void __user *); +extern int devinet_ioctl(struct net *net, unsigned int cmd, void __user *); extern voiddevinet_init(void); extern struct in_device*inetdev_by_index(struct net *, int); extern __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 09ca529..c270080 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -784,6 +784,7 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { struct sock *sk = sock-sk; int err = 0; + struct net *net = sk-sk_net; switch (cmd) { case SIOCGSTAMP: @@ -795,12 +796,12 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) case SIOCADDRT: case SIOCDELRT: case SIOCRTMSG: - err = ip_rt_ioctl(sk-sk_net, cmd, (void __user *)arg); + err = ip_rt_ioctl(net, cmd, (void __user *)arg); break; case SIOCDARP: case SIOCGARP: case SIOCSARP: - err = arp_ioctl(sk-sk_net, cmd, (void __user *)arg); + err = arp_ioctl(net, cmd, (void __user *)arg); break; case SIOCGIFADDR: case SIOCSIFADDR: @@ -813,7 +814,7 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) case SIOCSIFPFLAGS: case SIOCGIFPFLAGS: case SIOCSIFFLAGS: - err = devinet_ioctl(cmd, (void __user *)arg); + err = devinet_ioctl(net, cmd, (void __user *)arg); break; default: if (sk-sk_prot-ioctl) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 963e711..f7e78b7 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -595,7 +595,7 @@ static __inline__ int inet_abc_len(__be32 addr) } -int devinet_ioctl(unsigned int cmd, void __user *arg) +int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg) { struct ifreq ifr; struct sockaddr_in sin_orig; @@ -624,7 +624,7 @@ int devinet_ioctl(unsigned int cmd, void __user *arg) *colon = 0; #ifdef CONFIG_KMOD - dev_load(init_net, ifr.ifr_name); + dev_load(net, ifr.ifr_name); #endif switch (cmd) { @@ -665,7 +665,7 @@ int devinet_ioctl(unsigned int cmd, void __user *arg) rtnl_lock(); ret = -ENODEV; - if ((dev = __dev_get_by_name(init_net, ifr.ifr_name)) == NULL) + if ((dev = __dev_get_by_name(net, ifr.ifr_name)) == NULL) goto done; if (colon) diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index a52b585..009d78f 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -291,7 +291,7 @@ static int __init ic_dev_ioctl(unsigned int cmd, struct ifreq *arg) mm_segment_t oldfs = get_fs(); set_fs(get_ds()); - res = devinet_ioctl(cmd, (struct ifreq __user *) arg); + res = devinet_ioctl(init_net, cmd, (struct ifreq __user *) arg); set_fs(oldfs); return res; } -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/17 net-2.6.26] [NETNS]: Disable inetaddr notifiers in namespaces other than initial.
ip_fib_init is kept enabled. It is already namespace-aware. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |3 +++ drivers/net/via-velocity.c |3 +++ drivers/s390/net/qeth_main.c|3 +++ net/sctp/protocol.c |3 +++ 4 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 0942d82..9666434 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3511,6 +3511,9 @@ static int bond_inetaddr_event(struct notifier_block *this, unsigned long event, struct bonding *bond, *bond_next; struct vlan_entry *vlan, *vlan_next; + if (ifa-ifa_dev-dev-nd_net != init_net) + return NOTIFY_DONE; + list_for_each_entry_safe(bond, bond_next, bond_dev_list, bond_list) { if (bond-dev == event_dev) { switch (event) { diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c index c50fdee..1525e8a 100644 --- a/drivers/net/via-velocity.c +++ b/drivers/net/via-velocity.c @@ -3464,6 +3464,9 @@ static int velocity_netdev_event(struct notifier_block *nb, unsigned long notifi struct velocity_info *vptr; unsigned long flags; + if (dev-nd_net != init_net) + return NOTIFY_DONE; + spin_lock_irqsave(velocity_dev_list_lock, flags); list_for_each_entry(vptr, velocity_dev_list, list) { if (vptr-dev == dev) { diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c index 62606ce..d063e9e 100644 --- a/drivers/s390/net/qeth_main.c +++ b/drivers/s390/net/qeth_main.c @@ -8622,6 +8622,9 @@ qeth_ip_event(struct notifier_block *this, struct qeth_ipaddr *addr; struct qeth_card *card; + if (dev-nd_net != init_net) + return NOTIFY_DONE; + QETH_DBF_TEXT(trace,3,ipevent); card = qeth_get_card_from_dev(dev); if (!card) diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 22a1657..4475f7e 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -629,6 +629,9 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev, struct sctp_sockaddr_entry *addr = NULL; struct sctp_sockaddr_entry *temp; + if (ifa-ifa_dev-dev-nd_net != init_net) + return NOTIFY_DONE; + switch (ev) { case NETDEV_UP: addr = kmalloc(sizeof(struct sctp_sockaddr_entry), GFP_ATOMIC); -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/17 net-2.6.26] [NETNS]: Enable inetdev_event notifier.
After all these preparations it is time to enable main IPv4 device initialization routine inside namespace. It is safe do this now. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/devinet.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index f282b26..963e711 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1044,9 +1044,6 @@ static int inetdev_event(struct notifier_block *this, unsigned long event, struct net_device *dev = ptr; struct in_device *in_dev = __in_dev_get_rtnl(dev); - if (dev-nd_net != init_net) - return NOTIFY_DONE; - ASSERT_RTNL(); if (!in_dev) { -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/17 net-2.6.26] [NETNS]: Enable IPv4 address manipulations inside namespace.
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/devinet.c |9 - 1 files changed, 0 insertions(+), 9 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index f7e78b7..aa23d10 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -446,9 +446,6 @@ static int inet_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg ASSERT_RTNL(); - if (net != init_net) - return -EINVAL; - err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv4_policy); if (err 0) goto errout; @@ -560,9 +557,6 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg ASSERT_RTNL(); - if (net != init_net) - return -EINVAL; - ifa = rtm_to_ifaddr(net, nlh); if (IS_ERR(ifa)) return PTR_ERR(ifa); @@ -1169,9 +1163,6 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) struct in_ifaddr *ifa; int s_ip_idx, s_idx = cb-args[0]; - if (net != init_net) - return 0; - s_ip_idx = ip_idx = cb-args[1]; idx = 0; for_each_netdev(net, dev) { -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/17 net-2.6.26] [NETNS]: Process inet_select_addr inside a namespace.
The context is available from a network device passed in. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/devinet.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index aa23d10..033670d 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -871,6 +871,7 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope) { __be32 addr = 0; struct in_device *in_dev; + struct net *net = dev-nd_net; rcu_read_lock(); in_dev = __in_dev_get_rcu(dev); @@ -899,7 +900,7 @@ no_in_dev: */ read_lock(dev_base_lock); rcu_read_lock(); - for_each_netdev(init_net, dev) { + for_each_netdev(net, dev) { if ((in_dev = __in_dev_get_rcu(dev)) == NULL) continue; -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/17 net-2.6.26] [NETNS]: Enable all routing manipulation via netlink inside namespace.
Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/route.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f67eba..79e2e8a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2702,9 +2702,6 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void int err; struct sk_buff *skb; - if (net != init_net) - return -EINVAL; - err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy); if (err 0) goto errout; @@ -2734,7 +2731,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void if (iif) { struct net_device *dev; - dev = __dev_get_by_index(init_net, iif); + dev = __dev_get_by_index(net, iif); if (dev == NULL) { err = -ENODEV; goto errout_free; @@ -2760,7 +2757,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void }, .oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0, }; - err = ip_route_output_key(init_net, rt, fl); + err = ip_route_output_key(net, rt, fl); } if (err) @@ -2771,11 +2768,11 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void rt-rt_flags |= RTCF_NOTIFY; err = rt_fill_info(skb, NETLINK_CB(in_skb).pid, nlh-nlmsg_seq, - RTM_NEWROUTE, 0, 0); + RTM_NEWROUTE, 0, 0); if (err = 0) goto errout_free; - err = rtnl_unicast(skb, init_net, NETLINK_CB(in_skb).pid); + err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).pid); errout: return err; @@ -2789,6 +2786,9 @@ int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb) struct rtable *rt; int h, s_h; int idx, s_idx; + struct net *net; + + net = skb-sk-sk_net; s_h = cb-args[0]; if (s_h 0) @@ -2798,7 +2798,7 @@ int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb) rcu_read_lock_bh(); for (rt = rcu_dereference(rt_hash_table[h].chain), idx = 0; rt; rt = rcu_dereference(rt-u.dst.rt_next), idx++) { - if (idx s_idx) + if (rt-u.dst.dev-nd_net != net || idx s_idx) continue; if (rt-rt_genid != atomic_read(rt_genid)) continue; -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/17 net-2.6.26] [NETNS]: Register neighbour table parameters in the correct namespace.
neigh_sysctl_register should register sysctl entries inside correct namespace to avoid naming conflict. Typical example is a loopback. Entries for it present in all namespaces. Required to make inetdev_event working. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 7bb6a9a..c895ad4 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2732,7 +2732,8 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p, neigh_path[NEIGH_CTL_PATH_PROTO].procname = p_name; neigh_path[NEIGH_CTL_PATH_PROTO].ctl_name = p_id; - t-sysctl_header = register_sysctl_paths(neigh_path, t-neigh_vars); + t-sysctl_header = + register_net_sysctl_table(p-net, neigh_path, t-neigh_vars); if (!t-sysctl_header) goto free_procname; -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/17 net-2.6.26] [NETFILTER]: Consolidate masq_inet_event and masq_device_event.
They do exactly the same job. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/netfilter/ipt_MASQUERADE.c | 14 ++ 1 files changed, 2 insertions(+), 12 deletions(-) diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c b/net/ipv4/netfilter/ipt_MASQUERADE.c index d80fee8..313b3fc 100644 --- a/net/ipv4/netfilter/ipt_MASQUERADE.c +++ b/net/ipv4/netfilter/ipt_MASQUERADE.c @@ -139,18 +139,8 @@ static int masq_inet_event(struct notifier_block *this, unsigned long event, void *ptr) { - const struct net_device *dev = ((struct in_ifaddr *)ptr)-ifa_dev-dev; - - if (event == NETDEV_DOWN) { - /* IP address was deleted. Search entire table for - conntracks which were associated with that device, - and forget them. */ - NF_CT_ASSERT(dev-ifindex != 0); - - nf_ct_iterate_cleanup(device_cmp, (void *)(long)dev-ifindex); - } - - return NOTIFY_DONE; + struct net_device *dev = ((struct in_ifaddr *)ptr)-ifa_dev-dev; + return masq_device_event(this, event, dev); } static struct notifier_block masq_dev_notifier = { -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/17 net-2.6.26] [IPV4]: Remove check for ifa-ifa_dev != NULL.
This is a callback registered to inet address notifier chain. The check is useless as: - ifa-ifa_dev is always != NULL - similar checks are abscent in all other notifiers. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/atm/clip.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/net/atm/clip.c b/net/atm/clip.c index 86b885e..dd96440 100644 --- a/net/atm/clip.c +++ b/net/atm/clip.c @@ -648,10 +648,6 @@ static int clip_inet_event(struct notifier_block *this, unsigned long event, struct in_device *in_dev; in_dev = ((struct in_ifaddr *)ifa)-ifa_dev; - if (!in_dev || !in_dev-dev) { - printk(KERN_WARNING clip_inet_event: no device\n); - return NOTIFY_DONE; - } /* * Transitions are of the down-change-up type, so it's sufficient to * handle the change on up. -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/17 net-2.6.26] [IPV4]: Remove ifa != NULL check.
This is a callback registered to inet address notifier chain. The check is useless as: - ifa is always != NULL - similar checks are abscent in all other notifiers. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- drivers/net/via-velocity.c | 22 ++ 1 files changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c index cc0addb..c50fdee 100644 --- a/drivers/net/via-velocity.c +++ b/drivers/net/via-velocity.c @@ -3460,21 +3460,19 @@ static int velocity_resume(struct pci_dev *pdev) static int velocity_netdev_event(struct notifier_block *nb, unsigned long notification, void *ptr) { struct in_ifaddr *ifa = (struct in_ifaddr *) ptr; + struct net_device *dev = ifa-ifa_dev-dev; + struct velocity_info *vptr; + unsigned long flags; - if (ifa) { - struct net_device *dev = ifa-ifa_dev-dev; - struct velocity_info *vptr; - unsigned long flags; - - spin_lock_irqsave(velocity_dev_list_lock, flags); - list_for_each_entry(vptr, velocity_dev_list, list) { - if (vptr-dev == dev) { - velocity_get_ip(vptr); - break; - } + spin_lock_irqsave(velocity_dev_list_lock, flags); + list_for_each_entry(vptr, velocity_dev_list, list) { + if (vptr-dev == dev) { + velocity_get_ip(vptr); + break; } - spin_unlock_irqrestore(velocity_dev_list_lock, flags); } + spin_unlock_irqrestore(velocity_dev_list_lock, flags); + return NOTIFY_DONE; } -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
Zhang, Yanmin a écrit : On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said: I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] Could you add a comment someplace that says refcnt wants to be on a different cache line from input/output/ops or performance tanks badly, to warn some future kernel hacker who starts adding new fields to the structure? Ok. Below is the new patch. 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by moving tclassid to different place. It looks like tclassid could also have impact on performance. If moving tclassid before metrics, or just don't move tclassid, the performance isn't good. So I move it behind metrics. 2) Add comments before __refcnt. If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than the one without the patch. If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than the one without the patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 +0800 @@ -52,15 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ -#ifdef CONFIG_NET_CLS_ROUTE - __u32 tclassid; -#endif + struct dst_entry*path; struct neighbour *neighbour; struct hh_cache *hh; @@ -70,10 +65,20 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + +#ifdef CONFIG_NET_CLS_ROUTE + __u32 tclassid; +#endif + + /* +* __refcnt wants to be on a different cache line from +* input/output/ops or performance tanks badly +*/ atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels. Could you please test now with 32 bits one ? Thank you -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
Zhang, Yanmin a écrit : On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote: On Mon, 18 Feb 2008 16:12:38 +0800 Zhang, Yanmin [EMAIL PROTECTED] wrote: On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 15:21:48 +0100 On linux-2.6.25-rc1 x86_64 : offsetof(struct dst_entry, lastuse)=0xb0 offsetof(struct dst_entry, __refcnt)=0xb8 offsetof(struct dst_entry, __use)=0xbc offsetof(struct dst_entry, next)=0xc0 So it should be optimal... I dont know why tbench prefers __refcnt being on 0xc0, since in this case lastuse will be on a different cache line... Each incoming IP packet will need to change lastuse, __refcnt and __use, so keeping them in the same cache line is a win. I suspect then that even this patch could help tbench, since it avoids writing lastuse... I think your suspicions are right, and even moreso it helps to keep __refcnt out of the same cache line as input/output/ops which are read-almost-entirely :- I think you are right. The issue is these three variables sharing the same cache line with input/output/ops. ) I haven't done an exhaustive analysis, but it seems that the write traffic to lastuse and __refcnt are about the same. However if we find that __refcnt gets hit more than lastuse in this workload, it explains the regression. I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 +0800 @@ -52,11 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ + + struct dst_entry*path; #ifdef CONFIG_NET_CLS_ROUTE __u32 tclassid; @@ -70,10 +69,12 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; Well, after this patch, we grow dst_entry by 8 bytes : With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I don't enable it. I will move tclassid under ops. sizeof(struct dst_entry)=0xd0 offsetof(struct dst_entry, input)=0x68 offsetof(struct dst_entry, output)=0x70 offsetof(struct dst_entry, __refcnt)=0xb4 offsetof(struct dst_entry, lastuse)=0xc0 offsetof(struct dst_entry, __use)=0xb8 sizeof(struct rtable)=0x140 So we dirty two cache lines instead of one, unless your cpu have 128 bytes cache lines ? I am quite suprised that my patch to not change lastuse if already set to jiffies changes nothing... If you have some time, could you also test this (unrelated) patch ? We can avoid dirty all the time a cache line of loopback device. diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index f2a6e71..0a4186a 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev) return 0; } #endif - dev-last_rx = jiffies; +#ifdef CONFIG_SMP + if (dev-last_rx != jiffies) +#endif + dev-last_rx = jiffies; /* it's OK to use per_cpu_ptr() because BHs are off */ pcpu_lstats = netdev_priv(dev); Although I didn't test it, I don't think it's ok. The key is __refcnt shares the same cache line with ops/input/output. Note it was unrelated to struct dst, but dirtying of one cache line of 'loopback netdevice' I tested it, and tbench result was better with this patch : 890 MB/s instead of 870 MB/s on a bi dual core machine. I was curious of the potential gain on your 16 cores (4x4) machine. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/17] Finish IPv4 infrastructure namespacing.
This set finally allows to manipulate with network devices inside a namespace and allows to configure them [via netlink]. 'route' is not yet supported (but prepared to) as it requires a socket. Additionally, better routing cache support is added. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html