Re: [PATCH RFX]: napi_struct V3
On Mon, 2007-07-23 at 22:47 -0700, David Miller wrote: I don't think it's wise to implement this over and over again in each driver, since we already know at least a handfull of drivers will use this. Yep. Alternative is a napi_struct_with_restart, but I don't think it's worth the few-byte savings per queue. Any objections? On the contrary, this looks good. Thanks! Rusty. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Add new timeval_to_sec function
Varun Chandramohan wrote: Patrick McHardy wrote: Varun Chandramohan wrote: /** + * timeval_to_sec - Convert timeval to seconds + * @tv: pointer to the timeval variable to be converted + * + * Returns the seconds representation of timeval parameter. + */ +static inline time_t timeval_to_sec(const struct timeval *tv) +{ + return (tv-tv_sec + (tv-tv_usec + 50)/100); +} I don't think you should round down timeout values. Can you elaborate on that? As per the RFC of MIB ,we need only seconds granularity. Taking that as the case i dont understand why round down should not be done? When you like to create any timeout based on your calculated value, you might run into the problem that your calculated value is set to _zero_ even if there was some time before the conversion. This might probably not what you indented to get. So what about rounding up with return (tv-tv_sec + (tv-tv_usec + 99)/100); ??? Btw. isn't here already any solution based on ktime conversions? Regards, Oliver - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Add new timeval_to_sec function
Oliver Hartkopp wrote: Varun Chandramohan wrote: Patrick McHardy wrote: Varun Chandramohan wrote: /** + * timeval_to_sec - Convert timeval to seconds + * @tv: pointer to the timeval variable to be converted + * + * Returns the seconds representation of timeval parameter. + */ +static inline time_t timeval_to_sec(const struct timeval *tv) +{ + return (tv-tv_sec + (tv-tv_usec + 50)/100); +} I don't think you should round down timeout values. Can you elaborate on that? As per the RFC of MIB ,we need only seconds granularity. Taking that as the case i dont understand why round down should not be done? When you like to create any timeout based on your calculated value, you might run into the problem that your calculated value is set to _zero_ even if there was some time before the conversion. This might probably not what you indented to get. So what about rounding up with return (tv-tv_sec + (tv-tv_usec + 99)/100); ??? This can done. Is this what you were ref to me, Patrick? Btw. isn't here already any solution based on ktime conversions? AFAIK there isint any conversion function to secs. Correct me if iam wrong. But we can have a function or macro to do this conversion. Regards, Oliver Regards, Varun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: specifying scopid's for link-local IPv6 addrs
On Mon, 23 Jul 2007, Rick Jones wrote: Folks - People running netperf have reported that they have trouble with IPv6 under Linux. Specifically, wereas the use of link-local IPv6 addresses just works in netperf under a number of other OSes they do not under Linux. I'm ass-u-me-ing 2.6 here, but not sure exactly which ones - I've seen it on a 2.6.18-based RHEL5. Some poking about and conversation has suggested that one has to set a sin6_scope_id in the sockaddr_in6. This needs to be an index of one of the interfaces in the system, which I presume means walking some additional structures. Is this a requirement which might be expected to remain in the future, or is it something which might just go away? That will have an effect on netperf future development. thanks, rick jones Rick, I don't see any way around this. For example, on one of my test systems, I have the following link local routes: chance% netstat -A inet6 -rn | grep fe80::/64 fe80::/64 :: U 25600 eth0 fe80::/64 :: U 25600 eth2 fe80::/64 :: U 25600 eth3 fe80::/64 :: U 25600 eth4 fe80::/64 :: U 25600 eth5 fe80::/64 :: U 25600 eth6 So if I want to run a link local test to fe80::202:b3ff:fed4:cd1, the system has no way to choose which is the correct interface to use for the test, and will give an error if the interface isn't specified. Here's an example of this with nuttcp: chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1 nuttcp-t: Info: attempting to switch to deprecated classic mode nuttcp-t: Info: will use less reliable transmitter side statistics nuttcp-t: v5.5.5: Error: connect: Invalid argument errno=22 You must explicitly specify the desired interface. For example, on my test system, the correct interface is eth6 which is interface 8 (lo eth0 eth1 eth2 ... eth5 eth6). Here is an example nuttcp test specifying interface 8: chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%8 1178.5809 MB / 10.02 sec = 986.2728 Mbps 12 %TX 15 %RX nuttcp uses getaddrinfo() which parses the %ifindex field, and then copies the sin6_scope_id from the res structure to the server's sockaddr_in6 structure before initiating the connect(). -Bill - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-2.6.21 - networking dies after random time
On Mon, Jul 23, 2007 at 07:44:58AM +0200, Marcin Ślusarz wrote: Ok, I've bisected this problem and found that this patch broke my NIC: 76d2160147f43f982dfe881404cfde9fd0a9da21 is first bad commit commit 76d2160147f43f982dfe881404cfde9fd0a9da21 Author: Ingo Molnar [EMAIL PROTECTED] Date: Fri Feb 16 01:28:24 2007 -0800 [PATCH] genirq: do not mask interrupts by default Never mask interrupts immediately upon request. Disabling interrupts in high-performance codepaths is rare, and on the other hand this change could recover lost edges (or even other types of lost interrupts) by conservatively only masking interrupts after they happen. (NOTE: with this change the highlevel irq-disable code still soft-disables this IRQ line - and if such an interrupt happens then the IRQ flow handler keeps the IRQ masked.) Mark i8529A controllers as 'never loses an edge'. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Cc: Thomas Gleixner [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] So, it seems nobody (except the users) cares... BTW, maybe there should be created something like Network Cards Producers Made Rich on Unnecessary Changed Cards Linux Foundation?: On Fri, Jun 29, 2007 at 10:50:20AM +0200, Jean-Baptiste Vignaud wrote: ... 2) changed the 3com cards i replaced by two cards, 01:06.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 42) 01:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS) reinstalled and stressed the network (small download from a laptop) and : Jun 29 09:34:10 loki kernel: NETDEV WATCHDOG: eth0: transmit timed out Jun 29 09:34:51 loki last message repeated 14 times Jun 29 09:35:18 loki last message repeated 8 times ...Of course, no response of any serious developer for this as well. BTW #2: I wonder how true is this (after above-mentioned patch): From include/linux/irq.h: /** * struct irq_chip - hardware interrupt chip descriptor ... * @disable: disable the interrupt (defaults to chip-mask if NULL) Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFX]: napi_struct V3
David Miller [EMAIL PROTECTED] : [...] I also didn't play with turning off NAPI in kconfig where drivers allow that, can we just get rid of that crap already? :-/ Would it be accepted for 2.6.23 or must it be considered post-2.6.23 ? -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Initialize and fill IPv6 route age
On Tue, 24 Jul 2007 09:50:57 +0530 Varun Chandramohan [EMAIL PROTECTED] wrote: Stephen Hemminger wrote: On Mon, 23 Jul 2007 10:13:18 +0530 Varun Chandramohan [EMAIL PROTECTED] wrote: The age field of the ipv6 route structures are initilized with the current timeval at the time of route creation. When the route dump is called the route age value stored in the structure is subtracted from the present timeval and the difference is passed on as the route age. Signed-off-by: Varun Chandramohan [EMAIL PROTECTED] --- include/net/ip6_fib.h |1 + include/net/ip6_route.h |3 +++ net/ipv6/addrconf.c |5 + net/ipv6/route.c| 23 +++ 4 files changed, 28 insertions(+), 4 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index c48ea87..e30a1cf 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -98,6 +98,7 @@ struct rt6_info u32 rt6i_flags; u32 rt6i_metric; + time_t rt6i_age; atomic_trt6i_ref; struct fib6_table *rt6i_table; diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 5456fdd..fc9716c 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -36,6 +36,9 @@ struct route_info { #define RT6_LOOKUP_F_REACHABLE0x2 #define RT6_LOOKUP_F_HAS_SADDR0x4 +#define RT6_SET_ROUTE_INFO 0x0 +#define RT6_GET_ROUTE_INFO 0x1 + extern struct rt6_infoip6_null_entry; #ifdef CONFIG_IPV6_MULTIPLE_TABLES diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 5a5f8bd..715c766 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4187,6 +4187,7 @@ EXPORT_SYMBOL(unregister_inet6addr_notif int __init addrconf_init(void) { + struct timeval tv; int err = 0; /* The addrconf netdev notifier requires that loopback_dev @@ -4214,10 +4215,14 @@ int __init addrconf_init(void) if (err) return err; + do_gettimeofday(tv); Better to use ktime_t or timespec in new code. You are saying not to use timeval as its going to be removed sometime in future? If not, may i know why should we use timespec or ktime? I need only seconds granularity so i was wondering if that matters. Timeval isn't going away, but ktime is easier for basic maths. If all you need is seconds resolution, just use jiffies. Gettimeofday (and ktime_get) both require expensive hardware accesses on some platforms. For the specific case IPV6 route ageing, this isn't a big deal it is just that people tend to copy code. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
Michael Chan wrote: Last update on this. udelay( 100 + net_random % 300 ) seems to work much better and i have not had a single problem getting the link up within 10 seconds of a cold or warm-boot, and most often the link comes up directly without any sort of delay instead like before when it could hang for 30 seconds before getting a link, if you even got a link. We'll have to do some testing to see if we can find a better solution. Adding up to 400 usec of busy wait is not ideal. Are you connecting two 5701 fiber cards directly to each other in your setup? Hi, Yea, it's and ugly hack, but it's a workaround that at least works for me. Have done some additional tests and to me it seems that the driver just needs to wait a bit longer to detect the link + some random time to get around the issues it might have when chasing the other systems state. Don't have the numbers in front of me, but i did some tests to get the 'optimum' delay to wait and it seems like the larger the wait time (even up to around 40-50ms!) causes the autoneg to go much smoother and faster. And remember that the link can be reported up, but no traffic can be passed via the link before the autoneg is complete and both cards reports back that flowcontrol is enabled so you need to verify the link by trying to send some data and not just look at the link-status. Hope my conclusions might help, or atleast point you in the right direction. I have 2 01:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) ( 14e4:1645 ) connected via a single FC cable. And the two systems are one single-core and one dual-core AMD system. Regards, Patric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFX]: napi_struct V3
Francois Romieu wrote: David Miller [EMAIL PROTECTED] : [...] I also didn't play with turning off NAPI in kconfig where drivers allow that, can we just get rid of that crap already? :-/ Would it be accepted for 2.6.23 or must it be considered post-2.6.23 ? The merge window is closed, so not 2.6.23. The cases where Kconfig allows selection of NAPI-or-not is generally either (a) the author considers one path new and relatively unproven or (b) the author has not bothered to figure out which is the best choice, and instead just offers both rather than doing the legwork to find out. Either way, individual cases must be examined and it is not necessarily just a cleanup. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFX]: napi_struct V3
From: Francois Romieu [EMAIL PROTECTED] Date: Tue, 24 Jul 2007 09:12:01 +0200 David Miller [EMAIL PROTECTED] : [...] I also didn't play with turning off NAPI in kconfig where drivers allow that, can we just get rid of that crap already? :-/ Would it be accepted for 2.6.23 or must it be considered post-2.6.23 ? The 2.6.23 merge window closed already, so this is of course 2.6.24 material. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-2.6.21 - networking dies after random time
* Marcin Ślusarz [EMAIL PROTECTED] wrote: Ok, I've bisected this problem and found that this patch broke my NIC: 76d2160147f43f982dfe881404cfde9fd0a9da21 is first bad commit commit 76d2160147f43f982dfe881404cfde9fd0a9da21 Author: Ingo Molnar [EMAIL PROTECTED] Date: Fri Feb 16 01:28:24 2007 -0800 [PATCH] genirq: do not mask interrupts by default thanks for tracking it down! Could you try the patch below (ontop an otherwise unmodified kernel)? This tests the theory whether the problem is related to the disable_irq_nosync() call in the ne2k driver's xmit path. Does this solve the hangs too? Ingo Index: linux/kernel/irq/manage.c === --- linux.orig/kernel/irq/manage.c +++ linux/kernel/irq/manage.c @@ -102,7 +102,19 @@ void disable_irq_nosync(unsigned int irq spin_lock_irqsave(desc-lock, flags); if (!desc-depth++) { desc-status |= IRQ_DISABLED; - desc-chip-disable(irq); + /* +* the _nosync variant of irq-disable suggests that the +* caller is not worried about concurrency but about the +* ordering of the irq flow itself. (such as hardware +* getting confused about certain, normally valid irq +* handling sequences.) So if the default disable handler +* is in place then try the more conservative masking +* instead: +*/ + if (desc-chip-disable == default_disable desc-chip-mask) + desc-chip-mask(irq); + else + desc-chip-disable(irq); } spin_unlock_irqrestore(desc-lock, flags); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Mon, Jul 23 2007, Andrew Morton wrote: I worked out that the crash I saw was in BUG_ON(!pte_none(*(kmap_pte-idx))); in the read of kmap_pte[idx]. Which would be weird as the caller is using a literal KM_USER0. So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am unable to reproduce it now). If that BUG_ON _is_ triggering then it might indicate that someone is doing a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0. Or doing double kunmaps, or doing a kunmap_atomic() on the page, not the address. I've seen both of those end up triggering that BUG_ON() in a later kmap. Looking over the 2.6.22..2.6.23-rc1 diff, I found one such error in ocfs2 at least. But you are probably not using that, so I'll keep looking... --- [PATCH] ocfs2: bad kunmap_atomic() kunmap_atomic() takes the virtual address, not the mapped page as argument. Signed-off-by: Jens Axboe [EMAIL PROTECTED] diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 5727cd1..c4034f6 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2153,7 +2153,7 @@ static int ocfs2_splice_write_actor(struct pipe_inode_info *pipe, src = buf-ops-map(pipe, buf, 1); dst = kmap_atomic(page, KM_USER1); memcpy(dst + offset, src + buf-offset, count); - kunmap_atomic(page, KM_USER1); + kunmap_atomic(dst, KM_USER1); buf-ops-unmap(pipe, buf, src); copied = ocfs2_write_end(file, file-f_mapping, sd-pos, count, count, -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] possible deadlock in tulip driver
Calling flush_scheduled_work() may deadlock if called under rtnl_lock (from dev-stop) as linkwatch_event() may be on the workqueue and it will try to get the rtnl_lock Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- tulip_core.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- ./drivers/net/tulip/tulip_core.c.tulip 2007-07-16 12:54:29.0 +0400 +++ ./drivers/net/tulip/tulip_core.c2007-07-23 19:06:24.0 +0400 @@ -726,8 +726,6 @@ static void tulip_down (struct net_devic void __iomem *ioaddr = tp-base_addr; unsigned long flags; - flush_scheduled_work(); - del_timer_sync (tp-timer); #ifdef CONFIG_TULIP_NAPI del_timer_sync (tp-oom_timer); @@ -1788,6 +1786,8 @@ static void __devexit tulip_remove_one ( if (!dev) return; + flush_scheduled_work(); + tp = netdev_priv(dev); unregister_netdev(dev); pci_free_consistent (pdev, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, Jul 24 2007, Jens Axboe wrote: On Mon, Jul 23 2007, Andrew Morton wrote: I worked out that the crash I saw was in BUG_ON(!pte_none(*(kmap_pte-idx))); in the read of kmap_pte[idx]. Which would be weird as the caller is using a literal KM_USER0. So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am unable to reproduce it now). If that BUG_ON _is_ triggering then it might indicate that someone is doing a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0. Or doing double kunmaps, or doing a kunmap_atomic() on the page, not the address. I've seen both of those end up triggering that BUG_ON() in a later kmap. Looking over the 2.6.22..2.6.23-rc1 diff, I found one such error in ocfs2 at least. But you are probably not using that, so I'll keep looking... What about the new async crypto stuff? I've been looking, but is it guarenteed that async_memcpy() runs in process context with interrupts enabled always? If not, there's a km type bug there. In general, I think the highmem stuff could do with more safety checks: - People ALWAYS get the atomic unmaps wrong, passing in the page instead of the address. I've seen tons of these. And since kunmap_atomic() takes a void pointer, nobody notices until it goes boom. - People easily get the km type wrong - they use KM_USERx in interrupt context, or one of the irq variants without disabling interrupts. If we could just catch these two types of bugs, we've got a lot of these problems covered. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 24 Jul 2007 10:22:07 +0200 Jens Axboe [EMAIL PROTECTED] wrote: On Tue, Jul 24 2007, Jens Axboe wrote: On Mon, Jul 23 2007, Andrew Morton wrote: I worked out that the crash I saw was in BUG_ON(!pte_none(*(kmap_pte-idx))); in the read of kmap_pte[idx]. Which would be weird as the caller is using a literal KM_USER0. So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am unable to reproduce it now). If that BUG_ON _is_ triggering then it might indicate that someone is doing a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0. Or doing double kunmaps, or doing a kunmap_atomic() on the page, not the address. I've seen both of those end up triggering that BUG_ON() in a later kmap. Looking over the 2.6.22..2.6.23-rc1 diff, I found one such error in ocfs2 at least. But you are probably not using that, so I'll keep looking... What about the new async crypto stuff? I've been looking, but is it guarenteed that async_memcpy() runs in process context with interrupts enabled always? If not, there's a km type bug there. I think Shannon maintains that now. In general, I think the highmem stuff could do with more safety checks: - People ALWAYS get the atomic unmaps wrong, passing in the page instead of the address. I've seen tons of these. And since kunmap_atomic() takes a void pointer, nobody notices until it goes boom. yeah, it's a real trap. For a while I had a patch which converted kmap_atomic() to return a char*, and kunmap_atomic() to take a char*, so misuse got compile warnings. But it was a pig to maintain so I tossed it. It'd be somewhat easier to do now we've converted a lot of callers to clear_user_highpage() and similar. - People easily get the km type wrong - they use KM_USERx in interrupt context, or one of the irq variants without disabling interrupts. If we could just catch these two types of bugs, we've got a lot of these problems covered. Here's the -mm debug patch: diff -puN arch/i386/mm/highmem.c~kmap_atomic-debugging arch/i386/mm/highmem.c --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging +++ a/arch/i386/mm/highmem.c @@ -30,7 +30,44 @@ void *kmap_atomic(struct page *page, enu { enum fixed_addresses idx; unsigned long vaddr; + static unsigned warn_count = 10; + if (unlikely(warn_count == 0)) + goto skip; + + if (unlikely(in_interrupt())) { + if (in_irq()) { + if (type != KM_IRQ0 type != KM_IRQ1 + type != KM_BIO_SRC_IRQ type != KM_BIO_DST_IRQ + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } else if (!irqs_disabled()) { /* softirq */ + if (type != KM_IRQ0 type != KM_IRQ1 + type != KM_SOFTIRQ0 type != KM_SOFTIRQ1 + type != KM_SKB_SUNRPC_DATA + type != KM_SKB_DATA_SOFTIRQ + type != KM_BOUNCE_READ) { + WARN_ON(1); + warn_count--; + } + } + } + + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ || + type == KM_BIO_SRC_IRQ || type == KM_BIO_DST_IRQ) { + if (!irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) { + if (irq_count() == 0 !irqs_disabled()) { + WARN_ON(1); + warn_count--; + } + } +skip: /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); _ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1 sky2 boot crash in sky2_mac_intr
On Tue, 24 Jul 2007 10:22:05 +0200 Florian Lohoff [EMAIL PROTECTED] wrote: Hi, i am seeing irregular crashes on boot in the sky2_mac_intr. This is an Fujitsu Siemens Lifebook E8110 with a Core Duo. Currently i suspect some strange BIOS issues as the issues i see with the sky2 aka parity errors etc i also see sometimes with the integrated ipw3945 which complains about firmware errors. It seems the BIOS randomly fails to initialize all the hardware. To reproduce this crash and catch it on the serial console it took me around 12 boots. The machine is stable once correctly booted and i work most of the day on it. [ 46.479939] ACPI: PCI Interrupt :02:00.0[A] - GSI 16 (level, low) - IRQ 19 [ 46.568569] sky2 :02:00.0: v1.16 addr 0xf000 irq 19 Yukon-EC Ultra (0xb4) rev 2 [ 46.664555] sky2 eth0: addr 00:17:42:13:45:8c [ 61.958741] sky2 eth1: enabling interface [ 62.010834] sky2 eth1: phy write timeout The problem is related to power management. The PHY has a number of PCI configuration registers for power control, and the function of these changes based on the version and revision of the chip. The driver does work on older versions of the EC-U, in Fujitsu laptop's, it is just the new rev that is broken. The driver should probably fail smarter (by not loading) if the PHY isn't powered up correctly, but that doesn't help your problem. The vendor has provided me with documentation on many versions of the chip, but I don't have doc's on the lastest revision differences of the EC Ultra, so a proper solution is not easily available. The best method for resolving this would be to first try the vendor driver version of sk98lin and see if that fixes it. If so, then it is easy to change sky2, to match the phy setup in the vendor driver. Another possibility is to look for places in sky2 driver where there are places that compare version/revision. The most likely bits that need to change are in PCI registers: 0x80, 0x84 and 0x88 You could also load the windows driver and dump PCI config space (with lspci from cygwin), and see what the settings are there. I am away from my office for a month, and therefore away from any sky2 hardware for testing. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stalled connection (need help to debug)
On 18-07-2007 20:50, Oleg Verych wrote: Hallo. I have a very strange problem. [] Any advise on how to debug this will be very appreciated. Thanks. Thanks... You're quite welcome... Cheers, Jarek P. PS: Oleg, no offence, but are you sure this was the right problem and the right list? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Initialize and fill IPv6 route age
Stephen Hemminger wrote: On Tue, 24 Jul 2007 09:50:57 +0530 Varun Chandramohan [EMAIL PROTECTED] wrote: Stephen Hemminger wrote: On Mon, 23 Jul 2007 10:13:18 +0530 Varun Chandramohan [EMAIL PROTECTED] wrote: The age field of the ipv6 route structures are initilized with the current timeval at the time of route creation. When the route dump is called the route age value stored in the structure is subtracted from the present timeval and the difference is passed on as the route age. Signed-off-by: Varun Chandramohan [EMAIL PROTECTED] --- include/net/ip6_fib.h |1 + include/net/ip6_route.h |3 +++ net/ipv6/addrconf.c |5 + net/ipv6/route.c| 23 +++ 4 files changed, 28 insertions(+), 4 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index c48ea87..e30a1cf 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -98,6 +98,7 @@ struct rt6_info u32 rt6i_flags; u32 rt6i_metric; + time_t rt6i_age; atomic_trt6i_ref; struct fib6_table *rt6i_table; diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 5456fdd..fc9716c 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -36,6 +36,9 @@ struct route_info { #define RT6_LOOKUP_F_REACHABLE0x2 #define RT6_LOOKUP_F_HAS_SADDR0x4 +#define RT6_SET_ROUTE_INFO 0x0 +#define RT6_GET_ROUTE_INFO 0x1 + extern struct rt6_infoip6_null_entry; #ifdef CONFIG_IPV6_MULTIPLE_TABLES diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 5a5f8bd..715c766 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4187,6 +4187,7 @@ EXPORT_SYMBOL(unregister_inet6addr_notif int __init addrconf_init(void) { + struct timeval tv; int err = 0; /* The addrconf netdev notifier requires that loopback_dev @@ -4214,10 +4215,14 @@ int __init addrconf_init(void) if (err) return err; + do_gettimeofday(tv); Better to use ktime_t or timespec in new code. You are saying not to use timeval as its going to be removed sometime in future? If not, may i know why should we use timespec or ktime? I need only seconds granularity so i was wondering if that matters. Timeval isn't going away, but ktime is easier for basic maths. If all you need is seconds resolution, just use jiffies. Gettimeofday Thanks stephen for the review. As far as using jiffies (which is my original implementation) i got a comment saying that on 32bit platform jiffies wrap every 49 days. That's the reason for the do_gettimeofday() implementation. However i understand that such calls require expensive h/w accesses. Considering the fact that jiffies wrap around rather quickly, can you suggest any other way of doing it? (and ktime_get) both require expensive hardware accesses on some platforms. For the specific case IPV6 route ageing, this isn't a big deal it is just that people tend to copy code. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?
* Richard MUSIL [EMAIL PROTECTED] 2007-07-23 18:45 I have been giving it a second thought and came up with something more complex. The idea is to have locking granularity at the level of individual families. I agree in general, it would make up a better solution. However, your initial patch allows operations and families to be unregistered while message of the same family are being processed which must not be allowed. Please provide a new overall patch which is not based on your initial patch so I can review your idea properly. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-2.6.21 - networking dies after random time
* Ingo Molnar [EMAIL PROTECTED] wrote: thanks for tracking it down! Could you try the patch below (ontop an otherwise unmodified kernel)? This tests the theory whether the problem is related to the disable_irq_nosync() call in the ne2k driver's xmit path. Does this solve the hangs too? please try the patch below instead. Ingo Index: linux/kernel/irq/chip.c === --- linux.orig/kernel/irq/chip.c +++ linux/kernel/irq/chip.c @@ -231,7 +231,7 @@ static void default_enable(unsigned int /* * default disable function */ -static void default_disable(unsigned int irq) +void default_disable(unsigned int irq) { } Index: linux/kernel/irq/internals.h === --- linux.orig/kernel/irq/internals.h +++ linux/kernel/irq/internals.h @@ -10,6 +10,8 @@ extern void irq_chip_set_defaults(struct /* Set default handler: */ extern void compat_irq_chip_set_default_handler(struct irq_desc *desc); +extern void default_disable(unsigned int irq); + #ifdef CONFIG_PROC_FS extern void register_irq_proc(unsigned int irq); extern void register_handler_proc(unsigned int irq, struct irqaction *action); Index: linux/kernel/irq/manage.c === --- linux.orig/kernel/irq/manage.c +++ linux/kernel/irq/manage.c @@ -102,7 +102,19 @@ void disable_irq_nosync(unsigned int irq spin_lock_irqsave(desc-lock, flags); if (!desc-depth++) { desc-status |= IRQ_DISABLED; - desc-chip-disable(irq); + /* +* the _nosync variant of irq-disable suggests that the +* caller is not worried about concurrency but about the +* ordering of the irq flow itself. (such as hardware +* getting confused about certain, normally valid irq +* handling sequences.) So if the default disable handler +* is in place then try the more conservative masking +* instead: +*/ + if (desc-chip-disable == default_disable desc-chip-mask) + desc-chip-mask(irq); + else + desc-chip-disable(irq); } spin_unlock_irqrestore(desc-lock, flags); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GENETLINK]: Correctly report errors while registering a multicast group
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/netlink/genetlink.c === --- net-2.6.orig/net/netlink/genetlink.c2007-07-23 21:54:35.0 +0200 +++ net-2.6/net/netlink/genetlink.c 2007-07-23 21:54:54.0 +0200 @@ -196,7 +196,7 @@ int genl_register_mc_group(struct genl_f genl_ctrl_event(CTRL_CMD_NEWMCAST_GRP, grp); out: genl_unlock(); - return 0; + return err; } EXPORT_SYMBOL(genl_register_mc_group); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GENETLINK]: Fix adjustment of number of multicast groups
The current calculation of the maximum number of genetlink multicast groups seems odd, fix it. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/netlink/genetlink.c === --- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:03:02.0 +0200 +++ net-2.6/net/netlink/genetlink.c 2007-07-23 22:05:12.0 +0200 @@ -184,7 +184,7 @@ int genl_register_mc_group(struct genl_f } err = netlink_change_ngroups(genl_sock, -sizeof(unsigned long) * NETLINK_GENERIC); +mc_groups_longs * BITS_PER_LONG); if (err) goto out; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GENETLINK]: Fix race in genl_unregister_mc_groups()
family-mcast_groups is protected by genl_lock so it must be held while accessing the list in genl_unregister_mc_groups(). Requires adding a non-locking variant of genl_unregister_mc_group(). Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/netlink/genetlink.c === --- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:08:04.0 +0200 +++ net-2.6/net/netlink/genetlink.c 2007-07-23 22:09:08.0 +0200 @@ -200,6 +200,18 @@ int genl_register_mc_group(struct genl_f } EXPORT_SYMBOL(genl_register_mc_group); +static void __genl_unregister_mc_group(struct genl_family *family, + struct genl_multicast_group *grp) +{ + BUG_ON(grp-family != family); + netlink_clear_multicast_users(genl_sock, grp-id); + clear_bit(grp-id, mc_groups); + list_del(grp-list); + genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); + grp-id = 0; + grp-family = NULL; +} + /** * genl_unregister_mc_group - unregister a multicast group * @@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group); void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { - BUG_ON(grp-family != family); genl_lock(); - netlink_clear_multicast_users(genl_sock, grp-id); - clear_bit(grp-id, mc_groups); - list_del(grp-list); - genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); - grp-id = 0; - grp-family = NULL; + genl_unregister_mc_group(family, grp); genl_unlock(); } @@ -232,8 +238,10 @@ static void genl_unregister_mc_groups(st { struct genl_multicast_group *grp, *tmp; + genl_lock(); list_for_each_entry_safe(grp, tmp, family-mcast_groups, list) - genl_unregister_mc_group(family, grp); + __genl_unregister_mc_group(family, grp); + genl_unlock(); } /** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Au1000 eth : fix ioctl handling
Hi all, This patch fixes the handling of unsupported ioctls with the au1000_eth driver. Signed-off-by: Florian Fainelli [EMAIL PROTECTED] -- diff --git a/drivers/net/au1000_eth.c b/drivers/net/au1000_eth.c index c27cfce..99a1c61 100644 --- a/drivers/net/au1000_eth.c +++ b/drivers/net/au1000_eth.c @@ -1316,12 +1316,20 @@ static void set_rx_mode(struct net_device *dev) static int au1000_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) { struct au1000_private *aup = (struct au1000_private *)dev-priv; + struct mii_ioctl_data *data = if_mii(rq); + int rc = -EOPNOTSUPP; if (!netif_running(dev)) return -EINVAL; if (!aup-phy_dev) return -EINVAL; // PHY not controllable - return phy_mii_ioctl(aup-phy_dev, if_mii(rq), cmd); + switch (cmd) { + default: + rc = phy_mii_ioctl(aup-phy_dev, data, cmd); + break; + } + + return rc; } static struct net_device_stats *au1000_get_stats(struct net_device *dev) signature.asc Description: This is a digitally signed message part.
Re: 2.6.23-rc1 sky2 boot crash in sky2_mac_intr
On Tue, Jul 24, 2007 at 09:50:08AM +0100, Stephen Hemminger wrote: The problem is related to power management. The PHY has a number of PCI configuration registers for power control, and the function of these changes based on the version and revision of the chip. The driver does work on older versions of the EC-U, in Fujitsu laptop's, it is just the new rev that is broken. The driver should probably fail smarter (by not loading) if the PHY isn't powered up correctly, but that doesn't help your problem. The vendor has provided me with documentation on many versions of the chip, but I don't have doc's on the lastest revision differences of the EC Ultra, so a proper solution is not easily available. The best method for resolving this would be to first try the vendor driver version of sk98lin and see if that fixes it. If so, then it is easy to change sky2, to match the phy setup in the vendor driver. Another possibility is to look for places in sky2 driver where there are places that compare version/revision. The most likely bits that need to change are in PCI registers: 0x80, 0x84 and 0x88 You could also load the windows driver and dump PCI config space (with lspci from cygwin), and see what the settings are there. I am away from my office for a month, and therefore away from any sky2 hardware for testing. I'll try the above and keep you posted. The crash itself seems to be a 2.6.23-rc1 regression though. I never experienced this with 2.6.22-rc5 which i was running before. Flo -- Florian Lohoff [EMAIL PROTECTED] +49-171-2280134 Those who would give up a little freedom to get a little security shall soon have neither - Benjamin Franklin signature.asc Description: Digital signature
Re: Races in net_rx_action vs netpoll?
On 22-07-2007 09:05, David Miller wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Thu, 19 Jul 2007 17:27:47 +0100 Please revisit the requirements that netconsole needs and redesign it from scratch. The existing code is causing too much breakage. Can it be done without breaking the semantics of network devices, or should we rewrite the driver interface to take have a different interface like netdev_sync_send_skb() that is slow, synchronous and non-interrupt (ie polls for completion). Of course, then people will complain that netconsole traffic slows the machine down. for completion. I couldn't agree more. Since netpoll runs outside of all of the normal netdevice locking rules, only the people using netpoll hit all the bugs. That means most of us do not test out these code path, which is bad. So, if anything, a more integrated implementation is essential. But, IMHO, until this will be done some simpler measures could do be done to make poll_napi less dangerous (as a matter of fact I wonder why oopses observed diagnosed by Olaf are so rare). Current locking with netpoll_poll_lock is mainly misleading. It seems somebody who planned napi didn't even think such helpers as poll_napi are possible on other cpus and somebody doing netpoll didn't want to show this all per cpu data needs full locking anyway. But since it's like this there is no reason to invent a wheel again and normal locking should be done: so global (not per device) spin_lock (#ifdef CONFIG_NETPOLL only) held during all net_rx_action and spin_trylock similarly in poll_napi (with STATE_RX_SCHED re-checking) should be minimum needed here. Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote: You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out. My box bugged during boot the first time I booted 23-rc1, but nothing made it to the console, and I didn't have a serial console running. I didn't have DEBUG_PAGEALLOC or friends set. I haven't worked out where that kmap_atomic() call is coming from yet. Both traces point up into the page allocator, but I _think_ that's stack gunk. I just enabled all debug options, and was just rewarded with the below. [ 119.079531] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1 [ 119.558867] [ cut here ] [ 119.572197] kernel BUG at arch/i386/mm/highmem.c:38! [ 119.585804] invalid opcode: [#1] [ 119.598013] PREEMPT SMP DEBUG_PAGEALLOC [ 119.610103] Modules linked in: edd button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 snd_intel8x0 snd_ac97_codec ac97_bus snd_mpu401 snd_pcm prism54 snd_timer snd_mpu401_uart snd_rawmidi snd_seq_device snd intel_agp agpgart soundcore snd_page_alloc i2c_i801 fan thermal processor [ 119.698063] CPU:1 [ 119.698065] EIP:0060:[c011cd2d]Not tainted VLI [ 119.698067] EFLAGS: 00010006 (2.6.23-rc1-smp #75) [ 119.736358] EIP is at kmap_atomic_prot+0xa7/0xab [ 119.749647] eax: 3d07f163 ebx: c166db80 ecx: c0750e60 edx: 0007 [ 119.765417] esi: 0022 edi: 0163 ebp: c069dcd4 esp: c069dcc8 [ 119.781273] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 119.796378] Process udevd (pid: 4775, ti=c069d000 task=f31aea60 task.ti=f477d000) [ 119.804068] Stack: c166db80 c166db80 c069dcdc c011cd3f c069dd40 c015b6e0 0001 [ 119.822272]0044 0163 0001 c165f4e0 0001 c165f4e0 0001 [ 119.840762] 00028020 c061e71c c166db80 0046 0080 0001 c011e4de [ 119.859389] Call Trace: [ 119.881302] [c0105144] show_trace_log_lvl+0x1a/0x30 [ 119.896319] [c01051ff] show_stack_log_lvl+0xa5/0xca [ 119.911171] [c0105420] show_registers+0x1fc/0x343 [ 119.925756] [c0105689] die+0x122/0x249 [ 119.939241] [c0105834] do_trap+0x84/0xad [ 119.952897] [c0105b1c] do_invalid_op+0x88/0x92 [ 119.967118] [c04cf3c2] error_code+0x72/0x78 [ 119.980948] [c011cd3f] kmap_atomic+0xe/0x10 [ 119.994642] [c015b6e0] get_page_from_freelist+0x39e/0x45e [ 120.009485] [c015b7fb] __alloc_pages+0x5b/0x2db [ 120.023342] [c0172872] cache_alloc_refill+0x380/0x6f2 [ 120.037623] [c0172e7a] kmem_cache_alloc+0xa1/0xa5 [ 120.051426] [c03fb397] neigh_create+0x5f/0x506 [ 120.064894] [c046e25d] ndisc_dst_alloc+0x122/0x151 [ 120.078769] [c0471b0b] __ndisc_send+0x8d/0x4fa [ 120.092340] [c0472915] ndisc_send_ns+0x5f/0x7d [ 120.105848] [c0469ff5] addrconf_dad_timer+0xdb/0xe0 [ 120.119758] [c012f8a0] run_timer_softirq+0x130/0x191 [ 120.133717] [c012c06d] __do_softirq+0x76/0xe4 [ 120.147475] [c0106b48] do_softirq+0x63/0xac [ 120.147488] [c012bff5] (gdb) list *neigh_create+0x5f 0xc03fb397 is in neigh_create (include/linux/slab.h:259). 254 /* 255 * Shortcuts 256 */ 257 static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags) 258 { 259 return kmem_cache_alloc(k, flags | __GFP_ZERO); 260 } 261 262 /** 263 * kzalloc - allocate memory. The memory is set to zero. (gdb) list *kmem_cache_alloc+0xa1 0xc0172e7a is in kmem_cache_alloc (mm/slab.c:3176). 3171STATS_INC_ALLOCHIT(cachep); 3172ac-touched = 1; 3173objp = ac-entry[--ac-avail]; 3174} else { 3175STATS_INC_ALLOCMISS(cachep); 3176objp = cache_alloc_refill(cachep, flags); 3177} 3178return objp; 3179} 3180 (gdb) list *cache_alloc_refill+0x380 0xc0172872 is in cache_alloc_refill (include/linux/gfp.h:154). 149 150 /* Unknown node is current node */ 151 if (nid 0) 152 nid = numa_node_id(); 153 154 return __alloc_pages(gfp_mask, order, 155 NODE_DATA(nid)-node_zonelists + gfp_zone(gfp_mask)); 156 } 157 158 #ifdef CONFIG_NUMA (gdb) list *__alloc_pages+0x5b 0xc015b7fb is in __alloc_pages (mm/page_alloc.c:1248). 1243if (unlikely(*z == NULL)) { 1244/* Should this ever happen?? */ 1245return NULL; 1246} 1247 1248page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order, 1249zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET); 1250if (page) 1251goto got_pg; 1252 (gdb) list *get_page_from_freelist+0x39e 0xc015b6e0 is in get_page_from_freelist (include/linux/highmem.h:122). 117
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 2007-07-24 at 12:01 +0200, Mike Galbraith wrote: On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote: You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out. My box bugged during boot the first time I booted 23-rc1, but nothing made it to the console, and I didn't have a serial console running. I didn't have DEBUG_PAGEALLOC or friends set. I haven't worked out where that kmap_atomic() call is coming from yet. Both traces point up into the page allocator, but I _think_ that's stack gunk. I just enabled all debug options, and was just rewarded with the below. Hm. I just also experienced filesystem corruption when I tried to send from that kernel, and it bugged in the process. My mount table ended up in /etc/resolv.conf along with some binary goop, making nscd rather unhappy after reboot. fsck time. .Mike - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?
Thomas Graf wrote: Please provide a new overall patch which is not based on your initial patch so I can review your idea properly. Here it goes (merging two previous patches). I have diffed against v2.6.22, which I am using currently as my base: include/net/genetlink.h |1 + net/netlink/genetlink.c | 106 +++ 2 files changed, 80 insertions(+), 27 deletions(-) diff --git a/include/net/genetlink.h b/include/net/genetlink.h index b6eaca1..681ad13 100644 --- a/include/net/genetlink.h +++ b/include/net/genetlink.h @@ -25,6 +25,7 @@ struct genl_family struct nlattr **attrbuf;/* private */ struct list_headops_list; /* private */ struct list_headfamily_list;/* private */ + struct mutexlock; /* private */ }; /** diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index b9ab62f..0104267 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -38,6 +38,32 @@ static void genl_unlock(void) genl_sock-sk_data_ready(genl_sock, 0); } +static DEFINE_MUTEX(genl_fam_mutex); /* serialization for family list management */ + +static inline void genl_fam_lock(struct genl_family *family) +{ + mutex_lock(genl_fam_mutex); + if (family) + mutex_lock(family-lock); +} + +static inline void genl_fam_unlock(struct genl_family *family) +{ + if (family) + mutex_unlock(family-lock); + mutex_unlock(genl_fam_mutex); +} + +static inline void genl_onefam_lock(struct genl_family *family) +{ + mutex_lock(family-lock); +} + +static inline void genl_onefam_unlock(struct genl_family *family) +{ + mutex_unlock(family-lock); +} + #define GENL_FAM_TAB_SIZE 16 #define GENL_FAM_TAB_MASK (GENL_FAM_TAB_SIZE - 1) @@ -150,9 +176,9 @@ int genl_register_ops(struct genl_family *family, struct genl_ops *ops) if (ops-policy) ops-flags |= GENL_CMD_CAP_HASPOL; - genl_lock(); + genl_fam_lock(family); list_add_tail(ops-ops_list, family-ops_list); - genl_unlock(); + genl_fam_unlock(family); genl_ctrl_event(CTRL_CMD_NEWOPS, ops); err = 0; @@ -180,16 +206,16 @@ int genl_unregister_ops(struct genl_family *family, struct genl_ops *ops) { struct genl_ops *rc; - genl_lock(); + genl_fam_lock(family); list_for_each_entry(rc, family-ops_list, ops_list) { if (rc == ops) { list_del(ops-ops_list); - genl_unlock(); + genl_fam_unlock(family); genl_ctrl_event(CTRL_CMD_DELOPS, ops); return 0; } } - genl_unlock(); + genl_fam_unlock(family); return -ENOENT; } @@ -216,8 +242,9 @@ int genl_register_family(struct genl_family *family) goto errout; INIT_LIST_HEAD(family-ops_list); + mutex_init(family-lock); - genl_lock(); + genl_fam_lock(family); if (genl_family_find_byname(family-name)) { err = -EEXIST; @@ -251,14 +278,14 @@ int genl_register_family(struct genl_family *family) family-attrbuf = NULL; list_add_tail(family-family_list, genl_family_chain(family-id)); - genl_unlock(); + genl_fam_unlock(family); genl_ctrl_event(CTRL_CMD_NEWFAMILY, family); return 0; errout_locked: - genl_unlock(); + genl_fam_unlock(family); errout: return err; } @@ -275,7 +302,7 @@ int genl_unregister_family(struct genl_family *family) { struct genl_family *rc; - genl_lock(); + genl_fam_lock(family); list_for_each_entry(rc, genl_family_chain(family-id), family_list) { if (family-id != rc-id || strcmp(rc-name, family-name)) @@ -283,14 +310,16 @@ int genl_unregister_family(struct genl_family *family) list_del(rc-family_list); INIT_LIST_HEAD(family-ops_list); - genl_unlock(); + + genl_fam_unlock(family); + mutex_destroy(family-lock); kfree(family-attrbuf); genl_ctrl_event(CTRL_CMD_DELFAMILY, family); return 0; } - genl_unlock(); + genl_fam_unlock(family); return -ENOENT; } @@ -303,38 +332,57 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) struct genlmsghdr *hdr = nlmsg_data(nlh); int hdrlen, err; + genl_fam_lock(NULL); family = genl_family_find_byid(nlh-nlmsg_type); - if (family == NULL) + if (family == NULL) { + genl_fam_unlock(NULL); return -ENOENT; + } + + /* get particular family lock, but release global family lock +* so registering operations for other families are possible */ +
[PATCH] virtio_net.c gso feature support
Feedback welcome, as always! (There's been talk of a virtualization git tree, in which case there'll be a decent home for these patches soon). Cheers, Rusty. == Add feature and GSO support to virtio net driver. If you don't do GSO, you can simply ignore the first sg element of every outgoing packet, and tack a dummy one on every incoming. Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- drivers/net/virtio_net.c | 130 include/linux/virtio_net.h | 25 +++- 2 files changed, 143 insertions(+), 12 deletions(-) === --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -16,12 +16,13 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -//#define DEBUG +#define DEBUG #include linux/netdevice.h #include linux/etherdevice.h #include linux/module.h #include linux/virtio.h #include linux/scatterlist.h +#include linux/virtio_net.h /* FIXME: Make dynamic */ #define MAX_PACKET_LEN (ETH_HLEN+ETH_DATA_LEN) @@ -40,6 +41,18 @@ struct virtnet_info struct sk_buff_head send; }; +static inline struct virtio_net_hdr *skb_vnet_hdr(struct sk_buff *skb) +{ + return (struct virtio_net_hdr *)skb-cb; +} + +static void vnet_hdr_to_sg(struct scatterlist *sg, struct sk_buff *skb) +{ + sg-page = virt_to_page(skb_vnet_hdr(skb)); + sg-offset = offset_in_page(skb_vnet_hdr(skb)); + sg-length = sizeof(struct virtio_net_hdr); +} + static bool skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq-priv; @@ -52,12 +65,14 @@ static void receive_skb(struct net_devic static void receive_skb(struct net_device *dev, struct sk_buff *skb, unsigned len) { - if (unlikely(len ETH_HLEN)) { + struct virtio_net_hdr *hdr = skb_vnet_hdr(skb); + + if (unlikely(len sizeof(struct virtio_net_hdr) + ETH_HLEN)) { pr_debug(%s: short packet %i\n, dev-name, len); dev-stats.rx_length_errors++; - dev_kfree_skb(skb); - return; - } + goto drop; + } + len -= sizeof(struct virtio_net_hdr); BUG_ON(len MAX_PACKET_LEN); skb_trim(skb, len); @@ -66,13 +81,70 @@ static void receive_skb(struct net_devic ntohs(skb-protocol), skb-len, skb-pkt_type); dev-stats.rx_bytes += skb-len; dev-stats.rx_packets++; + + if (hdr-flags VIRTIO_NET_F_NEEDS_CSUM) { + pr_debug(Needs csum!\n); + skb-ip_summed = CHECKSUM_PARTIAL; + skb-csum_start = hdr-csum_start; + skb-csum_offset = hdr-csum_offset; + if (skb-csum_start skb-len - 2 + || skb-csum_offset skb-len - 2) { + if (net_ratelimit()) + printk(KERN_WARNING %s: csum=%u/%u len=%u\n, + dev-name, skb-csum_start, + skb-csum_offset, skb-len); + goto frame_err; + } + } + + if (hdr-gso_type != VIRTIO_NET_GSO_NONE) { + pr_debug(GSO!\n); + switch (hdr-gso_type) { + case VIRTIO_NET_GSO_TCP: + skb_shinfo(skb)-gso_type = SKB_GSO_TCPV4; + break; + case VIRTIO_NET_GSO_TCP_ECN: + skb_shinfo(skb)-gso_type = SKB_GSO_TCP_ECN; + break; + case VIRTIO_NET_GSO_UDP: + skb_shinfo(skb)-gso_type = SKB_GSO_UDP; + break; + case VIRTIO_NET_GSO_TCPV6: + skb_shinfo(skb)-gso_type = SKB_GSO_TCPV6; + break; + default: + if (net_ratelimit()) + printk(KERN_WARNING %s: bad gso type %u.\n, + dev-name, hdr-gso_type); + goto frame_err; + } + + skb_shinfo(skb)-gso_size = hdr-gso_size; + if (skb_shinfo(skb)-gso_size == 0) { + if (net_ratelimit()) + printk(KERN_WARNING %s: zero gso size.\n, + dev-name); + goto frame_err; + } + + /* Header must be checked, and gso_segs computed. */ + skb_shinfo(skb)-gso_type |= SKB_GSO_DODGY; + skb_shinfo(skb)-gso_segs = 0; + } + netif_receive_skb(skb); + return; + +frame_err: + dev-stats.rx_frame_errors++; +drop: + dev_kfree_skb(skb); } static void try_fill_recv(struct virtnet_info *vi) { struct sk_buff *skb; - struct scatterlist sg[MAX_SKB_FRAGS]; + struct scatterlist
Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in write_bulk_callback() in drivers/net/usb/pegasus.c
ACK :-) cheers, Petko On Mon, 23 Jul 2007, Micah Gruber wrote: This patch fixes a potential null dereference bug where we dereference pegasus before a null check. This patch simply moves the dereferencing after the null check. Signed-off-by: Micah Gruber [EMAIL PROTECTED] --- --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -768,11 +768,13 @@ static void write_bulk_callback(struct urb *urb) { pegasus_t *pegasus = urb-context; - struct net_device *net = pegasus-net; + struct net_device *net; if (!pegasus) return; + net = pegasus-net; + if (!netif_device_present(net) || !netif_running(net)) return; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2)
From: Rick Jones [EMAIL PROTECTED] Subject: Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2) Date: Thu, 12 Jul 2007 15:27:30 -0700 So the problem is that RTO can grows to be twice the failover detection time. So back to the original mail, the scenario has a switch with failover detection of 20seconds. Worst case TCP RTO could grow to 40 seconds. Going back in archive to original mail: Background == When designing a TCP/IP based network system on failover-capable network devices, people want to set timeouts hierarchically in three layers, network device layer, TCP layer, and application layer (bottom-up order), such that: 1. Network device layer detects a failure first and switch to a backup device (say, in 20sec). 2. TCP layer timeout retransmission comes next, _hopefully_ before the application layer timeout. 3. Application layer detects a network failure last (by, say, 30sec timeout) and may trigger a system-level failover. Sounds like the solution is to make the switch failover detection faster. If you get switch failover down to 5sec then TCP RTO shouldn't be bigger than 10sec, and application will survive. That may indeed be the best solution, we'll have to wait to hear if there is any freedom there. When this sort of thing has crossed my path in other contexts, the general answer is that the device failover time is fixed, and the application layer time is similarly constrained by end-user expectation/requirement. Often as not, layer 8 and 9 issues tend to dominate and expect to trump (in this case layer 4 issues). I agree that application will survive if a user makes the application timeout twice the failover timeout. But I'm afraid there is no such freedom. Basically, to minimize downtime, shorter timeouts are preferred as long as the probability of mis-detection is kept low at a certain level. In practice, failover timeouts for bonding, switches, or routers are determined by heuristics. Users know what timeout values and retry counts of probe packets are suitable for detecting failure of a certain combination of network equipments. (e.g., 5sec timeout, retries 4 times) Shorter is better. And application timeout (or system timeout) is given as an end-user requirement. There is little change of negotiation, really. And again shorter (than requirement, if possible) is better. Regards, -- OBATA Noboru ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2)
From: Rick Jones [EMAIL PROTECTED] Subject: Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2) Date: Thu, 12 Jul 2007 13:51:44 -0700 TCP's timeouts are perfectly fine, and the only thing you might be showing above is that the application timeouts are too short or that TCP needs notifications. The application timeouts are probably being driven by external desires for a given recovery time. Agreed. TCP notifications don't solve anything unless the links in question are local to the machine on which the TCP endpoint resides. Agreed. Thank you for a good explanation. My original discussion using Dom-0 and Dom-U might be misleading, but I was trying to say: * Network failure and recovery(failover) are not necessarily visible locally. ** Dom-0 vs. Dom-U discussion is just an example of the case where a network failure is not visible locally. ** For another example, network switches or routers sitting somewhere in the middle of route are often duplicated with active-standby setting today. * Quick response (retransmission) of TCP upon a recovery of such invisible devices as well is desired. * If the failure and recovery are not visible locally, TCP notifications do not help. Regards, -- OBATA Noboru ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Add new timeval_to_sec function
Varun Chandramohan wrote: Oliver Hartkopp wrote: I don't think you should round down timeout values. Can you elaborate on that? As per the RFC of MIB ,we need only seconds granularity. Taking that as the case i dont understand why round down should not be done? When you like to create any timeout based on your calculated value, you might run into the problem that your calculated value is set to _zero_ even if there was some time before the conversion. This might probably not what you indented to get. So what about rounding up with return (tv-tv_sec + (tv-tv_usec + 99)/100); ??? This can done. Is this what you were ref to me, Patrick? Yes, timeouts should usually be at least as long as specified. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Add new timeval_to_sec function
Patrick McHardy wrote: Varun Chandramohan wrote: Oliver Hartkopp wrote: I don't think you should round down timeout values. Can you elaborate on that? As per the RFC of MIB ,we need only seconds granularity. Taking that as the case i dont understand why round down should not be done? When you like to create any timeout based on your calculated value, you might run into the problem that your calculated value is set to _zero_ even if there was some time before the conversion. This might probably not what you indented to get. So what about rounding up with return (tv-tv_sec + (tv-tv_usec + 99)/100); ??? This can done. Is this what you were ref to me, Patrick? Yes, timeouts should usually be at least as long as specified. Thanks Patrick and Oliver, ill round it up. :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On 7/24/07, Jens Axboe [EMAIL PROTECTED] wrote: What about the new async crypto stuff? I've been looking, but is it guarenteed that async_memcpy() runs in process context with interrupts enabled always? If not, there's a km type bug there. Currently the only user is the MD raid456 driver, and yes, it only performs copies from the handle_stripe routine which is always run in process context with interrupts enabled. However this is not documented. Would it be advisable to add a WARN_ON for this condition? In general, I think the highmem stuff could do with more safety checks: - People ALWAYS get the atomic unmaps wrong, passing in the page instead of the address. I've seen tons of these. And since kunmap_atomic() takes a void pointer, nobody notices until it goes boom. - People easily get the km type wrong - they use KM_USERx in interrupt context, or one of the irq variants without disabling interrupts. If we could just catch these two types of bugs, we've got a lot of these problems covered. -- Jens Axboe -- Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On 7/24/07, Andrew Morton [EMAIL PROTECTED] wrote: [...] What about the new async crypto stuff? I've been looking, but is it guarenteed that async_memcpy() runs in process context with interrupts enabled always? If not, there's a km type bug there. I think Shannon maintains that now. I am looking after the async_tx API, I will send a patch to update MAINTAINERS shortly. -- Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.22-rc7] xfrm state selection update to use inner addresses
Joakim Koskela wrote: This patch modifies the xfrm state selection logic to use the inner addresses where the outer have been (incorrectly) used. This is required for beet mode in general and interfamily setups in both tunnel and beet mode. Looks good. Acked-by: Patrick McHardy [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] e1000: repost work around 82571 completion timout on pseries HW
There seems to have been some discussion about a patch like this in the past but I still haven't noticed any platforms fixes or noticed that this got included, so I'd like to propose this modified version. Thoughts? Signed-off-by: Andy Gospodarek [EMAIL PROTECTED] --- e1000_hw.c |7 +++ e1000_hw.h |1 + 2 files changed, 8 insertions(+) diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c index 9be4469..7c75b8b 100644 --- a/drivers/net/e1000/e1000_hw.c +++ b/drivers/net/e1000/e1000_hw.c @@ -1030,6 +1030,13 @@ e1000_init_hw(struct e1000_hw *hw) break; } +#if defined(CONFIG_PPC_PSERIES) +if (hw-mac_type == e1000_82571) { +uint32_t gcr = E1000_READ_REG(hw, GCR); +gcr |= E1000_GCR_DISABLE_TIMEOUT_MECHANISM; +E1000_WRITE_REG(hw, GCR, gcr); +} +#endif if (hw-mac_type == e1000_82573) { uint32_t gcr = E1000_READ_REG(hw, GCR); diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h index bd000b8..71f4e2d 100644 --- a/drivers/net/e1000/e1000_hw.h +++ b/drivers/net/e1000/e1000_hw.h @@ -2211,6 +2211,7 @@ struct e1000_host_command_info { #define PCI_EX_82566_SNOOP_ALL PCI_EX_NO_SNOOP_ALL #define E1000_GCR_L1_ACT_WITHOUT_L0S_RX 0x0800 +#define E1000_GCR_DISABLE_TIMEOUT_MECHANISM 0x8000 /* Function Active and Power State to MNG */ #define E1000_FACTPS_FUNC0_POWER_STATE_MASK 0x0003 #define E1000_FACTPS_LAN0_VALID 0x0004 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000: repost work around 82571 completion timout on pseries HW
Andy Gospodarek wrote: There seems to have been some discussion about a patch like this in the past but I still haven't noticed any platforms fixes or noticed that this got included, so I'd like to propose this modified version. Thoughts? I've written a completely new version for IBM based on the response that implements this as a pci quirk (to tag the chipset, not the device). However IBM has spotted an issue due to firmware hiding the id for linux of the bridge involved (on pseries), and they're working on an alternative to fix that. So, the patch will look different, and I'll post it as soon as we have a version for both pseries and xseries that works. Auke Signed-off-by: Andy Gospodarek [EMAIL PROTECTED] --- e1000_hw.c |7 +++ e1000_hw.h |1 + 2 files changed, 8 insertions(+) diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c index 9be4469..7c75b8b 100644 --- a/drivers/net/e1000/e1000_hw.c +++ b/drivers/net/e1000/e1000_hw.c @@ -1030,6 +1030,13 @@ e1000_init_hw(struct e1000_hw *hw) break; } +#if defined(CONFIG_PPC_PSERIES) +if (hw-mac_type == e1000_82571) { +uint32_t gcr = E1000_READ_REG(hw, GCR); +gcr |= E1000_GCR_DISABLE_TIMEOUT_MECHANISM; +E1000_WRITE_REG(hw, GCR, gcr); +} +#endif if (hw-mac_type == e1000_82573) { uint32_t gcr = E1000_READ_REG(hw, GCR); diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h index bd000b8..71f4e2d 100644 --- a/drivers/net/e1000/e1000_hw.h +++ b/drivers/net/e1000/e1000_hw.h @@ -2211,6 +2211,7 @@ struct e1000_host_command_info { #define PCI_EX_82566_SNOOP_ALL PCI_EX_NO_SNOOP_ALL #define E1000_GCR_L1_ACT_WITHOUT_L0S_RX 0x0800 +#define E1000_GCR_DISABLE_TIMEOUT_MECHANISM 0x8000 /* Function Active and Power State to MNG */ #define E1000_FACTPS_FUNC0_POWER_STATE_MASK 0x0003 #define E1000_FACTPS_LAN0_VALID 0x0004 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][30/37] Clean up duplicate includes in net/netfilter/
Jesper Juhl wrote: This patch cleans up duplicate includes in net/netfilter/ I've queued this one and the bridge-netfilter patch (27/37), thanks Jesper. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in xfrm_bundle_ok
Hello, From: Ken-ichirou MATSUZAWA [EMAIL PROTECTED] Subject: Oops in xfrm_bundle_ok Date: Thu, 05 Jul 2007 02:47:10 +0900 (JST) I got Oops like below. I glanced xfrm_bundle_ok() in xfrm_policy.c and __xfrm4.bundle_create() in xfrm4_policy.c. It seems this was fixed by below, thanks a lot. commit bd0bf0765ea1fba80d7085e1f0375ec045631dc1 Author: Patrick McHardy [EMAIL PROTECTED] Date: Wed Jul 18 01:55:52 2007 -0700 [XFRM]: Fix crash introduced by struct dst_entry reordering @@ -2141,7 +2141,7 @@ int xfrm_bundle_ok(struct xfrm_policy *pol, struct xfrm_dst *first, if (last == first) break; - last = last-u.next; + last = (struct xfrm_dst *)last-u.dst.next; last-child_mtu_cached = mtu; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: revised make xfrm_audit_log more generic patch
On Tue, 2007-07-24 at 11:04 -0400, Steve Grubb wrote: + audit_log_format(audit_buf, %s: auid=%u, buf, auid); if (sid != 0 security_secid_to_secctx(sid, secctx, secctx_len) == 0) The operation in buf will not be parsed by the user space tools. Let's use op=%s where you have %s: above. Audit record fields are name=value and fields separated by spaces. op is what we are using in other places to mean operation. I know its a change from the records above, but we previously had some detail about what operation was being performed by the record type and this did not matter so much. Now that we only have one event type, the meaning of the event being recorded needs to be parsable and in a field. It also wouldn't hurt to change the text being sent to this function to have a hyphen instead of a space, so SPD delete becomes SPD-delete. This keeps the parser happy. This patch otherwise looks good. Sounds good. I will make the changes and resend. Thanks!! Joy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 24 Jul 2007 12:01:09 +0200 Mike Galbraith [EMAIL PROTECTED] wrote: On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote: You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out. My box bugged during boot the first time I booted 23-rc1, but nothing made it to the console, and I didn't have a serial console running. I didn't have DEBUG_PAGEALLOC or friends set. I haven't worked out where that kmap_atomic() call is coming from yet. Both traces point up into the page allocator, but I _think_ that's stack gunk. I just enabled all debug options, and was just rewarded with the below. doh. It's a slab bug. [ 119.079531] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1 [ 119.558867] [ cut here ] [ 119.572197] kernel BUG at arch/i386/mm/highmem.c:38! [ 119.585804] invalid opcode: [#1] [ 119.598013] PREEMPT SMP DEBUG_PAGEALLOC [ 119.610103] Modules linked in: edd button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 snd_intel8x0 snd_ac97_codec ac97_bus snd_mpu401 snd_pcm prism54 snd_timer snd_mpu401_uart snd_rawmidi snd_seq_device snd intel_agp agpgart soundcore snd_page_alloc i2c_i801 fan thermal processor [ 119.698063] CPU:1 [ 119.698065] EIP:0060:[c011cd2d]Not tainted VLI [ 119.698067] EFLAGS: 00010006 (2.6.23-rc1-smp #75) [ 119.736358] EIP is at kmap_atomic_prot+0xa7/0xab [ 119.749647] eax: 3d07f163 ebx: c166db80 ecx: c0750e60 edx: 0007 [ 119.765417] esi: 0022 edi: 0163 ebp: c069dcd4 esp: c069dcc8 [ 119.781273] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 [ 119.796378] Process udevd (pid: 4775, ti=c069d000 task=f31aea60 task.ti=f477d000) [ 119.804068] Stack: c166db80 c166db80 c069dcdc c011cd3f c069dd40 c015b6e0 0001 [ 119.822272]0044 0163 0001 c165f4e0 0001 c165f4e0 0001 [ 119.840762] 00028020 c061e71c c166db80 0046 0080 0001 c011e4de [ 119.859389] Call Trace: [ 119.881302] [c0105144] show_trace_log_lvl+0x1a/0x30 [ 119.896319] [c01051ff] show_stack_log_lvl+0xa5/0xca [ 119.911171] [c0105420] show_registers+0x1fc/0x343 [ 119.925756] [c0105689] die+0x122/0x249 [ 119.939241] [c0105834] do_trap+0x84/0xad [ 119.952897] [c0105b1c] do_invalid_op+0x88/0x92 [ 119.967118] [c04cf3c2] error_code+0x72/0x78 [ 119.980948] [c011cd3f] kmap_atomic+0xe/0x10 [ 119.994642] [c015b6e0] get_page_from_freelist+0x39e/0x45e [ 120.009485] [c015b7fb] __alloc_pages+0x5b/0x2db [ 120.023342] [c0172872] cache_alloc_refill+0x380/0x6f2 [ 120.037623] [c0172e7a] kmem_cache_alloc+0xa1/0xa5 [ 120.051426] [c03fb397] neigh_create+0x5f/0x506 [ 120.064894] [c046e25d] ndisc_dst_alloc+0x122/0x151 [ 120.078769] [c0471b0b] __ndisc_send+0x8d/0x4fa [ 120.092340] [c0472915] ndisc_send_ns+0x5f/0x7d [ 120.105848] [c0469ff5] addrconf_dad_timer+0xdb/0xe0 [ 120.119758] [c012f8a0] run_timer_softirq+0x130/0x191 [ 120.133717] [c012c06d] __do_softirq+0x76/0xe4 [ 120.147475] [c0106b48] do_softirq+0x63/0xac [ 120.147488] [c012bff5] (gdb) list *neigh_create+0x5f 0xc03fb397 is in neigh_create (include/linux/slab.h:259). 254 /* 255 * Shortcuts 256 */ 257 static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags) 258 { 259 return kmem_cache_alloc(k, flags | __GFP_ZERO); 260 } See, networking's kmem_cache_alloc(..., __GFP_ZERO) ended up calling into the page allocator with __GFP_ZERO. This is the bug - slab isn't supposed to do that: the __GFP_ZERO is supposed to be removed. Now, it's not a highmem page, so prep_zero_page() won't actually establish a kmap, but it will check that the kmap slot is presently unused on this CPU. But networking calls in here from softirq context (illegal for KM_USER0) and sometimes that KM_USER0 slot *will* be in use, so kmap_atomic_prot() will go BUG. I must say it's really really scary that such a low-level function as prep_zero_page() is using KM_USER0. I don't think it has enough debugging checks in there to prevent Bad Stuff from going undetected. I guess this was the bug: --- a/mm/slab.c~a +++ a/mm/slab.c @@ -2776,7 +2776,7 @@ static int cache_grow(struct kmem_cache * 'nodeid'. */ if (!objp) - objp = kmem_getpages(cachep, flags, nodeid); + objp = kmem_getpages(cachep, local_flags, nodeid); if (!objp) goto failed; _ I don't see why you later got fs corruption - afacit we won't actually modify the KM_USER0 slot in this scenario. 262 /** 263 * kzalloc - allocate memory. The memory is set to zero. (gdb) list *kmem_cache_alloc+0xa1 0xc0172e7a is in kmem_cache_alloc
Re: [PATCH] e1000: repost work around 82571 completion timout on pseries HW
On Tue, Jul 24, 2007 at 08:31:08AM -0700, Kok, Auke wrote: Andy Gospodarek wrote: There seems to have been some discussion about a patch like this in the past but I still haven't noticed any platforms fixes or noticed that this got included, so I'd like to propose this modified version. Thoughts? I've written a completely new version for IBM based on the response that implements this as a pci quirk (to tag the chipset, not the device). However IBM has spotted an issue due to firmware hiding the id for linux of the bridge involved (on pseries), and they're working on an alternative to fix that. So, the patch will look different, and I'll post it as soon as we have a version for both pseries and xseries that works. Auke Sounds great. I just wanted to make sure this didn't slip through the cracks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Fix race in genl_unregister_mc_groups()
Thomas Graf wrote: @@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group); void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { - BUG_ON(grp-family != family); genl_lock(); - netlink_clear_multicast_users(genl_sock, grp-id); - clear_bit(grp-id, mc_groups); - list_del(grp-list); - genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); - grp-id = 0; - grp-family = NULL; + genl_unregister_mc_group(family, grp); genl_unlock(); } Shouldn't this be __genl_unregister_mc_group(family, grp) ? -Brian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: revised make xfrm_audit_log more generic patch
On Tue, 2007-07-24 at 11:04 -0400, Steve Grubb wrote: It also wouldn't hurt to change the text being sent to this function to have a hyphen instead of a space, so SPD delete becomes SPD-delete. This keeps the parser happy. Steve, more for my education, should all entries have this sort of syntax, that is, a hyphen in it? I imagine some entries might be a bit more wordy and so I was wondering ahead of time how to do it. Thanks!! Joy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netxen: bug fixes for multiport adapters
Jeff, Any chance of these patches getting committed soon? Thanks, -Dhananjay Phadke On 7/21/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: These patches include fix for problem with 2nd port of multiport adapters on IBM blades. Also improves interrupt handling for multiport adapters avoiding interrupt flood after interrupt is down. Generated against upstream-fixes. drivers/net/netxen/netxen_nic.h |3 +- drivers/net/netxen/netxen_nic_main.c | 85 +++--- 2 files changed, 43 insertions(+), 45 deletions(-) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK
Sam Ravnborg wrote: On Tue, Jul 24, 2007 at 08:36:33AM +0300, Al Boldi wrote: Replaces NF_CONNTRACK_ENABLED with NF_CONNTRACK and selects it for NF_CONNTRACK_IPV4 and NF_CONNTRACK_IPV6 This exposes IPv4/6 connection tracking options for easier Kconfig setup. Signed-off-by: Al Boldi [EMAIL PROTECTED] Cc: David Miller [EMAIL PROTECTED] Cc: Sam Ravnborg [EMAIL PROTECTED] Cc: Andrew Morton [EMAIL PROTECTED] --- --- a/net/netfilter/Kconfig 2007-07-09 06:38:52.0 +0300 +++ b/net/netfilter/Kconfig 2007-07-24 08:28:06.0 +0300 @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG and is also scheduled to replace the old syslog-based ipt_LOG and ip6t_LOG modules. -# Rename this to NF_CONNTRACK in a 2.6.25 -config NF_CONNTRACK_ENABLED +config NF_CONNTRACK We kept this mainly for an easier upgrade. As the comment states, it should go in 2.6.25, at which time all people having reconfigured their kernel at least once since ip_conntrack was removed will have the NF_CONNTRACK option set to the same value as NF_CONNTRACK_ENABLED. --- a/net/ipv4/netfilter/Kconfig 2007-07-09 06:38:50.0 +0300 +++ b/net/ipv4/netfilter/Kconfig 2007-07-24 08:27:39.0 +0300 @@ -7,7 +7,7 @@ menu IP: Netfilter Configuration config NF_CONNTRACK_IPV4 tristate IPv4 connection tracking support (required for NAT) - depends on NF_CONNTRACK + select NF_CONNTRACK ---help--- Connection tracking keeps a record of what packets have passed through your machine, in order to figure out how they are related --- a/net/ipv6/netfilter/Kconfig 2007-07-09 06:38:51.0 +0300 +++ b/net/ipv6/netfilter/Kconfig 2007-07-24 08:27:54.0 +0300 @@ -7,7 +7,8 @@ menu IPv6: Netfilter Configuration (EXP config NF_CONNTRACK_IPV6 tristate IPv6 connection tracking support (EXPERIMENTAL) - depends on INET IPV6 EXPERIMENTAL NF_CONNTRACK + depends on INET IPV6 EXPERIMENTAL + select NF_CONNTRACK ---help--- Connection tracking keeps a record of what packets have passed through your machine, in order to figure out how they are related This change looks wrong. Due to the reverse nature of select kconfig cannot fulfill the dependencies of selected symbols. So as a rule of thumb select should only select symbols with no menu and no dependencies to avoid some of the problems that have popped up during the last months. In this case it looks OK since the dependencies of IPv4 connection tracking are (besides NF_CONNTRACK) are superset of those of nf_conntrack. But I vaguely recall having tried this myself and it broke somewhere, maybe it was because of the NF_CONNTRACK_ENABLED option, I can't recall anymore. Al, if this also works without removal of NF_CONNTRACK_ENABLED, please resend without that part. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][30/37] Clean up duplicate includes in net/netfilter/
On 24/07/07, Patrick McHardy [EMAIL PROTECTED] wrote: Jesper Juhl wrote: This patch cleans up duplicate includes in net/netfilter/ I've queued this one and the bridge-netfilter patch (27/37), thanks Jesper. Thanks for the feedback Patrick. -- Jesper Juhl [EMAIL PROTECTED] Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: revised make xfrm_audit_log more generic patch
On Tuesday 24 July 2007 12:33:26 pm Joy Latten wrote: It also wouldn't hurt to change the text being sent to this function to have a hyphen instead of a space, so SPD delete becomes SPD-delete. This keeps the parser happy. Steve, more for my education, should all entries have this sort of syntax, that is, a hyphen in it? Only if its something that is important to have associated in reports. More that 1 or 2 hyphens is probably not good. I imagine some entries might be a bit more wordy and so I was wondering ahead of time how to do it. The audit logs should be short as possible but contain everything necessary. You can have language in the record that makes it more understandable for people reading the raw record, but it won't necessarily be picked up by report parsers for searching or presentation. If you want me to help review the choices, let me know offline and we can work through it. -Steve - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: specifying scopid's for link-local IPv6 addrs
Rick, I don't see any way around this. For example, on one of my test systems, I have the following link local routes: chance% netstat -A inet6 -rn | grep fe80::/64 fe80::/64 :: U 25600 eth0 fe80::/64 :: U 25600 eth2 fe80::/64 :: U 25600 eth3 fe80::/64 :: U 25600 eth4 fe80::/64 :: U 25600 eth5 fe80::/64 :: U 25600 eth6 So if I want to run a link local test to fe80::202:b3ff:fed4:cd1, the system has no way to choose which is the correct interface to use for the test, and will give an error if the interface isn't specified. Yeah, I was wondering about that. I'm not sure if the attempts on those other OSes happened to involve multiple interfaces or not. Even so, it feels unpleasant for an application to deal with and I wonder if there is a way for a stack to deal with it on the application's behalf. I guess that might involve some sort of layer violation between neightbor discovery and routing (typing while I think about things I know little about...) Is there RFC chapter and verse I might read about routing with multiple link-local's on a system? You must explicitly specify the desired interface. For example, on my test system, the correct interface is eth6 which is interface 8 (lo eth0 eth1 eth2 ... eth5 eth6). Here is an example nuttcp test specifying interface 8: chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%8 1178.5809 MB / 10.02 sec = 986.2728 Mbps 12 %TX 15 %RX nuttcp uses getaddrinfo() which parses the %ifindex field, and then copies the sin6_scope_id from the res structure to the server's sockaddr_in6 structure before initiating the connect(). OK, I'll give that a quick try with netperf: [EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -c -C -i 30,3 -- -s 1M -S 1M -m 64K -H fe80::207:43ff:fe05:9d%2 TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to fe80::207:43ff:fe05:9d%2 (fe80::207:43ff:fe05:9d) port 0 AF_INET6 : +/-2.5% @ 99% conf. Cool - it establishes the data connection just fine. To further demonstrate my ignorance :) is that %n suffix something one might expect in most/all getaddrinfo()'s or is that unique to the one in Linux? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NET_DMA: where do we ever call dma_skb_copy_datagram_iovec() with NULL pinned_list?
On 7/20/07, Al Viro [EMAIL PROTECTED] wrote: AFAICS, all callers of dma_skb_copy_datagram_iovec() are either * recursive for fragments, pass pinned_list unchanged or * called from tcp, with pinned_list coming from tp-ucopy.pinned_list and only when tp-ucopy.dma_chan is non-NULL. Now, all non-NULL assignments to -dma_chan have the same form: if (!tp-ucopy.dma_chan tp-ucopy.pinned_list) tp-ucopy.dma_chan = get_softnet_dma(); IOW, if -ucopy.pinned_list stays NULL, -ucopy.dma_chan will do the same. Moreover, any place that resets -ucopy.pinned_list will also reset -ucopy.dma_chan. IOW, we can't ever get non-NULL tp-ucopy.dma_chan while tp-ucopy.pinned_list is NULL. So how can we ever get to the dma_memcpy_to_kernel_iovec()? Shannon what do you think? Is this a bug, or does it no longer support non-userspace iovecs? Thanks -- Andy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: specifying scopid's for link-local IPv6 addrs
On Tue, 2007-07-24 at 10:13 -0700, Rick Jones wrote: Rick, I don't see any way around this. For example, on one of my test systems, I have the following link local routes: chance% netstat -A inet6 -rn | grep fe80::/64 fe80::/64 :: U 25600 eth0 fe80::/64 :: U 25600 eth2 fe80::/64 :: U 25600 eth3 fe80::/64 :: U 25600 eth4 fe80::/64 :: U 25600 eth5 fe80::/64 :: U 25600 eth6 So if I want to run a link local test to fe80::202:b3ff:fed4:cd1, the system has no way to choose which is the correct interface to use for the test, and will give an error if the interface isn't specified. Yeah, I was wondering about that. I'm not sure if the attempts on those other OSes happened to involve multiple interfaces or not. Even so, it feels unpleasant for an application to deal with and I wonder if there is a way for a stack to deal with it on the application's behalf. I guess that might involve some sort of layer violation between neightbor discovery and routing (typing while I think about things I know little about...) Is there RFC chapter and verse I might read about routing with multiple link-local's on a system? You must explicitly specify the desired interface. For example, on my test system, the correct interface is eth6 which is interface 8 (lo eth0 eth1 eth2 ... eth5 eth6). Here is an example nuttcp test specifying interface 8: chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%8 1178.5809 MB / 10.02 sec = 986.2728 Mbps 12 %TX 15 %RX nuttcp uses getaddrinfo() which parses the %ifindex field, and then copies the sin6_scope_id from the res structure to the server's sockaddr_in6 structure before initiating the connect(). OK, I'll give that a quick try with netperf: [EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -c -C -i 30,3 -- -s 1M -S 1M -m 64K -H fe80::207:43ff:fe05:9d%2 We can even specify the interface name instead of the interface index link-local%ethX getaddrinfo() uses if_nametoindex() internally to get the index. Thanks Sridhar TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to fe80::207:43ff:fe05:9d%2 (fe80::207:43ff:fe05:9d) port 0 AF_INET6 : +/-2.5% @ 99% conf. Cool - it establishes the data connection just fine. To further demonstrate my ignorance :) is that %n suffix something one might expect in most/all getaddrinfo()'s or is that unique to the one in Linux? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: specifying scopid's for link-local IPv6 addrs
You must explicitly specify the desired interface. For example, on my test system, the correct interface is eth6 which is interface 8 (lo eth0 eth1 eth2 ... eth5 eth6). Here is an example nuttcp test specifying interface 8: chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%8 1178.5809 MB / 10.02 sec = 986.2728 Mbps 12 %TX 15 %RX nuttcp uses getaddrinfo() which parses the %ifindex field, and then copies the sin6_scope_id from the res structure to the server's sockaddr_in6 structure before initiating the connect(). OK, I'll give that a quick try with netperf: [EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -c -C -i 30,3 -- -s 1M -S 1M -m 64K -H fe80::207:43ff:fe05:9d%2 TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to fe80::207:43ff:fe05:9d%2 (fe80::207:43ff:fe05:9d) port 0 AF_INET6 : +/-2.5% @ 99% conf. Cool - it establishes the data connection just fine. Well, I spoke too soon - while it got me past my EINVAL, the connection establishement timed-out. Either I picked the wrong value for n, or I may yet need to make some tweaks to netperf. rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK
Patrick McHardy wrote: Sam Ravnborg wrote: On Tue, Jul 24, 2007 at 08:36:33AM +0300, Al Boldi wrote: Replaces NF_CONNTRACK_ENABLED with NF_CONNTRACK and selects it for NF_CONNTRACK_IPV4 and NF_CONNTRACK_IPV6 This exposes IPv4/6 connection tracking options for easier Kconfig setup. Signed-off-by: Al Boldi [EMAIL PROTECTED] Cc: David Miller [EMAIL PROTECTED] Cc: Sam Ravnborg [EMAIL PROTECTED] Cc: Andrew Morton [EMAIL PROTECTED] --- --- a/net/netfilter/Kconfig 2007-07-09 06:38:52.0 +0300 +++ b/net/netfilter/Kconfig 2007-07-24 08:28:06.0 +0300 @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG and is also scheduled to replace the old syslog-based ipt_LOG and ip6t_LOG modules. -# Rename this to NF_CONNTRACK in a 2.6.25 -config NF_CONNTRACK_ENABLED +config NF_CONNTRACK We kept this mainly for an easier upgrade. As the comment states, it should go in 2.6.25, at which time all people having reconfigured their kernel at least once since ip_conntrack was removed will have the NF_CONNTRACK option set to the same value as NF_CONNTRACK_ENABLED. --- a/net/ipv4/netfilter/Kconfig2007-07-09 06:38:50.0 +0300 +++ b/net/ipv4/netfilter/Kconfig2007-07-24 08:27:39.0 +0300 @@ -7,7 +7,7 @@ menu IP: Netfilter Configuration config NF_CONNTRACK_IPV4 tristate IPv4 connection tracking support (required for NAT) - depends on NF_CONNTRACK + select NF_CONNTRACK ---help--- Connection tracking keeps a record of what packets have passed through your machine, in order to figure out how they are related --- a/net/ipv6/netfilter/Kconfig2007-07-09 06:38:51.0 +0300 +++ b/net/ipv6/netfilter/Kconfig2007-07-24 08:27:54.0 +0300 @@ -7,7 +7,8 @@ menu IPv6: Netfilter Configuration (EXP config NF_CONNTRACK_IPV6 tristate IPv6 connection tracking support (EXPERIMENTAL) - depends on INET IPV6 EXPERIMENTAL NF_CONNTRACK + depends on INET IPV6 EXPERIMENTAL + select NF_CONNTRACK ---help--- Connection tracking keeps a record of what packets have passed through your machine, in order to figure out how they are related This change looks wrong. Due to the reverse nature of select kconfig cannot fulfill the dependencies of selected symbols. So as a rule of thumb select should only select symbols with no menu and no dependencies to avoid some of the problems that have popped up during the last months. In this case it looks OK since the dependencies of IPv4 connection tracking are (besides NF_CONNTRACK) are superset of those of nf_conntrack. But I vaguely recall having tried this myself and it broke somewhere, maybe it was because of the NF_CONNTRACK_ENABLED option, I can't recall anymore. Al, if this also works without removal of NF_CONNTRACK_ENABLED, please resend without that part. It doesn't. But how about this, if you really can't live without NF_CONNTRACK_ENBLED: == --- Kconfig.old 2007-07-09 06:38:52.0 +0300 +++ Kconfig 2007-07-24 20:24:27.0 +0300 @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG and is also scheduled to replace the old syslog-based ipt_LOG and ip6t_LOG modules. -# Rename this to NF_CONNTRACK in a 2.6.25 -config NF_CONNTRACK_ENABLED +config NF_CONNTRACK tristate Netfilter connection tracking support help Connection tracking keeps a record of what packets have passed @@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED To compile it as a module, choose M here. If unsure, say N. -config NF_CONNTRACK +config NF_CONNTRACK_ENABLED tristate - default NF_CONNTRACK_ENABLED + default NF_CONNTRACK config NF_CT_ACCT bool Connection tracking flow accounting == Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK
Al Boldi wrote: Patrick McHardy wrote: But I vaguely recall having tried this myself and it broke somewhere, maybe it was because of the NF_CONNTRACK_ENABLED option, I can't recall anymore. Al, if this also works without removal of NF_CONNTRACK_ENABLED, please resend without that part. It doesn't. But how about this, if you really can't live without NF_CONNTRACK_ENBLED: == --- Kconfig.old 2007-07-09 06:38:52.0 +0300 +++ Kconfig 2007-07-24 20:24:27.0 +0300 @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG and is also scheduled to replace the old syslog-based ipt_LOG and ip6t_LOG modules. -# Rename this to NF_CONNTRACK in a 2.6.25 -config NF_CONNTRACK_ENABLED +config NF_CONNTRACK tristate Netfilter connection tracking support help Connection tracking keeps a record of what packets have passed @@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED To compile it as a module, choose M here. If unsure, say N. -config NF_CONNTRACK +config NF_CONNTRACK_ENABLED tristate - default NF_CONNTRACK_ENABLED + default NF_CONNTRACK config NF_CT_ACCT bool Connection tracking flow accounting That defeats the only purpose why we kept it. How about we change this once we remove it, in 2.6.25? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Mon, Jul 23, 2007 at 02:28:11PM -0700, Linus Torvalds wrote: On Mon, 23 Jul 2007, Andrew Morton wrote: It'd be nice to get a clean trace. Are you able to obtain the full trace with CONFIG_FRAME_POINTER=y? If you are talking about http://userweb.kernel.org/~akpm/dsc03659.jpg then I think that _is_ a full trace. It's certainly not very messy, and it seems accurate. It's just that inlining makes it much harder to see the call-graphs, but that's what inlining does.. For example, missing from the call graph is get_page_from_freelist - buffered_rmqueue - [ missing - inlined ] prep_new_page -[ missing - inlined ] prep_zero_page - [ missing - inlined ] clear_highpage - [ missing - inlined ] kmap_atomic -[ missing - tailcall ] kmap_atomic_prot but I'm pretty sure the call trace is good (and I'm also pretty sure gcc is overly aggressive at inlining, and that it causes us pain for debugging, but whatever) ... For prep_zero_page() and clear_highpage() we can't blame gcc since we force gcc to always inline them. buffered_rmqueue() and prep_new_page() are static functions with only one caller each, and for the normal non-debug case it's a really nice optimization to have them inlined automatically. But it might make sense to add -fno-inline-functions-called-once to the CFLAGS depending on some debug option? Linus cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 24 Jul 2007, Adrian Bunk wrote: buffered_rmqueue() and prep_new_page() are static functions with only one caller each, and for the normal non-debug case it's a really nice optimization to have them inlined automatically. I'm not at all sure I agree. Inlining big functions doesn't actually tend to generally generate any better code, so if gcc's logic is single callsite - always inline, then that logic is likely not right. Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: NET_DMA: where do we ever call dma_skb_copy_datagram_iovec() with NULL pinned_list?
Al Viro: AFAICS, all callers of dma_skb_copy_datagram_iovec() are either * recursive for fragments, pass pinned_list unchanged or * called from tcp, with pinned_list coming from tp-ucopy.pinned_list and only when tp-ucopy.dma_chan is non-NULL. Now, all non-NULL assignments to -dma_chan have the same form: if (!tp-ucopy.dma_chan tp-ucopy.pinned_list) tp-ucopy.dma_chan = get_softnet_dma(); IOW, if -ucopy.pinned_list stays NULL, -ucopy.dma_chan will do the same. Moreover, any place that resets -ucopy.pinned_list will also reset -ucopy.dma_chan. IOW, we can't ever get non-NULL tp-ucopy.dma_chan while tp-ucopy.pinned_list is NULL. So how can we ever get to the dma_memcpy_to_kernel_iovec()? It looks like this is extra code. The history seems to be that it was thought to be useful for internal copying, perhaps for smbfs or iSCSI, as the comment suggests. However, since no one is using it, it can probably come out. If there is no argument, I'll post a patch to remove it. sln -- == Mr. Shannon Nelson LAN Access Division, Intel Corp. [EMAIL PROTECTED]I don't speak for Intel (503) 712-7659Parents can't afford to be squeamish. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 24 Jul 2007, Andrew Morton wrote: I guess this was the bug: Looks very likely to me. Mike, Alexey, does this fix things for you? Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 24 Jul 2007 11:14:21 -0700 (PDT) Linus Torvalds [EMAIL PROTECTED] wrote: On Tue, 24 Jul 2007, Adrian Bunk wrote: buffered_rmqueue() and prep_new_page() are static functions with only one caller each, and for the normal non-debug case it's a really nice optimization to have them inlined automatically. I'm not at all sure I agree. Inlining big functions doesn't actually tend to generally generate any better code, so if gcc's logic is single callsite - always inline, then that logic is likely not right. fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig vmlinux .text from 928360 up to 955362 bytes (27k larger). A surprisingly large increase - I wonder if it did something dumb. It appears to still correctly inline those things which we've manually marked inline. hm. It would be nice to defeat the autoinlining for debug purposes though. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK
Patrick McHardy wrote: Al Boldi wrote: Patrick McHardy wrote: But I vaguely recall having tried this myself and it broke somewhere, maybe it was because of the NF_CONNTRACK_ENABLED option, I can't recall anymore. Al, if this also works without removal of NF_CONNTRACK_ENABLED, please resend without that part. It doesn't. But how about this, if you really can't live without NF_CONNTRACK_ENBLED: == --- Kconfig.old 2007-07-09 06:38:52.0 +0300 +++ Kconfig 2007-07-24 20:24:27.0 +0300 @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG and is also scheduled to replace the old syslog-based ipt_LOG and ip6t_LOG modules. -# Rename this to NF_CONNTRACK in a 2.6.25 -config NF_CONNTRACK_ENABLED +config NF_CONNTRACK tristate Netfilter connection tracking support help Connection tracking keeps a record of what packets have passed @@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED To compile it as a module, choose M here. If unsure, say N. -config NF_CONNTRACK +config NF_CONNTRACK_ENABLED tristate - default NF_CONNTRACK_ENABLED + default NF_CONNTRACK config NF_CT_ACCT bool Connection tracking flow accounting That defeats the only purpose why we kept it. I'm not sure how this would defeat the only purpose. Isn't the purpose of this to alias NF_CONNTRACK_ENABLED to NF_CONNTRACK? And as such would yield the same result. Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead of NF_CONNTRACK. Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 24 Jul 2007, Andrew Morton wrote: fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig vmlinux .text from 928360 up to 955362 bytes (27k larger). A surprisingly large increase - I wonder if it did something dumb. It appears to still correctly inline those things which we've manually marked inline. hm. I think inlining small enough functions is worth it, and the thing is, the kernel is actually pretty damn good at having lots of small functions. It's one of the few things I really care about from a coding style standpoint. So I'm not surprised that -fno-inline-functions-called-once makes things larger, because I think it's generally a good idea to inline things that are just called once. But it does make things harder to debug, and the performance advantages become increasingly small for bigger functions. And that's a balancing act. Do we care about performance? Yes. But do we care so much that it's worth inlining something like buffered_rmqueue()? So I would not be surprised if -fno-inline-functions-called-once will disable *all* the inlining heuristics, and say oh, it's not an inline function, and it's only called once, so we won't inline it at all. So called once should probably make the inlining weight bigger (ie inline *larger* functions than you would otherwise), it just shouldn't make it infinite. It's not worth it. Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK
Al Boldi wrote: Patrick McHardy wrote: Al Boldi wrote: Patrick McHardy wrote: But I vaguely recall having tried this myself and it broke somewhere, maybe it was because of the NF_CONNTRACK_ENABLED option, I can't recall anymore. Al, if this also works without removal of NF_CONNTRACK_ENABLED, please resend without that part. It doesn't. But how about this, if you really can't live without NF_CONNTRACK_ENBLED: == --- Kconfig.old 2007-07-09 06:38:52.0 +0300 +++ Kconfig 2007-07-24 20:24:27.0 +0300 @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG and is also scheduled to replace the old syslog-based ipt_LOG and ip6t_LOG modules. -# Rename this to NF_CONNTRACK in a 2.6.25 -config NF_CONNTRACK_ENABLED +config NF_CONNTRACK tristate Netfilter connection tracking support help Connection tracking keeps a record of what packets have passed @@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED To compile it as a module, choose M here. If unsure, say N. -config NF_CONNTRACK +config NF_CONNTRACK_ENABLED tristate -default NF_CONNTRACK_ENABLED +default NF_CONNTRACK config NF_CT_ACCT bool Connection tracking flow accounting That defeats the only purpose why we kept it. I'm not sure how this would defeat the only purpose. Isn't the purpose of this to alias NF_CONNTRACK_ENABLED to NF_CONNTRACK? And as such would yield the same result. The purpose is to avoid forcing people a second time to reconfigure the conntrack options since we've completed nf_conntrack and removed ip_conntrack. Previously NF_CONNTRACK was a bool (selecting the new implementation) and NF_CONNTRACK_ENABLED specified whether to build either nf_conntrack or ip_conntrack modular/static/not at all. So old configs only have the information whether to build modular in NF_CONNTRACK_ENABLED, but NF_CONNTRACK is what actually controls it. With your change, old configs will still build nf_conntrack properly, but they will always choose static linking. Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead of NF_CONNTRACK. I guess so, and that would have to select NF_CONNTRACK. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/10] Implement batching skb API
KK, On Tue, 2007-24-07 at 09:14 +0530, Krishna Kumar2 wrote: J Hadi Salim [EMAIL PROTECTED] wrote on 07/23/2007 06:02:01 PM: Actually you have not sent netperf results with prep and without prep. My results were based on pktgen (which i explained as testing the driver). I think depending on netperf without further analysis is simplistic. It was like me doing forwarding tests on these patches. So _which_ non-LLTX driver doesnt do that? ;- I have no idea since I haven't looked at all drivers. Can you tell which all non-LLTX drivers does that ? I stated this as the sole criterea. The few i have peeked at all do it. I also think the e1000 should be converted to be non-LLTX. The rest of netdev is screaming to kill LLTX. tun driver doesnt use it either - but i doubt that makes it bloat Adding extra code that is currently not usable (esp from a submission point) is bloat. So far i have converted 3 drivers, 1 of them doesnt use it. Two more driver conversions are on the way, they will both use it. How is this bloat again? A few emails back you said if only IPOIB can use batching then thats good enough justification. You waltz in, have the luxury of looking at my code, presentations, many discussions with me etc ... luxury ? I had implemented the entire thing even before knowing that you are working on something similar! and I had sent the first proposal to netdev, I saw your patch at the end of may (or at least 2 weeks after you said it existed). That patch has very little resemblance to what you just posted conceptwise or codewise. I could post it if you would give me permission. *after* which you told that you have your own code and presentations (which I had never seen earlier - I joined netdev a few months back, earlier I was working on RDMA, Infiniband as you know). I am gonna assume you didnt know of my work - which i have been making public for about 3 years. Infact i talked about this topic when i visited your office in 2006 on a day you were not present, so it is plausible you didnt hear of it. And it didn't give me any great ideas either, remember I had posted results for E1000 at the time of sending the proposals. In mid-June you sent me a series of patches which included anything from changing variable names to combining qdisc_restart and about everything i referred to as being cosmetic differences in your posted patches. I took two of those and incorporated them in. One was an XXX in my code already to allocate the dev-blist (Commit: bb4464c5f67e2a69ffb233fcf07aede8657e4f63). The other one was a mechanical removal of the blist being passed (Commit: 0e9959e5ee6f6d46747c97ca8edc91b3eefa0757). Some of the others i asked you to defer. For example, the reason i gave you for not merging any qdisc_restart_combine changes is because i was waiting for Dave to swallow the qdisc_restart changes i made; otherwise maintainance becomes extremely painful for me. Sridhar actually provided a lot more valuable comments and fixes but has not planted a flag on behalf of the queen of spain like you did. However I do give credit in my proposal to you for what ideas that your provided (without actual code), and the same I did for other people who did the same, like Dave, Sridhar. BTW, you too had discussions with me, and I sent some patches to improve your code too, I incorporated two of your patches and asked for deferal of others. These patches have now shown up in what you claim as the difference. I just call them cosmetic difference not to downplay the importance of having an ethtool interface but because they do not make batching perform any better. The real differences are those two items. I am suprised you havent cannibalized those changes as well. I thought you renamed them to something else; according to your posting: This patch will work with drivers updated by Jamal, Matt Michael Chan with minor modifications - rename xmit_win to xmit_slots rename batch handler. Or maybe thats a future plan you have in mind? so it looks like a two way street to me (and that is how open source works and should). Open source is a lot more transparent than that. You posted a question, which was part of your research. I responded and told you i have patches; you asked me for them and i promptly ported them from pre-2.6.18 to the latest kernel at the time. The nature of this batching work is one of performance. So numbers are important. If you had some strong disagreements on something in the architecture, then it would be of great value to explain it in a technical detail - and more importantly to provide some numbers to say why it is a bad idea. You get numbers by running some tests. You did none of the above. Your effort has been to produce your patch for whatever reasons. This would not have been problematic to me if it actually was based within reasons of optimization because the end goal would have been achieved. I have deleted the rest of the email
Re: 2.6.20-2.6.21 - networking dies after random time
On Tue, 24 Jul 2007, Ingo Molnar wrote: please try the patch below instead. I'm hoping this is just a let's see if the behavior changes patch, not something that you think should be applied if it fixes something? This patch looks like it is trying to paper over (rather than fix) some possible bug in the -disable logic. Makes sense as a let's see if it's that kind of thing, but not as a let's fix it. Or am I missing something? Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
Linus Torvalds [EMAIL PROTECTED] writes: So called once should probably make the inlining weight bigger (ie inline *larger* functions than you would otherwise), it just shouldn't make it infinite. It's not worth it. There's probably a --param where it can be tweaked exactly. The problem is that --params tend to be very gcc version specific and might do something completely different on a newer or older version. So it's better not to use them. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, Jul 24, 2007 at 12:15:48PM -0700, Linus Torvalds wrote: On Tue, 24 Jul 2007, Andrew Morton wrote: fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig vmlinux .text from 928360 up to 955362 bytes (27k larger). A surprisingly large increase - I wonder if it did something dumb. It appears to still correctly inline those things which we've manually marked inline. hm. I think inlining small enough functions is worth it, and the thing is, the kernel is actually pretty damn good at having lots of small functions. It's one of the few things I really care about from a coding style standpoint. So I'm not surprised that -fno-inline-functions-called-once makes things larger, because I think it's generally a good idea to inline things that are just called once. But it does make things harder to debug, and the performance advantages become increasingly small for bigger functions. And that's a balancing act. Do we care about performance? Yes. When using CONFIG_CC_OPTIMIZE_FOR_SIZE=y we even actively tell gcc that we only care about size and do not care about performance... But do we care so much that it's worth inlining something like buffered_rmqueue()? ... Where is the problem with having buffered_rmqueue() inlined? Linus cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 24 Jul 2007, Andi Kleen wrote: There's probably a --param where it can be tweaked exactly. The problem is that --params tend to be very gcc version specific and might do something completely different on a newer or older version. So it's better not to use them. I agree wholeheartedly with that sentiment. We've tried at times (because some gcc snapshots made some *truly* insane choices for a while), and maybe we still have some around. Not worth the pain. Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 24 Jul 2007, Adrian Bunk wrote: But do we care so much that it's worth inlining something like buffered_rmqueue()? ... Where is the problem with having buffered_rmqueue() inlined? In this case, it was a pain to just even try to find the call chain, or read the asm. I would encourage lots of kernel hackers to read the assembler code gcc generates. I suspect people being aware of code generation issues (and writing their code with that in mind) is a *much* bigger performance impact than gcc inlining random functions. So maybe I'm old-fashioned and crazy, but readability of the asm result actually is a worthwhile goal. Not because we care directly, but because I'd like to encourage people to do it, due to the *indirect* benefits. Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Add new timeval_to_sec function
Oliver Hartkopp [EMAIL PROTECTED] wrote on 07/23/2007 11:22:39 PM: When you like to create any timeout based on your calculated value, you might run into the problem that your calculated value is set to _zero_ even if there was some time before the conversion. This might probably not what you indented to get. BTW, these are stats (for human consumption), not timers. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] amso1100: QP init bug in amso driver
Roland: The guys at UNH found this and fixed it. I'm surprised no one has hit this before. I guess it only breaks when the refcount on the QP is non-zero. Initialize the wait_queue_head_t in the c2_qp structure. Signed-off-by: Ethan Burns [EMAIL PROTECTED] Acked-by: Tom Tucker [EMAIL PROTECTED] --- drivers/infiniband/hw/amso1100/c2_qp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c index 420c138..01d0786 100644 --- a/drivers/infiniband/hw/amso1100/c2_qp.c +++ b/drivers/infiniband/hw/amso1100/c2_qp.c @@ -506,6 +506,7 @@ int c2_alloc_qp(struct c2_dev *c2dev, qp-send_sgl_depth = qp_attrs-cap.max_send_sge; qp-rdma_write_sgl_depth = qp_attrs-cap.max_send_sge; qp-recv_sgl_depth = qp_attrs-cap.max_recv_sge; + init_waitqueue_head(qp-wait); /* Initialize the SQ MQ */ q_size = be32_to_cpu(reply-sq_depth); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK
Patrick McHardy wrote: Al Boldi wrote: Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead of NF_CONNTRACK. I guess so, and that would have to select NF_CONNTRACK. Should I resend, or can you take care of it? Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-2.6.21 - networking dies after random time
* Linus Torvalds [EMAIL PROTECTED] wrote: On Tue, 24 Jul 2007, Ingo Molnar wrote: please try the patch below instead. I'm hoping this is just a let's see if the behavior changes patch, not something that you think should be applied if it fixes something? This patch looks like it is trying to paper over (rather than fix) some possible bug in the -disable logic. Makes sense as a let's see if it's that kind of thing, but not as a let's fix it. Or am I missing something? yeah - it's a totaly bad and unacceptable hack (i realized how bad it was when i wrote up that comment section ...), i just wanted to see which portion of ne2k/lib8390.c is sensitive to the fact whether an irq line is masked or not. The patch has no SOB line either. the current best fix forward is to undo my original change, unless we find a better fix for this problem. (Note that the other patches posted in this thread are broken too: they only mask the irq but dont reliably unmask it.) here's the current method of handling irqs for Marcin's card: 17: 12 IO-APIC-fasteoi eth1, eth0 and fasteoi is a really simple sequence: no masking/unmasking by the flow handler itself but a NOP at entry and an APIC-EOI at the end. The disable/enable irq thing should thus have minimal effect if done within an irq handler. now ne2k does something uncommon: for xmit (which is normally done outside of irq handlers) it will disable_irq_nosync()/enable_irq() the interrupt. It does it to exclude the handler from _that_ CPU, but due to the _nosync it does not exclude it from any other CPUs. So that's a bit weird already. just in case, i've just re-checked all the genirq bits that change IRQ_DISABLED (that bit accidentally clear would be the only way to truly allow an IRQ handler to interrupt the disable_irq_nosync() critical section on that CPU) - but i can see no way for that to happen: we unconditionally detect and report unbalanced and underflowing irq_desc-depth, and the only other place (besides enable_irq()) that clears IRQ_DISABLED is __set_irq_handler(), and on x86 that is only used during bootup. Marcin, could you try the patch below too? [without having any other patch applied.] It basically turns the critical section into an irqs-off critical section and thus checks whether your problem is related to that particular area of code. Ingo Index: linux/drivers/net/lib8390.c === --- linux.orig/drivers/net/lib8390.c +++ linux/drivers/net/lib8390.c @@ -297,9 +297,7 @@ static int ei_start_xmit(struct sk_buff * Slow phase with lock held. */ - disable_irq_nosync_lockdep_irqsave(dev-irq, flags); - - spin_lock(ei_local-page_lock); + spin_lock_irqsave(ei_local-page_lock, flags); ei_local-irqlock = 1; @@ -376,8 +374,7 @@ static int ei_start_xmit(struct sk_buff ei_local-irqlock = 0; ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR); - spin_unlock(ei_local-page_lock); - enable_irq_lockdep_irqrestore(dev-irq, flags); + spin_unlock_irqrestore(ei_local-page_lock, flags); dev_kfree_skb (skb); ei_local-stat.tx_bytes += send_length; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, Jul 24, 2007 at 11:25:22AM -0700, Linus Torvalds wrote: On Tue, 24 Jul 2007, Andrew Morton wrote: I guess this was the bug: Looks very likely to me. Mike, Alexey, does this fix things for you? Yeah, box is running for more than hour, survived LTP, gdb testsuite, portage sync and so on. What new to me is that someone decided TSC is unstable but that's probably OK on full debug kernel: Clocksource tsc unstable (delta = 91724418 ns) Time: pit clocksource has been installed. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in mace_interrupt() in drivers/net/pcmcia/nmclan_cs.c
Micah Gruber wrote: This patch fixes a potential null dereference bug where we dereference DEV before a null check. This patch simply moves the dereferencing after the null check. Signed-off-by: Micah Gruber [EMAIL PROTECTED] --- --- a/drivers/net/pcmcia/nmclan_cs.c+++ b/drivers/net/pcmcia/nmclan_cs.c@@ -996,7 +996,7 @@ PLEASE PLEASE PLEASE fix your mailer. None of your patches are apply-able via script or patch(1), which is how all Linux maintainers apply patches. If you want to contribute to the kernel, you -must- figure out how to send patches via email. Linux kernel development is done almost exclusively through email, so it is very important to get the details right. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] ucc_geth: add ethtool support
Li Yang wrote: The patch enables statistics in ucc_geth and adds ethtool support to ucc_geth driver. Signed-off-by: Li Yang [EMAIL PROTECTED] --- drivers/net/Makefile |2 +- drivers/net/ucc_geth.c | 19 +- drivers/net/ucc_geth.h |6 + drivers/net/ucc_geth_ethtool.c | 388 drivers/net/ucc_geth_mii.c |6 +- 5 files changed, 407 insertions(+), 14 deletions(-) create mode 100644 drivers/net/ucc_geth_ethtool.c applied 1-2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] netxen: Load firmware during probe, dma watchdog fix.
[EMAIL PROTECTED] wrote: The firmware should be loaded after resetting hardware during PCI probe, besides module unload. This fixes issue with 2nd port of multiport adapter on powerpc blades. This patch also fixes a bug that PCI resources are not freed if dma watchdog shutdown failed. The dma watchdog poll messages during module unload are also suppressed. Signed-off-by: Dhananjay Phadke [EMAIL PROTECTED] Signed-off-by: Milan Bag [EMAIL PROTECTED] Signed-off-by: Wen Xiong [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/10] ps3: fix wrong calculation of rx descriptor address
Masakazu Mokuno wrote: Fixed the bug that calculation of the address of rx descriptor was wrong. Signed-off-by: Masakazu Mokuno [EMAIL PROTECTED] --- drivers/net/ps3_gelic_net.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) applied 1-10 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] atl1: change tpd_avail function name
Jay Cliburn wrote: Change tpd_avail() to atl1_tpd_avail(). Signed-off-by: Jay Cliburn [EMAIL PROTECTED] --- drivers/net/atl1/atl1_main.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) applied 1-5 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] drivers/net/acenic.c: fix check-after-use
Adrian Bunk wrote: The Coverity checker noted that we've already dereferenced dev when we check whether it's NULL. Since it's impossible that dev is NULL at this place this patch removes the NULL check. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] eHEA: net_poll support
Jan-Bernd Themann wrote: net_poll support for eHEA added Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/ehea/ehea.h |2 +- drivers/net/ehea/ehea_main.c | 22 +- 2 files changed, 22 insertions(+), 2 deletions(-) applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] defxx: Use __maybe_unused rather than a local hack
Maciej W. Rozycki wrote: This is a change to remove a local hack in favour to __maybe_unused that has been recently added. Signed-off-by: Maciej W. Rozycki [EMAIL PROTECTED] --- Hi, It should be obvious. The code builds, therefore it works. Please apply, Maciej applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] forcedeth: new device ids in pci_ids.h
Ayaz Abdulla wrote: This patch contains new device ids for MCP73 chipset. Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED] applied 1-2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/3] netdev: i82596 Ethernet needs asm/cacheflush.h
Geert Uytterhoeven wrote: netdev: i82596 Ethernet needs asm/cacheflush.h on m68k drivers/net/82596.c: In function 'init_rx_bufs': drivers/net/82596.c:552: error: implicit declaration of function 'cache_clear' drivers/net/82596.c: In function 'i596_start_xmit': drivers/net/82596.c:1104: error: implicit declaration of function 'cache_push' The driver still compiles on ia32 (CONFIG_APRICOT=y) Signed-off-by: Geert Uytterhoeven [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec
Al Viro pointed out that dma_memcpy_to_kernel_iovec() really was unreachable and thus unused. The code originally was there to support in-kernel dma needs, but since it remains unused, we'll pull it out. Signed-off-by: Shannon Nelson [EMAIL PROTECTED] --- drivers/dma/iovlock.c | 27 --- 1 files changed, 0 insertions(+), 27 deletions(-) diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c index d637555..e763d72 100644 --- a/drivers/dma/iovlock.c +++ b/drivers/dma/iovlock.c @@ -143,29 +143,6 @@ void dma_unpin_iovec_pages(struct dma_pinned_list *pinned_list) kfree(pinned_list); } -static dma_cookie_t dma_memcpy_to_kernel_iovec(struct dma_chan *chan, struct - iovec *iov, unsigned char *kdata, size_t len) -{ - dma_cookie_t dma_cookie = 0; - - while (len 0) { - if (iov-iov_len) { - int copy = min_t(unsigned int, iov-iov_len, len); - dma_cookie = dma_async_memcpy_buf_to_buf( - chan, - iov-iov_base, - kdata, - copy); - kdata += copy; - len -= copy; - iov-iov_len -= copy; - iov-iov_base += copy; - } - iov++; - } - - return dma_cookie; -} /* * We have already pinned down the pages we will be using in the iovecs. @@ -187,10 +164,6 @@ dma_cookie_t dma_memcpy_to_iovec(struct dma_chan *chan, struct iovec *iov, if (!chan) return memcpy_toiovec(iov, kdata, len); - /* - kernel copies (e.g. smbfs) */ - if (!pinned_list) - return dma_memcpy_to_kernel_iovec(chan, iov, kdata, len); - iovec_idx = 0; while (iovec_idx pinned_list-nr_iovecs) { struct dma_page_list *page_list; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Fix race in genl_unregister_mc_groups()
* Brian Haley [EMAIL PROTECTED] 2007-07-24 12:14 Thomas Graf wrote: @@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group); void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { -BUG_ON(grp-family != family); genl_lock(); -netlink_clear_multicast_users(genl_sock, grp-id); -clear_bit(grp-id, mc_groups); -list_del(grp-list); -genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); -grp-id = 0; -grp-family = NULL; +genl_unregister_mc_group(family, grp); genl_unlock(); } Shouldn't this be __genl_unregister_mc_group(family, grp) ? Yes, thank for you noticing. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[REPOST][GENETLINK]: Fix race in genl_unregister_mc_groups()
family-mcast_groups is protected by genl_lock so it must be held while accessing the list in genl_unregister_mc_groups(). Requires adding a non-locking variant of genl_unregister_mc_group(). Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/netlink/genetlink.c === --- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:08:04.0 +0200 +++ net-2.6/net/netlink/genetlink.c 2007-07-24 23:51:11.0 +0200 @@ -200,6 +200,18 @@ int genl_register_mc_group(struct genl_f } EXPORT_SYMBOL(genl_register_mc_group); +static void __genl_unregister_mc_group(struct genl_family *family, + struct genl_multicast_group *grp) +{ + BUG_ON(grp-family != family); + netlink_clear_multicast_users(genl_sock, grp-id); + clear_bit(grp-id, mc_groups); + list_del(grp-list); + genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); + grp-id = 0; + grp-family = NULL; +} + /** * genl_unregister_mc_group - unregister a multicast group * @@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group); void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { - BUG_ON(grp-family != family); genl_lock(); - netlink_clear_multicast_users(genl_sock, grp-id); - clear_bit(grp-id, mc_groups); - list_del(grp-list); - genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); - grp-id = 0; - grp-family = NULL; + __genl_unregister_mc_group(family, grp); genl_unlock(); } @@ -232,8 +238,10 @@ static void genl_unregister_mc_groups(st { struct genl_multicast_group *grp, *tmp; + genl_lock(); list_for_each_entry_safe(grp, tmp, family-mcast_groups, list) - genl_unregister_mc_group(family, grp); + __genl_unregister_mc_group(family, grp); + genl_unlock(); } /** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] TIPC: fix tipc_link_create error handling
if printbuf allocation or tipc_node_attach_link() fails, invalid references to the link are left in the associated node and bearer structures. Fix by allocating printbuf early and moving timer initialization and the addition of the new link to the b_ptr-links list after tipc_node_attach_link() succeeded. Signed-off-by: Florian Westphal [EMAIL PROTECTED] --- link.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) also move k_init_timer(), as suggested by Allan. diff --git a/net/tipc/link.c b/net/tipc/link.c index 5adfdfd..1d674e0 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -423,6 +423,17 @@ struct link *tipc_link_create(struct bearer *b_ptr, const u32 peer, return NULL; } + if (LINK_LOG_BUF_SIZE) { + char *pb = kmalloc(LINK_LOG_BUF_SIZE, GFP_ATOMIC); + + if (!pb) { + kfree(l_ptr); + warn(Link creation failed, no memory for print buffer\n); + return NULL; + } + tipc_printbuf_init(l_ptr-print_buf, pb, LINK_LOG_BUF_SIZE); + } + l_ptr-addr = peer; if_name = strchr(b_ptr-publ.name, ':') + 1; sprintf(l_ptr-name, %u.%u.%u:%s-%u.%u.%u:, @@ -432,8 +443,6 @@ struct link *tipc_link_create(struct bearer *b_ptr, const u32 peer, tipc_zone(peer), tipc_cluster(peer), tipc_node(peer)); /* note: peer i/f is appended to link name by reset/activate */ memcpy(l_ptr-media_addr, media_addr, sizeof(*media_addr)); - k_init_timer(l_ptr-timer, (Handler)link_timeout, (unsigned long)l_ptr); - list_add_tail(l_ptr-link_list, b_ptr-links); l_ptr-checkpoint = 1; l_ptr-b_ptr = b_ptr; link_set_supervision_props(l_ptr, b_ptr-media-tolerance); @@ -459,21 +468,14 @@ struct link *tipc_link_create(struct bearer *b_ptr, const u32 peer, l_ptr-owner = tipc_node_attach_link(l_ptr); if (!l_ptr-owner) { + if (LINK_LOG_BUF_SIZE) + kfree(l_ptr-print_buf.buf); kfree(l_ptr); return NULL; } - if (LINK_LOG_BUF_SIZE) { - char *pb = kmalloc(LINK_LOG_BUF_SIZE, GFP_ATOMIC); - - if (!pb) { - kfree(l_ptr); - warn(Link creation failed, no memory for print buffer\n); - return NULL; - } - tipc_printbuf_init(l_ptr-print_buf, pb, LINK_LOG_BUF_SIZE); - } - + k_init_timer(l_ptr-timer, (Handler)link_timeout, (unsigned long)l_ptr); + list_add_tail(l_ptr-link_list, b_ptr-links); tipc_k_signal((Handler)tipc_link_start, (unsigned long)l_ptr); dbg(tipc_link_create(): tolerance = %u,cont intv = %u, abort_limit = %u\n, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REPOST][GENETLINK]: Fix race in genl_unregister_mc_groups()
From: Thomas Graf [EMAIL PROTECTED] Date: Tue, 24 Jul 2007 23:52:57 +0200 family-mcast_groups is protected by genl_lock so it must be held while accessing the list in genl_unregister_mc_groups(). Requires adding a non-locking variant of genl_unregister_mc_group(). Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied, thanks Thomas. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] NET: Fix sch_api to properly set sch-parent on the root qdisc
From: Patrick McHardy [EMAIL PROTECTED] Fix sch_api to correctly set sch-parent for both ingress and egress qdiscs in qdisc_create(). Signed-off-by: Patrick McHardy [EMAIL PROTECTED] Signed-off-by: Peter P Waskiewicz Jr [EMAIL PROTECTED] --- net/sched/sch_api.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 13c09bc..dee0d5f 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -380,6 +380,10 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n) return; while ((parentid = sch-parent)) { sch = qdisc_lookup(sch-dev, TC_H_MAJ(parentid)); + if (sch == NULL) { + WARN_ON(parentid != TC_H_ROOT); + return; + } cops = sch-ops-cl_ops; if (cops-qlen_notify) { cl = cops-get(sch, parentid); @@ -420,8 +424,6 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent, unsigned long cl = cops-get(parent, classid); if (cl) { err = cops-graft(parent, cl, new, old); - if (new) - new-parent = classid; cops-put(parent, cl); } } @@ -436,7 +438,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent, */ static struct Qdisc * -qdisc_create(struct net_device *dev, u32 handle, struct rtattr **tca, int *errp) +qdisc_create(struct net_device *dev, u32 parent, u32 handle, + struct rtattr **tca, int *errp) { int err; struct rtattr *kind = tca[TCA_KIND-1]; @@ -482,6 +485,8 @@ qdisc_create(struct net_device *dev, u32 handle, struct rtattr **tca, int *errp) goto err_out2; } + sch-parent = parent; + if (handle == TC_H_INGRESS) { sch-flags |= TCQ_F_INGRESS; sch-stats_lock = dev-ingress_lock; @@ -758,9 +763,11 @@ create_n_graft: if (!(n-nlmsg_flagsNLM_F_CREATE)) return -ENOENT; if (clid == TC_H_INGRESS) - q = qdisc_create(dev, tcm-tcm_parent, tca, err); + q = qdisc_create(dev, tcm-tcm_parent, tcm-tcm_parent, +tca, err); else - q = qdisc_create(dev, tcm-tcm_handle, tca, err); + q = qdisc_create(dev, tcm-tcm_parent, tcm-tcm_handle, +tca, err); if (q == NULL) { if (err == -EAGAIN) goto replay; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] NET: Fix sch_prio to properly detect the root qdisc on multiqueue
Fix the check in prio_tune() to see if sch-parent is TC_H_ROOT instead of sch-handle to load or reject the qdisc for multiqueue devices. Signed-off-by: Peter P Waskiewicz Jr [EMAIL PROTECTED] --- net/sched/sch_prio.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 2d8c084..06441db 100644 --- a/net/sched/sch_prio.c +++ b/net/sched/sch_prio.c @@ -239,11 +239,13 @@ static int prio_tune(struct Qdisc *sch, struct rtattr *opt) /* If we're multiqueue, make sure the number of incoming bands * matches the number of queues on the device we're associating with. * If the number of bands requested is zero, then set q-bands to -* dev-egress_subqueue_count. +* dev-egress_subqueue_count. Also, the root qdisc must be the +* only one that is enabled for multiqueue, since it's the only one +* that interacts with the underlying device. */ q-mq = RTA_GET_FLAG(tb[TCA_PRIO_MQ - 1]); if (q-mq) { - if (sch-handle != TC_H_ROOT) + if (sch-parent != TC_H_ROOT) return -EINVAL; if (netif_is_multiqueue(sch-dev)) { if (q-bands == 0) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Fix adjustment of number of multicast groups
From: Thomas Graf [EMAIL PROTECTED] Date: Tue, 24 Jul 2007 11:45:14 +0200 The current calculation of the maximum number of genetlink multicast groups seems odd, fix it. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[FIX] NET: Fix sch_api and sch_prio to properly set and detect the root qdisc
This is a patch from Patrick McHardy to fix the sch_api code, which I went ahead and tested and made a slight fix to. This also includes the fix to sch_prio based on Patrick's patch. The sch-parent handle should contain the parent qdisc ID. When the qdisc is the root qdisc (TC_H_ROOT), the parent handle should be the value TC_H_ROOT. This fixes sch_api to set this correctly on qdisc_create() for both ingress and egress qdiscs. Change this check in prio_tune() so that only the root qdisc can be multiqueue-enabled; use sch-parent instead of sch-handle. -- PJ Waskiewicz [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Correctly report errors while registering a multicast group
From: Thomas Graf [EMAIL PROTECTED] Date: Tue, 24 Jul 2007 11:44:27 +0200 Signed-off-by: Thomas Graf [EMAIL PROTECTED] Also applied, thanks Thomas. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2][RFC] Update network drivers to use devres
These patches update the network driver core and the e100 driver to use devres. Devres is a simple resource manager for device drivers, see Documentation/driver-model/devres.txt for more information. If the RFC goes well I will create patches to update all applicable network drivers. Applies against 2.6.22 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2][RFC] Update net core to use devres.
* netdev_pci_remove_one() can replace simple pci device remove functions * devm_alloc_netdev() is like alloc_netdev but allocates memory using devres. alloc_netdev() can be removed once all drivers use devres. Applies against 2.6.22 Signed-off-by: Brandon Philips [EMAIL PROTECTED] --- include/linux/etherdevice.h |2 + include/linux/netdevice.h |5 net/core/dev.c | 48 net/ethernet/eth.c |6 + 4 files changed, 57 insertions(+), 4 deletions(-) Index: linux-2.6/include/linux/netdevice.h === --- linux-2.6.orig/include/linux/netdevice.h +++ linux-2.6/include/linux/netdevice.h @@ -622,6 +622,7 @@ extern int dev_queue_xmit(struct sk_buf extern int register_netdevice(struct net_device *dev); extern voidunregister_netdevice(struct net_device *dev); extern voidfree_netdev(struct net_device *dev); +extern voidnetdev_pci_remove_one(struct pci_dev *pdev); extern voidsynchronize_net(void); extern int register_netdevice_notifier(struct notifier_block *nb); extern int unregister_netdevice_notifier(struct notifier_block *nb); @@ -992,6 +993,10 @@ static inline void netif_tx_disable(stru extern voidether_setup(struct net_device *dev); /* Support for loadable net-drivers */ + +extern struct net_device *devm_alloc_netdev(struct device *dev, + int sizeof_priv, const char *name, + void (*setup)(struct net_device *)); extern struct net_device *alloc_netdev(int sizeof_priv, const char *name, void (*setup)(struct net_device *)); extern int register_netdev(struct net_device *dev); Index: linux-2.6/net/core/dev.c === --- linux-2.6.orig/net/core/dev.c +++ linux-2.6/net/core/dev.c @@ -89,6 +89,7 @@ #include linux/interrupt.h #include linux/if_ether.h #include linux/netdevice.h +#include linux/pci.h #include linux/etherdevice.h #include linux/notifier.h #include linux/skbuff.h @@ -3343,7 +3344,8 @@ static struct net_device_stats *internal } /** - * alloc_netdev - allocate network device + * __alloc_netdev - allocate network device + * @dev: managed device responsible for mem; NULL if not managed * @sizeof_priv: size of private data to allocate space for * @name: device name format string * @setup: callback to initialize device @@ -3351,8 +3353,8 @@ static struct net_device_stats *internal * Allocates a struct net_device with private data area for driver use * and performs basic initialization. */ -struct net_device *alloc_netdev(int sizeof_priv, const char *name, - void (*setup)(struct net_device *)) +struct net_device *__alloc_netdev(struct device *gendev, int sizeof_priv, + const char *name, void (*setup)(struct net_device *)) { void *p; struct net_device *dev; @@ -3364,7 +3366,11 @@ struct net_device *alloc_netdev(int size alloc_size = (sizeof(*dev) + NETDEV_ALIGN_CONST) ~NETDEV_ALIGN_CONST; alloc_size += sizeof_priv + NETDEV_ALIGN_CONST; - p = kzalloc(alloc_size, GFP_KERNEL); + if (dev == NULL) + p = kzalloc(alloc_size, GFP_KERNEL); + else + p = devm_kzalloc(gendev, alloc_size, GFP_KERNEL); + if (!p) { printk(KERN_ERR alloc_netdev: Unable to allocate device.\n); return NULL; @@ -3382,8 +3388,23 @@ struct net_device *alloc_netdev(int size strcpy(dev-name, name); return dev; } + +struct net_device *devm_alloc_netdev(struct device *dev, int sizeof_priv, const + char *name, void (*setup)(struct net_device *)) +{ + return __alloc_netdev(dev, sizeof_priv, name, setup); +} +EXPORT_SYMBOL(devm_alloc_netdev); + +struct net_device *alloc_netdev(int sizeof_priv, const char *name, + void (*setup)(struct net_device *)) +{ + return __alloc_netdev(NULL, sizeof_priv, name, setup); +} EXPORT_SYMBOL(alloc_netdev); + + /** * free_netdev - free network device * @dev: device @@ -3411,6 +3432,25 @@ void free_netdev(struct net_device *dev) #endif } +/** + * free_netdev - free network device + * @dev: device + * + * This function does the last stage of destroying an allocated device + * interface. The reference to the device object is released. + * If this is the last reference then it will be freed. + */ +void netdev_pci_remove_one(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + if (netdev) { + unregister_netdev(netdev); + pci_set_drvdata(pdev, NULL); +
[PATCH 2/2][RFC] Update e100 driver to use devres.
devres manages device resources and is currently used by all libata low level drivers. It can greatly reduce the complexity of the error handling on probe and the device removal functions. For example the e100_free() function and all of the gotos in e100_probe have been removed. Also, e100_remove() has been deleted and replaced with a much simpler netdev_pci_remove_one() function that should be able to handle any PCI net device that doesn't require any teardown besides resource deallocation. Applies against 2.6.22. Signed-off-by: Brandon Philips [EMAIL PROTECTED] --- drivers/net/e100.c | 73 + 1 file changed, 19 insertions(+), 54 deletions(-) Index: linux-2.6/drivers/net/e100.c === --- linux-2.6.orig/drivers/net/e100.c +++ linux-2.6/drivers/net/e100.c @@ -2514,18 +2514,11 @@ static int e100_do_ioctl(struct net_devi static int e100_alloc(struct nic *nic) { - nic-mem = pci_alloc_consistent(nic-pdev, sizeof(struct mem), - nic-dma_addr); - return nic-mem ? 0 : -ENOMEM; -} + struct device *dev = nic-pdev-dev; -static void e100_free(struct nic *nic) -{ - if(nic-mem) { - pci_free_consistent(nic-pdev, sizeof(struct mem), - nic-mem, nic-dma_addr); - nic-mem = NULL; - } + nic-mem = dmam_alloc_coherent(dev, sizeof(struct mem), + nic-dma_addr, GFP_ATOMIC); + return nic-mem ? 0 : -ENOMEM; } static int e100_open(struct net_device *netdev) @@ -2552,7 +2545,7 @@ static int __devinit e100_probe(struct p struct nic *nic; int err; - if(!(netdev = alloc_etherdev(sizeof(struct nic { + if (!(netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct nic { if(((1 debug) - 1) NETIF_MSG_PROBE) printk(KERN_ERR PFX Etherdev alloc failed, abort.\n); return -ENOMEM; @@ -2582,26 +2575,26 @@ static int __devinit e100_probe(struct p nic-msg_enable = (1 debug) - 1; pci_set_drvdata(pdev, netdev); - if((err = pci_enable_device(pdev))) { + if ((err = pcim_enable_device(pdev))) { DPRINTK(PROBE, ERR, Cannot enable PCI device, aborting.\n); - goto err_out_free_dev; + return err; } if(!(pci_resource_flags(pdev, 0) IORESOURCE_MEM)) { DPRINTK(PROBE, ERR, Cannot find proper PCI device base address, aborting.\n); err = -ENODEV; - goto err_out_disable_pdev; + return err; } if((err = pci_request_regions(pdev, DRV_NAME))) { DPRINTK(PROBE, ERR, Cannot obtain PCI resources, aborting.\n); - goto err_out_disable_pdev; + return err; } if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) { DPRINTK(PROBE, ERR, No usable DMA configuration, aborting.\n); - goto err_out_free_res; + return err; } SET_MODULE_OWNER(netdev); @@ -2610,11 +2603,11 @@ static int __devinit e100_probe(struct p if (use_io) DPRINTK(PROBE, INFO, using i/o access mode\n); - nic-csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr)); + nic-csr = pcim_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr)); if(!nic-csr) { DPRINTK(PROBE, ERR, Cannot map device registers, aborting.\n); err = -ENOMEM; - goto err_out_free_res; + return err; } if(ent-driver_data) @@ -2647,11 +2640,11 @@ static int __devinit e100_probe(struct p if((err = e100_alloc(nic))) { DPRINTK(PROBE, ERR, Cannot alloc driver memory, aborting.\n); - goto err_out_iounmap; + return err; } if((err = e100_eeprom_load(nic))) - goto err_out_free; + return err; e100_phy_init(nic); @@ -2661,8 +2654,8 @@ static int __devinit e100_probe(struct p if (!eeprom_bad_csum_allow) { DPRINTK(PROBE, ERR, Invalid MAC address from EEPROM, aborting.\n); - err = -EAGAIN; - goto err_out_free; + + return -EAGAIN; } else { DPRINTK(PROBE, ERR, Invalid MAC address from EEPROM, you MUST configure one.\n); @@ -2682,7 +2675,7 @@ static int __devinit e100_probe(struct p strcpy(netdev-name, eth%d); if((err = register_netdev(netdev))) { DPRINTK(PROBE, ERR, Cannot register net device, aborting.\n); - goto err_out_free; + return err; } DPRINTK(PROBE, INFO, addr 0x%llx, irq %d, @@ -2692,36
Re: [PATCH 1/2][RFC] Update net core to use devres.
On Tue, Jul 24, 2007 at 03:39:52PM -0700, [EMAIL PROTECTED] wrote: * netdev_pci_remove_one() can replace simple pci device remove functions * devm_alloc_netdev() is like alloc_netdev but allocates memory using devres. alloc_netdev() can be removed once all drivers use devres. E... What the hell for? To make sure that we have struct device for everything, whether we need it or not? Have you actually read through drivers/net? I mean, _really_ read through it, looking for ugly cases... I've done just that several times and I'm sorry, but I would classify that project as hopeless. It's way, _way_ more diverse than SATA... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html