Re: 2.6.19: ACPI reports AC not present after resume from STD
On Tuesday 06 March 2007, Rafael J. Wysocki wrote: > [changed Cc list] > > On Sunday, 25 February 2007 18:14, Andrey Borzenkov wrote: > > On Воскресенье 25 февраля 2007, Rafael J. Wysocki wrote: > > > On Sunday, 25 February 2007 11:37, Andrey Borzenkov wrote: > > > > On Воскресенье 25 февраля 2007, Rafael J. Wysocki wrote: > > > > > On Sunday, 25 February 2007 00:26, Andrey Borzenkov wrote: > > > > > > On Суббота 24 февраля 2007, Rafael J. Wysocki wrote: > > > > > > > Hi, > > > > > > > > > > > > > > On Saturday, 24 February 2007 10:55, Andrey Borzenkov wrote: > > > > > > > > On Вторник 13 февраля 2007, Andrey Borzenkov wrote: > > > > > > > > > On Четверг 07 декабря 2006, Lebedev, Vladimir P wrote: > > > > > > > > > > Please register new bug, attach acpidump and dmesg. > > > > > > > > > > > > > > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=7995 > > > > > > > > > > > > > > > > > > regards > > > > > > > > > > > > > > > > Well, this starts looking like ACPI is not at fault. > > > > > > > > > > > > > > > > When reporting AC state ACPI just reads contents of system > > > > > > > > memory (I presume it gets updated by BIOS/ACPI when AC state > > > > > > > > changes). It looks like this memory area is restored during > > > > > > > > resume from STD. I updated mentioned bug report with more > > > > > > > > detailed description. Now if someone could suggest a way to > > > > > > > > catch if specific physical address gets saved/restored this > > > > > > > > would finally explain it. > > > > > > > > > > > > > > First, if you want the reserved memory areas to be left alone > > > > > > > by swsusp, you need to mark them as 'nosave'. On x86_64 this > > > > > > > is done by the function e820_mark_nosave_range() in > > > > > > > arch/x86_64/kernel/e820.c that can be ported to i386 with no > > > > > > > problems. However, we haven't found that very useful, so far, > > > > > > > since no one has ever reported any problems with the current > > > > > > > approach, which is to save and restore them. > > > > > > > > > > > > Well, the following proof of concept patch fixes this issue for > > > > > > me. Please notice that original version of > > > > > > e820_mark_nosave_range() could fail to exclude some areas due to > > > > > > alignment issues (exactly what happened to me on first try) so it > > > > > > still can explain your problem too. > > > > > > > > > > Great job, thanks for the patch! It looks good, so I'm going to > > > > > forward it for merging. > > > > > > > > Please no; I'm currently testing slightly more polished version; I > > > > will send it later. > > > > > > OK > > > > > > > Could anybody explain (or give pointer to) what happens which region > > > > that is not page-aligned? In particular, the very first one: > > > > > > > > BIOS-e820: - 0009fc00 (usable) > > > > BIOS-e820: 0009fc00 - 000a (reserved) > > > > > > > > Will the kernel allocate partial page (how?) or will the kernel > > > > ignore last (first) incomplete page? In the former case how those > > > > incomplete pages can be detected? > > > > > > Well, on x86_64, if I understand e820_register_active_regions() > > > correctly, the partial pages won't be registered. > > > > It appears that for low memory kernel will ignore incomplete pages for > > sure. I hope it does the same for high memory - but for now I just throw > > this in and pray :) This also significantly simplifies patch. > > Well, can you please check if the appended modification of your patch still > works? > It works for me with caveat /home/bor/src/linux-git/arch/i386/kernel/e820.c: In function ‘e820_mark_nosave_range’: /home/bor/src/linux-git/arch/i386/kernel/e820.c:328: warning: format ‘%016Lx’ expects type ‘long long unsigned int’, but argument 2 has type ‘long unsigned int’ /home/bor/src/linux-git/arch/i386/kernel/e820.c:328: warning: format ‘%016Lx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ regards -andrey > Thanks, > Rafael > > > --- > arch/i386/kernel/e820.c | 47 > +++ arch/i386/kernel/setup.c | > 1 + > include/asm-i386/e820.h |1 + > 3 files changed, 49 insertions(+) > > Index: linux-2.6.21-rc2/arch/i386/kernel/e820.c > === > --- linux-2.6.21-rc2.orig/arch/i386/kernel/e820.c > +++ linux-2.6.21-rc2/arch/i386/kernel/e820.c > @@ -313,6 +313,53 @@ static int __init request_standard_resou > > subsys_initcall(request_standard_resources); > > +/* > + * Mark pages corresponding to given pfn range as 'nosave'. > + */ > +static void __init > +e820_mark_nosave_range(unsigned long start_pfn, unsigned long end_pfn) > +{ > + unsigned long pfn; > + > + if (start_pfn >= end_pfn) > + return; > + > + printk("Nosave address range: %016Lx - %016Lx\n", > + PFN_PHYS(start_pfn), PFN_PHYS(end_pfn)); > + for (pfn =
Re: [PATCH 0/20] x86_64 Relocatable bzImage support (V4)
On Thu, Mar 08, 2007 at 10:15:02AM +1100, Nigel Cunningham wrote: > Hi. > > On Thu, 2007-03-08 at 07:49 +1100, Nigel Cunningham wrote: > > Hi. > > > > On Wed, 2007-03-07 at 07:07 -0800, Arjan van de Ven wrote: > > > On Wed, 2007-03-07 at 12:27 +0530, Vivek Goyal wrote: > > > > Hi, > > > > > > > > Here is another attempt on x86_64 relocatable bzImage patches(V4). This > > > > patchset makes a bzImage relocatable and same kernel binary can be > > > > loaded > > > > and run from different physical addresses. > > > > > > > > > have these patches been extensively tested with various suspend > > > scenarios? (S1,S3,S4 in acpi speak or s2ram and s2disk in Linux speak) > > > > We did work on this for RHEL5, getting relocatable kernel support > > working fine with S4. While doing it and since, I've been running > > Suspend2 with the same patch. > > > > Since that work, Vivek has done more modifications, but I can confirm > > that the basic design is reliable with S4. Haven't tried S3, but can do. > > Will report back shortly. > > S3 works okay here with a relocatable x86_64 kernel (2.6.20). > Ok. Got hold of a system which supports Standby mode (S1) and it works fine with 2.6.21-rc2 + relocatable patchset. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wanted: simple, safe x86 stack overflow detection
Bill Irwin wrote: On Tue, 2007-03-06 at 22:44 -0800, Bill Irwin wrote: What do you see as the obstacle to eliminating nested IRQ's? On Wed, Mar 07, 2007 at 04:34:52AM -0800, Arjan van de Ven wrote: political will, or maybe just the lack of convincing people so far Political issues are significantly more difficult to resolve than technical ones. On Tue, 2007-03-06 at 22:44 -0800, Bill Irwin wrote: It doesn't seem so far out to test for being on the interrupt stack and defer the call to do_IRQ() until after the currently-running instance of do_IRQ() has returned, or to move to per-irq stacks modulo special arrangements for the per-cpu IRQ's. Or did you have other methods in mind? On Wed, Mar 07, 2007 at 04:34:52AM -0800, Arjan van de Ven wrote: it's simpler... irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action) { irqreturn_t ret, retval = IRQ_NONE; unsigned int status = 0; handle_dynamic_tick(action); if (!(action->flags & IRQF_DISABLED)) local_irq_enable_in_hardirq(); just removing the if() and the explicit IRQ enabling already makes irqs no longer nest... I can see why that would raise eyebrows. I can see getting bashed mercilessly with interrupt latency concerns as a result here. Can you suggest any defenses? I don't understand why interrupt latency suffers. Sure, the interrupt that's being masked is delayed, but on the other hand the interrupt that's doing the masking is not. We're moving the latency from the first interrupt to the second, probably with a slight gain in overall throughput. It *does* matter if the interrupts have meaningful priorities. Is that the case here? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][SCTP] Re: lockdep: inconsistent lock state ipv6_add_addr/sctp_v6_copy_addrlist (2.6.21-rc1)
On 25-02-2007 10:08, Simon Arlott wrote: > This happens on every boot if more information is needed: > > [ 37.393715] = > [ 37.393830] [ INFO: inconsistent lock state ] > [ 37.393881] 2.6.21-rc1-git #146 > [ 37.393929] - > [ 37.393979] inconsistent {softirq-on-R} -> {in-softirq-W} usage. > [ 37.394040] hotplug/1072 [HC0[0]:SC1[2]:HE1:SE0] takes: > [ 37.394092] (>lock){-+-?}, at: [] > ipv6_add_addr+0x164/0x1e0 > [ 37.394308] {softirq-on-R} state was registered at: > [ 37.394359] [] __lock_acquire+0x622/0xbb0 > [ 37.394515] [] lock_acquire+0x62/0x80 > [ 37.394678] [] _read_lock+0x35/0x50 > [ 37.394834] [] sctp_v6_copy_addrlist+0x30/0xc0 ... [SCTP] ipv6: inconsistent lock state ipv6_add_addr/sctp_v6_copy_addrlist lockdep found that dev->lock taken from softirq in ipv6_add_addr is also taken in sctp_v6_copy_addrlist with softirqs enabled, so lockup is possible. Noticed-by: Simon Arlott <[EMAIL PROTECTED]> Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]> --- diff -Nurp linux-2.6.21-rc2-mm2-/net/sctp/ipv6.c linux-2.6.21-rc2-mm2/net/sctp/ipv6.c --- linux-2.6.21-rc2-mm2-/net/sctp/ipv6.c 2007-02-21 19:46:49.0 +0100 +++ linux-2.6.21-rc2-mm2/net/sctp/ipv6.c2007-03-07 21:57:37.0 +0100 @@ -360,7 +360,7 @@ static void sctp_v6_copy_addrlist(struct return; } - read_lock(_dev->lock); + read_lock_bh(_dev->lock); for (ifp = in6_dev->addr_list; ifp; ifp = ifp->if_next) { /* Add the address to the local list. */ addr = t_new(struct sctp_sockaddr_entry, GFP_ATOMIC); @@ -374,7 +374,7 @@ static void sctp_v6_copy_addrlist(struct } } - read_unlock(_dev->lock); + read_unlock_bh(_dev->lock); rcu_read_unlock(); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fix BUG_ON check at move_freepages() (Re: 2.6.21-rc3-mm2)
Hello. The BUG_ON() check at move_freepages() is wrong. Its end_page is start_page + MAX_ORDER_NR_PAGES. So, it can be next zone. BUG_ON() should check "end_page - 1". This is fix of 2.6.21-rc3-mm2 for it. Signed-off-by: Yasunori Goto <[EMAIL PROTECTED]> --- mm/page_alloc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: current_test/mm/page_alloc.c === --- current_test.orig/mm/page_alloc.c 2007-03-08 15:44:10.0 +0900 +++ current_test/mm/page_alloc.c2007-03-08 16:17:29.0 +0900 @@ -707,7 +707,7 @@ int move_freepages(struct zone *zone, unsigned long order; int blocks_moved = 0; - BUG_ON(page_zone(start_page) != page_zone(end_page)); + BUG_ON(page_zone(start_page) != page_zone(end_page - 1)); for (page = start_page; page < end_page;) { if (!PageBuddy(page)) { -- Yasunori Goto - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] Blackfin: blackfin i2c driver
On Tue, 6 Mar 2007 23:45:29 -0800, Andrew Morton wrote: > On Wed, 07 Mar 2007 15:39:27 +0800 "Wu, Bryan" <[EMAIL PROTECTED]> wrote: > > > Thanks a lot, could you please give me a script just to kill this > > whitespace? So I can do it before sending you patches. > > > Is pretty simple: > > #!/bin/sh > # > # Strip any trailing whitespace which a unified diff adds. > # > > strip1() > { > TMP=$(mktemp /tmp/XX) > cp $1 $TMP > sed -e '/^+/s/[ ]*$//' < $TMP > $1 > rm $TMP > } > > for i in $* > do > strip1 $i > done > > > that'll be in > http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20/patch-scripts-0.20.tar.gz > too Alternatively, you can use quilt [1] to manage your patches and enable the --strip-trailing-whitespace option by default. [1] http://savannah.nongnu.org/projects/quilt/ -- Jean Delvare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc1 and 2.6.21-rc2 kwin dies silently
Andrew Morton wrote: (cc restored. Please always do reply-to-all) On Wed, 28 Feb 2007 18:05:13 +0200 [EMAIL PROTECTED] wrote: On Wednesday 28 February 2007 17:19, Sid Boyce wrote: openSUSE 10.3 Alpha and KDE-3.5.6, xorg-x11-7.2. KDE is setup not to require a password to unlock, but it asks for password. When the screen unlocks, kwin is gone with no errors logged in /var/log/kdm or /var/log/messages. No problems with 2.6.20. Same problem on openSUSE 10.2 x86_64, KDE-3.5.5 and 2.6.21-rc2. Regards Sid. This is the linux kernel mailing list. Perhaps you should post your problem to the opensuse mailing list. 2.6.20 worked. 2.6.20-rc2 did not. Working theory: the kernel broke. Sid, the chances that anyone can work out what caused this are pretty low. It would be great if you could perform a git bisection search sometime in the next few weeks, work out which commit caused this. Thanks. I shall go back to 2.6.20-git3 and work forward. Up to 2.6.20-git2 was OK. Regards Sid. -- Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist, Cricket Coach Microsoft Windows Free Zone - Linux used for all Computing Tasks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] epoll use a single inode ...
Kyle Moffett a écrit : Prefetching is also fairly critical on a Power4 or G5 PowerPC system as they have a long memory latency; an L2-cache miss can cost 200+ cycles. On such systems the "dcbt" prefetch instruction brings in a single 128-byte cacheline and has no serializing effects whatsoever, making it ideal for use in a linked-list-traversal inner loop. OK, 200 cycles... But what is the cost of the conditional branch you added in prefetch(x) ? if (!x) return; (correctly predicted or not, but do powerPC have a BTB ?) About the NULL 'potential problem', maybe we could use a dummy nil (but mapped) object, and use its address in lists, ie compare for instead of NULL. This would avoid : - The conditional test in some prefetch() implementations - The potential TLB problem with the NULL value. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree
On Wed, 2007-03-07 at 17:01 -0800, Daniel Arai wrote: > Thomas Gleixner wrote: > > > You managed to avoid the usage of other code (i.e. PIT / HPET) already, > > so why is it sooo desireable to emulate apics instead of substituting it > > by a small and sane replacement ? Just because you happen to have an > > LAPIC emulator ? That's no reason to wire yourself into the kernel code > > and make it harder to change and maintain. > > There are several reasons why it's desirable to emulate the APIC. As you > mentioned, we already have APIC emulation, and APIC emulation isn't a huge > bottleneck on most workloads. Our code works, the Linux code works, and > replacing both pieces of code with something "small and sane" isn't going to > improve performance very much, so why bother? Any hypervisor implementation > is > going to be a tradeoff between what's easy to implement in the hypervisor, > what's easy to implement in the guest operating system, and what's > performance > critical. It is not about performance. It is about maintainability. > Secondly, not all (para-)virtualized operating systems will want to use > abstracted devices. Some virtual operating systems will be given direct > access > to hardware devices, and will need to run the actual driver for that device > and > not some abstracted device driver. So I don't buy your argument that every > piece of the kernel that interacts with a paravirtualized driver should have > a > "small and sane replacement." Err. We talk about paravirtualized Linux and not about what you have to emulate to get Windows running. I don't care at all. Do you really expect that we have to accept your design decisions, just because they allow you to make your life easy ? This is exactly what you are using paravirt ops for: a backdoor to throw your hackery at the kernel and leave us with the mess of hardwired crap. > But more importantly, we want a kernel that can run both on native hardware > and > in a paravirtualized environment. Linux doesn't really provide abstractions > for > replacing the appropriate code. We tried to hook into the source code at a > level that seemed possible. Again. You just refuse to change your implementation and you want to keep it by arguing how hard it is because there are no abstractions. I went through the business of creating abstractions into hardwired hairballs twice. I know exactly what I'm talking about. It _IS_ hard work, but at the end it makes the code better and more maintainable. You do nothing for that, but expect that we live with your addons to the hairball. > There's no good way to override __send_IPI_shortcut. I suppose we could add > paravirt ops for __send_IPI_shortcut and every other op that touches the > APIC. > But there are dozens of functions in apic.c that would need to be included in > paravirt ops. And for our implementation, we really just want to override > apic_read and apic_write, since we can make these faster when done through > hypercalls than through memory accesses. If we were to make these paravirt > ops, > their implementations would be the same, except with a different apic_read > and > apic_write. This is a whole lot of useless code duplication. No it is not. #include is an abstraction and __send_IPI ... is the i386 low level implementation. You insist to hook yourself into the low level code instead of hooking into the high level code, because it is _YOUR_ implementation and we have to accept it as is. This is the completely wrong way. We get the same crap and discussion for every other architecture we are going to support with paravirt ops. And probably for every other hypervisor implementation, which has a different way of doing things. > Most of the interrupt system is not written in such a way that multiple APICs > implementations can be selected from at boot time. This is an absolute > requirement so that the same kernel can boot on native and in a > paravirtualized > environment. While this could be implemented, it seems like a waste of time, > since we can just emulate something similar to a real interrupt system and > not > change things very much. Waste of your precious time. I'm working on low level code and abstractions and from now on I have also to take care not to break _YOUR_ implementation. You are going to waste _MY_ time and I'm going to fight that forever. Your prayer wheel argument of missing abstractions and easiness of emulating things is annoying. If you think it is better to emulate APIC, please emulate it without paravirt ops. If you want the speed improvement, work with us to create the interfaces and abstractions which are necessary to have a sane, maintainable and useful for all hypervisors implementation. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.21-rc2-mm2
On Wed, 2007-03-07 at 11:52 -0800, Andrew Morton wrote: > On Wed, 7 Mar 2007 16:46:20 -0300 "Luiz Fernando N. Capitulino" <[EMAIL > PROTECTED]> wrote: > > > Em Tue, 6 Mar 2007 00:44:08 -0800 > > Andrew Morton <[EMAIL PROTECTED]> escreveu: > > > > | > > | Temporarily at > > | > > | http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/ > > | > > | Will appear later at > > | > > | > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/ > > > > Getting this while rebooting: > > > > [ 166.588469] BUG: atomic counter underflow at: > > [ 166.588527] [] show_trace_log_lvl+0x1a/0x30 > > [ 166.588632] [] show_trace+0x12/0x20 > > [ 166.588730] [] dump_stack+0x16/0x20 > > [ 166.588828] [] kref_put+0xa1/0x100 > > [ 166.588927] [] kobject_put+0x14/0x20 > > [ 166.589027] [] kobject_unregister+0x22/0x30 > > [ 166.589127] [] bus_remove_driver+0x79/0x90 > > [ 166.589227] [] driver_unregister+0xb/0x20 > > [ 166.589327] [] pci_unregister_driver+0x13/0x70 > > [ 166.589428] [] alsa_card_via82xx_exit+0xd/0xf [snd_via82xx] > > [ 166.589534] [] sys_delete_module+0x140/0x1b0 > > [ 166.589635] [] sysenter_past_esp+0x5f/0x99 > > [ 166.589734] === > > > > Me too. Greg has reverted the offenging commit, so now rmmod of the IPMI > driver locks the machine again. Hi, The hang (which /me screwed up fixing) isn't upon rmmod, it's when IPMI is built-in and ipmi_si finds nobody home. Driver tries to back out, and waits forever for completion. (about 0.7 seconds into boot) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Locking function (interrupt handler) in the L1/L2 cache
Hi, I have MPC8548 Linux based firewall which will mostly do packet processing for 80% time. So obviously most of the time it will RX and TX packets through gianfar Ethernet driver. I want to lock my interrupt handler of this driver in the L1 cache. 1. Are there any kernel APIs to lock any function and data in the L1/L2 cache? 2. How can I use "icbtls" - Instruction Cache Block Touch and Lock Set" for locking my interrupt handler? 3. Is "icbtls" is the correct instruction at which I am looking at? 4. How do I find end address of the interrupt handler or any other function and how do we pass it to cache locking instructions? (Because it can happen that interrupt handler size is more than a cache line, not aligned etc)? 5. Can we enhance request_irq() function to take an additional parameter to lock the interrupt handler in the cache? I understand that if my interrupt handler is going to be called most of the time then it is very likely to happen that OS will not flush the same, but there is no guarantee for it. Regards, Parav Pandit DISCLAIMER: This message (including attachment if any) is confidential and may be privileged. Before opening attachments please check them for viruses and defects. MindTree Consulting Limited (MindTree) will not be responsible for any viruses or defects or any forwarded attachments emanating either from within MindTree or outside. If you have received this message by mistake please notify the sender by return e-mail and delete this message from your system. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. Please note that e-mails are susceptible to change and MindTree shall not be liable for any improper, untimely or incomplete transmission. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA resume slowness, e1000 MSI warning
Andrew Morton <[EMAIL PROTECTED]> writes: > > That's: > > pci_restore_pcix_state(dev); > pci_restore_msi_state(dev); > WARN_ON(!hlist_empty(>saved_cap_space)); > > return 0; Hmm. Either I am confused of I just found an unanticipated leak. pci_restore_msi_state should be out of the picture as we don't yet have ppc msi support and I don't think the g5 generation hardware supported it either. The only case I can see which might trigger this is if we saved pci-X state and then didn't restore it because we could not find the capability on restore. Any chance you could walk that list and find the cap_nr of the remaining element? Something like: { struct pci_cap_saved_state *tmp; struct hlist_node *pos; hlist_for_each_entry(tmp, pos, _dev->saved_cap_space, next) printk(KERN_INFO "saved_cap: 0x%02x\n", tmp->cap_nr); } Until I get the best scenario I can come up with is a tg3 hardware bug that doesn't renable the pci-X capability after a restore of power state. Getting that cap_nr will at least allow me to be certain if I am dealing with msi, pci-X or pci-e. Unanticipated bugs aren't supposed to be this easy to find! Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 2.6.16.43
New hwmon drivers since 2.6.16.42 for the following hardware: - National Semiconductor pc87427 - SMSC lpc47m192 and lpc47m997 - Winbond w83791d Location: ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ git tree: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git Changes since 2.6.16.42: Adrian Bunk (3): Linux 2.6.16.43-rc1 fs/bad_inode.c 64bit fix Linux 2.6.16.43 Alexey Dobriyan (1): [IPV4/IPV6] multicast: Check add_grhead() return value Charles Spirakis (2): HWMON: w83791d: New hardware monitoring driver for the Winbond W83791D w83791d: Documentation update Francois Romieu (1): sis190: failure to set the MAC address from EEPROM Hartmut Rick (1): smsc47m192: New hwmon driver for SMSC LPC47M192/997 Ilpo Järvinen (1): [TCP]: Prevent pseudo garbage in SYN's advertized window Jean Delvare (3): hwmon: New PC87427 hardware monitoring driver hwmon: Add support for the Winbond W83687THF i2c-isa: Restore driver owner Jim Cromie (2): hwmon: Allow sensor attributes arrays hwmon: Refactor SENSOR_DEVICE_ATTR_2 Jordan Crouse (1): hwmon lm83: Add LM82 support Kirill Korotaev (1): fix ext3 block bitmap leakage Marcel Siegert (1): V4L/DVB: Dvbdev: fix illegal re-usage of fileoperations struct Martin Devera (1): I2C: i2c-piix4: Add Broadcom HT-1000 support Patrick McHardy (1): [DECNET]: Fix sfuzz hanging on 2.6.18 Rudolf Marek (1): i2c-piix4: Add ATI IXP200/300/400 support Stephen Hemminger (6): sky2: fix ram buffer allocation settings sky2: allow multicast pause frames sky2: fix for use on big endian sky2: more stats sky2: add more pci ids sky2: email and version change. Documentation/hwmon/lm83| 16 Documentation/hwmon/pc87427 | 38 Documentation/hwmon/smsc47m192 | 102 ++ Documentation/hwmon/sysfs-interface |6 Documentation/hwmon/w83627hf|4 Documentation/hwmon/w83791d | 120 ++ Documentation/i2c/busses/i2c-piix4 |4 Makefile|2 drivers/hwmon/Kconfig | 57 + drivers/hwmon/Makefile |3 drivers/hwmon/it87.c|1 drivers/hwmon/lm78.c|1 drivers/hwmon/lm83.c| 50 - drivers/hwmon/pc87360.c |1 drivers/hwmon/pc87427.c | 627 + drivers/hwmon/sis5595.c |1 drivers/hwmon/smsc47b397.c |1 drivers/hwmon/smsc47m1.c|1 drivers/hwmon/smsc47m192.c | 648 ++ drivers/hwmon/via686a.c |1 drivers/hwmon/vt8231.c |1 drivers/hwmon/w83627ehf.c |1 drivers/hwmon/w83627hf.c| 73 + drivers/hwmon/w83781d.c |1 drivers/hwmon/w83791d.c | 1256 drivers/i2c/busses/Kconfig |9 drivers/i2c/busses/i2c-piix4.c | 10 drivers/media/dvb/dvb-core/dvbdev.c | 13 drivers/net/sis190.c|2 drivers/net/sky2.c | 146 ++- fs/bad_inode.c |8 fs/ext3/inode.c |1 include/linux/hwmon-sysfs.h | 24 include/linux/pci_ids.h |4 net/decnet/af_decnet.c |4 net/ipv4/igmp.c |2 net/ipv4/tcp_output.c |4 net/ipv6/mcast.c|2 38 files changed, 3130 insertions(+), 115 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree
On Wed, 2007-03-07 at 17:23 -0800, Jeremy Fitzhardinge wrote: > Daniel Arai wrote: > > But more importantly, we want a kernel that can run both on native hardware > > and > > in a paravirtualized environment. Linux doesn't really provide > > abstractions for > > replacing the appropriate code. We tried to hook into the source code at > > a > > level that seemed possible. > > > > Xen doesn't support any kind of apic emulation, so we'll need to hook > anything which relies on an apic. The ipi code you quote below will > probably be one of those. > > My opinion is that pv_ops shouldn't have raw apic operations, but > instead have appropriate high-level interfaces to achieve the same > ends. Zach's counter-argument was basically your's: that the VMI code > will use a lot of the native code except for the actual apic operations. > > I can live with VMI emulating apics if it wants, so long as it does it > in private and doesn't make a big scene about it. We'll need the > high-level interfaces regardless. I can't because it reaches out into non private parts of the low level implementation and is not helping to distangle things and making the overall code better. No it forces its own view of the world on us without giving us anything back. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Matt Helsley <[EMAIL PROTECTED]> writes: > On Thu, 2007-03-08 at 16:32 +-1300, Sam Vilain wrote: > > +ADw-snip+AD4 > > +AD4 Kirill, 06032418:36+-03: > +AD4 +AD4 I propose to use +ACI-namespace+ACI naming. > +AD4 +AD4 1. This is already used in fs. > +AD4 +AD4 2. This is what IMHO suites at least OpenVZ/Eric > +AD4 +AD4 3. it has good acronym +ACI-ns+ACI. > +AD4 > +AD4 Right. So, now I'll also throw into the mix: > +AD4 > +AD4 - resource groups (I get a strange feeling of d+AOk-j+AOA v+APo there) > > +ADw-offtopic+AD4 > Re: d+AOk-j+AOA v+APo: yes+ACE > > It's like that Star Trek episode ... except we can't agree on the name > of the impossible particle we will invent which solves all our problems. > +ADw-/offtopic+AD4 > > At the risk of prolonging the agony I hate to ask: are all of these > groupings really concerned with +ACI-resources+ACI? > > +AD4 - supply chains (think supply and demand) > +AD4 - accounting classes > > CKRM's use of the term +ACI-class+ACI drew negative comments from Paul Jackson > and Andrew Morton about this time last year. That led to my suggestion > of +ACI-Resource Groups+ACI. Unless they've changed their minds... > > +AD4 Do any of those sound remotely close? If not, your turn :) > > I'll butt in here: task groups? task sets? confuselets? +ADs) Generically we can use subsystem now for the individual pieces without confusing anyone. I really don't much care as long as we don't start redefining container as something else. I think the IBM guys took it from solaris originally which seems to define a zone as a set of isolated processes (for us all separate namespaces). And a container as a set of as a zone that uses resource control. Not exactly how we have been using the term but close enough not to confuse someone. As long as we don't go calling the individual subsystems or the process groups they need to function a container I really don't care. I just know that if we use container for just the subsystem level it makes effective communication impossible, and code reviews essentially impossible. As the description says one thing the reviewer reads it as another and then the patch does not match the description. Leading to NAKs. Resource groups at least for subset of subsystems that aren't namespaces sounds reasonable. Heck resource group, resource controller, resource subsystem, resource just about anything seems sane to me. The important part is that we find a vocabulary without doubly defined words so we can communicate and a small common set we can agree on so people can work on and implement the individual resource controllers/groups, and get the individual pieces merged as they are reading. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] chaostables
Hello netfilter-devel, I would like to submit chaostables (v0.5_svn23) for inclusion. Primary use is to detect, spoof and slowdown various sorts of port scans. Implementation details can be found at http://jengelh.hopto.org/p/chaostables/ If you have any comments or suggestions, do not hestitate to let me know. Signed-off-by: Jan Engelhardt <[EMAIL PROTECTED]> --- include/linux/netfilter/x_tables.h|2 include/linux/netfilter/xt_CHAOS.h| 14 + include/linux/netfilter/xt_portscan.h |8 + net/netfilter/Kconfig | 12 + net/netfilter/Makefile|3 net/netfilter/x_tables.c | 12 + net/netfilter/xt_CHAOS.c | 184 +++ net/netfilter/xt_DELUDE.c | 259 net/netfilter/xt_portscan.c | 271 ++ 9 files changed, 765 insertions(+) Index: linux-2.6.21-rc3/include/linux/netfilter/x_tables.h === --- linux-2.6.21-rc3.orig/include/linux/netfilter/x_tables.h +++ linux-2.6.21-rc3/include/linux/netfilter/x_tables.h @@ -292,6 +292,8 @@ extern struct xt_table_info *xt_replace_ int *error); extern struct xt_match *xt_find_match(int af, const char *name, u8 revision); +extern struct xt_match *xt_request_find_match(int af, const char *name, + u8 revision); extern struct xt_target *xt_find_target(int af, const char *name, u8 revision); extern struct xt_target *xt_request_find_target(int af, const char *name, u8 revision); Index: linux-2.6.21-rc3/include/linux/netfilter/xt_CHAOS.h === --- /dev/null +++ linux-2.6.21-rc3/include/linux/netfilter/xt_CHAOS.h @@ -0,0 +1,14 @@ +#ifndef _LINUX_XT_CHAOS_H +#define _LINUX_XT_CHAOS_H 1 + +enum xt_chaos_variant { + XTCHAOS_NORMAL, + XTCHAOS_TARPIT, + XTCHAOS_DELUDE, +}; + +struct xt_chaos_info { + enum xt_chaos_variant variant; +}; + +#endif /* _LINUX_XT_CHAOS_H */ Index: linux-2.6.21-rc3/include/linux/netfilter/xt_portscan.h === --- /dev/null +++ linux-2.6.21-rc3/include/linux/netfilter/xt_portscan.h @@ -0,0 +1,8 @@ +#ifndef _LINUX_XT_PORTSCAN_H +#define _LINUX_XT_PORTSCAN_H 1 + +struct xt_portscan_info { + unsigned int match_stealth, match_syn, match_cn, match_gr; +}; + +#endif /* _LINUX_XT_PORTSCAN_H */ Index: linux-2.6.21-rc3/net/netfilter/Kconfig === --- linux-2.6.21-rc3.orig/net/netfilter/Kconfig +++ linux-2.6.21-rc3/net/netfilter/Kconfig @@ -286,6 +286,14 @@ config NETFILTER_XTABLES # alphabetically ordered list of targets +config NETFILTER_XT_TARGET_CHAOS + tristate '"CHAOS" target support' + depends on NETFILTER_XTABLES + +config NETFILTER_XT_TARGET_DELUDE + tristate '"DELUDE" target support' + depends on NETFILTER_XTABLES + config NETFILTER_XT_TARGET_CLASSIFY tristate '"CLASSIFY" target support' depends on NETFILTER_XTABLES @@ -562,6 +570,10 @@ config NETFILTER_XT_MATCH_POLICY To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_MATCH_PORTSCAN + tristate '"portscan" match support' + depends on NETFILTER_XTABLES && NF_CONNTRACK + config NETFILTER_XT_MATCH_MULTIPORT tristate "Multiple port match support" depends on NETFILTER_XTABLES Index: linux-2.6.21-rc3/net/netfilter/Makefile === --- linux-2.6.21-rc3.orig/net/netfilter/Makefile +++ linux-2.6.21-rc3/net/netfilter/Makefile @@ -37,8 +37,10 @@ obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_co obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o # targets +obj-$(CONFIG_NETFILTER_XT_TARGET_CHAOS) += xt_CHAOS.o obj-$(CONFIG_NETFILTER_XT_TARGET_CLASSIFY) += xt_CLASSIFY.o obj-$(CONFIG_NETFILTER_XT_TARGET_CONNMARK) += xt_CONNMARK.o +obj-$(CONFIG_NETFILTER_XT_TARGET_DELUDE) += xt_DELUDE.o obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o obj-$(CONFIG_NETFILTER_XT_TARGET_MARK) += xt_MARK.o obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o @@ -63,6 +65,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += obj-$(CONFIG_NETFILTER_XT_MATCH_MARK) += xt_mark.o obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o +obj-$(CONFIG_NETFILTER_XT_MATCH_PORTSCAN) += xt_portscan.o obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o Index: linux-2.6.21-rc3/net/netfilter/x_tables.c === ---
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Sam Vilain <[EMAIL PROTECTED]> writes: > And do we bother changing IPC namespaces or let that one slide? ipc namespaces works (if you worry about tiny details like we put the resource limits for the sysv ipc objects inside the namespace). Probably the most instructive example of this is that you can you map a sysv ipc shared memory segment with shmat and then switch to another sysvipc namespace you still have access by reads and writes to that shared memory segment but you cannot manipulate it because it doesn't have a name. Either that or look at the output of ipcs, before and after an unshare. SYSVIPC really doesn't have it's own (very weird) set of global names and that is essentially all the ipc namespace deals with. I think you have the sysvipc namespace confused with something else though (like signal sending). Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
MMC: Fix typo in mmc highspeed
MMC: Fix typo in mmc highspeed Signed-off-by: Kyungmin Park <[EMAIL PROTECTED]> -- diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c index 4a73e8b..3b8f7af 100644 --- a/drivers/mmc/mmc.c +++ b/drivers/mmc/mmc.c @@ -1134,7 +1134,7 @@ static void mmc_process_ext_csds(struct mmc_host *host) mmc_card_set_highspeed(card); - host->ios.timing = MMC_TIMING_SD_HS; + host->ios.timing = MMC_TIMING_MMC_HS; mmc_set_ios(host); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/20] x86_64: Assembly safe page.h and pgtable.h
Vivek Goyal <[EMAIL PROTECTED]> writes: > Hi Sam, > > Thanks for the review. This makes sense to me. Move const.h into > asm-generic and let everybody use it. > > This is more of a small cleanup issue and involves changing few header files > in asm-sparc64 and make sure nothing is broken on sparc64. This patchset > is already becoming big and complex. Is it ok if we let the patch > remain unmodified for now and once this gets in and settles down, I can > post another patch to do above modification? Actually unless there is a reason not to, we can probably move this into include/linux instead of include/asm-generic. I don't see anything in that header file that is architecture specific in any way. Except that it happens to only be used in architecture specific code. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
IP Defragmentation
Hi list, I am using kernel 2.6.20.1. I have written a module,which will register a function at local_in hook, i have found a strange behavior with the packets getting in my callback function i.e [let say i am sending 1500 bytes to this machine from the network] ping -s 1500 1>in case of fragmention i am getting only one packet at the hook,While analyzing the ip header it says this is the assembled packet(skb->len=1528,offset=0,MF=0). While dumping the data(for 0 to 1528 print skb->data[i]) it shows that only 1472 bytes are valid data and rest 28 bytes are something garbage. I verified this with ethereal. 2>I have dumped these packets in ip_local_deliver function after ip_defrag and before NF_HOOK,But the result is same. Can Anybody let me know why i am not getting the complete data ? Regards, kanhu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
On Thu, 2007-03-08 at 16:32 +1300, Sam Vilain wrote: > Kirill, 06032418:36+03: > > I propose to use "namespace" naming. > > 1. This is already used in fs. > > 2. This is what IMHO suites at least OpenVZ/Eric > > 3. it has good acronym "ns". > > Right. So, now I'll also throw into the mix: > > - resource groups (I get a strange feeling of déjà vú there) Re: déjà vú: yes! It's like that Star Trek episode ... except we can't agree on the name of the impossible particle we will invent which solves all our problems. At the risk of prolonging the agony I hate to ask: are all of these groupings really concerned with "resources"? > - supply chains (think supply and demand) > - accounting classes CKRM's use of the term "class" drew negative comments from Paul Jackson and Andrew Morton about this time last year. That led to my suggestion of "Resource Groups". Unless they've changed their minds... > Do any of those sound remotely close? If not, your turn :) I'll butt in here: task groups? task sets? confuselets? ;) Cheers, -Matt Helsley - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm
> On Thu, 8 Mar 2007 16:25:05 +1100 Con Kolivas <[EMAIL PROTECTED]> wrote: > > It also boots OK on a very similar but somewhat older Nocona machine. > > Perhaps due to config changes: > > http://userweb.kernel.org/~akpm/ck/config-ok.txt > > Ok I just remembered that not only did I expect the cpu task to never be > scheduled and it _might_ be scheduled on sched_init, it is actually > _consciously_ scheduled on hotplug cpu which I have no way of handling at the > moment. On both your configs I noticed you had hotplug cpu enabled, but > perhaps it isn't really being used on the more conservative config. So this > is something I already know I need to handle. Did your ppc that had > the "bitmap error" have hotplug cpu enabled? It might be an unrelated > bug^Wphenomenon. The powerpc config has CONFIG_HOTPLUG_CPU=n - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/20] x86_64: Assembly safe page.h and pgtable.h
On Wed, Mar 07, 2007 at 08:24:04PM +0100, Sam Ravnborg wrote: > On Wed, Mar 07, 2007 at 12:29:20PM +0530, Vivek Goyal wrote: > > > > > > This patch makes pgtable.h and page.h safe to include > > in assembly files like head.S. Allowing us to use > > symbolic constants instead of hard coded numbers when > > refering to the page tables. > > > > This patch copies asm-sparc64/const.h to asm-x86_64 to > > get a definition of _AC() a very convinient macro that > > allows us to force the type when we are compiling the > > code in C and to drop all of the type information when > > we are using the constant in assembly. > Should this file not live in asm-generic and be useable > for all architectures? > Hi Sam, Thanks for the review. This makes sense to me. Move const.h into asm-generic and let everybody use it. This is more of a small cleanup issue and involves changing few header files in asm-sparc64 and make sure nothing is broken on sparc64. This patchset is already becoming big and complex. Is it ok if we let the patch remain unmodified for now and once this gets in and settles down, I can post another patch to do above modification? Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 8/6] mm: fix cpdfio vs fault race
On Wed, Mar 07, 2007 at 01:02:14PM -0800, Andrew Morton wrote: > On Wed, 7 Mar 2007 12:31:21 +0100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Index: linux-2.6/mm/memory.c > > === > > --- linux-2.6.orig/mm/memory.c > > +++ linux-2.6/mm/memory.c > > @@ -1664,6 +1664,15 @@ gotten: > > unlock: > > pte_unmap_unlock(page_table, ptl); > > if (dirty_page) { > > + /* > > +* Yes, Virginia, this is actually required to prevent a race > > +* with clear_page_dirty_for_io() from clearing the page dirty > > +* bit after it clear all dirty ptes, but before a racing > > +* do_wp_page installs a dirty pte. > > +* > > +* do_no_page is protected similarly. > > +*/ > > + wait_on_page_locked(dirty_page); > > set_page_dirty_balance(dirty_page); > > put_page(dirty_page); > > } > > @@ -2316,6 +2325,7 @@ retry: > > unlock: > > pte_unmap_unlock(page_table, ptl); > > if (dirty_page) { > > + wait_on_page_locked(dirty_page); > > set_page_dirty_balance(dirty_page); > > put_page(dirty_page); > > } > > Index: linux-2.6/mm/page-writeback.c > > now that's scary - applying this on top of your > lock-the-page-in-the-fault-handler patches gives: > > if (dirty_page) { > /* >* Yes, Virginia, this is actually required to prevent a race >* with clear_page_dirty_for_io() from clearing the page dirty >* bit after it clear all dirty ptes, but before a racing >* do_wp_page installs a dirty pte. >* >* do_no_page is protected similarly. >*/ > wait_on_page_locked(dirty_page); > wait_on_page_locked(dirty_page); > set_page_dirty_balance(dirty_page); > put_page(dirty_page); > } > > One wonders how on earth patch(1) managed to do that. If it has inserted > the comment twice as well then it might be explicable.. Ouch ;) Yeah that patch I sent was supposed to apply underneath the previous ones, sorry I wasn't clear. > Oh well, let's try this: Yeah that looks like the correct one for applying on top. Thanks. > > From: Nick Piggin <[EMAIL PROTECTED]> > > Fix msync data loss and (less importantly) dirty page accounting > inaccuracies due to the race remaining in clear_page_dirty_for_io(). > > The deleted comment explains what the race was, and the added comments > explain how it is fixed. > > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> > Cc: Linus Torvalds <[EMAIL PROTECTED]> > Cc: Miklos Szeredi <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > mm/memory.c |9 + > mm/page-writeback.c | 17 - > 2 files changed, 21 insertions(+), 5 deletions(-) > > diff -puN mm/memory.c~mm-fix-cpdfio-vs-fault-race mm/memory.c > --- a/mm/memory.c~mm-fix-cpdfio-vs-fault-race > +++ a/mm/memory.c > @@ -1669,6 +1669,15 @@ gotten: > unlock: > pte_unmap_unlock(page_table, ptl); > if (dirty_page) { > + /* > + * Yes, Virginia, this is actually required to prevent a race > + * with clear_page_dirty_for_io() from clearing the page dirty > + * bit after it clear all dirty ptes, but before a racing > + * do_wp_page installs a dirty pte. > + * > + * do_no_page is protected similarly. > + */ > + wait_on_page_locked(dirty_page); > set_page_dirty_balance(dirty_page); > put_page(dirty_page); > } > diff -puN mm/page-writeback.c~mm-fix-cpdfio-vs-fault-race mm/page-writeback.c > --- a/mm/page-writeback.c~mm-fix-cpdfio-vs-fault-race > +++ a/mm/page-writeback.c > @@ -903,6 +903,8 @@ int clear_page_dirty_for_io(struct page > { > struct address_space *mapping = page_mapping(page); > > + BUG_ON(!PageLocked(page)); > + > if (mapping && mapping_cap_account_dirty(mapping)) { > /* >* Yes, Virginia, this is indeed insane. > @@ -928,14 +930,19 @@ int clear_page_dirty_for_io(struct page >* We basically use the page "master dirty bit" >* as a serialization point for all the different >* threads doing their things. > - * > - * FIXME! We still have a race here: if somebody > - * adds the page back to the page tables in > - * between the "page_mkclean()" and the "TestClearPageDirty()", > - * we might have it mapped without the dirty bit set. >*/ > if (page_mkclean(page)) > set_page_dirty(page); > + /* > + * We carefully synchronise fault handlers against > + * installing a dirty
Re: 2.6.21-rc1 and 2.6.21-rc2 kwin dies silently
(cc restored. Please always do reply-to-all) > On Wed, 28 Feb 2007 18:05:13 +0200 [EMAIL PROTECTED] wrote: > On Wednesday 28 February 2007 17:19, Sid Boyce wrote: > > openSUSE 10.3 Alpha and KDE-3.5.6, xorg-x11-7.2. KDE is setup not to > > require a password to unlock, but it asks for password. When the screen > > unlocks, kwin is gone with no errors logged in /var/log/kdm or > > /var/log/messages. No problems with 2.6.20. > > > > Same problem on openSUSE 10.2 x86_64, KDE-3.5.5 and 2.6.21-rc2. > > Regards > > Sid. > > This is the linux kernel mailing list. Perhaps you should post your problem > to > the opensuse mailing list. 2.6.20 worked. 2.6.20-rc2 did not. Working theory: the kernel broke. Sid, the chances that anyone can work out what caused this are pretty low. It would be great if you could perform a git bisection search sometime in the next few weeks, work out which commit caused this. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC: -mm patch] #if 0 mmc_deselect_cards()
Adrian Bunk wrote: > On Tue, Mar 06, 2007 at 12:44:08AM -0800, Andrew Morton wrote: >> ... >> Changes since 2.6.20-rc2-mm1: >> ... >> git-mmc.patch >> ... >> git trees >> ... > > mmc_deselect_cards() is no longer used. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > Indeed, but it's probably better to just remove it rather than have old crud lying around. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm
On Thursday 08 March 2007 15:15, Andrew Morton wrote: > On Wed, 7 Mar 2007 18:54:30 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > > On Wed, 7 Mar 2007 12:26:42 +1100 > > > > > > Con Kolivas <[EMAIL PROTECTED]> wrote: > > > > What follows is the same patch series that constitutes the RDSL > > > > "Rotating Staircase DeadLine" cpu scheduler resynced for > > > > 2.6.21-rc2-mm2. > > > > > > Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f. > > > > > > I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then > > > oopsed differently. Before netconsole had come up, no serial console, > > > no digital camera. > > > > > > There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably > > > boot that kernel on your own machine. > > > > > > I need to do rc3-mm1 now. I might find some time to poke at this > > > further after that, but I have to leave for a week in .jp and it'll be > > > squeezy, sorry. > > > > well it boots os dual pIII and quad powerpc. > > It also boots OK on a very similar but somewhat older Nocona machine. > Perhaps due to config changes: > http://userweb.kernel.org/~akpm/ck/config-ok.txt Ok I just remembered that not only did I expect the cpu task to never be scheduled and it _might_ be scheduled on sched_init, it is actually _consciously_ scheduled on hotplug cpu which I have no way of handling at the moment. On both your configs I noticed you had hotplug cpu enabled, but perhaps it isn't really being used on the more conservative config. So this is something I already know I need to handle. Did your ppc that had the "bitmap error" have hotplug cpu enabled? It might be an unrelated bug^Wphenomenon. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/20] Move swsusp __pa() dependent code to arch portion
On Wed, Mar 07, 2007 at 11:47:40PM +0100, Pavel Machek wrote: > Hi! > > > o __pa() should be used only on kernel linearly mapped virtual addresses > > and not on kernel text and data addresses. > > > > o Hibernation code needs to determine the physical address associated > > with kernel symbol to mark a section boundary which contains pages which > > don't have to be saved and restored during hibernate/resume operation. > > > > o Move this piece of code in arch dependent section. So that architectures > > which don't have kernel text/data mapped into kernel linearly mapped > > region can come up with their own ways of determining physical addresses > > associated with a kernel text. > > > > Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]> > > ...hmm, but that means 3 copies of same code. Can we put the > Actually it is not exactly same code. i386 and x86_64 use __pa_symbol() and powerpc uses __pa() for determining physical address associated with a kernel text symbol. That's the precise intent here. Leave it to arch code to decide how to calculate physical address associated with a kernel symbol. > > +/* > > + * pfn_is_nosave - check if given pfn is in the 'nosave' section > > + */ > > + > > +int pfn_is_nosave(unsigned long pfn) > > +{ > > + unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> > > PAGE_SHIFT; > > + unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) > > >> PAGE_SHIFT; > > + return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn); > > +} > > ...in asm-generic/suspend.h (or something) and then just include it? > Pavel As code is not exactly same, we can't put it in asm-generic/suspend.h. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PAGE_SIZE Availability Inconsistency
While I agree, NBPG is a bit of a problem, although it's only needed for aout coredumps AFAICT, but still needed to compile e.g. gdb. Well then how does gdb deal with ia64? because PAGE_SIZE and friends aren't available for that arch same with ppc. Looking at the gdb code they do have places where they define a PAGE_SIZE but they even mention its a bug (gdb-6.6/libiberty/getpagesize.c:14) also grepped through their code looking for includes of page.h come up with nothing. - David Brown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 16/20] swsusp: do not use virt_to_page on kernel data address
On Wed, Mar 07, 2007 at 11:49:15PM +0100, Pavel Machek wrote: > Hi! > > > o virt_to_page() call should be used on kernel linear addresses and not > > on kernel text and data addresses. Swsusp code uses it on kernel data > > (statically allocated swsusp_header). > > > > o Allocate swsusp_header dynamically so that virt_to_page() can be used > > safely. > > > > o I am changing this because in next few patches, __pa() on x86_64 will > > no longer support kernel text and data addresses and hibernation breaks. > > > > Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]> > > > +static int swsusp_header_init(void) > > +{ > > + swsusp_header = (struct swsusp_header*) __get_free_page(GFP_KERNEL); > > + if (!swsusp_header) > > + panic("Could not allocate memory for swsusp_header\n"); > > + return 0; > > +} > > + > > +core_initcall(swsusp_header_init); > > I do not like the panic, but I guess it is okay as we are running > during boot? (Could you add a comment?) Otherwise ok. > Hi Pavel, Yes, it is an initcall and this memory page will be allocated during boot time. Not very sure what comment to put there. To me it seems pretty obivious with "core_initcall". Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Raid 10 Problems?
On Mar 7 2007 10:20, dean gaudet wrote: >>> http://gentoo-wiki.com/HOWTO_Install_on_Software_RAID#Write-intent_bitmap >> >> That information has been extremely useful. Thanks a >> lot. I fund a command to do the bitmap internal after >> the array was made so I added that. Seems like some of >> these features should be default. Maybe it's time for >> the raid folks to update what is default? > >the bitmap has performance implications... for example: >http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07229.html I wonder if bitmapping a raid1 volume is faster than bmp.ing raid5. The other thing is, the bitmap is supposed to be written out at intervals, not at every write, so the extra head movement for bitmap updates should be really low, and not making the tar -xjf process slower by half a minute. Is there a way to tweak the write-bitmap-to-disk interval? Perhaps something in /sys or ye olde /proc. Maybe linux-raid@ knows 8) >note that unless you tweak your init scripts you'll need to put external >bitmaps on your root partition, see this thread: Huh? That statement does not make sense. But I think you meant: when using external bitmaps, adjust the init scripts. Because internal bitmaps are good for one thing: you don't need to change anything. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 16/20] swsusp: do not use virt_to_page on kernel data address
On Wed, Mar 07, 2007 at 11:50:06PM +0100, Pavel Machek wrote: > Hi! > > > o virt_to_page() call should be used on kernel linear addresses and not > > on kernel text and data addresses. Swsusp code uses it on kernel data > > (statically allocated swsusp_header). > > > > o Allocate swsusp_header dynamically so that virt_to_page() can be used > > safely. > > > > o I am changing this because in next few patches, __pa() on x86_64 will > > no longer support kernel text and data addresses and hibernation breaks. > > > > Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]> > > (I assume this was tested, too?) > Pavel Yes. I have tested this and it works fine. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/20] x86_64 Relocatable bzImage support (V4)
On Thu, Mar 08, 2007 at 10:15:02AM +1100, Nigel Cunningham wrote: > Hi. > > On Thu, 2007-03-08 at 07:49 +1100, Nigel Cunningham wrote: > > Hi. > > > > On Wed, 2007-03-07 at 07:07 -0800, Arjan van de Ven wrote: > > > On Wed, 2007-03-07 at 12:27 +0530, Vivek Goyal wrote: > > > > Hi, > > > > > > > > Here is another attempt on x86_64 relocatable bzImage patches(V4). This > > > > patchset makes a bzImage relocatable and same kernel binary can be > > > > loaded > > > > and run from different physical addresses. > > > > > > > > > have these patches been extensively tested with various suspend > > > scenarios? (S1,S3,S4 in acpi speak or s2ram and s2disk in Linux speak) > > > > We did work on this for RHEL5, getting relocatable kernel support > > working fine with S4. While doing it and since, I've been running > > Suspend2 with the same patch. > > > > Since that work, Vivek has done more modifications, but I can confirm > > that the basic design is reliable with S4. Haven't tried S3, but can do. > > Will report back shortly. > > S3 works okay here with a relocatable x86_64 kernel (2.6.20). > Hi Nigel, Is it possible to test S3 with 2.6.21-rc2 kernels also. Right now I don't have access to any machine supporting S3. I tested it at the time of my last posting and it had worked well. Appreciate your help. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] Input fixes for 2.6.21-rc3
Hi Linus, Please consider pulling from: git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus or master.kernel.org:/pub/scm/linux/kernel/git/dtor/input.git for-linus to receive fix for AUX IRQ delivery test that causes missing keyboards on some boxes without PS/2 mice. The fix is confirmed to be working for MSI K8M800 and also confirmed not to break previous fix (re. bugzilla 7833). Changelog: -- Dmitry Torokhov (1): Input: i8042 - another attempt to fix AUX delivery checks Diffstat: - i8042.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/20] x86_64: wakeup.S misc cleanups
On Wed, Mar 07, 2007 at 11:41:57PM +0100, Pavel Machek wrote: > Hi! > > > + movw$0x0e00 + 'i', %ds:(0xb8012) > > + movb$0xa8, %al ; outb %al, $0x80; > > + > > > - movw$0x0e00 + 'i', %ds:(0xb8012) > > - movb$0xa8, %al ; outb %al, $0x80; > > Outbs were my debugging hacks, perhaps you can simply remove them at > this point? Not sure how useful "Linux" debug print is, it can > probably be removed, too. > Hi Pavel, I found these debugging hacks useful while debugging some problem with my changes in this code. It helps to find out till what poing code flow as reached in this assembly code. So I think its not a bad idea to let this piece code be there. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/20] x86_64: wakeup.S misc cleanups
On Wed, Mar 07, 2007 at 11:40:53PM +0100, Pavel Machek wrote: > Hi! > > > o Various cleanups. One of the main purpose of cleanups is that make > > wakeup.S as close as possible to trampoline.S. > > > > o Following are the changes > > - Indentations for comments. > > - Changed the gdt table to compact form and to resemble the > > one in trampoline.S > > - Take the jump to 32bit from real mode using ljmpl. Makes code > > more readable. > > - After enabling long mode, directly take a long jump for 64bit > > mode. No need to take an extra jump to "reach_comaptibility_mode" > > - Stack is not used after real mode. So don't load stack in > > 32 bit mode. > > - No need to enable PGE here. > > - No need to do extra EFER read, anyway we trash the read contents. > > - No need to enable system call (EFER_SCE). Anyway it will be > > enabled when original EFER is restored. > > - No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will > > reload the original cr0 while restroing the processor state. > > > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > > Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]> > > ACK, provided it was tested. > Pavel Hi Pavel, Thanks. I tested all the S3 related changes during my last posting. That time I had access to an x86_64 box which supported ACPI state S3. Since then this code has not changed and it has been running successfully in RHEL5 kernels. Now I don't have access to an x86_64 machine which supports S3 so I can't test suspend to RAM. But I am sure that these patches are working as nothing has changed since last posting. Just now Nigel reported successful suspend to RAM results for 2.6.20. I have requested him to test it for 2.6.21-rc2 also, if possible. I have throughly tested suspend to disk (S4) and it works fine. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm
On 08/03/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Wed, 7 Mar 2007 12:26:42 +1100 > Con Kolivas <[EMAIL PROTECTED]> wrote: > > > What follows is the same patch series that constitutes the RDSL "Rotating > > Staircase DeadLine" cpu scheduler resynced for 2.6.21-rc2-mm2. > > Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f. > > I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then oopsed > differently. Before netconsole had come up, no serial console, no digital > camera. > > There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably > boot that kernel on your own machine. > > I need to do rc3-mm1 now. I might find some time to poke at this > further after that, but I have to leave for a week in .jp and it'll be > squeezy, sorry. well it boots os dual pIII and quad powerpc. The powerpc says Scheduler bitmap error - bitmap being reconstructed.. during bootup. But it didn't crash like the Nocona machine. Ah thanks. Sorry I have a very busy day at work and am unable to do anything about it till tonight. I could imagine on nocona this would be due to the idle task being scheduled on init - which it is not supposed to do but if you read the comment in sched_init it says it *might. I have no way of handling that at the moment because I wasn't sure it ever happened any more. As for the powerpc.. I have no idea (from where I am at the moment which it would be totally inappopriate for me to try to debug :P), sorry. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp_cubic: use 32 bit math
On Wed, Mar 07, 2007 at 07:10:47PM -0800, Stephen Hemminger wrote: > David Miller wrote: > >From: Stephen Hemminger <[EMAIL PROTECTED]> > >Date: Wed, 7 Mar 2007 17:07:31 -0800 > > > > > >>The basic calculation has to be done in 32 bits to avoid > >>doing 64 bit divide by 3. The value x is only 22bits max > >>so only need full 64 bits only for x^2. > >> > >>Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > >> > > > >Applied, thanks Stephen. > > > >What about Willy Tarreau's supposedly even faster variant? > >Or does this incorporate that set of improvements? > > > That's what this is: >x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3; Confirmed, it's the same. BTW, has someone tested on a 64bit system if it brings any difference ? Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc3-mm2
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm2/ - This is the same as 2.6.21-rc3-mm1, except Con's CPU scheduler changes were dropped. This is for A/B comparison purposes, and because those changes crashed on one test setup. Changes since 2.6.21-rc3-mm1: -lists-add-list-splice-tail.patch -sched-remove-sleepavg-from-proc.patch -sched-remove-noninteractive-flag.patch -sched-implement-180-bit-sched-bitmap.patch -sched-implement-rsdl-cpu-scheduler.patch -sched-document-rsdl-cpu-scheduler.patch Removed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc3-mm1
Temporarily at http://userweb.kernel.org/~akpm/2.6.21-rc3-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm1/ - The wireless changes in here need a lot of testers, please. It is major rework. Of course the config files got all changed around so `make oldconfig' breaks everything. I was able to get ipw2200 working after some fumbling, but perhaps John can tell people what has been changed in there? What has happened, from a big picture perspective? - This patchset contains Con's rip-up-and-rewrite of the CPU scheduling algorithm. It oopsed for me on one machine so I'll do an rc3-mm2 without those changes shortly. If 2.6.21-rc3-mm1 crashes and 2.6.rc3-mm2 does not, don't forget to Cc: Con Kolivas <[EMAIL PROTECTED]> on the report ;) Feedback on this change is sought. Especially from the enterprise-database and volanomark loonies: this stuff might be headed your way so don't tell us afterwards that it hurt. - Added Nick's lock-the-page-in-the-pagefault-handler patches. These reduce the incidence of one bug and increase the incidence of another. VM is fun. - Re-added the ext4 development tree to the -mm lineup. It has stuff in it. Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail [EMAIL PROTECTED] - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Occasional snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits list. Changes since 2.6.21-rc2-mm2: origin.patch git-acpi.patch git-alsa.patch git-arm.patch git-avr32.patch git-cpufreq.patch git-drm.patch git-dvb.patch git-gfs2-nmw.patch git-hid.patch git-ia64.patch git-ieee1394.patch git-input.patch git-kvm.patch git-leds.patch git-libata-all.patch git-md-accel.patch git-md-accel-fixup.patch git-mmc.patch git-ubi.patch git-netdev-all.patch git-ioat.patch git-ocfs2.patch git-parisc.patch git-r8169.patch git-selinux.patch git-pciseg.patch git-s390.patch git-unionfs.patch git-watchdog.patch git-wireless.patch git-wireless-fixup.patch git-ipwireless_cs.patch git-gccbug.patch git trees -paravirt-build-fixes.patch -fix-suspend-resume-with-periodic-tick-devices.patch -nvidiafb-backlight-fix-implicit-declaration-in-nv_backlight.patch -atyfb-fix-kconfig-error-part-2.patch -fbdev-fix-kconfig-error-if-fb_ddc=n.patch -fix-2621-rfcomm-lockups.patch -scheduled-removal-of-sa_xxx-interrupt-flags-fixups-3.patch -i386-make-x86_64-tsc-header-require-i386-rather-than-vice-versa.patch -hrtimers-fix-hrtimer_cb_irqsafe_no_softirq-description.patch -hrtimers-hrtimer_clock_base-description-typo.patch -highres-do-not-run-the-timer_softirq-after-switching-to-highres-mode.patch -highres-do-not-run-the-timer_softirq-after-switching-to-highres-mode-tweak.patch -highres-do-not-run-the-timer_softirq-after-switching-to-highres-mode-tweak-fix.patch -kconfig-update-swsusp-description.patch -remove-arch-i386-kernel-tscccustom_sched_clock.patch -mqueue-nested-locking-annotation.patch -fix-vsyscall-settimeofday.patch -fs-nobh_truncate_page-fix.patch -geode-aes-use-unsigned-long-for-spin_lock_irqsave.patch -publish-rcutorture-module-parameters-via-sysfs-read-only.patch -cciss-fix-for-2tb-support.patch -cciss-add-struct-pci_driver-shutdown-support-replaces-reboot-notifier.patch -initramfs-should-not-depend-on-config_block.patch -linux-audith-needs-linux-typesh.patch -uml-fix-formatting-violations-in-signal-delivery-code.patch -uml-add-a-debugging-message.patch -uml-comment-the-initialization-of-a-global.patch -knfsd-use-recv_msg-to-get-peer-address-for-nfsd-instead-of-code-copying.patch -knfsd-remove-config_ipv6-ifdefs-from-sunrpc-server-code.patch
Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm
On Wed, 7 Mar 2007 18:54:30 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > > On Wed, 7 Mar 2007 12:26:42 +1100 > > Con Kolivas <[EMAIL PROTECTED]> wrote: > > > > > What follows is the same patch series that constitutes the RDSL "Rotating > > > Staircase DeadLine" cpu scheduler resynced for 2.6.21-rc2-mm2. > > > > Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f. > > > > I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then oopsed > > differently. Before netconsole had come up, no serial console, no digital > > camera. > > > > There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably > > boot that kernel on your own machine. > > > > I need to do rc3-mm1 now. I might find some time to poke at this > > further after that, but I have to leave for a week in .jp and it'll be > > squeezy, sorry. > > well it boots os dual pIII and quad powerpc. > It also boots OK on a very similar but somewhat older Nocona machine. Perhaps due to config changes: http://userweb.kernel.org/~akpm/ck/config-ok.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc2 regression vs. 2.6.20: AT keyboard only works with pci=noacpi
On Wednesday 07 March 2007 16:50, Dmitry Torokhov wrote: > On 3/7/07, Ash Milsted <[EMAIL PROTECTED]> wrote: > > > > So, I tracked this down to 2.6.21-git7, the first snapshot that gives me > > this problem. Tellingly it does contain an input tree merge. I would git > > bisect > > but I don't have a local copy of the tree - I tried to get one, but it > > stopped > > halfway through the clone, probably because I had to use http... So, I hope > > that > > helps. > > > > Hm, that is strange... 2.6.20-rc7 has i8042 AUX IRQ delivery test fix > and fix for panic blink, both shoudl not really affect your keyboard. > Can I please get full dmesg of boot with "i8042.debug > log_buf_len=131072"? > Argh, I can't believe I forgot to get this into my tree. Could you please tell me if the patch below fixes ytour issue? -- Dmitry Input: i8042 - another attempt to fix AUX delivery checks Do not assume that AUX_LOOP command is broken unless it completes successfully but returns wrong (unexpected) data. Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]> --- drivers/input/serio/i8042.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) Index: linux/drivers/input/serio/i8042.c === --- linux.orig/drivers/input/serio/i8042.c +++ linux/drivers/input/serio/i8042.c @@ -553,7 +553,8 @@ static int __devinit i8042_check_aux(voi */ param = 0x5a; - if (i8042_command(, I8042_CMD_AUX_LOOP) || param != 0x5a) { + retval = i8042_command(, I8042_CMD_AUX_LOOP); + if (retval || param != 0x5a) { /* * External connection test - filters out AT-soldered PS/2 i8042's @@ -567,7 +568,12 @@ static int __devinit i8042_check_aux(voi (param && param != 0xfa && param != 0xff)) return -1; - aux_loop_broken = 1; +/* + * If AUX_LOOP completed without error but returned unexpected data + * mark it as broken + */ + if (!retval) + aux_loop_broken = 1; } /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/6 -rt] powerpc 2.6.20-rt8: to convert spinlocks to raw ones.
On Thu, Mar 08, 2007 at 02:26:47PM +1100, Paul Mackerras wrote: > Bill Huey (hui) writes: > > > The places that need to be reverted to raw spinlocks are generally either > > acquired by function calls that allocate the spinlock at a terminal of the > > kernel's lock graph or isolated from other callers completely (parts of the > > timer for logic for instance). It's all about the collision of various lock > > (preemptive and non-preemptive) subtrees and how to avoid scheduling within > > atomic violations that lead to deadlocks. The -rt patch gets arbitrary > > preemption abilities by shrinking the non-preemptive sub-tree bit to the > > bare > > essentials of what will let a system to run yet still preserve all of > > the expected locking semantics of a critical section. > > Thanks; that's an interesting explanation. > > It misses the point of what I was saying to Sergei, though, which was > *not* "I don't understand your patch", it was "if this patch goes into > a git tree, someone coming along in 3 years time won't understand the > patch." In other words I was ranting about the need for a decent > description to accompany the patch itself, so it would go into the > permanent record. Yeah, I think it's a a fear and uncertainly about the technical details about the patch. That is why folks CC Ingo and company to get either a kind of confirmation that this is ok along with comments. There are very few folks that really understand the basic principals of the patch in this community and that's not going to change any time soon. The mystery, paranoia (FUD) and criticism surrounding it can make folks a bit shy. I'll talk to you and Ben about it if we all get to OLS again. :) bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp_cubic: use 32 bit math
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 07 Mar 2007 19:10:47 -0800 > David Miller wrote: > > What about Willy Tarreau's supposedly even faster variant? > > Or does this incorporate that set of improvements? > > > That's what this is: > x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3; Great, thanks for the clarification. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
trailing whitespace killing (Re: [PATCH -mm] Blackfin: blackfin i2c driver)
> From: Andrew Morton > Newsgroups: gmane.linux.kernel > Subject: Re: [PATCH -mm] Blackfin: blackfin i2c driver > Date: Tue, 6 Mar 2007 23:45:29 -0800 [] > On Wed, 07 Mar 2007 15:39:27 +0800 "Wu, Bryan" <[EMAIL PROTECTED]> wrote: > >> Thanks a lot, could you please give me a script just to kill this >> whitespace? So I can do it before sending you patches. > > > Is pretty simple: > > #!/bin/sh > # > # Strip any trailing whitespace which a unified diff adds. > # > > strip1() > { > TMP=$(mktemp /tmp/XX) > cp $1 $TMP > sed -e '/^+/s/[ ]*$//' < $TMP > $1 > rm $TMP > } > > for i in $* > do > strip1 $i > done > > > that'll be in > http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20/patch-scripts-0.20.tar.gz > too It doesn't work for me. Maybe i can't understand what you are trying to do, anyway. General suggestion is can be: sed -e 's_[ \t]*$__' (i.e any line on stdin with space/tab mixed tails is stripped on stdout) You can use it as wrapper for diff, sending patch bombs, etc. (very nice with pipes): shell$ diff -Npu2 old new | sed -e 's_[ \t]*$__' > patch.diff shell$ < patch-set.mbox sed -e 's_[ \t]*$__' | formail -s /usr/sbin/sendmail -bm -t similar in scripts; quilt (patch sets manager) notices about them. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/20] x86_64 Relocatable bzImage support (V4)
On Wed, Mar 07, 2007 at 07:07:39AM -0800, Arjan van de Ven wrote: > On Wed, 2007-03-07 at 12:27 +0530, Vivek Goyal wrote: > > Hi, > > > > Here is another attempt on x86_64 relocatable bzImage patches(V4). This > > patchset makes a bzImage relocatable and same kernel binary can be loaded > > and run from different physical addresses. > > > have these patches been extensively tested with various suspend > scenarios? (S1,S3,S4 in acpi speak or s2ram and s2disk in Linux speak) Hi Arjan, I have tested these patches for suspend to RAM and suspend to disk and they work fine. In the past we had few issues with suspend to disk and now these issues have been resolved in this patchset. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Paul Menage wrote: > I made sure to check [...]wikipedia.org[...] when this argument started ... > :-) > Wikipedia?! That's not a referen[...] oh bugger it. I've vented enough today and we're on the same page now I think. >> This is the classic terminology problem between substance and function. >> ie, some things share characteristics but does that mean they are the >> same thing? >> > > Aren't you arguing my side here? My point is that what I'm trying to > add with "containers" (or whatever name we end up using) can't easily > be subsumed into the "namespace" concept, and you're arguing that they > should go into nsproxy because they share some characteristics. > Ok, they share this characteristic with namespaces: that they group processes. So, they conceptually hang off task_struct. But we put them on ns_proxy because we've got this vague notion that things might be better that way. >> about this you still insist on calling this sub-system specific stuff >> the "container", >> > Uh, no. I'm trying to call a *grouping* of processes a container. > Ok, so is this going to supplant the namespaces too? >> and then go screaming that I am wrong and you are right >> on terminology. >> > > Actually I asked if you/Eric had better suggestions. > Cool, let's review them. Me, 07921311:38+12: > This would suggesting re-write this patchset, part 2 as a "CPUSet > namespace", part 4 as a "CPU scheduling namespace", parts 5 and 6 as > "Resource Limits Namespace" (drop this "BeanCounter" brand), and of > course part 7 falls away. Me, 07022110:58+12: > Did you like the names I came up with in my original reply? > - CPUset namespace for CPU partitioning > - Resource namespaces: >- cpusched namespace for CPU >- ulimit namespace for memory >- quota namespace for disk space >- io namespace for disk activity >- etc Ok, there's nothing original or useful there; I'm obviously quite deliberately still punting on the issue. Eric, 07030718:32-07: > Pretty much. For most of the other cases I think we are safe referring > to them as resource controls or resource limits.I know that roughly > covers what cpusets and beancounters and ckrm currently do. Let's go back in time to the thread I referred to: Me, 06032209:08+12 and nearby posts > - "vserver" spelt in full > - family > - container > - jail > - task_ns (sort for namespace) > Using the term "box" and ID term "boxid": > create_space - creates a new space and "hashes" it Kirill, 06032418:36+03: > I propose to use "namespace" naming. > 1. This is already used in fs. > 2. This is what IMHO suites at least OpenVZ/Eric > 3. it has good acronym "ns". Right. So, now I'll also throw into the mix: - resource groups (I get a strange feeling of déjà vú there) - supply chains (think supply and demand) - accounting classes Do any of those sound remotely close? If not, your turn :) And do we bother changing IPC namespaces or let that one slide? Sam. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/6 -rt] powerpc 2.6.20-rt8: to convert spinlocks to raw ones.
Bill Huey (hui) writes: > The places that need to be reverted to raw spinlocks are generally either > acquired by function calls that allocate the spinlock at a terminal of the > kernel's lock graph or isolated from other callers completely (parts of the > timer for logic for instance). It's all about the collision of various lock > (preemptive and non-preemptive) subtrees and how to avoid scheduling within > atomic violations that lead to deadlocks. The -rt patch gets arbitrary > preemption abilities by shrinking the non-preemptive sub-tree bit to the bare > essentials of what will let a system to run yet still preserve all of > the expected locking semantics of a critical section. Thanks; that's an interesting explanation. It misses the point of what I was saying to Sergei, though, which was *not* "I don't understand your patch", it was "if this patch goes into a git tree, someone coming along in 3 years time won't understand the patch." In other words I was ranting about the need for a decent description to accompany the patch itself, so it would go into the permanent record. Regards, Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] epoll use a single inode ...
On Wed, 7 Mar 2007, Michael K. Edwards wrote: > > People's prejudices against prefetch instructions are sometimes > traceable to the 3DNow! prefetch(w) botch, which some processors > "support" as no-ops and others are too aggressive about (Opteron > prefetches are reputed to be "strong", i. e., not dropped on DTLB > miss). No, I just checked, and Intel's own optimization manual makes it clear that you should be careful. They talk about performance penalties due to resource constraints - which makes tons of sense with a core that is good at handling its own resources and could quite possibly use those resources better to actually execute the loads and stores deeper down the instruction pipeline. So it's not just 3DNow! making AMD look bad, or Intel would obviously suggest people use it out of the wazoo ;) > XScale gets it right. Blah. XScale isn't even an OoO CPU, *of*course* it needs prefetching. Calling that "getting it right" is ludicrous. If anything, it gets things so wrong that prefetching is *required* for good performance. I'm talking about real CPU's with real memory pipelines that already do prefetching in hardware. The better the core is, the less the prefetch helps (and often the more it hurts in comparison to how much it helps). But if you mean "doesn't try to fill the TLB on data prefetches", then yes, that's generally the right thing to do. > (Oddly, Prescott seems to have initiated a page table walk on DTLB miss > during software prefetch -- just one of many weird Prescott flaws.) Netburst in general is *very* happy to do speculative TLB fills, I think. > I'm guessing Pentium M and its descendants (Core Solo and Duo) get it > right but I'm having a hell of a time finding out for sure. Can any of > the x86 experts answer this? I just suspect that the upside for Core 2 Due is likely fairly low. The L2 cache is good, the memory re-ordering is working.. I doubt "prefetch" helps in generic code that much for things like linked list following, you should probably limit it to code that has *known* access patterns and you know it's not going to be in the cache. (In other words, I bet prefetching can help a lot with MMX/media kind of code, I doubt it's a huge win for "for_each_entry()") Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp_cubic: use 32 bit math
David Miller wrote: From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 7 Mar 2007 17:07:31 -0800 The basic calculation has to be done in 32 bits to avoid doing 64 bit divide by 3. The value x is only 22bits max so only need full 64 bits only for x^2. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Applied, thanks Stephen. What about Willy Tarreau's supposedly even faster variant? Or does this incorporate that set of improvements? That's what this is: x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] rcfs core patch
Srivatsa Vaddagiri <[EMAIL PROTECTED]> writes: > Heavily based on Paul Menage's (inturn cpuset) work. The big difference > is that the patch uses task->nsproxy to group tasks for resource control > purpose (instead of task->containers). > > The patch retains the same user interface as Paul Menage's patches. In > particular, you can have multiple hierarchies, each hierarchy giving a > different composition/view of task-groups. > > (Ideally this patch should have been split into 2 or 3 sub-patches, but > will do that on a subsequent version post) After looking at the discussion that happened immediately after this was posted this feels like the right general direction to get the different parties talking to each other. I'm not convinced about the whole idea yet but this looks like a step in a useful direction. I have a big request. Please next time this kind of patch is posted add a description of what is happening and why. I have yet to see people explain why this is a good idea. Why the current semantics were chosen. The review is still largely happening at the why level but no one is addressing that yet. So please can we have a why. I have a question? What does rcfs look like if we start with the code that is in the kernel? That is start with namespaces and nsproxy and just build a filesystem to display/manipulate them? With the code built so it will support adding resource controllers when they are ready? > Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]> > Signed-off-by : Paul Menage <[EMAIL PROTECTED]> > > > --- > > linux-2.6.20-vatsa/include/linux/init_task.h |4 > linux-2.6.20-vatsa/include/linux/nsproxy.h |5 > linux-2.6.20-vatsa/init/Kconfig | 22 > linux-2.6.20-vatsa/init/main.c |1 > linux-2.6.20-vatsa/kernel/Makefile |1 > > > --- > > diff -puN include/linux/nsproxy.h~rcfs include/linux/nsproxy.h > --- linux-2.6.20/include/linux/nsproxy.h~rcfs 2007-03-01 14:20:47.0 > +0530 > +++ linux-2.6.20-vatsa/include/linux/nsproxy.h 2007-03-01 14:20:47.0 > +0530 > @@ -28,6 +28,10 @@ struct nsproxy { We probably want to rename this struct task_proxy And then we can rename most of the users things like: dup_task_proxy, clone_task_proxy, get_task_proxy, free_task_proxy, put_task_proxy, exit_task_proxy, init_task_proxy > struct ipc_namespace *ipc_ns; > struct mnt_namespace *mnt_ns; > struct pid_namespace *pid_ns; > +#ifdef CONFIG_RCFS > + struct list_head list; This extra list of nsproxy's is unneeded and a performance problem the way it is used. In general we want to talk about the individual resource controllers not the nsproxy. > + void *ctlr_data[CONFIG_MAX_RC_SUBSYS]; I still don't understand why these pointers are so abstract, and why we need an array lookup into them? > +#endif > }; > extern struct nsproxy init_nsproxy; > > @@ -35,6 +39,12 @@ struct nsproxy *dup_namespaces(struct ns > int copy_namespaces(int flags, struct task_struct *tsk); > void get_task_namespaces(struct task_struct *tsk); > void free_nsproxy(struct nsproxy *ns); > +#ifdef CONFIG_RCFS > +struct nsproxy *find_nsproxy(struct nsproxy *ns); > +int namespaces_init(void); > +#else > +static inline int namespaces_init(void) { return 0;} > +#endif > > static inline void put_nsproxy(struct nsproxy *ns) > { > diff -puN /dev/null include/linux/rcfs.h > --- /dev/null 2006-02-25 03:06:56.0 +0530 > +++ linux-2.6.20-vatsa/include/linux/rcfs.h 2007-03-01 14:20:47.0 > +0530 > @@ -0,0 +1,72 @@ > +#ifndef _LINUX_RCFS_H > +#define _LINUX_RCFS_H > + > +#ifdef CONFIG_RCFS > + > +/* struct cftype: > + * > + * The files in the container filesystem mostly have a very simple read/write > + * handling, some common function will take care of it. Nevertheless some > cases > + * (read tasks) are special and therefore I define this structure for every > + * kind of file. I'm still inclined to think this should be part of /proc, instead of a purely separate fs. But I might be missing something. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 097/101] revert "drivers/net/tulip/dmfe: support basic carrier detection"
On Wed, 7 Mar 2007, Dan Williams wrote: > > Definitely right. If it doesn't work for your card, it needs to be > fixed for your card. Well, regressions are regressions. And they are a *lot* more important than any new features. If it doesn't work, it gets reverted. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Loop device - Tracking page writes made to a loop device through mmap
All comments have been taken care of. Description: A file_operations structure variable called loop_fops is initialised with the default block device file operations (def_blk_fops). The mmap operation is overriden with a new function called loop_file_mmap. A vm_operations structure variable called loop_file_vm_ops is initialised with the default operations for a disk file. The page_mkwrite operation in this variable is initialised to a new function called loop_track_pgwrites. In the function lo_open, the file operations pointer of the device file is initialised with the address of loop_fops. The function loop_file_mmap simply calls generic_file_mmap and then initialises the vm_ops of the vma with address of loop_file_vm_ops. The function loop_track_pgwrites stores the page offset of the page that is being written to, in a red-black tree within the loop device. A flag lo_track_pgwrite has been added to the structs loop_device and loop_info64 to turn on/off tracking of page writes. Two new ioctls have been added. The ioctl cmd LOOP_GET_PGWRITES retrieves the page offsets of pages that have been written to. The ioctl cmd LOOP_CLR_PGWRITES empties the red-black tree This functionality would allow us to have a read only version and a write version of memory by doing the following: Associate a normal file as backing storage for the loop device and mmap to the loop device. Call this mmapped address space as area1. Mmap to a normal file of identical size. Call this mmapped address space as area2. Changes made to area1 can be periodically copied to area2 using the ioctl cmds (retreive dirty page offsets and copy the dirty pages from area1 to area2). This facility would provide a quick way of updating the read only version. Motivation for new ioctls: Imagine a business server application which processes messages from clients as they come in (say over a TCP connection). Some of those messages may be transactions, i.e. they cause data changes in the application. Rest of those messages may be queries i.e. they get information from the application. The application can consist of two processes. One process will handle the transactions. The other process will handle the queries. Each process will have its own copy of the business data. The process handling transactions can mmap to the loop device for its copy of the memory. The loop device must have a normal file for its backing storage. The process handling queries can mmap to another normal file for its copy of the memory. Both these memories have identical data at the beginning. Queries and transactions can now be handled simultaneously by the respective processes. The query process can update its memory periodically by obtaining the changes that have have happened to the loop device. By using the ioctl call to retrieve the dirty page offsets, only the dirty pages need to be copied over to the query process's copy of memory. We can infact have multiple processes to handle queries sharing the same memory. During this copy over, the transaction process will hold off processing transactions till the update is complete. This would be very useful for high speed in-memory transaction systems, where the query load can be passed of to other processes. Example of such systems would be a stock trading system, where clients buy and sell stock(equity, options etc). At the same time lot of clients would be downloading market data and this can be done independently of the transactions. This new facility will provide a way of tracking changes made to business data, independent of the application domain. Test program: Before you run the test program, please create the backing storage file for the loop device as follows dd if=/dev/zero of=/root/file bs=4K count=10 Set bs to be whatever pagesize is in your machine. In my machine it was 4K. #include #include #include #include #include #include #include #include #include #include #include #include #include int main() { int maxPages = 10; char* start = 0; int fd; int dfd; int *array = 0; int pageSize; int elemsPerPage; struct loop_info64 info; struct loop_pgoff_array pgarray; pgarray.max = maxPages; pgarray.pgoff = calloc(maxPages, sizeof(long)); if (pgarray.pgoff == NULL) { fprintf(stderr, "can't create pgarray\n"); exit(1); } pageSize = getpagesize(); elemsPerPage = pageSize/sizeof(int); /* open the device file */ if ((fd = open ("/dev/loop0", O_RDWR, S_IRWXU)) < 0) { fprintf(stderr, "can't create device file for writing\n"); goto out5; } /* open the disk file to set as backing storage*/ if ((dfd = open ("/root/file", O_RDWR, S_IRWXU)) < 0) { fprintf(stderr, "can't create device file for writing\n");
Re: SATA resume slowness, e1000 MSI warning
On Wed, 07 Mar 2007 12:28:11 -0700 [EMAIL PROTECTED] (Eric W. Biederman) wrote: > Below is an additional set of warnings that should help debug this. > The old code just got lucky that it triggered a warning when this happens. > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 01869b1..5113913 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -613,6 +613,7 @@ int pci_enable_msi(struct pci_dev* dev) > return -EINVAL; > > WARN_ON(!!dev->msi_enabled); > + WARN_ON(!hlist_empty(>saved_cap_space)); > > /* Check whether driver already requested for MSI-X irqs */ > if (dev->msix_enabled) { > @@ -638,6 +639,8 @@ void pci_disable_msi(struct pci_dev* dev) > if (!dev->msi_enabled) > return; > > + WARN_ON(!hlist_empty(>saved_cap_space)); > + > msi_set_enable(dev, 0); > pci_intx(dev, 1); /* enable intx */ > dev->msi_enabled = 0; > @@ -739,6 +742,7 @@ int pci_enable_msix(struct pci_dev* dev, struct > msix_entry *entries, int nvec) > } > } > WARN_ON(!!dev->msix_enabled); > + WARN_ON(!hlist_empty(>saved_cap_space)); > > /* Check whether driver already requested for MSI irq */ > if (dev->msi_enabled) { > @@ -763,6 +767,8 @@ void pci_disable_msix(struct pci_dev* dev) > if (!dev->msix_enabled) > return; > > + WARN_ON(!hlist_empty(>saved_cap_space)); > + > msix_set_enable(dev, 0); > pci_intx(dev, 1); /* enable intx */ > dev->msix_enabled = 0; > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index bd44a48..4418839 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -677,6 +677,7 @@ pci_restore_state(struct pci_dev *dev) > } > pci_restore_pcix_state(dev); > pci_restore_msi_state(dev); > + WARN_ON(!hlist_empty(>saved_cap_space)); > > return 0; > } Got a hit on a powerpc g5: PM: Writing back config space on device 0001:05:04.0 at offset 1 (was 2b0, writing 2b6) [ cut here ] Badness at drivers/pci/pci.c:679 Call Trace: [C80F7410] [C0011EFC] .show_stack+0x50/0x1cc (unreliable) [C80F74C0] [C01AD610] .report_bug+0xa0/0x110 [C80F7550] [C00256E4] .program_check_exception+0xb4/0x670 [C80F7630] [C00046F4] program_check_common+0xf4/0x100 --- Exception: 700 at .pci_restore_state+0x310/0x340 LR = .pci_restore_state+0x2e0/0x340 [C80F79D0] [C026A174] .tg3_chip_reset+0x19c/0xa04 [C80F7A90] [C026D948] .tg3_reset_hw+0xa4/0x2718 [C80F7BA0] [C0270030] .tg3_init_hw+0x74/0x94 [C80F7C30] [C0270BE0] .tg3_open+0x4c8/0x854 [C80F7CF0] [C03A74A4] .dev_open+0x100/0x12c [C80F7D90] [C03BAEA8] .netpoll_setup+0x2dc/0x3ec [C80F7E40] [C0283450] .init_netconsole+0x64/0x8c [C80F7EC0] [C05C0BE4] .init+0x1d0/0x390 [C80F7F90] [C00271F8] .kernel_thread+0x4c/0x68 tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is on for TX and on for RX. Scheduler bitmap error - bitmap being reconstructed.. netconsole: network logging started Calling initcall 0xc06bd180: .macio_module_init+0x0/0x3c() That's: pci_restore_pcix_state(dev); pci_restore_msi_state(dev); WARN_ON(!hlist_empty(>saved_cap_space)); return 0; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote: Sorry, I didn't realise I was talking with somebody qualified enough to speak on behalf of the Generally Established Principles of Computer Science. I made sure to check http://en.wikipedia.org/wiki/Namespace http://en.wikipedia.org/wiki/Namespace_%28computer_science%29 when this argument started ... :-) This is the classic terminology problem between substance and function. ie, some things share characteristics but does that mean they are the same thing? Aren't you arguing my side here? My point is that what I'm trying to add with "containers" (or whatever name we end up using) can't easily be subsumed into the "namespace" concept, and you're arguing that they should go into nsproxy because they share some characteristics. Look, I already agreed in the earlier thread that the term "namespace" was being stretched beyond belief, yet instead of trying to be useful about this you still insist on calling this sub-system specific stuff the "container", Uh, no. I'm trying to call a *grouping* of processes a container. and then go screaming that I am wrong and you are right on terminology. Actually I asked if you/Eric had better suggestions. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp_cubic: use 32 bit math
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 7 Mar 2007 17:07:31 -0800 > The basic calculation has to be done in 32 bits to avoid > doing 64 bit divide by 3. The value x is only 22bits max > so only need full 64 bits only for x^2. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Applied, thanks Stephen. What about Willy Tarreau's supposedly even faster variant? Or does this incorporate that set of improvements? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm
On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Wed, 7 Mar 2007 12:26:42 +1100 > Con Kolivas <[EMAIL PROTECTED]> wrote: > > > What follows is the same patch series that constitutes the RDSL "Rotating > > Staircase DeadLine" cpu scheduler resynced for 2.6.21-rc2-mm2. > > Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f. > > I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then oopsed > differently. Before netconsole had come up, no serial console, no digital > camera. > > There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably > boot that kernel on your own machine. > > I need to do rc3-mm1 now. I might find some time to poke at this > further after that, but I have to leave for a week in .jp and it'll be > squeezy, sorry. well it boots os dual pIII and quad powerpc. The powerpc says Scheduler bitmap error - bitmap being reconstructed.. during bootup. But it didn't crash like the Nocona machine. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] epoll use a single inode ...
On Mar 07, 2007, at 20:25:14, Michael K. Edwards wrote: On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote In general, using software prefetching is just a stupid idea, unless - the prefetch really is very strict (ie for a linked list you do exactly the above kinds of things to make sure that you don't try to prefetch the non-existent end entry) AND - the CPU is stupid (in-order in particular). I think Intel even suggests in their optimization manuals to *not* do software prefetching, because hw can usually simply do better without it. Not the XScale -- it performs quite poorly without prefetch, as people who have run ARMv5-optimized binaries on it can testify. The Intel XScale(r) core prefetch load instruction is a true prefetch instruction because the load destination is the data or mini-data cache and not a register. Compilers for processors which have data caches, but do not support prefetch, sometimes use a load instruction to preload the data cache. This technique has the disadvantages of using a register to load data and requiring additional registers for subsequent preloads and thus increasing register pressure. By contrast, the prefetch can be used to reduce register pressure instead of increasing it. The prefetch load is a hint instruction and does not guarantee that the data will be loaded. Whenever the load would cause a fault or a table walk, then the processor will ignore the prefetch instruction, the fault or table walk, and continue processing the next instruction. This is particularly advantageous in the case where a linked list or recursive data structure is terminated by a NULL pointer. Prefetching the NULL pointer will not fault program flow. Prefetching is also fairly critical on a Power4 or G5 PowerPC system as they have a long memory latency; an L2-cache miss can cost 200+ cycles. On such systems the "dcbt" prefetch instruction brings in a single 128-byte cacheline and has no serializing effects whatsoever, making it ideal for use in a linked-list-traversal inner loop. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Paul Menage wrote: > Sorry, I think this statement is wrong, by the generally established > meaning of the term namespace in computer science. > Sorry, I didn't realise I was talking with somebody qualified enough to speak on behalf of the Generally Established Principles of Computer Science. >> Trying to extend the well-known term namespace to refer to thingsthat >> are semantically equivalent namespaces is a useful approach, IMHO. >> >> > Yes, that would be true. But the kinds of groupings that we're talking > about are supersets of namespaces, not semantically equivalent to > them. To use Eric's "shoe" analogy from earlier, it's like insisting > that we use the term "sneaker" to refer to all footware, including ski > boots and birkenstocks ... > I see it more like insisting that we use the term "clothing" to also refer to "weapons" because for both of them you tell your body to "wear" them in some game. This is the classic terminology problem between substance and function. ie, some things share characteristics but does that mean they are the same thing? Look, I already agreed in the earlier thread that the term "namespace" was being stretched beyond belief, yet instead of trying to be useful about this you still insist on calling this sub-system specific stuff the "container", and then go screaming that I am wrong and you are right on terminology. I've normally recognised[1] these three things as the primary feature groups of vserver: - isolation - resource limiting - resource sharing So I've got no problem with using "clothing" remaining for isolation and "weapons" for resource sharing and limiting. Or some other suitable terms. Sam. 1. eg, http://utsl.gen.nz/talks/vserver/slide4c.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 097/101] revert "drivers/net/tulip/dmfe: support basic carrier detection"
On Wed, 2007-03-07 at 10:14 -0800, Stephen Hemminger wrote: > On Wed, 07 Mar 2007 09:12:12 -0800 > Greg KH <[EMAIL PROTECTED]> wrote: > > > > > From: Andrew Morton <[EMAIL PROTECTED]> > > > > Revert 7628b0a8c01a02966d2228bdf741ddedb128e8f8. Thomas Bachler > > reports: > > > > Commit 7628b0a8c01a02966d2228bdf741ddedb128e8f8 (drivers/net/tulip/dmfe: > > support basic carrier detection) breaks networking on my Davicom DM9009. > > ethtool always reports there is no link. tcpdump shows incoming packets, > > but TX is disabled. Reverting the above patch fixes the problem. > > > > Carrier detection support is important and should be fixed rather than > removed. Definitely right. If it doesn't work for your card, it needs to be fixed for your card. Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6 -rt] powerpc 2.6.20-rt8: fix boot/runtime errors/warnings for PowerPC(ppc64)
At Wed, 07 Mar 2007 17:26:50 +0300, Sergei Shtylyov wrote: > > Tsutomu OWA wrote: > > CONFIG_MCOUNT, CONFIG_LATENCY_TRACE and other tracing options nor > > CONFIG_GENERIC_TIME, > > There is PowerPC genTOD patch and it's incorporated into -rt (don't know > it works for Cell) but it breaks TOD vsyscalls. Several months ago I've > posted > patches removing them for the time being: > > clockevents etc are not yet ported. > > Note that there *is* PowerPC clockevents driver already (don't know if it > works for Cell) -- it just never got merged to -rt: I should have written like "... are not yet ported by myself." anyway, thanks for the info. -- owa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
"Paul Menage" <[EMAIL PROTECTED]> writes: > On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: >> The real trick is that I believe these groupings are designed to be something >> you can setup on login and then not be able to switch out of. > > That's going to to be the case for most resource controllers - is that > the case for namespaces? (e.g. can any task unshare say its mount > namespace?) With namespaces there are secondary issues with unsharing. Weird things like a simple unshare might allow you to replace /etc/shadow and thus mess up a suid root application. Once people have worked through those secondary issues unsharing of namespaces is likely allowable (for someone without CAP_SYS_ADMIN). Although if you pick the truly hierarchical namespaces the pid namespace unsharing will simply give you a parent of the current namespace. For resource controls I expect unsharing is likely to be like the pid namespace. You might allow it but if you do you are forced to be a child and possible there will be hierarchy depth restrictions. Assuming you can implement hierarchical accounting without to much expense. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PAGE_SIZE Availability Inconsistency
Hi, On Tuesday 06 March 2007 10:29, Christoph Hellwig wrote: > PAGE_SIZE should not be available at all. Please use getpagesize() > instead. While I agree, NBPG is a bit of a problem, although it's only needed for aout coredumps AFAICT, but still needed to compile e.g. gdb. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: f_owner.lock and file->pos updates
I wrote: I didn't see any clean way to intersperse overwrites and appends to a record-structured file without using vfs_llseek, which steps on f_pos. The context, of course, is an attempt to fix -ENOPATCH with regard to the netlink-based AIO submission scheme I outlined a couple of days ago. :-) Maybe f_pos should be advanced atomically by the number of bytes expected to be read/written, before entering the vfs_(read|write)(|v) call? And then if the read/write doesn't complete normally, f_pos should be decremented by the number of bytes we failed to read/write? Or do we have to make absolutely, positively sure that sampling f_pos from another thread never returns any value outside (before)..(before + bytes read/written)? If so, the only way to cure the worst symptom of the append race appears to be to hold a per-fd lock for the duration of the sys_(read|write). Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: f_owner.lock and file->pos updates
On 3/7/07, Alan Cox <[EMAIL PROTECTED]> wrote: The right way IMHO would be to do the work that was done for pread/pwrite and implement preadv/pwritev. The moment you want to do atomic things with the file->f_pos instead of doing it with a local passed pos value it gets ugly.. why do you need to do it with f_pos ? I didn't see any clean way to intersperse overwrites and appends to a record-structured file without using vfs_llseek, which steps on f_pos. Actually, we may already have a problem with append races in sys_write/sys_writev. If it's possible for two threads to write() to the same file in different threads (both intending to append), they may wind up passing the same "pos" value into vfs_write(). Or does fget_light/fput_light do some sort of locking that I'm not seeing? Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm
On Wed, 7 Mar 2007 12:26:42 +1100 Con Kolivas <[EMAIL PROTECTED]> wrote: > What follows is the same patch series that constitutes the RDSL "Rotating > Staircase DeadLine" cpu scheduler resynced for 2.6.21-rc2-mm2. Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f. I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then oopsed differently. Before netconsole had come up, no serial console, no digital camera. There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably boot that kernel on your own machine. I need to do rc3-mm1 now. I might find some time to poke at this further after that, but I have to leave for a week in .jp and it'll be squeezy, sorry. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Replace misspelled "PRINTK" with "CONFIG_PRINTK".
On Wednesday March 7, [EMAIL PROTECTED] wrote: > > Replace the apparently misspelled preprocessor variable "PRINTK" > with "CONFIG_PRINTK". No, it is meant to be "PRINTK". It dates way way back before my time, but presumably the idea was you could -DPRINTK=something and if you didn't do that, it would figure out what it thought you wanted. Definitely not meant to be CONFIG_PRINTK. NeilBrown > > Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]> > > --- > > not sure who the official maintainer here is, sorry. > > diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > index 5554ada..0c09772 100644 > --- a/drivers/md/bitmap.c > +++ b/drivers/md/bitmap.c > @@ -53,7 +53,7 @@ > //#define DPRINTK PRINTK /* set this NULL to avoid verbose debug output */ > #define DPRINTK(x...) do { } while(0) > > -#ifndef PRINTK > +#ifndef CONFIG_PRINTK > # if DEBUG > 0 > #define PRINTK(x...) printk(KERN_DEBUG x) > # else > > -- > > Robert P. J. Day > Linux Consulting, Training and Annoying Kernel Pedantry > Waterloo, Ontario, CANADA > > http://fsdev.net/wiki/index.php?title=Main_Page > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: Pretty much. For most of the other cases I think we are safe referring to them as resource controls or resource limits.I know that roughly covers what cpusets and beancounters and ckrm currently do. Plus resource monitoring (which may often be a subset of resource control/limits). The real trick is that I believe these groupings are designed to be something you can setup on login and then not be able to switch out of. That's going to to be the case for most resource controllers - is that the case for namespaces? (e.g. can any task unshare say its mount namespace?) Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc2-mm2 hang
On Wed, Mar 07, 2007 at 02:12:16PM -0800, Dave Hansen wrote: > I'm seeing weird hangs running ltp on 2.6.21-rc2-mm2. It manifests > itself by the waitpid06 test in LTP hanging. This is very, very > reproducible in about 5 seconds by adding '-s wait' to the ltp command > line. > > I see 4 waitpid06 processes on my 4-way machine spinning in userspace. > But, the weird part is that I can't ssh in once this happens, but I can > log in to the console. I've bisected it down to: > > sched-fix-idle-load-balancing-in-softirqd-context [having some mailer issues. Pl ignore if this is a duplicate] This sounds like an issue in merge we recently had and 2.6.21-rc2-mm2 already has a fix for this. sched-fix-idle-load-balancing-in-softirqd-context-fix.patch Can you please apply both sched-fix-idle-load-balancing-in-softirqd-context sched-fix-idle-load-balancing-in-softirqd-context-fix.patch and see if you still see this problem? thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
"Paul Menage" <[EMAIL PROTECTED]> writes: > On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote: >> But "namespace" has well-established historical semantics too - a way >> of changing the mappings of local * to global objects. This >> accurately describes things liek resource controllers, cpusets, resource >> monitoring, etc. > > Sorry, I think this statement is wrong, by the generally established > meaning of the term namespace in computer science. > >> >> Trying to extend the well-known term namespace to refer to things that >> are semantically equivalent namespaces is a useful approach, IMHO. >> > > Yes, that would be true. But the kinds of groupings that we're talking > about are supersets of namespaces, not semantically equivalent to > them. To use Eric's "shoe" analogy from earlier, it's like insisting > that we use the term "sneaker" to refer to all footware, including ski > boots and birkenstocks ... Pretty much. For most of the other cases I think we are safe referring to them as resource controls or resource limits.I know that roughly covers what cpusets and beancounters and ckrm currently do. The real trick is that I believe these groupings are designed to be something you can setup on login and then not be able to switch out of. Which means we can't use sessions and process groups as the grouping entities as those have different semantics. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] epoll use a single inode ...
On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote Yeah, I'm not at all surprised. Any implementation of "prefetch" that doesn't just turn into a no-op if the TLB entry doesn't exist (which makes them weaker for *actual* prefetching) will generally have a hard time with a NULL pointer. Exactly because it will try to do a totally unnecessary TLB fill - and since most CPU's will not cache negative TLB entries, that unnecessary TLB fill will be done over and over and over again.. Data prefetch instructions should indeed avoid page table walks. (Instruction prefetch mechanisms often do induce table walks on ITLB miss.) Not just because of the null pointer case, but because it's quite normal to run off the end of an array in a loop with an embedded prefetch instruction. If you have an extra instruction issue unit that shares the same DTLB, and you know you will really want that data, you can sometimes use it to force DTLB preloads by doing an actual data fetch from the foreseeable page. This is potentially one of the best uses of chip multi-threading on an architecture like Sun's Niagara. (I don't think Intel's hyper-threading works for this purpose; the DTLB is shared but the entries are marked as owned by one thread or the other. HT can be used for L2 cache prefetching, although the results so far seem to be mixed: http://www.cgo.org/cgo2004/papers/02_80_Kim_D_REVISED.pdf) In general, using software prefetching is just a stupid idea, unless - the prefetch really is very strict (ie for a linked list you do exactly the above kinds of things to make sure that you don't try to prefetch the non-existent end entry) AND - the CPU is stupid (in-order in particular). I think Intel even suggests in their optimization manuals to *not* do software prefetching, because hw can usually simply do better without it. Not the XScale -- it performs quite poorly without prefetch, as people who have run ARMv5-optimized binaries on it can testify. From the XScale Core Developer's Manual: The Intel XScale(r) core has a true prefetch load instruction (PLD). The purpose of this instruction is to preload data into the data and mini-data caches. Data prefetching allows hiding of memory transfer latency while the processor continues to execute instructions. The prefetch is important to compiler and assembly code because judicious use of the prefetch instruction can enormously improve throughput performance of the core. Data prefetch can be applied not only to loops but also to any data references within a block of code. Prefetch also applies to data writing when the memory type is enabled as write allocate The Intel XScale(r) core prefetch load instruction is a true prefetch instruction because the load destination is the data or mini-data cache and not a register. Compilers for processors which have data caches, but do not support prefetch, sometimes use a load instruction to preload the data cache. This technique has the disadvantages of using a register to load data and requiring additional registers for subsequent preloads and thus increasing register pressure. By contrast, the prefetch can be used to reduce register pressure instead of increasing it. The prefetch load is a hint instruction and does not guarantee that the data will be loaded. Whenever the load would cause a fault or a table walk, then the processor will ignore the prefetch instruction, the fault or table walk, and continue processing the next instruction. This is particularly advantageous in the case where a linked list or recursive data structure is terminated by a NULL pointer. Prefetching the NULL pointer will not fault program flow. People's prejudices against prefetch instructions are sometimes traceable to the 3DNow! prefetch(w) botch, which some processors "support" as no-ops and others are too aggressive about (Opteron prefetches are reputed to be "strong", i. e., not dropped on DTLB miss). XScale gets it right. So do most Pentium 4's using the SSE prefetches, according to the IA-32 optimization manual. (Oddly, Prescott seems to have initiated a page table walk on DTLB miss during software prefetch -- just one of many weird Prescott flaws.) I'm guessing Pentium M and its descendants (Core Solo and Duo) get it right but I'm having a hell of a time finding out for sure. Can any of the x86 experts answer this? Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Linux 2.6.20.1 - unable to handle kernel paging request - accessing freed memory?
--- Pekka J Enberg <[EMAIL PROTECTED]> wrote: > It should give us a better clue which sysfs file is causing the oops. This BUG happened during boot-up! The only USB device I have is a pwc webcam: $ /sbin/lsusb Bus 004 Device 001: ID : Bus 003 Device 001: ID : Bus 002 Device 001: ID : Bus 001 Device 003: ID 046d:08b4 Logitech, Inc. QuickCam Zoom Bus 001 Device 001: ID : Linux version 2.6.20.1 ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)) #3 SMP PREEMPT Thu Mar 1 12:06:59 GMT 2007 BIOS-provided physical RAM map: sanitize start sanitize end copy_e820_map() start: size: 000a end: 000a type: 1 copy_e820_map() type is E820_RAM copy_e820_map() start: 000f size: 0001 end: 0010 type: 2 copy_e820_map() start: 0010 size: 7fe75000 end: 7ff75000 type: 1 copy_e820_map() type is E820_RAM copy_e820_map() start: 7ff75000 size: 2000 end: 7ff77000 type: 4 copy_e820_map() start: 7ff77000 size: 00021000 end: 7ff98000 type: 3 copy_e820_map() start: 7ff98000 size: 00068000 end: 8000 type: 2 copy_e820_map() start: fec0 size: 0009 end: fec9 type: 2 copy_e820_map() start: fee0 size: 0001 end: fee1 type: 2 copy_e820_map() start: ffb0 size: 0050 end: 0001 type: 2 BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7ff75000 (usable) BIOS-e820: 7ff75000 - 7ff77000 (ACPI NVS) BIOS-e820: 7ff77000 - 7ff98000 (ACPI data) BIOS-e820: 7ff98000 - 8000 (reserved) BIOS-e820: fec0 - fec9 (reserved) BIOS-e820: fee0 - fee1 (reserved) BIOS-e820: ffb0 - 0001 (reserved) 1151MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000fe710 Entering add_active_range(0, 0, 524149) 0 entries of 256 used Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 229376 HighMem229376 -> 524149 early_node_map[1] active PFN ranges 0:0 -> 524149 On node 0 totalpages: 524149 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 4064 pages, LIFO batch:0 Normal zone: 1760 pages used for memmap Normal zone: 223520 pages, LIFO batch:31 HighMem zone: 2302 pages used for memmap HighMem zone: 292471 pages, LIFO batch:31 DMI 2.3 present. ACPI: RSDP (v000 DELL ) @ 0x000febc0 ACPI: RSDT (v001 DELLWS 650 0x0009 ASL 0x0061) @ 0x000fd4f1 ACPI: FADT (v001 DELLWS 650 0x0009 ASL 0x0061) @ 0x000fd529 ACPI: SSDT (v001 DELLst_ex 0x1000 MSFT 0x010d) @ 0xfffefafa ACPI: MADT (v001 DELLWS 650 0x0009 ASL 0x0061) @ 0x000fd59d ACPI: BOOT (v001 DELLWS 650 0x0009 ASL 0x0061) @ 0x000fd621 ACPI: ASF! (v016 DELLWS 650 0x0009 ASL 0x0061) @ 0x000fd649 ACPI: DSDT (v001 DELLdt_ex 0x1000 MSFT 0x010d) @ 0x ACPI: PM-Timer IO Port: 0x808 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled) Processor #6 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled) Processor #1 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled) Processor #7 15:2 APIC version 20 ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23 ACPI: IOAPIC (id[0x09] address[0xfec8] gsi_base[24]) IOAPIC[1]: apic_id 9, version 32, address 0xfec8, GSI 24-47 ACPI: IOAPIC (id[0x0a] address[0xfec80800] gsi_base[48]) IOAPIC[2]: apic_id 10, version 32, address 0xfec80800, GSI 48-71 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 3 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 8800 (gap: 8000:7ec0) Detected 2658.187 MHz processor. Built 1 zonelists. Total pages: 520055 Kernel command line: ro root=LABEL=/ nmi_watchdog=1 elevator=cfq console=ttyS0,115200n8 console=tty0 acpi_pm_good mapped APIC to d000 (fee0) mapped IOAPIC to c000 (fec0) mapped IOAPIC to b000 (fec8) mapped IOAPIC to a000 (fec80800) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c0345000 soft=c033d000 PID hash table entries: 4096 (order: 12, 16384 bytes) Console: colour VGA+ 80x25 Dentry
Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree
Daniel Arai wrote: > But more importantly, we want a kernel that can run both on native hardware > and > in a paravirtualized environment. Linux doesn't really provide abstractions > for > replacing the appropriate code. We tried to hook into the source code at a > level that seemed possible. > Xen doesn't support any kind of apic emulation, so we'll need to hook anything which relies on an apic. The ipi code you quote below will probably be one of those. My opinion is that pv_ops shouldn't have raw apic operations, but instead have appropriate high-level interfaces to achieve the same ends. Zach's counter-argument was basically your's: that the VMI code will use a lot of the native code except for the actual apic operations. I can live with VMI emulating apics if it wants, so long as it does it in private and doesn't make a big scene about it. We'll need the high-level interfaces regardless. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 5/5] signalfd v2 - compat code ...
This patch implement the necessary compat code for the signalfd system calls. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/fs/compat.c === --- linux-2.6.20.ep2.orig/fs/compat.c 2007-03-07 13:28:39.0 -0800 +++ linux-2.6.20.ep2/fs/compat.c2007-03-07 13:42:18.0 -0800 @@ -46,6 +46,7 @@ #include #include #include +#include #include #include @@ -2235,3 +2236,41 @@ return sys_ni_syscall(); } #endif + +asmlinkage long compat_sys_signalfd(int ufd, + const compat_sigset_t __user *sigmask, + compat_size_t sigsetsize) +{ + compat_sigset_t ss32; + sigset_t tmp; + sigset_t __user *ksigmask; + + if (sigsetsize != sizeof(compat_sigset_t)) + return -EINVAL; + if (copy_from_user(, sigmask, sizeof(ss32))) + return -EFAULT; + sigset_from_compat(, ); + ksigmask = compat_alloc_user_space(sizeof(sigset_t)); + if (copy_to_user(ksigmask, , sizeof(sigset_t))) + return -EFAULT; + + return sys_signalfd(ufd, ksigmask, sizeof(sigset_t)); +} + +asmlinkage long compat_sys_signalfd_dequeue(int fd, + struct compat_siginfo __user *info, + long timeo) +{ + siginfo_t kinfo; + long ret; + mm_segment_t old_fs = get_fs(); + + set_fs(KERNEL_DS); + ret = sys_signalfd_dequeue(fd, (siginfo_t __user *) , timeo); + set_fs(old_fs); + if (!ret) + ret = copy_siginfo_to_user32(info, ); + + return ret; +} + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/5] signalfd v2 - anonymous inode source ...
This patch add an anonymous inode source, to be used for files that need and inode only in order to create a file*. We do not care of having an inode for each file, and we do not even care of having different names in the associated dentries (dentry names will be same for classes of file*). This allow code reuse, and will be used by epoll, signalfd and timerfd (and whatever else there'll be). (Andrew already has this in -mm) Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/fs/anon_inodes.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.20.ep2/fs/anon_inodes.c 2007-03-07 15:58:01.0 -0800 @@ -0,0 +1,203 @@ +/* + * fs/anon_inodes.c + * + * Copyright (C) 2007 Davide Libenzi + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +static int ainofs_delete_dentry(struct dentry *dentry); +static struct inode *aino_getinode(void); +static struct inode *aino_mkinode(void); +static int ainofs_get_sb(struct file_system_type *fs_type, int flags, +const char *dev_name, void *data, struct vfsmount *mnt); + + + +static struct vfsmount *aino_mnt __read_mostly; +static struct inode *aino_inode; +static struct file_operations aino_fops = { }; +static struct file_system_type aino_fs_type = { + .name = "ainofs", + .get_sb = ainofs_get_sb, + .kill_sb= kill_anon_super, +}; +static struct dentry_operations ainofs_dentry_operations = { + .d_delete = ainofs_delete_dentry, +}; + + + +int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile, + char const *name, const struct file_operations *fops, void *priv) +{ + struct qstr this; + struct dentry *dentry; + struct inode *inode; + struct file *file; + int error, fd; + + error = -ENFILE; + file = get_empty_filp(); + if (!file) + goto eexit_1; + + inode = aino_getinode(); + if (IS_ERR(inode)) { + error = PTR_ERR(inode); + goto eexit_2; + } + + error = get_unused_fd(); + if (error < 0) + goto eexit_3; + fd = error; + + /* +* Link the inode to a directory entry by creating a unique name +* using the inode sequence number. +*/ + error = -ENOMEM; + this.name = name; + this.len = strlen(name); + this.hash = 0; + dentry = d_alloc(aino_mnt->mnt_sb->s_root, ); + if (!dentry) + goto eexit_4; + dentry->d_op = _dentry_operations; + /* Do not publish this dentry inside the global dentry hash table */ + dentry->d_flags &= ~DCACHE_UNHASHED; + d_instantiate(dentry, inode); + + file->f_path.mnt = mntget(aino_mnt); + file->f_path.dentry = dentry; + file->f_mapping = inode->i_mapping; + + file->f_pos = 0; + file->f_flags = O_RDONLY; + file->f_op = fops; + file->f_mode = FMODE_READ; + file->f_version = 0; + file->private_data = priv; + + fd_install(fd, file); + + *pfd = fd; + *pinode = inode; + *pfile = file; + return 0; + +eexit_4: + put_unused_fd(fd); +eexit_3: + iput(inode); +eexit_2: + put_filp(file); +eexit_1: + return error; +} + + +static int ainofs_delete_dentry(struct dentry *dentry) +{ + /* +* We faked vfs to believe the dentry was hashed when we created it. +* Now we restore the flag so that dput() will work correctly. +*/ + dentry->d_flags |= DCACHE_UNHASHED; + return 1; +} + + +static struct inode *aino_getinode(void) +{ + return igrab(aino_inode); +} + + +/* + * A single inode exist for all aino files. On the contrary of pipes, + * aino inodes has no per-instance data associated, so we can avoid + * the allocation of multiple of them. + */ +static struct inode *aino_mkinode(void) +{ + int error = -ENOMEM; + struct inode *inode = new_inode(aino_mnt->mnt_sb); + + if (!inode) + goto eexit_1; + + inode->i_fop = _fops; + + /* +* Mark the inode dirty from the very beginning, +* that way it will never be moved to the dirty +* list because mark_inode_dirty() will think +* that it already _is_ on the dirty list. +*/ + inode->i_state = I_DIRTY; + inode->i_mode = S_IRUSR | S_IWUSR; + inode->i_uid = current->fsuid; + inode->i_gid = current->fsgid; + inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + return inode; + +eexit_1: + return ERR_PTR(error); +} + + +static int ainofs_get_sb(struct file_system_type *fs_type, int flags, +const char *dev_name, void *data, struct vfsmount *mnt) +{ + return get_sb_pseudo(fs_type, "aino:",
[patch 2/5] signalfd v2 - signalfd core ...
This patch series implements the new signalfd() and signalfd_dequeue() system calls. I took part of the original Linus code (and you know how badly it can be broken :), and I added even more breakage ;) The patch had to be almost completely changed. This patch allows multiple signalfd to listen for signals on the same sighand, w/out raing with dequeue_signal. Plus other changes that I don't remember (see here for the original patch http://tinyurl.com/3yuna5 ). This seems to be working fine on my Dual Opteron machine. I made a quick test program for it: http://www.xmailserver.org/signafd-test.c The signalfd() system call implements signal delivery into a file descriptor receiver. The signalfd file descriptor if created with the following API: int signalfd(int ufd, const sigset_t *mask, size_t masksize); The "ufd" parameter allows to change an existing signalfd sigmask, w/out going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new signalfd file. The "mask" allows to specify the signal mask of signals that we are interested in. The "masksize" parameter is the size of "mask". Note that signalfd delivery and standard signal delivery can go in parallel. So you can receive signals on the signalfd file, and on the signal handlers. This makes the system more flexible IMO. If you don't want to see standard delivery, just pass the same "mask" to sigprocmask(SIG_BLOCK). The signalfd fd supports the poll(2) system call. The poll(2) will return POLLIN when signals are available to be dequeued. As a direct consequence of supporting the Linux poll subsystem, the signalfd fd can use used together with epoll(2) too. A new system call has been also introduced to allow signal dequeueing: int signalfd_dequeue(int fd, siginfo_t *info, long timeo); The "fd" parameter must ba a signalfd file descriptor. The "info" parameter is a pointer to the siginfo that will receive the dequeued signal, and "timeo" is a timeout in milliseconds, or -1 for infinite. The signalfd_dequeue function returns 0 if successfull. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/fs/signalfd.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.20.ep2/fs/signalfd.c 2007-03-07 17:06:07.0 -0800 @@ -0,0 +1,369 @@ +/* + * fs/signalfd.c + * + * Copyright (C) 2003 Linus Torvalds + * + * Mon Mar 5, 2007: Davide Libenzi + * Changed signal delivery and de-queueing. + * Now using anonymous inode source. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + + +#define MAX_MSTIMEO min(1000ULL * MAX_SCHEDULE_TIMEOUT / HZ, (LONG_MAX - 999ULL) / HZ) + + + +struct signalfd_ctx { + struct list_head lnk; + wait_queue_head_t wqh; + sigset_t sigmask; + sigset_t pending; + struct list_head squeue[_NSIG]; + long lost_sigs; + struct task_struct *tsk; + struct sighand_struct *sighand; +}; + +struct signalfd_sq { + struct list_head lnk; + siginfo_t info; +}; + + + +static void signalfd_cleanup(struct signalfd_ctx *ctx); +static int signalfd_close(struct inode *inode, struct file *file); +static unsigned int signalfd_poll(struct file *filp, poll_table *wait); +static struct signalfd_sq *signalfd_fetchsig(struct signalfd_ctx *ctx); + + + +static const struct file_operations signalfd_fops = { + .release= signalfd_close, + .poll = signalfd_poll, +}; +static struct kmem_cache *signalfd_ctx_cachep; +static struct kmem_cache *signalfd_sq_cachep; + + +/* + * This must be called with the sighand lock held. + */ +int signalfd_deliver(struct sighand_struct *sighand, int sig, struct siginfo *info) +{ + int nsig = 0; + struct list_head *pos; + struct signalfd_ctx *ctx; + struct signalfd_sq *sq; + + list_for_each(pos, >sfdlist) { + ctx = list_entry(pos, struct signalfd_ctx, lnk); + /* +* We use a negative signal value as a way to broadcast that the +* sighand has been orphaned, so that we can notify all the +* listeners about this. +*/ + if (sig < 0) + __wake_up_locked(>wqh, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE); + else if (sigismember(>sigmask, sig) && +(sig >= SIGRTMIN || !sigismember(>pending, sig))) { + sigaddset(>pending, sig); + sq = kmem_cache_alloc(signalfd_sq_cachep, GFP_ATOMIC); + if (sq) { + signal_fill_info(>info, sig, info); + list_add_tail(>lnk, >squeue[sig - 1]); + } else + ctx->lost_sigs++; + __wake_up_locked(>wqh,
[patch 3/5] signalfd v2 - wire i386 syscall ...
This patch wire the signalfd system calls to the i386 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S === --- linux-2.6.20.ep2.orig/arch/i386/kernel/syscall_table.S 2007-03-07 11:07:45.0 -0800 +++ linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S 2007-03-07 12:34:33.0 -0800 @@ -319,3 +319,5 @@ .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_signalfd /* 320 */ + .long sys_signalfd_dequeue Index: linux-2.6.20.ep2/include/asm-i386/unistd.h === --- linux-2.6.20.ep2.orig/include/asm-i386/unistd.h 2007-03-07 11:07:45.0 -0800 +++ linux-2.6.20.ep2/include/asm-i386/unistd.h 2007-03-07 12:34:02.0 -0800 @@ -325,10 +325,12 @@ #define __NR_move_pages317 #define __NR_getcpu318 #define __NR_epoll_pwait 319 +#define __NR_signalfd 320 +#define __NR_signalfd_dequeue 321 #ifdef __KERNEL__ -#define NR_syscalls 320 +#define NR_syscalls 322 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/5] signalfd v2 - wire x86_64 syscall ...
This patch wire the signalfd system calls to the x86_64 architecture. Signed-off-by: Davide Libenzi - Davide Index: linux-2.6.20.ep2/include/asm-x86_64/unistd.h === --- linux-2.6.20.ep2.orig/include/asm-x86_64/unistd.h 2007-03-07 13:28:41.0 -0800 +++ linux-2.6.20.ep2/include/asm-x86_64/unistd.h2007-03-07 13:42:12.0 -0800 @@ -619,8 +619,12 @@ __SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages279 __SYSCALL(__NR_move_pages, sys_move_pages) +#define __NR_signalfd 280 +__SYSCALL(__NR_signalfd, sys_signalfd) +#define __NR_signalfd_dequeue 281 +__SYSCALL(__NR_signalfd_dequeue, sys_signalfd_dequeue) -#define __NR_syscall_max __NR_move_pages +#define __NR_syscall_max __NR_signalfd_dequeue #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR Index: linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.20.ep2.orig/arch/x86_64/ia32/ia32entry.S 2007-03-07 13:28:41.0 -0800 +++ linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S 2007-03-07 13:42:12.0 -0800 @@ -714,8 +714,11 @@ .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu -ia32_syscall_end: + .quad sys_epoll_pwait + .quad sys_signalfd /* 320 */ + .quad sys_signalfd_dequeue +ia32_syscall_end: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] tcp_cubic: use 32 bit math
The basic calculation has to be done in 32 bits to avoid doing 64 bit divide by 3. The value x is only 22bits max so only need full 64 bits only for x^2. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/ipv4/tcp_cubic.c |8 1 file changed, 4 insertions(+), 4 deletions(-) --- net-2.6.22.orig/net/ipv4/tcp_cubic.c2007-03-07 15:51:37.0 -0800 +++ net-2.6.22/net/ipv4/tcp_cubic.c 2007-03-07 17:06:02.0 -0800 @@ -96,7 +96,7 @@ */ static u32 cubic_root(u64 a) { - u64 x; + u32 x; /* Initial estimate is based on: * cbrt(x) = exp(log(x) / 3) @@ -104,9 +104,9 @@ x = 1u << (fls64(a)/3); /* converges to 32 bits in 3 iterations */ - x = (2 * x + div64_64(a, x*x)) / 3; - x = (2 * x + div64_64(a, x*x)) / 3; - x = (2 * x + div64_64(a, x*x)) / 3; + x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3; + x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3; + x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3; return x; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix get_unmapped_area and fsync for hugetlb shm segments
Bill Irwin <[EMAIL PROTECTED]> writes: > On Wed, Mar 07, 2007 at 04:03:17PM -0700, Eric W. Biederman wrote: >> I think the right answer is most likely to add an extra file method or >> two so we can remove the need for is_file_hugepages. >> There are still 4 calls to is_file_hugepages in ipc/shm.c and >> 2 calls in mm/mmap.c not counting the one in is_file_shm_hugepages. >> The special cases make it difficult to properly wrap hugetlbfs files >> with another file, which is why we have the weird special case above. > > It's not clear to me that the core can be insulated from hugetlb's > distinct pagecache and memory mapping granularities in a Linux-native > manner, but if you come up with something new or manage to get the > known methods past Linus, akpm, et al, more power to you. I will agree with that there are limits on what can be achieved. However looking at where we have tests for is_file_hugepages most of those tests don't appear to be inherently anything to do with huge pages, so it wouldn't surprise me if we couldn't generalize things a little more. > I'm not entirely sure what you're up to, but I'm mostly here to sanction > others' design notions since my own are far too extreme, and, of course, > review and ack patches, take bugreports and write fixes (not that I've > managed to get to any of them first in a long while, if ever), and so on. > I say killing the is_whatever_hugepages() checks with whatever abstraction > is good, since I don't like them myself, provided it's sane. Go for it. Mostly I had reference counting and consistency problems with ipc/shm.c that had horrible leak potential when I exited a ipc namespace. Implementing everything as stacked files made the code simpler and more maintainable. (shm_nattach stopped being a special case yea!) I'm happy to stop here but if someone cares to proceed with removing is_file_hugepages I want to encourage that. I don't see any other cleanups short of that are really worth doing. Everything in ipc/shm.c could be considered a weird special case, so I'm not going to worry about it too much. Although removing those special cases is good. There is some odd accounting logic in mm/mmap.c based on is_file_hugepages and there is the get_unmapped_area case. For get_unmapped_area I see no reason to presume that the only kind of file that must live at a specific address are huge pages (even if that is the only kind of file where we have that case today). So generalizing that check should be relatively straight forward. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree
Thomas Gleixner wrote: You managed to avoid the usage of other code (i.e. PIT / HPET) already, so why is it sooo desireable to emulate apics instead of substituting it by a small and sane replacement ? Just because you happen to have an LAPIC emulator ? That's no reason to wire yourself into the kernel code and make it harder to change and maintain. There are several reasons why it's desirable to emulate the APIC. As you mentioned, we already have APIC emulation, and APIC emulation isn't a huge bottleneck on most workloads. Our code works, the Linux code works, and replacing both pieces of code with something "small and sane" isn't going to improve performance very much, so why bother? Any hypervisor implementation is going to be a tradeoff between what's easy to implement in the hypervisor, what's easy to implement in the guest operating system, and what's performance critical. Secondly, not all (para-)virtualized operating systems will want to use abstracted devices. Some virtual operating systems will be given direct access to hardware devices, and will need to run the actual driver for that device and not some abstracted device driver. So I don't buy your argument that every piece of the kernel that interacts with a paravirtualized driver should have a "small and sane replacement." But more importantly, we want a kernel that can run both on native hardware and in a paravirtualized environment. Linux doesn't really provide abstractions for replacing the appropriate code. We tried to hook into the source code at a level that seemed possible. For example, take smp_call_function(). What this essentially does is call send_IPI_allbutself(). void fastcall send_IPI_self(int vector) { __send_IPI_shortcut(APIC_DEST_SELF, vector); } void __send_IPI_shortcut(unsigned int shortcut, int vector) { /* * Subtle. In the case of the 'never do double writes' workaround * we have to lock out interrupts to be safe. As we don't care * of the value read we use an atomic rmw access to avoid costly * cli/sti. Otherwise we use an even cheaper single atomic write * to the APIC. */ unsigned int cfg; /* * Wait for idle. */ apic_wait_icr_idle(); /* * No need to touch the target chip field */ cfg = __prepare_ICR(shortcut, vector); /* * Send the IPI. The write to APIC_ICR fires this off. */ apic_write_around(APIC_ICR, cfg); } There's no good way to override __send_IPI_shortcut. I suppose we could add paravirt ops for __send_IPI_shortcut and every other op that touches the APIC. But there are dozens of functions in apic.c that would need to be included in paravirt ops. And for our implementation, we really just want to override apic_read and apic_write, since we can make these faster when done through hypercalls than through memory accesses. If we were to make these paravirt ops, their implementations would be the same, except with a different apic_read and apic_write. This is a whole lot of useless code duplication. Most of the interrupt system is not written in such a way that multiple APICs implementations can be selected from at boot time. This is an absolute requirement so that the same kernel can boot on native and in a paravirtualized environment. While this could be implemented, it seems like a waste of time, since we can just emulate something similar to a real interrupt system and not change things very much. Dan Arai VMware, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote: But "namespace" has well-established historical semantics too - a way of changing the mappings of local * to global objects. This accurately describes things liek resource controllers, cpusets, resource monitoring, etc. Sorry, I think this statement is wrong, by the generally established meaning of the term namespace in computer science. Trying to extend the well-known term namespace to refer to things that are semantically equivalent namespaces is a useful approach, IMHO. Yes, that would be true. But the kinds of groupings that we're talking about are supersets of namespaces, not semantically equivalent to them. To use Eric's "shoe" analogy from earlier, it's like insisting that we use the term "sneaker" to refer to all footware, including ski boots and birkenstocks ... Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v2] epoll use a single inode ...
On Tue, Mar 06, 2007 at 21:20:33 +0100, Eric Dumazet wrote: ... > I rewrote the reciprocal_div() for i386 so that one multiply is used. > > static inline u32 reciprocal_divide(u32 A, u32 R) > { > #if __i386 > unsigned int edx, eax; > asm("mul %2":"=a" (eax), "=d" (edx):"rm" (R), "0" (A)); ^^^ mul does not work if R is memory operand. mull should be used instead. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
the usage of DEBUG_DRIVER seems ambiguous
the usage of the DEBUG_DRIVER preprocessor variable is a big confusing: $ $ grep -rw DEBUG_DRIVER * drivers/net/sunlance.c:#undef DEBUG_DRIVER drivers/net/a2065.c:#ifdef DEBUG_DRIVER drivers/net/a2065.c:#ifdef DEBUG_DRIVER drivers/net/7990.c:#ifdef DEBUG_DRIVER drivers/net/7990.c:#ifdef DEBUG_DRIVER drivers/base/Kconfig:config DEBUG_DRIVER ... it's clearly a configuration variable, but it's also being used by itself in a few drivers/net/ source files. is that deliberate? rday -- Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://fsdev.net/wiki/index.php?title=Main_Page - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: system call time increase when turning on CONFIG_PARAVIRT
On Fri, 2007-03-02 at 16:16 -0800, Jeremy Fitzhardinge wrote: > > Yes, the intent is that running a CONFIG_PARAVIRT kernel on native > hardware will have negligible performance hit compared to running a > non-paravirt kernel. > > J It turned out that VDSO was turned off by CONFIG_PARAVIRT option, causing the system call to use inefficient int 0x80 which led to the increase system_call time I was seeing. I noted that Ingo has caught this problem and proposed a patch to correct this issue in another mail thread. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: system call time increase when turning on CONFIG_PARAVIRT
Tim Chen wrote: > It turned out that VDSO was turned off by CONFIG_PARAVIRT option, > causing the system call to use inefficient int 0x80 which led to the > increase system_call time I was seeing. I noted that Ingo has caught > this problem and proposed a patch to correct this issue in another mail > thread. Thanks for identifying this. We'll be posting a more general fix for COMPAT_VDSO soon which will address this. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Paul Menage wrote: > But "namespace" has well-established historical semantics too - a way > of changing the mappings of local names to global objects. This > doesn't describe things liek resource controllers, cpusets, resource > monitoring, etc. > > Trying to extend the well-known term namespace to refer to things that > aren't namespaces isn't a useful approach, IMO. > > Paul > But "namespace" has well-established historical semantics too - a way of changing the mappings of local * to global objects. This accurately describes things liek resource controllers, cpusets, resource monitoring, etc. Trying to extend the well-known term namespace to refer to things that are semantically equivalent namespaces is a useful approach, IMHO. Sam. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cifs: remove useless cargo-cult checks
Christoph Hellwig <[EMAIL PROTECTED]> wrote on 03/07/2007 04:17:46 PM: > On Wed, Mar 07, 2007 at 12:51:04PM -0600, Steven French wrote: > > Is there an easy way to mirror particular patches going into the > > cifs-2.6.git tree (which is pulled into mm) to lkml? > > Maybe some git expert can comment on that. What I would be looking for is a way via e.g. "git commit" (to my project tree on kernel.org) to pass it an option to send a copy of the patch to lkml or some list (or perhaps the reverse, set a flag that says don't bother mirroring patch for review to fsdevel or lkml). With Samba, some people just watch all commits, but for the kernel that is way too many. > > The cifs patches go in mm for at least a week before they go into kernel > > but some of them I would like to post again to lkml. > > polling -mm is a little hard as it's an enormous blob, so posting to > lkml or -fsdevel would definitively be quite helpfull. Yes agreed (watching fsdevel is easier than scanning every new -mm patch) - but I would rather not bore people, and make them waste time on fsdevel or lkml looking at every single cifs patch. Only about three of the past 10 cifs patches were interesting enough to ask for detailed review (and I would have loved an easier way to get review on those - as I would love to get more review of Q's interesting DFS patch - but it is hard in practice to make this easy). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Srivatsa Vaddagiri wrote: > container structure in your patches provides for these things: > > a. A way to group tasks > b. A way to maintain several hierarchies of such groups > > If you consider just a. then I agree that container abstraction is > redundant, esp for vserver resource control (nsproxy can already be used > to group tasks). > > What nsproxy doesn't provide is b - a way to represent hierarchies of > groups. > Well, that's like saying you can't put hierarchical data in a relational database. The hierarchy question is an interesting one, though. However I believe it first needs to be broken down into subsystems and considered on a subsystem-by-subsystem basis again, and if general patterns are observed, then a common solution should stand out. Let's go back to the namespaces we know about and discuss how hierarchies apply to them. Please those able to brainstorm, do so - I call green hat time. 1. UTS namespaces Can a UTS namespace set any value it likes? Can you inspect or set the UTS namespace values of a subservient UTS namespace? 2. IPC namespaces Can a process in an IPC namespace send a signal to those in a subservient namespace? 3. PID namespaces Can a process in a PID namespace see the processes in a subservient namespace? Do the processes in a subservient namespace appear in a higher level namespace mapped to a different set of PIDs? 4. Filesystem namespaces Can we see all of the mounts in a subservient namespace? Does our namespace receive updates when their namespace mounts change? (perhaps under a sub-directory) 5. L2 network namespaces Can we see or alter the subservient network namespace's interfaces/iptables/routing? Are any of the subservient network namespace's interfaces visible in our namespace, and by which mapping? 6. L3 network namespaces Can we bind to a subservient network namespace's addresses? Can we give or remove addresses to and from the subservient network namespace's namespace? Can we allow the namespace access to modify particular IP tables? 7. resource namespaces Is the subservient namespace's resource usage counting against ours too? Can we dynamically alter the subservient namespace's resource allocations? 8. anyone else? So, we can see some general trends here - but it's never quite the same question, and I think the best answers will come from a tailored approach for each subsystem. Each one *does* have some common questions - for instance, "is the namespace allowed to create more namespaces of this type". That's probably a capability bit for each, though. So let's bring this back to your patches. If they are providing visibility of ns_proxy, then it should be called namesfs or some such. It doesn't really matter if processes disappear from namespace aggregates, because that's what's really happening anyway. The only problem is that if you try to freeze a namespace that has visibility of things at this level, you might not be able to reconstruct the filesystem in the same way. This may or may not be considered a problem, but open filehandles and directory handles etc surviving a freeze/thaw is part of what we're trying to achieve. Then again, perhaps some visibility is better than none for the time being. If they are restricted entirely to resource control, then don't use the nsproxy directly - use the structure or structures which hang off the nsproxy (or even task_struct) related to resource control. Sam. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/6 -rt] powerpc 2.6.20-rt8: to convert spinlocks to raw ones.
On Thu, Mar 08, 2007 at 08:30:43AM +1100, Paul Mackerras wrote: > Sergei Shtylyov writes: > > > I've floowed up to my patch with such explanation. In the context of > > an-rt > > patch itself, it was just too clear, hence I didn't go into explanations in > > the patch itself. :-) > > Well, it might be clear, to you, now, with the context in your head. > But if such a patch is to go into a git tree, and somebody comes along > in 3 years time and wants to know exactly why you made that change > (and maybe that somebody is you :), then they will need more detail - > such as how you came to the conclusion that those locks and no others > needed to be changed, for instance. > > At least give some of the reasoning behind your choice of which locks > to convert, so that in future, if the patch turns out to have > introduced a bug somehow, the person debugging it can either identify > that there was a flaw in your logic, or else understand something that > you have seen that they missed. Paul, It has to do with how locking is done in the -rt patch itself. It's probably before the time of general maintainers since the -rt patch hasn't been fully merged, but I agree a document needs to be written outlining what needs to be changed to spinlocks and what locks can be emulated with the rtmutex.c/rt.c logic. There aren't that many people that know specifically unless they've tried to map out chunks of the Linux kernel for this purpose in the first place. I only know because of my own parallel effort to get the kernel to be preemptive (the old mmLinux project that I abandoned for Ingo's stuff). Generally, things that run within interrupt contexts need to be spinlocks. The interrupt controller is one of those things obviously, the timer interrupt for practical reasons such as performance and other places so that locking is outside of direct control and scope of the scheduler. Of course the scheduler's runqueues needs to be spinlocked for the reasons above otherwise your system is stuck with a kind chicken and the egg problem interacting with the scheduler. The places that need to be reverted to raw spinlocks are generally either acquired by function calls that allocate the spinlock at a terminal of the kernel's lock graph or isolated from other callers completely (parts of the timer for logic for instance). It's all about the collision of various lock (preemptive and non-preemptive) subtrees and how to avoid scheduling within atomic violations that lead to deadlocks. The -rt patch gets arbitrary preemption abilities by shrinking the non-preemptive sub-tree bit to the bare essentials of what will let a system to run yet still preserve all of the expected locking semantics of a critical section. Otherwise everything by default is backed by a blocking rtmutex identity to provide for correct preemptivity behavior within critical sections. That is why these reverts are needed to restore the mathematical correctness of the kernel's locking structures. I hope this is helpful. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend
On Thursday, 8 March 2007 01:20, Dave Jones wrote: > On Thu, Mar 08, 2007 at 12:13:05AM +0100, Rafael J. Wysocki wrote: > > > > > Well, the WARN_ON() in > arch/x86_64/kernel/acpi/sleep.c:init_low_mapping() > > > > triggers every time an SMP x86_64 box is suspended to disk using the > platform > > > > mode (default), which is quite annoying IMHO and users think something > wrong is > > > > going on. This will probably cause them to report the problem and I'd > rather > > > > like to avoid handling these reports. ;-) > > > > > > Well sure - if patches were always error-free, we'd always apply them > > > immediately. > > > > > > The question is: is the risk of this patch breaking things exceeded by > the > > > benefit which you describe? > > > > Well, it has survived some testing (http://lkml.org/lkml/2007/3/7/16). > Also, > > before the code ordering in 2.6.21-rc* we had been running on one CPU > > here, so I think the risk is small. > > > > We could remove the WARN_ON() as Pavel has just suggested, but first I'd > like > > to know who put it there and why. > > It was introduced as part of .. > > commit 55b2355eefc2f160246226d4d69fed431173a4d5 > Author: Shaohua Li <[EMAIL PROTECTED]> > Date: Fri Jun 23 02:04:49 2006 -0700 > > [PATCH] don't use flush_tlb_all in suspend time > > flush_tlb_all uses on_each_cpu, which will disable/enable interrupt. > In suspend/resume time, this will make interrupt wrongly enabled. Ah, thanks. So the question is what can go wrong if we ignore the TLBs of the other CPUs that may be on-line when init_low_mapping() is executed. Frankly, I don't know. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote: Paul Menage wrote: >> In the namespace world when we say container we mean roughly at the level >> of nsproxy and container_group. >> > So you're saying that a task can only be in a single system-wide container. > Nope, we didn't make the mistake of nailing down what a "container" was too far before it is implemented. We talked before about containers-within-containers because, inevitably if you provide a feature you'll end up having to deal with virtualising systems that in turn use that feature. Sure, my aproach allows containers hierarchically as children of other containers too. > My patch provides multiple potentially-independent ways of dividing up > the tasks on the system - if the "container" is the set of all > divisions that the process is in, what's an appropriate term for the > sub-units? > namespace, since 2.4.x > That assumes the viewpoint that your terminology is "correct" and > other people's needs "fixing". :-) > Absolutely. Please respect the semantics established so far; changing them adds nothing at the cost of much confusion. But "namespace" has well-established historical semantics too - a way of changing the mappings of local names to global objects. This doesn't describe things liek resource controllers, cpusets, resource monitoring, etc. Trying to extend the well-known term namespace to refer to things that aren't namespaces isn't a useful approach, IMO. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sleeping thread not receive signal until it wakes up
On 3/7/07, Lee Revell <[EMAIL PROTECTED]> wrote: On 3/7/07, linux-os (Dick Johnson) <[EMAIL PROTECTED]> wrote: > Interruptible_sleep_on is interruptible, but for your task to > actually be awakened and your alarm handler to get some CPU, > it needs to be scheduled. If the BKL (big kernel lock) is > held, it won't be scheduled until it is released. You can schedule while holding the BKL and it will be dropped and reacquired. Lee My hardware is PowerPC architecture. Does it have any thing to do with the kernel locking? Also, I saw CONFIG_LOCK_KERNEL, CONFIG_PREEMPT_BKL ans CONFIG_SMP in the file include/linux/smp_lock.h, or CONFIG_PREEMPT in lib/kernel_lock.c and I don't have any of these macro defined, would that be the reason. I could not find where these option when running make menuconfig either. Thanks, LNgo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree
Thomas Gleixner wrote: > Sigh. The cut zero hairball is already in mainline. :( > Yes, there were a couple of unfortunate patches in that series, but they got fast-tracked in with the promise they would get fixed asap. > Sure. If the clockevent API is changed, then the users get fixed. This > is not my main concern. The "oh we reuse the PIT interrupt" reachout is > what makes life hard. VMI does this already extensive and I'm frightened > by it. > Well, I think they know what's expected of them now. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bad regression v 2.6.19 from the ATA ACPI merge
Every single non-PCI controller has been broken by this code. pata_get_dev_handle() assumes that the passed ata_port is PCI. The libata-core code does not do any checking. This causes everyone to experience oopses with pata_pcmcia for example. Multiple examples of the bug in our FC7test tree reports from end users trying the new libata and kernels. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: e1000 oops on boot [Re: 2.6.21-rc2-mm2]
Randy Dunlap wrote: On Wed, 7 Mar 2007 16:23:15 -0800 Andrew Morton wrote: The below will apppear in -rc3-mm1 (hopefully later today) and it will hopefully fix that crash. From: Auke Kok <[EMAIL PROTECTED]> --- drivers/net/e1000/e1000_main.c | 66 +-- 1 files changed, 45 insertions(+), 21 deletions(-) diff -puN drivers/net/e1000/e1000_main.c~e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq drivers/net/e1000/e1000_main.c --- a/drivers/net/e1000/e1000_main.c~e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq +++ a/drivers/net/e1000/e1000_main.c @@ -522,14 +522,15 @@ e1000_release_manageability(struct e1000 } } Auke: Below, please s/@adapter =/@adapter:/ to make it be correct kernel-doc notation. ah, sorry about that :) I'll adjust it later when doing some more cleanups. Thanks for the pointer. Auke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Paul Menage wrote: >> In the namespace world when we say container we mean roughly at the level >> of nsproxy and container_group. >> > So you're saying that a task can only be in a single system-wide container. > Nope, we didn't make the mistake of nailing down what a "container" was too far before it is implemented. We talked before about containers-within-containers because, inevitably if you provide a feature you'll end up having to deal with virtualising systems that in turn use that feature. > My patch provides multiple potentially-independent ways of dividing up > the tasks on the system - if the "container" is the set of all > divisions that the process is in, what's an appropriate term for the > sub-units? > namespace, since 2.4.x > That assumes the viewpoint that your terminology is "correct" and > other people's needs "fixing". :-) > Absolutely. Please respect the semantics established so far; changing them adds nothing at the cost of much confusion. > But as I've said I'm not particularly wedded to the term "container" > if that really turned out to be what's blocking acceptance from people > like Andrew or Linus. Do you have a suggestion for a better name? To > me, "process container" seems like the ideal name, since it's an > abstraction that "contains" processes and associates them with some > (subsystem-provided) state. > It's not even really the term, it's the semantics. Sam. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: e1000 oops on boot [Re: 2.6.21-rc2-mm2]
On Wed, 7 Mar 2007 16:23:15 -0800 Andrew Morton wrote: > The below will apppear in -rc3-mm1 (hopefully later today) and it will > hopefully fix that crash. > > > From: Auke Kok <[EMAIL PROTECTED]> > > --- > > drivers/net/e1000/e1000_main.c | 66 +-- > 1 files changed, 45 insertions(+), 21 deletions(-) > > diff -puN > drivers/net/e1000/e1000_main.c~e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq > drivers/net/e1000/e1000_main.c > --- > a/drivers/net/e1000/e1000_main.c~e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq > +++ a/drivers/net/e1000/e1000_main.c > @@ -522,14 +522,15 @@ e1000_release_manageability(struct e1000 > } > } Auke: Below, please s/@adapter =/@adapter:/ to make it be correct kernel-doc notation. > -int > -e1000_up(struct e1000_adapter *adapter) > +/** > + * e1000_configure - configure the hardware for RX and TX > + * @adapter = private board structure > + **/ > +static void e1000_configure(struct e1000_adapter *adapter) > { --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree
On Wed, 2007-03-07 at 15:33 -0800, Jeremy Fitzhardinge wrote: > > On the other hand we yet see things like: > > > > /* We use normal irq0 handler on cpu0. */ > > time_init_hook(); > > > > Which is just reaching into the kernel code directly and does not handle > > the clock event interrupt self contained. clockevents is not bound to > > IRQ0 and this kind of hackery is exactly what we need to avoid in order > > to get this maintainable. > > > > Yes, I'm definitely not arguing with you about this. I think the first > cut vmi time code was pretty questionable, but I have confidence they'll > fix it up before submission. Sigh. The cut zero hairball is already in mainline. :( > The point is that when you put the xen and vmi implementations next to > each other you find that 1) in each case there's a pretty small > abstraction distance between the clock interface and the hypercall > interface, and 2) there's very little code which can be shared between > the two. Which means that adding another layer of abstraction to > protect the clock code from paravirtualized time devices is just going > to add fat without much benefit. Fair enough. > > Yes, if they are used in a sane and self contained way without reaching > > all over the place and expecting that those functions, which are not > > part of the paravirt interfaces will work for ever. > > > > 100% agree. If the interfaces change, then we'll change the code using > them like any other kernel code would. If the new interfaces are hard > to make work then that's a problem, but one would hope that would get > shaken out as part of the normal kernel development process. Sure. If the clockevent API is changed, then the users get fixed. This is not my main concern. The "oh we reuse the PIT interrupt" reachout is what makes life hard. VMI does this already extensive and I'm frightened by it. > The point is that this code under and around the paravirt_ops interface > is just normal Linux code, and we expect to participate in the normal > kernel development process, with all the usual > discussions/arguments/negotiations over interface changes. If the code > loses all its maintainers and becomes orphaned, unresponsive to > interface changes, then it's like any other dead driver: mark it > CONFIG_BROKEN and wait for someone to fix it. But for now and the > foreseeable future these are going to be actively supported and > maintained pieces of code. Ack. > > You are not increasing the entanglement with the rest of the system, > > when you use a self contained device on top of an existing core kernel > > infrastructure, which has a paravirt backend. Quite the contrary, you > > have one piece of virtual hardware which is connected to the kernel and > > interacts with the various incarnations on the other side, which can as > > well live inside the kernel code. Granted it is another level of > > indirection, but I'd be happy to have only to deal with one of those > > beasts. > > > > Right. But at that point the interface doesn't really have much of a > technical basis. It's really a political border at which you can hand > off responsibility and make it ours. I quite understand your > motivation, but I think you're solving a problem that hasn't happened > yet, and one that we'd all like to avoid. Granted. > I know the vmi time code has coloured your view here, but I surely hope > it can be got into a better state before posting. I'm biased of course, > but I would rather hope that all these drivers we're talking about will > be as stylistically clean as the Xen time code (which has room for > improvement, of course). > > There is, however, a median solution which keeps the number of clock > drivers down but also doesn't involve extending pv_ops. We can just > create paravirt_clocksource/paravirt_clockevent helper wrappers, with > their own internal interfaces to act as a facade for the > hypervisor-specific code. I don't think there's much point in doing > this now, but maybe it will become appealing once we start dealing with > things like stolen time. We'll see. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Replace misspelled "PRINTK" with "CONFIG_PRINTK".
On Wed, 7 Mar 2007, Dave Jones wrote: > On Wed, Mar 07, 2007 at 06:38:32PM -0500, Robert P. J. Day wrote: > > > > Replace the apparently misspelled preprocessor variable "PRINTK" > > with "CONFIG_PRINTK". > > this looks wrong. > > > diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > > index 5554ada..0c09772 100644 > > --- a/drivers/md/bitmap.c > > +++ b/drivers/md/bitmap.c > > @@ -53,7 +53,7 @@ > > //#define DPRINTK PRINTK /* set this NULL to avoid verbose debug output */ > > #define DPRINTK(x...) do { } while(0) > > > > -#ifndef PRINTK > > +#ifndef CONFIG_PRINTK > > # if DEBUG > 0 > > #define PRINTK(x...) printk(KERN_DEBUG x) > > # else > > the intention here is to only define 'PRINTK' if no-one else > has defined it already. oops, sorry, i misread that. rday -- Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://fsdev.net/wiki/index.php?title=Main_Page - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/