Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Mon, 2007-12-10 at 19:25 -0700, Eric W. Biederman wrote: > "Huang, Ying" <[EMAIL PROTECTED]> writes: [...] > > /* > > * Do not allocate memory (or fail in any way) in machine_kexec(). > > * We are past the point of no return, committed to rebooting now. > > */ > > -NORET_TYPE void machine_kexec(struct kimage *image) > > +int machine_kexec_vcall(struct kimage *image, unsigned long *ret, > > +unsigned int argc, va_list args) > > { > > Why do we need var arg support? > Can't we do that with a shim we load from user space? If all parameters are provided in user space, the usage model may be as follow: - sys_kexec_load() /* with executable/data/parameters(A) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(A)*/ - /* jump back */ - sys_kexec_load() /* with executable/data/parameters(B) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(B)*/ - /* jump back */ That is, the kexec image should be re-loaded if the parameters are different, and there can be no state reserved in kexec image. This is OK for original kexec implementation, because there is no jumping back. But, for kexec with jumping back, another usage model may be useful too. - sys_kexec_load() /* with executable/data loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(A)) /* execute physical mode code with parameters(A)*/ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(B)) /* execute physical mode code with parameters(B)*/ This way the kexec image need not to be re-loaded, and the state of kexec image can be reserved across several invoking. Another usage model may be useful is invoking the kexec image (such as firmware) from kernel space. - kmalloc the needed memory and loaded the firmware image (if needed) - sys_kexec_load() with a fake image (one segment with size 0), the entry point of the fake image is the entry point of the firmware image. - kexec_call(fake_image, ...) /* maybe change entry point if needed */ This way, some kernel code can invoke the firmware in physical mode just like invoking an ordinary function. [...] > > - /* The segment registers are funny things, they have both a > > -* visible and an invisible part. Whenever the visible part is > > -* set to a specific selector, the invisible part is loaded > > -* with from a table in memory. At no other time is the > > -* descriptor table in memory accessed. > > -* > > -* I take advantage of this here by force loading the > > -* segments, before I zap the gdt with an invalid value. > > -*/ > > - load_segments(); > > - /* The gdt & idt are now invalid. > > -* If you want to load them you must set up your own idt & gdt. > > -*/ > > - set_gdt(phys_to_virt(0),0); > > - set_idt(phys_to_virt(0),0); > > + if (image->preserve_cpu_ext) { > > + /* The segment registers are funny things, they have > > +* both a visible and an invisible part. Whenever the > > +* visible part is set to a specific selector, the > > +* invisible part is loaded with from a table in > > +* memory. At no other time is the descriptor table > > +* in memory accessed. > > +* > > +* I take advantage of this here by force loading the > > +* segments, before I zap the gdt with an invalid > > +* value. > > +*/ > > + load_segments(); > > + /* The gdt & idt are now invalid. If you want to load > > +* them you must set up your own idt & gdt. > > +*/ > > + set_gdt(phys_to_virt(0), 0); > > + set_idt(phys_to_virt(0), 0); > > + } > > We can't keep the same idt and gdt as the pages they are on will be > overwritten/reused. So explictily stomping on them sounds better > so they never work. We can restore them on kernel reentry. The original idea about this code is: If the kexec image is claimed that it need not to "perserving extensive CPU state" (such as FPU/MMX/GDT/LDT/IDT/CS/DS/ES/FS/GS/SS etc), the IDT/GDT/CS/DS/ES/FS/GS/SS are not touched in kexec image code. So the segment registers need not to be set. But this is not clear. At least more description should be provided for each preserve flag. > > /* now call it */ > > - relocate_kernel((unsigned long)image->head, (unsigned long)page_list, > > - image->start, cpu_has_pae); > > + relocate_kernel_ptr((unsigned long)image->head, > > + (unsigned long)page_list, > > + image->start, cpu_has_pae); > > Why rename relocate_kernel? > Ah. I see. You need to make it into a pointer again. The crazy don't > stop the pgd support strikes again. It used to be named rnk. You mean I should change the function pointer name to rnk to keep consistency? I find rnk in IA64 implementation. Best Regards, Huang Ying -- To unsubscribe
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Hi, On Tue, 11 Dec 2007 12:12:59 +1030 David Newall <[EMAIL PROTECTED]> wrote: > H. Peter Anvin wrote: > > David Newall wrote: > > > > I think a single ISA bus transaction is 1 µs, so two of them back to > > back should be 2 µs, not 8 µs... > > Exactly. You think it's 2us, but the documentation doesn't say. The _p > functions are generic inasmuch as they provide an unspecified delay. Well, if the delay is so much unspecified, what about _reading_ port 0x80 ? Will the delay be shorter ? And if so, what about reading port 0x80 and writing the value back ? inb al,0x80 outb 0x80,al I've been wondering since the beginning of this thread if the problem is not just the value we put to port 0x80, not writing to the port... Just my 0.02 Eur... Paul -- Paul RollandE-Mail : rol(at)witbe.net Witbe.net SATel. +33 (0)1 47 67 77 77 Les Collines de l'Arche Fax. +33 (0)1 47 67 77 99 F-92057 Paris La DefenseRIPE : PR12-RIPE Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur "Some people dream of success... while others wake up and work hard at it" "I worry about my child and the Internet all the time, even though she's too young to have logged on yet. Here's what I worry about. I worry that 10 or 15 years from now, she will come to me and say 'Daddy, where were you when they took freedom of the press away from the Internet?'" --Mike Godwin, Electronic Frontier Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
On 11-12-07 02:25, H. Peter Anvin wrote: David Newall wrote: Where did the 8us delay come from? The documentation and source is careful not to say how long the delay is. Would changing it to, say 1us, be technically wrong? Is code that requires 8us correct? I think a single ISA bus transaction is 1 µs, so two of them back to back should be 2 µs, not 8 µs... Sigh. And now where do these _two_ transactions come from? (and yes, see Alan's folowups, a transaction on a spec bus is 1 us). Rene. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Lnux 2.6.24-rc5
Hi, linus kernel.org web download is not available yet, isn't it? Regards dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Dec 10, 2007 8:48 PM, Eric W. Biederman <[EMAIL PROTECTED]> wrote: > Neil Horman <[EMAIL PROTECTED]> writes: > > Almost there. > > > > > On Mon, Dec 10, 2007 at 06:08:03PM -0700, Eric W. Biederman wrote: > >> Neil Horman <[EMAIL PROTECTED]> writes: > >> > > > >> > >> Ok. This test is broken. Please remove the == 1. You are looking > >> for == (1 << 18). So just saying: "if (htcfg & (1 << 18))" should be > >> clearer. > >> > > Fixed. Thanks! > > > >> > + printk(KERN_INFO "Detected use of extended apic ids on hypertransport > > bus\n"); > >> > + if ((htcfg & (1 << 17)) == 0) { > >> > + printk(KERN_INFO "Enabling hypertransport extended apic interrupt > >> > broadcast\n"); > >> > + htcfg |= (1 << 17); > >> > + write_pci_config(num, slot, func, 0x68, htcfg); > >> > + } > >> > + } > >> > + > >> > +} > >> > >> The rest of this quirk looks fine, include the fact it is only intended > >> to be applied to PCI_VENDOR_ID_AMD PCI_DEVICE_ID_AMD_K8_NB. > >> > > Copy that. > > > >> > >> For what is below I don't like the way the infrastructure has been > >> extended as what you are doing quickly devolves into a big mess. > >> > >> Please extend struct chipset to be something like: > >> struct chipset { > >> u16 vendor; > >> u16 device; > >> u32 class, class_mask; > >> void (*f)(void); > >> }; > >> > >> And then the test for matching the chipset can be something like: > >> if ((id->vendor == PCI_ANY_ID || id->vendor == dev->vendor) && > >> (id->device == PCI_ANY_ID || id->device == dev->device) && > >> !((id->class ^ dev->class) & id->class_mask)) > >> > >> Essentially a subset of pci_match_one_device from drivers/pci/pci.h > >> > >> That way you don't need to increase the number of tables or the > >> number of passes through the pci busses, just update the early_qrk > >> table with a few more bits of information. > >> > > copy that. Fixed. Thanks! > > > >> The extended form should be much more maintainable in the long > >> run. Given that we may want this before we enable the timer > >> which is very early doing this in the pci early quirks seems > >> to make sense. > >> > >> Eric > > > > > > New patch attached, with suggestions incorporated. > > > > Thanks & regards > > Neil > > > > Signed-off-by: Neil Horman <[EMAIL PROTECTED]> > > > > > > early-quirks.c | 82 > > ++--- > > 1 file changed, 73 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c > > index 88bb83e..4b0cee1 100644 > > --- a/arch/x86/kernel/early-quirks.c > > +++ b/arch/x86/kernel/early-quirks.c > > @@ -44,6 +44,50 @@ static int __init nvidia_hpet_check(struct > > acpi_table_header > > *header) > > #endif /* CONFIG_X86_IO_APIC */ > > #endif /* CONFIG_ACPI */ > > > > +static void __init fix_hypertransport_config(int num, int slot, int func) > > +{ > > + u32 htcfg; > > + /* > > + *we found a hypertransport bus > > + *make sure that are broadcasting > > + *interrupts to all cpus on the ht bus > > + *if we're using extended apic ids > > + */ > > + htcfg = read_pci_config(num, slot, func, 0x68); > > + if (htcfg & (1 << 18)) { > > + printk(KERN_INFO "Detected use of extended apic ids on hypertransport > > bus\n"); > > + if ((htcfg & (1 << 17)) == 0) { > > + printk(KERN_INFO "Enabling hypertransport extended apic interrupt > > broadcast\n"); > > + htcfg |= (1 << 17); > > + write_pci_config(num, slot, func, 0x68, htcfg); > > + } > > + } > > + > > +} > > + > > +static void __init check_hypertransport_config() > > +{ > > + int num, slot, func; > > + u32 device, vendor; > > + func = 0; > > + for (num = 0; num < 32; num++) { > > + for (slot = 0; slot < 32; slot++) { > > + vendor = read_pci_config(num,slot,func, > > + PCI_VENDOR_ID); > > + device = read_pci_config(num,slot,func, > > + PCI_DEVICE_ID); > > + vendor &= 0x; > > + device >>= 16; > > + if ((vendor == PCI_VENDOR_ID_AMD) && > > + (device == PCI_DEVICE_ID_AMD_K8_NB)) > > + fix_hypertransport_config(num,slot,func); > > + } > > + } > > + > > + return; > > + > > +} > > We should not need check_hypertransport_config as the generic loop > now does the work for us. > > + > > static void __init nvidia_bugs(void) > > { > > #ifdef CONFIG_ACPI > > @@ -83,15 +127,25 @@ static void __init ati_bugs(void) > > #endif > > } > > > > +static void __init amd_host_bugs(void) > > +{ > > + printk(KERN_CRIT "IN AMD_HOST_BUGS\n"); > > + check_hypertransport_config(); > > +} > > Likewise
Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
On Sat, Dec 08, 2007 at 08:52:11PM +0100, Ingo Molnar wrote: > so even today's upstream kernel, which has 'ancient' SLUB code, SLAB and > SLUB have essentially the same linecount: > > $ wc -l mm/slab.c mm/slub.c > 4478 mm/slab.c > 4125 mm/slub.c > > (and while linecount != complexity, there is a strong relationship.) > > With SLAB having 10 years more test coverage and tuning. FWIW, the one thing slub does that slab doesn't that I find really nice is being enable to enable debugging at boot time rather than compile time. We don't get many people running benchmarks against the Fedora kernel, so any scalability differences between slub/slab probably won't reach us until we start shipping betas of the next RHEL based on the same kernel. Which leaves my only other gripe. It broke slabtop. There's an alternative implementation in Documentation/vm/slabinfo.c (why there not say, util-linux, home of current slabtop?) Dave -- http://www.codemonkey.org.uk -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2 V2] Kprobes: Build kretprobe examples only if arch supports kretprobes
From: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]> This patch builds samples/kprobes/kretprobe_example.c only on archs that support kretprobes. Thanks to Sam Ravnborg for Kconfig suggestions. V2: Updated dependency on CONFIG_KRETPROBES Signed-off-by: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]> --- samples/Kconfig |5 + samples/kprobes/Makefile |4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc4/samples/kprobes/Makefile === --- linux-2.6.24-rc4.orig/samples/kprobes/Makefile +++ linux-2.6.24-rc4/samples/kprobes/Makefile @@ -1,5 +1,5 @@ # builds the kprobes example kernel modules; # then to use one (as root): insmod -obj-$(CONFIG_SAMPLE_KPROBES) += kprobe_example.o jprobe_example.o \ - kretprobe_example.o +obj-$(CONFIG_SAMPLE_KPROBES) += kprobe_example.o jprobe_example.o +obj-$(CONFIG_SAMPLE_KRETPROBES) += kretprobe_example.o Index: linux-2.6.24-rc4/samples/Kconfig === --- linux-2.6.24-rc4.orig/samples/Kconfig +++ linux-2.6.24-rc4/samples/Kconfig @@ -28,5 +28,10 @@ config SAMPLE_KPROBES help This build several kprobes example modules. +config SAMPLE_KRETPROBES + tristate "Build kretprobes example -- loadable modules only" + default m + depends on SAMPLE_KPROBES && KRETPROBES + endif # SAMPLES -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Reducing the bdi proporion calculation period to speed up disk write
The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per device dirty threshold. It works well. However, the period for proportion calculation may be too large. For 8G memory, the calc_period_shift() will return 19 as the shift. When we switch writing operation between different disks, there may be potential performance issue. For example, we first write to disk A, then write to disk B. The proportion for disk B will increase slowly because the denominator is too large (It's 2^18 + (global_count & counter_mask)). The disk B will get small dirty page quota for a long time, it will get blocked frequently though the total dirty page is under the dirty page limit. Peter provided a patch to avoid this issue, this patch allow violation of bdi limits if there is a lot of room on the system. It looks like: +if (nr_reclaimable + nr_writeback < (background_thresh + dirty_thresh) / 2) + break; This patch really help to avoid congestion, but if the dirty pages exceed about 3/4 of the dirty_thresh, congestion still happens if we write to another disk. I think that we can reduce the period to speed up the proportion adjustment. diff -Nur a/page-writeback.c b/page-writeback.c --- a/page-writeback.c 2007-12-11 13:46:30.0 +0800 +++ b/page-writeback.c 2007-12-11 13:47:11.0 +0800 @@ -128,10 +128,7 @@ */ static int calc_period_shift(void) { - unsigned long dirty_total; - - dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) / 100; - return 2 + ilog2(dirty_total - 1); + return 12; } In the 8G memory system, I did some testing with iozone. I found that reducing the period help to increase the write speed when switch to a new disk. Run "./iozone -B -i 0 -i 2 -r 4k -s 1000M" twice in the disk B. Here is the result: 1. With the patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f First Second write 78M 173M rewrite 112M203M randread1710M 1697M randwrite 192M1412M 2. With Peter's patch write 134M169M rewrite 134M203M randread1717M 1705M randwrite 179M1412M 3.Adjust the shift to 12 write 260M259M rewrite 240M246M randread1712M 1700M randwrite 1409M 1409M 4.With Peter's patch and adjust the shift to 12 write 256M239M rewrite 253M253M randread1704M 1716M randwrite 1414M 1416M Run "./iozone -B -i 0 -i 2 -r 4k -s 500M" twice in the disk B. 1. With the patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f First Second write 821M725M rewrite 144M1299M randread1740M 1733M randwrite 1444M 1440M 2. With Peter's patch write 1100M 1112M rewrite 1295M 1313M randread1745M 1744M randwrite 1452M 1449M 3.Adjust the shift to 12 write 1021M 1104M rewrite 1314M 1311M randread1741M 1737M randwrite 1448M 1445M 4.With Peter's patch and adjust the shift to 12 write 1104M 1105M rewrite 1292M 1308M randread1737M 1741M randwrite 1449M 1449M -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2 V2] Kprobes: Indicate kretprobe support in arch//Kconfig
From: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]> This patch adds CONFIG_HAVE_KRETPROBES to the arch//Kconfig file for relevant architectures with kprobes support. This facilitates easy handling of in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on kretprobes being present in the kernel. This patch depends on Mathieu Desnoyers' "Instrumentation menu removal" patchset (http://marc.info/?l=linux-kernel=119496432229633=2) Updated to apply on 2.6.24-rc4-mm1. Thanks to Sam Ravnborg for helping make the patch more lean. V2: Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies. Signed-off-by: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]> --- arch/Kconfig |7 +++ arch/ia64/Kconfig |1 + arch/powerpc/Kconfig |1 + arch/s390/Kconfig |1 + arch/x86/Kconfig |1 + include/asm-ia64/kprobes.h|1 - include/asm-powerpc/kprobes.h |1 - include/asm-x86/kprobes_32.h |1 - include/asm-x86/kprobes_64.h |1 - include/linux/kprobes.h |6 +++--- kernel/kprobes.c |8 +++- 11 files changed, 17 insertions(+), 12 deletions(-) Index: linux-2.6.24-rc4/arch/Kconfig === --- linux-2.6.24-rc4.orig/arch/Kconfig +++ linux-2.6.24-rc4/arch/Kconfig @@ -27,5 +27,12 @@ config KPROBES for kernel debugging, non-intrusive instrumentation and testing. If in doubt, say "N". +config KRETPROBES + def_bool y + depends on KPROBES && HAVE_KRETPROBES + config HAVE_KPROBES def_bool n + +config HAVE_KRETPROBES + def_bool n Index: linux-2.6.24-rc4/arch/ia64/Kconfig === --- linux-2.6.24-rc4.orig/arch/ia64/Kconfig +++ linux-2.6.24-rc4/arch/ia64/Kconfig @@ -17,6 +17,7 @@ config IA64 select ARCH_SUPPORTS_MSI select HAVE_OPROFILE select HAVE_KPROBES + select HAVE_KRETPROBES default y help The Itanium Processor Family is Intel's 64-bit successor to Index: linux-2.6.24-rc4/arch/powerpc/Kconfig === --- linux-2.6.24-rc4.orig/arch/powerpc/Kconfig +++ linux-2.6.24-rc4/arch/powerpc/Kconfig @@ -81,6 +81,7 @@ config PPC default y select HAVE_OPROFILE select HAVE_KPROBES + select HAVE_KRETPROBES config EARLY_PRINTK bool Index: linux-2.6.24-rc4/arch/s390/Kconfig === --- linux-2.6.24-rc4.orig/arch/s390/Kconfig +++ linux-2.6.24-rc4/arch/s390/Kconfig @@ -53,6 +53,7 @@ config S390 def_bool y select HAVE_OPROFILE select HAVE_KPROBES + select HAVE_KRETPROBES source "init/Kconfig" Index: linux-2.6.24-rc4/arch/x86/Kconfig === --- linux-2.6.24-rc4.orig/arch/x86/Kconfig +++ linux-2.6.24-rc4/arch/x86/Kconfig @@ -20,6 +20,7 @@ config X86 def_bool y select HAVE_OPROFILE select HAVE_KPROBES + select HAVE_KRETPROBES config GENERIC_TIME def_bool y Index: linux-2.6.24-rc4/include/asm-ia64/kprobes.h === --- linux-2.6.24-rc4.orig/include/asm-ia64/kprobes.h +++ linux-2.6.24-rc4/include/asm-ia64/kprobes.h @@ -82,7 +82,6 @@ struct kprobe_ctlblk { struct prev_kprobe prev_kprobe[ARCH_PREV_KPROBE_SZ]; }; -#define ARCH_SUPPORTS_KRETPROBES #define kretprobe_blacklist_size 0 #define SLOT0_OPCODE_SHIFT (37) Index: linux-2.6.24-rc4/include/asm-powerpc/kprobes.h === --- linux-2.6.24-rc4.orig/include/asm-powerpc/kprobes.h +++ linux-2.6.24-rc4/include/asm-powerpc/kprobes.h @@ -80,7 +80,6 @@ typedef unsigned int kprobe_opcode_t; #define is_trap(instr) (IS_TW(instr) || IS_TWI(instr)) #endif -#define ARCH_SUPPORTS_KRETPROBES #define flush_insn_slot(p) do { } while (0) #define kretprobe_blacklist_size 0 Index: linux-2.6.24-rc4/include/asm-x86/kprobes_32.h === --- linux-2.6.24-rc4.orig/include/asm-x86/kprobes_32.h +++ linux-2.6.24-rc4/include/asm-x86/kprobes_32.h @@ -42,7 +42,6 @@ typedef u8 kprobe_opcode_t; ? (MAX_STACK_SIZE) \ : (((unsigned long)current_thread_info()) + THREAD_SIZE - (ADDR))) -#define ARCH_SUPPORTS_KRETPROBES #define flush_insn_slot(p) do { } while (0) extern const int kretprobe_blacklist_size; Index: linux-2.6.24-rc4/include/asm-x86/kprobes_64.h === --- linux-2.6.24-rc4.orig/include/asm-x86/kprobes_64.h +++ linux-2.6.24-rc4/include/asm-x86/kprobes_64.h @@ -41,7 +41,6 @@ typedef u8 kprobe_opcode_t; ? (MAX_STACK_SIZE) \ : (((unsigned
Re: [PATCH 1/2] Kprobes: Indicate kretprobe support in arch//Kconfig - updated
On Mon, Dec 10, 2007 at 10:10:01AM -0500, Mathieu Desnoyers wrote: > * Ananth N Mavinakayanahalli ([EMAIL PROTECTED]) wrote: > > On Mon, Dec 10, 2007 at 11:13:07AM +0100, Sam Ravnborg wrote: > > > On Mon, Dec 10, 2007 at 03:22:22PM +0530, Ananth N Mavinakayanahalli > > > wrote: > > > > From: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]> > > > > > > Index: linux-2.6.24-rc4/include/linux/kprobes.h > > === > > --- linux-2.6.24-rc4.orig/include/linux/kprobes.h > > +++ linux-2.6.24-rc4/include/linux/kprobes.h > > @@ -125,11 +125,11 @@ struct jprobe { > > DECLARE_PER_CPU(struct kprobe *, current_kprobe); > > DECLARE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); > > > > -#ifdef ARCH_SUPPORTS_KRETPROBES > > +#ifdef CONFIG_HAVE_KRETPROBES > > Hi Ananth, > > I just want to point out a detail: if someone sets CONFIG_KPROBES to n, > the CONFIG_HAVE_KPROBES is still y, and so is CONFIG_HAVE_KRETPROBES. > However, I doubt that you want to activate this code in this case ? > The code paths are OK because they are nested into CONFIG_KPROBES > ifdefs (or not built due to dependency on CONFIG_KPROBES in the > Makfile), but if one wants to use CONFIG_HAVE_KRETPROBE for something > else (Makefile), then it could become a problem. > > Could we add a menu entry CONFIG_KRETPROBES that depends on > CONFIG_HAVE_KRETPROBES and CONFIG_KPROBES, and also remove the > CONFIG_HAVE_KPROBES dependency for the CONFIG_HAVE_KRETPROBE option ? > This way, we would have much more flexibility (like specifiying if we > want CONFIG_KRETPROBES to be default y or default n...) Done... Updated patch coming up. Ananth -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Alan Cox wrote: In any case, my machine does not have an ISA bus. Why should it? It's a laptop! Yes it does. The branding spec said "No ISA bus" so it was renamed "LPC" and hidden internally, but its alive and well. Well that, plus it was serialized and uses PCI electricals and timing, hence the LPC (Low Pin Count) moniker. Its performance is pretty much exactly ISA, though, and unlike PCI it provides full support for all legacy ISA features like slave DMA. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Andi Kleen wrote: My machine in question, for example, needs no waiting within CMOS_READs at all. And I doubt any other chip/device needs waiting that isn't I don't know about CMOS, but there were definitely some not too ancient systems (let's say not more than 10 years) who required IO delays in the floppy driver and the 8253/8259. But on those the jumps are already far too fast. Yes, early Linux used jumps. I believe it broke a bunch of machines when the P5 came out, as the jumps were too fast. (I have to admit to being a bit fuzzy on this... my memory says it was the 486 and not the P5, but that clearly can't be the case since my first Linux box was a 486/33.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Lnux 2.6.24-rc5
It's been a week, and I promised to be a good boy and try to follow my release rules, so here is the next -rc. Things _have_ slowed down, although I'd obviously be lying if I said we've got all the regressions handled and under control. They are being worked on, and the list is shrinking, but at a guess, we're definitely not going to have a final 2.6.24 out before xmas unless santa puts some more elves to work on those regressions.. So any elves out there - please keep working. I'm including the shortlog since it's small enough, and quite frankly, gives about as readable explanation of the changes as can be imagined. Nothing hugely exciting here. I'd post the diffstat too, but it's not really all that interesting, and it only highlights a textually big PA-RISC revert, and the powerpc defconfig updates. And the Blackfin SPI driver. The rest is largely random noise in various subsystems (drivers/net, xfs filesystem, and arch updates are some of the areas that show more changes). Linus --- Adam Litke (1): hugetlb: handle write-protection faults in follow_hugetlb_page Adrian Bunk (3): x86: revert CONFIG_X86_HT semantics change x86: free_cache_attributes() section fix MAINTAINERS: remove the MTRR entry Al Viro (5): regression: cifs endianness bug no need to mess with KBUILD_CFLAGS on uml-i386 anymore fcrypt endianness misannotations regression: bfs endianness bug remove nonsense force-casts from ocfs2 Alexey Dobriyan (1): proc: fix proc_dir_entry refcounting Andrew Gallatin (1): [LRO]: fix lro_gen_skb() alignment Andrew Morton (7): x86: arch_register_cpu() section fix [BRIDGE]: Section fix. [IA64] increase .data.patch offset [IA64] don't assume that unwcheck.py is executable [IA64] export copy_page() to modules aoe: properly initialise the request_queue's backing_dev_info revert "dpt_i2o: convert to SCSI hotplug model" Anton Vorontsov (1): PHY: Add the phy_device_release device method. Atsushi Nemoto (1): qemu: do not enable IP7 blindly Auke Kok (1): e100: cleanup unneeded math Bartlomiej Zolnierkiewicz (1): pata_amd/pata_via: de-couple programming of PIO/MWDMA and UDMA timings Ben Gardner (1): gpio_cs5535: disable AUX on output Benjamin Herrenschmidt (6): ibm_newemac: Fix ZMII refcounting bug ibm_newemac: Workaround reset timeout when no link ibm_newemac: Cleanup/Fix RGMII MDIO support detection ibm_newemac: Cleanup/fix support for STACR register variants ibm_newemac: Update file headers copyright notices powerpc: Fix IDE legacy vs. native fixups Bernhard Walle (1): [IA64] rename _bss to __bss_start Bryan Wu (11): spi: initial BF54x SPI support spi: spi_bfin cleanups, error handling spi: spi_bfin handles spi_transfer.cs_change spi: spi_bfin uses platform device resources spi: spi_bfin: handle multiple spi_masters spi: spi_bfin: bugfix for 8..16 bit word sizes spi: spi_bfin: update handling of delay-after-deselect Blackfin SPI driver: use cpu_relax() to replace continue in while busywait Blackfin SPI driver: use void __iomem * for regs_base Blackfin SPI driver: move hard coded pin_req to board file Blackfin SPI driver: reconfigure speed_hz and bits_per_word in each spi transfer Chris Dearman (1): [MIPS] Don't byteswap writes to display when running bigendian Christian Borntraeger (2): [S390] dcssblk: prevent early access without own make_request function [S390] Fix compile error on 31bit without preemption Christoph Hellwig (1): [XFS] revert to double-buffering readdir Cornelia Huck (1): [S390] cio: Issue SenseID per path. Cyrill Gorcunov (1): [SPARC64]: check for possible NULL pointer dereference David Brownell (2): SPI: use mutex not semaphore spi: at25 driver is for EEPROM not FLASH David Chinner (2): [XFS] Fix broken inode cluster setup. [XFS] Fix xfs_ichgtime()s broken usage of I_SYNC David Howells (1): [AF_RXRPC]: Add a missing goto David S. Miller (4): [SPARC64]: Missing mdesc_release() in ldc_init(). [SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string. [SPARC64]: Update defconfig. [SPARC64]: Fix memory controller register access when non-SMP. David Sterba (1): bonding: Fix time comparison David Woodhouse (1): Don't claim to do IPv6 checksum offload Denis Cheng (1): mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init Denis V. Lunev (1): [IPV4]: Remove prototype of ip_rt_advice Divy Le Ray (2): cxgb - revert file mode changes. cxgb3 - T3C support update Don Zickus (1): x86: add the word 'WARNING' in check_nmi_watchdog() output Donald Douwsma (1): [XFS] Fix dbflush panic in xfs_qm_sync. Eliezer Tamir (1): make bnx2x select
Re: [PATCH][for -mm] fix accounting in vmscan.c for memory controller
KAMEZAWA Hiroyuki wrote: > On Tue, 11 Dec 2007 10:44:36 +0530 > Balbir Singh <[EMAIL PROTECTED]> wrote: > >> Looks good to me. >> >> Acked-by: Balbir Singh <[EMAIL PROTECTED]> >> >> TODO: >> >> 1. Should we have vm_events for the memory controller as well? >>May be in the longer term >> > > ALLOC_STALL is recoreded as failcnt, I think. > I think DIRECT can be accoutned easily. Thanks for clarifying -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] fix accounting in vmscan.c for memory controller
On Tue, 11 Dec 2007 10:44:36 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote: > Looks good to me. > > Acked-by: Balbir Singh <[EMAIL PROTECTED]> > > TODO: > > 1. Should we have vm_events for the memory controller as well? >May be in the longer term > ALLOC_STALL is recoreded as failcnt, I think. I think DIRECT can be accoutned easily. But I'm not in hurry very much, because all reclaimation is DIRECT, now. After we implement background reclaim, we should consider it. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] fix accounting in vmscan.c for memory controller
KAMEZAWA Hiroyuki wrote: > Without this, ALLOCSTALL and PGSCAN_DIRECT increases too much unless > there is no memory shortage. > > against 2.6.24-rc4-mm1. > > -Kame > > == > Some amount of accounting is done while page reclaiming. > > Now, there are 2 types of page reclaim (if memory controller is used) > - global: shortage of (global) pages. > - under cgroup: use up to limit. > > I think 2 accountings, ALLOCSTALL and DIRECT should be accounted only under > global lru scan. They are accounted against memory shortage at alloc_pages(). > > Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> > > mm/vmscan.c |6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > Index: linux-2.6.24-rc4-mm1/mm/vmscan.c > === > --- linux-2.6.24-rc4-mm1.orig/mm/vmscan.c > +++ linux-2.6.24-rc4-mm1/mm/vmscan.c > @@ -896,8 +896,9 @@ static unsigned long shrink_inactive_lis > if (current_is_kswapd()) { > __count_zone_vm_events(PGSCAN_KSWAPD, zone, nr_scan); > __count_vm_events(KSWAPD_STEAL, nr_freed); > - } else > + } else if (scan_global_lru(sc)) > __count_zone_vm_events(PGSCAN_DIRECT, zone, nr_scan); > + > __count_zone_vm_events(PGSTEAL, zone, nr_freed); > > if (nr_taken == 0) > @@ -1333,7 +1334,8 @@ static unsigned long do_try_to_free_page > unsigned long lru_pages = 0; > int i; > > - count_vm_event(ALLOCSTALL); > + if (scan_global_lru(sc)) > + count_vm_event(ALLOCSTALL); > /* >* mem_cgroup will not do shrink_slab. >*/ > Looks good to me. Acked-by: Balbir Singh <[EMAIL PROTECTED]> TODO: 1. Should we have vm_events for the memory controller as well? May be in the longer term -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [DOC][for -mm] update Documentation/controller/memory.txt
KAMEZAWA Hiroyuki wrote: > Balbir-san, could you review this update ? > > -- > Documentation updates for memory controller. > > Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> > > Index: linux-2.6.24-rc4-mm1/Documentation/controllers/memory.txt > === > --- linux-2.6.24-rc4-mm1.orig/Documentation/controllers/memory.txt > +++ linux-2.6.24-rc4-mm1/Documentation/controllers/memory.txt > @@ -9,8 +9,7 @@ d. Provides a double LRU: global memory > global LRU; a cgroup on hitting a limit, reclaims from the per > cgroup LRU > > -NOTE: Page Cache (unmapped) also includes Swap Cache pages as a subset > -and will not be referred to explicitly in the rest of the documentation. > +NOTE: Swap Cache (unmapped) is not accounted now. > > Benefits and Purpose of the memory controller > > @@ -144,7 +143,7 @@ list. > The memory controller uses the following hierarchy > > 1. zone->lru_lock is used for selecting pages to be isolated > -2. mem->lru_lock protects the per cgroup LRU > +2. mem->per_zone->lru_lock protects the per cgroup LRU (per zone) > 3. lock_page_cgroup() is used to protect page->page_cgroup > > 3. User Interface > @@ -193,6 +192,15 @@ this file after a write to guarantee the > The memory.failcnt field gives the number of times that the cgroup limit was > exceeded. > > +The memory.stat file gives accounting information. Now, the number of > +caches, RSS and Active pages/Inactive pages are shown. > + > +The memory.force_empty gives an interface to drop *all* charges by force. > + > +# echo -n 1 > memory.force_empty > + > +will drop all charges in cgroup. Currently, this is maintained for test. > + > 4. Testing > > Balbir posted lmbench, AIM9, LTP and vmmstress results [10] and [11]. > @@ -222,11 +230,8 @@ reclaimed. > > A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a > cgroup might have some charge associated with it, even though all > -tasks have migrated away from it. If some pages are still left, after > following > -the steps listed in sections 4.1 and 4.2, check the Swap Cache usage in > -/proc/meminfo to see if the Swap Cache usage is showing up in the > -cgroups memory.usage_in_bytes counter. A simple test of swapoff -a and > -swapon -a should free any pending Swap Cache usage. > +tasks have migrated away from it. Such charges are automatically dropped at > +rmdir() if there are no tasks. > > 4.4 Choosing what to account -- Page Cache (unmapped) vs RSS (mapped)? > > @@ -238,15 +243,11 @@ echo -n 1 > memory.control_type > 5. TODO > > 1. Add support for accounting huge pages (as a separate controller) > -2. Improve the user interface to accept/display memory limits in KB or MB > - rather than pages (since page sizes can differ across platforms/machines). > -3. Make cgroup lists per-zone > -4. Make per-cgroup scanner reclaim not-shared pages first > -5. Teach controller to account for shared-pages > -6. Start reclamation when the limit is lowered > -7. Start reclamation in the background when the limit is > +2. Make per-cgroup scanner reclaim not-shared pages first > +3. Teach controller to account for shared-pages > +4. Start reclamation when the limit is lowered > +5. Start reclamation in the background when the limit is > not yet hit but the usage is getting closer > -8. Create per zone LRU lists per cgroup > Looks very good to me! Reviewed-by: Balbir Singh <[EMAIL PROTECTED]> -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] RCU : move three variables to __read_mostly to save space
I noticed this vmlinux layout on i686 (where CONFIG_X86_L1_CACHE_SHIFT = 7) : c06cdab4 d pid_caches_lh c06cdb00 d qlowmark c06cdb04 d qhimark c06cdb08 d blimit c06cdb80 d rcu_ctrlblk c06cdc80 d rcu_bh_ctrlblk This means that qlowmark, qhimark and blimit use a whole 128 bytes cache line. Linker is not smart enough for us. Moving these three variables to read_mostly section saves 116 (128-12) bytes. # size vmlinux vmlinux.before_patch textdata bss dec hex filename 6343966 490818 630784 7465568 71ea60 vmlinux 6343966 490930 630784 7465680 71ead0 vmlinux.before_patch Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c index a66d4d1..11c815c 100644 --- a/kernel/rcupdate.c +++ b/kernel/rcupdate.c @@ -75,9 +75,9 @@ DEFINE_PER_CPU(struct rcu_data, rcu_bh_data) = { 0L }; /* Fake initialization required by compiler */ static DEFINE_PER_CPU(struct tasklet_struct, rcu_tasklet) = {NULL}; -static int blimit = 10; -static int qhimark = 1; -static int qlowmark = 100; +static int blimit __read_mostly = 10; +static int qhimark __read_mostly = 1; +static int qlowmark __read_mostly = 100; static atomic_t rcu_barrier_cpu_count; static DEFINE_MUTEX(rcu_barrier_mutex);
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
Neil Horman <[EMAIL PROTECTED]> writes: Almost there. > On Mon, Dec 10, 2007 at 06:08:03PM -0700, Eric W. Biederman wrote: >> Neil Horman <[EMAIL PROTECTED]> writes: >> > >> >> Ok. This test is broken. Please remove the == 1. You are looking >> for == (1 << 18). So just saying: "if (htcfg & (1 << 18))" should be >> clearer. >> > Fixed. Thanks! > >> > + printk(KERN_INFO "Detected use of extended apic ids on hypertransport > bus\n"); >> > + if ((htcfg & (1 << 17)) == 0) { >> > + printk(KERN_INFO "Enabling hypertransport extended apic interrupt >> > broadcast\n"); >> > + htcfg |= (1 << 17); >> > + write_pci_config(num, slot, func, 0x68, htcfg); >> > + } >> > + } >> > + >> > +} >> >> The rest of this quirk looks fine, include the fact it is only intended >> to be applied to PCI_VENDOR_ID_AMD PCI_DEVICE_ID_AMD_K8_NB. >> > Copy that. > >> >> For what is below I don't like the way the infrastructure has been >> extended as what you are doing quickly devolves into a big mess. >> >> Please extend struct chipset to be something like: >> struct chipset { >> u16 vendor; >> u16 device; >> u32 class, class_mask; >> void (*f)(void); >> }; >> >> And then the test for matching the chipset can be something like: >> if ((id->vendor == PCI_ANY_ID || id->vendor == dev->vendor) && >> (id->device == PCI_ANY_ID || id->device == dev->device) && >> !((id->class ^ dev->class) & id->class_mask)) >> >> Essentially a subset of pci_match_one_device from drivers/pci/pci.h >> >> That way you don't need to increase the number of tables or the >> number of passes through the pci busses, just update the early_qrk >> table with a few more bits of information. >> > copy that. Fixed. Thanks! > >> The extended form should be much more maintainable in the long >> run. Given that we may want this before we enable the timer >> which is very early doing this in the pci early quirks seems >> to make sense. >> >> Eric > > > New patch attached, with suggestions incorporated. > > Thanks & regards > Neil > > Signed-off-by: Neil Horman <[EMAIL PROTECTED]> > > > early-quirks.c | 82 ++--- > 1 file changed, 73 insertions(+), 9 deletions(-) > > > > diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c > index 88bb83e..4b0cee1 100644 > --- a/arch/x86/kernel/early-quirks.c > +++ b/arch/x86/kernel/early-quirks.c > @@ -44,6 +44,50 @@ static int __init nvidia_hpet_check(struct > acpi_table_header > *header) > #endif /* CONFIG_X86_IO_APIC */ > #endif /* CONFIG_ACPI */ > > +static void __init fix_hypertransport_config(int num, int slot, int func) > +{ > + u32 htcfg; > + /* > + *we found a hypertransport bus > + *make sure that are broadcasting > + *interrupts to all cpus on the ht bus > + *if we're using extended apic ids > + */ > + htcfg = read_pci_config(num, slot, func, 0x68); > + if (htcfg & (1 << 18)) { > + printk(KERN_INFO "Detected use of extended apic ids on hypertransport > bus\n"); > + if ((htcfg & (1 << 17)) == 0) { > + printk(KERN_INFO "Enabling hypertransport extended apic interrupt > broadcast\n"); > + htcfg |= (1 << 17); > + write_pci_config(num, slot, func, 0x68, htcfg); > + } > + } > + > +} > + > +static void __init check_hypertransport_config() > +{ > + int num, slot, func; > + u32 device, vendor; > + func = 0; > + for (num = 0; num < 32; num++) { > + for (slot = 0; slot < 32; slot++) { > + vendor = read_pci_config(num,slot,func, > + PCI_VENDOR_ID); > + device = read_pci_config(num,slot,func, > + PCI_DEVICE_ID); > + vendor &= 0x; > + device >>= 16; > + if ((vendor == PCI_VENDOR_ID_AMD) && > + (device == PCI_DEVICE_ID_AMD_K8_NB)) > + fix_hypertransport_config(num,slot,func); > + } > + } > + > + return; > + > +} We should not need check_hypertransport_config as the generic loop now does the work for us. > + > static void __init nvidia_bugs(void) > { > #ifdef CONFIG_ACPI > @@ -83,15 +127,25 @@ static void __init ati_bugs(void) > #endif > } > > +static void __init amd_host_bugs(void) > +{ > + printk(KERN_CRIT "IN AMD_HOST_BUGS\n"); > + check_hypertransport_config(); > +} Likewise this function is unneeded and the printk is likely confusing for users. > struct chipset { > u16 vendor; > + u16 device; > + u32 class; > + u32 class_mask; > void (*f)(void); > }; > > static struct chipset early_qrk[] __initdata = { > - { PCI_VENDOR_ID_NVIDIA, nvidia_bugs }, > - { PCI_VENDOR_ID_VIA,
[PATCH 6/6] pcmcia/pcnet_cs: Fix 'shadow variable' warning
Fixing: CHECK drivers/net/pcmcia/pcnet_cs.c drivers/net/pcmcia/pcnet_cs.c:523:15: warning: symbol 'hw_info' shadows an earlier one drivers/net/pcmcia/pcnet_cs.c:148:18: originally declared here Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]> --- diff --git a/drivers/net/pcmcia/pcnet_cs.c b/drivers/net/pcmcia/pcnet_cs.c index db6a97d..5779344 100644 --- a/drivers/net/pcmcia/pcnet_cs.c +++ b/drivers/net/pcmcia/pcnet_cs.c @@ -520,7 +520,7 @@ static int pcnet_config(struct pcmcia_device *link) int i, last_ret, last_fn, start_pg, stop_pg, cm_offset; int has_shmem = 0; u_short buf[64]; -hw_info_t *hw_info; +hw_info_t *local_hw_info; DECLARE_MAC_BUF(mac); DEBUG(0, "pcnet_config(0x%p)\n", link); @@ -589,23 +589,23 @@ static int pcnet_config(struct pcmcia_device *link) dev->if_port = 0; } -hw_info = get_hwinfo(link); -if (hw_info == NULL) - hw_info = get_prom(link); -if (hw_info == NULL) - hw_info = get_dl10019(link); -if (hw_info == NULL) - hw_info = get_ax88190(link); -if (hw_info == NULL) - hw_info = get_hwired(link); - -if (hw_info == NULL) { +local_hw_info = get_hwinfo(link); +if (local_hw_info == NULL) + local_hw_info = get_prom(link); +if (local_hw_info == NULL) + local_hw_info = get_dl10019(link); +if (local_hw_info == NULL) + local_hw_info = get_ax88190(link); +if (local_hw_info == NULL) + local_hw_info = get_hwired(link); + +if (local_hw_info == NULL) { printk(KERN_NOTICE "pcnet_cs: unable to read hardware net" " address for io base %#3lx\n", dev->base_addr); goto failed; } -info->flags = hw_info->flags; +info->flags = local_hw_info->flags; /* Check for user overrides */ info->flags |= (delay_output) ? DELAY_OUTPUT : 0; if ((link->manf_id == MANFID_SOCKET) && -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/6] pcmcia/axnet_cs: Make use of 'max()' instead of handcrafted one
Use 'max(x,y)' instead of 'x < y ? y : x'. Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]> --- diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c index 8d910a3..96931cc 100644 --- a/drivers/net/pcmcia/axnet_cs.c +++ b/drivers/net/pcmcia/axnet_cs.c @@ -1091,8 +1091,8 @@ static int ei_start_xmit(struct sk_buff *skb, struct net_device *dev) ei_local->irqlock = 1; - send_length = ETH_ZLEN < length ? length : ETH_ZLEN; - + send_length = max(length, ETH_ZLEN); + /* * We have two Tx slots available for use. Find the first free * slot, and then perform some sanity checks. With two Tx bufs, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/6] pcmcia/fmvj18x_cs: Fix 'shadow variable' warning
Fixing: CHECK drivers/net/pcmcia/fmvj18x_cs.c drivers/net/pcmcia/fmvj18x_cs.c:1205:6: warning: symbol 'i' shadows an earlier one drivers/net/pcmcia/fmvj18x_cs.c:1179:9: originally declared here Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]> --- diff --git a/drivers/net/pcmcia/fmvj18x_cs.c b/drivers/net/pcmcia/fmvj18x_cs.c index 8c719b4..4f604ae 100644 --- a/drivers/net/pcmcia/fmvj18x_cs.c +++ b/drivers/net/pcmcia/fmvj18x_cs.c @@ -1202,8 +1202,7 @@ static void set_rx_mode(struct net_device *dev) outb(1, ioaddr + RX_MODE); /* Ignore almost all multicasts. */ } else { struct dev_mc_list *mclist; - int i; - + memset(mc_filter, 0, sizeof(mc_filter)); for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count; i++, mclist = mclist->next) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/6] pcmcia/3c574_cs: Fix 'shadow variable' warning
Fixing: CHECK drivers/net/pcmcia/3c574_cs.c drivers/net/pcmcia/3c574_cs.c:695:7: warning: symbol 'i' shadows an earlier one drivers/net/pcmcia/3c574_cs.c:636:6: originally declared here Signed-off-by: Richard Knutson <[EMAIL PROTECTED]> --- diff --git a/drivers/net/pcmcia/3c574_cs.c b/drivers/net/pcmcia/3c574_cs.c index ad134a6..97b6daa 100644 --- a/drivers/net/pcmcia/3c574_cs.c +++ b/drivers/net/pcmcia/3c574_cs.c @@ -692,7 +692,7 @@ static void tc574_reset(struct net_device *dev) mdio_write(ioaddr, lp->phys, 4, lp->advertising); if (!auto_polarity) { /* works for TDK 78Q2120 series MII's */ - int i = mdio_read(ioaddr, lp->phys, 16) | 0x20; + i = mdio_read(ioaddr, lp->phys, 16) | 0x20; mdio_write(ioaddr, lp->phys, 16, i); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/6] pcmcia/3c574_cs: Fix dubious bitfield warning
Fixing: CHECK drivers/net/pcmcia/3c574_cs.c drivers/net/pcmcia/3c574_cs.c:194:13: warning: dubious bitfield without explicit `signed' or `unsigned' drivers/net/pcmcia/3c574_cs.c:196:14: warning: dubious bitfield without explicit `signed' or `unsigned' Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]> --- Is there a reason for not doing it this way? diff --git a/drivers/net/pcmcia/3c574_cs.c b/drivers/net/pcmcia/3c574_cs.c index ad134a6..97b6daa 100644 --- a/drivers/net/pcmcia/3c574_cs.c +++ b/drivers/net/pcmcia/3c574_cs.c @@ -190,10 +190,10 @@ enum Window3 {/* Window 3: MAC/config bits. */ union wn3_config { int i; struct w3_config_fields { - unsigned int ram_size:3, ram_width:1, ram_speed:2, rom_size:2; - int pad8:8; - unsigned int ram_split:2, pad18:2, xcvr:3, pad21:1, autoselect:1; - int pad24:7; + u8 ram_size:3, ram_width:1, ram_speed:2, rom_size:2; + u8 pad8; + u8 ram_split:2, pad18:2, xcvr:3, pad21:1; + u8 autoselect:1, pad24:7; } u; }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/6] pcmcia/axnet_cs: Make functions static
Fixing: CHECK drivers/net/pcmcia/axnet_cs.c drivers/net/pcmcia/axnet_cs.c:994:5: warning: symbol 'ax_close' was not declared. Should it be static? drivers/net/pcmcia/axnet_cs.c:1017:6: warning: symbol 'ei_tx_timeout' was not declared. Should it be static? Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]> --- diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c index 8d910a3..96931cc 100644 --- a/drivers/net/pcmcia/axnet_cs.c +++ b/drivers/net/pcmcia/axnet_cs.c @@ -991,7 +991,7 @@ static int ax_open(struct net_device *dev) * * Opposite of ax_open(). Only used when "ifconfig down" is done. */ -int ax_close(struct net_device *dev) +static int ax_close(struct net_device *dev) { unsigned long flags; @@ -1014,7 +1014,7 @@ int ax_close(struct net_device *dev) * completed (or failed) - i.e. never posted a Tx related interrupt. */ -void ei_tx_timeout(struct net_device *dev) +static void ei_tx_timeout(struct net_device *dev) { long e8390_base = dev->base_addr; struct ei_device *ei_local = (struct ei_device *) netdev_priv(dev); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] A clean approach to writeout throttling
On Monday 10 December 2007 13:31, Jonathan Corbet wrote: > Hey, Daniel, > > I'm just getting around to looking at this. One thing jumped out at me: > > + if (bio->bi_throttle) { > > + struct request_queue *q = bio->bi_queue; > > + bio->bi_throttle = 0; /* or detect multiple endio and err? */ > > + atomic_add(bio->bi_throttle, >available); > > + wake_up(>throttle_wait); > > + } > > I'm feeling like I must be really dumb, but...how can that possibly > work? You're zeroing >bi_throttle before adding it back into > q->available, so the latter will never increase... Hi Jon, Don't you know? These days we optimize all our code for modern processors with tunnelling instructions and metaphysical cache. On such processors, setting a register to zero does not entirely destroy all the data that used to be in the register, so subsequent instructions can make further use of the overwritten data by reconstructing it from remnants of bits left attached to the edges of the register. Um, yeah, that's it. Actually, I fat-fingered it in the merge to -mm. Thanks for the catch, corrected patch attached. The offending line isn't even a functional part of the algorithm, it is just supposed to defend against the possibility that, somehow, ->bi_endio gets called multiple times. Probably it should really be something like: BUG_ON(bio->bi_throttle == -1); if (bio->bi_throttle) { ... bio->bi_throttle = -1; Or perhaps we should just rely on nobody ever making that mistake and let somebody else catch it if it does. Regards, Daniel --- 2.6.24-rc3-mm.clean/block/ll_rw_blk.c 2007-12-04 14:45:25.0 -0800 +++ 2.6.24-rc3-mm/block/ll_rw_blk.c 2007-12-10 04:49:56.0 -0800 @@ -3210,9 +3210,9 @@ static inline int bio_check_eod(struct b */ static inline void __generic_make_request(struct bio *bio) { - struct request_queue *q; + struct request_queue *q = bdev_get_queue(bio->bi_bdev); sector_t old_sector; - int ret, nr_sectors = bio_sectors(bio); + int nr_sectors = bio_sectors(bio); dev_t old_dev; int err = -EIO; @@ -3221,6 +3221,13 @@ static inline void __generic_make_reques if (bio_check_eod(bio, nr_sectors)) goto end_io; + if (q && q->metric && !bio->bi_queue) { + int need = bio->bi_throttle = q->metric(bio); + bio->bi_queue = q; + /* FIXME: potential race if atomic_sub is called in the middle of condition check */ + wait_event(q->throttle_wait, atomic_read(>available) >= need); + atomic_sub(need, >available); + } /* * Resolve the mapping until finished. (drivers are * still free to implement/resolve their own stacking @@ -3231,10 +3238,9 @@ static inline void __generic_make_reques */ old_sector = -1; old_dev = 0; - do { + while (1) { char b[BDEVNAME_SIZE]; - q = bdev_get_queue(bio->bi_bdev); if (!q) { printk(KERN_ERR "generic_make_request: Trying to access " @@ -3282,8 +3288,10 @@ end_io: goto end_io; } - ret = q->make_request_fn(q, bio); - } while (ret); + if (!q->make_request_fn(q, bio)) + return; + q = bdev_get_queue(bio->bi_bdev); + } } /* --- 2.6.24-rc3-mm.clean/drivers/md/dm.c 2007-12-04 14:46:04.0 -0800 +++ 2.6.24-rc3-mm/drivers/md/dm.c 2007-12-04 23:31:41.0 -0800 @@ -889,6 +889,11 @@ static int dm_any_congested(void *conges return r; } +static unsigned dm_metric(struct bio *bio) +{ + return bio->bi_vcnt; +} + /*- * An IDR is used to keep track of allocated minor numbers. *---*/ @@ -967,6 +972,7 @@ out: static struct block_device_operations dm_blk_dops; +#define DEFAULT_THROTTLE_CAPACITY 1000 /* * Allocate and initialise a blank device with a given minor. */ @@ -1009,6 +1015,11 @@ static struct mapped_device *alloc_dev(i goto bad1_free_minor; md->queue->queuedata = md; + md->queue->metric = dm_metric; + /* A dm device constructor may change the throttle capacity */ + atomic_set(>queue->available, md->queue->capacity = DEFAULT_THROTTLE_CAPACITY); + init_waitqueue_head(>queue->throttle_wait); + md->queue->backing_dev_info.congested_fn = dm_any_congested; md->queue->backing_dev_info.congested_data = md; blk_queue_make_request(md->queue, dm_request); --- 2.6.24-rc3-mm.clean/fs/bio.c 2007-12-04 14:38:47.0 -0800 +++ 2.6.24-rc3-mm/fs/bio.c 2007-12-04 23:31:41.0 -0800 @@ -1007,6 +1007,13 @@ void bio_endio(struct bio *bio, int erro else if (!test_bit(BIO_UPTODATE, >bi_flags)) error = -EIO; + if (bio->bi_throttle) { + struct request_queue *q = bio->bi_queue; + atomic_add(bio->bi_throttle, >available); + bio->bi_throttle = 0; /* or detect multiple endio and err? */ + wake_up(>throttle_wait); + } + if (bio->bi_end_io) bio->bi_end_io(bio, error); } --- 2.6.24-rc3-mm.clean/include/linux/bio.h 2007-12-04
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
On Mon, Dec 10, 2007 at 06:08:03PM -0700, Eric W. Biederman wrote: > Neil Horman <[EMAIL PROTECTED]> writes: > > > Ok. This test is broken. Please remove the == 1. You are looking > for == (1 << 18). So just saying: "if (htcfg & (1 << 18))" should be clearer. > Fixed. Thanks! > > + printk(KERN_INFO "Detected use of extended apic ids on hypertransport > > bus\n"); > > + if ((htcfg & (1 << 17)) == 0) { > > + printk(KERN_INFO "Enabling hypertransport extended apic interrupt > > broadcast\n"); > > + htcfg |= (1 << 17); > > + write_pci_config(num, slot, func, 0x68, htcfg); > > + } > > + } > > + > > +} > > The rest of this quirk looks fine, include the fact it is only intended > to be applied to PCI_VENDOR_ID_AMD PCI_DEVICE_ID_AMD_K8_NB. > Copy that. > > For what is below I don't like the way the infrastructure has been > extended as what you are doing quickly devolves into a big mess. > > Please extend struct chipset to be something like: > struct chipset { > u16 vendor; > u16 device; > u32 class, class_mask; > void (*f)(void); > }; > > And then the test for matching the chipset can be something like: > if ((id->vendor == PCI_ANY_ID || id->vendor == dev->vendor) && > (id->device == PCI_ANY_ID || id->device == dev->device) && > !((id->class ^ dev->class) & id->class_mask)) > > Essentially a subset of pci_match_one_device from drivers/pci/pci.h > > That way you don't need to increase the number of tables or the > number of passes through the pci busses, just update the early_qrk > table with a few more bits of information. > copy that. Fixed. Thanks! > The extended form should be much more maintainable in the long > run. Given that we may want this before we enable the timer > which is very early doing this in the pci early quirks seems > to make sense. > > Eric New patch attached, with suggestions incorporated. Thanks & regards Neil Signed-off-by: Neil Horman <[EMAIL PROTECTED]> early-quirks.c | 82 ++--- 1 file changed, 73 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 88bb83e..4b0cee1 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -44,6 +44,50 @@ static int __init nvidia_hpet_check(struct acpi_table_header *header) #endif /* CONFIG_X86_IO_APIC */ #endif /* CONFIG_ACPI */ +static void __init fix_hypertransport_config(int num, int slot, int func) +{ + u32 htcfg; + /* +*we found a hypertransport bus +*make sure that are broadcasting +*interrupts to all cpus on the ht bus +*if we're using extended apic ids +*/ + htcfg = read_pci_config(num, slot, func, 0x68); + if (htcfg & (1 << 18)) { + printk(KERN_INFO "Detected use of extended apic ids on hypertransport bus\n"); + if ((htcfg & (1 << 17)) == 0) { + printk(KERN_INFO "Enabling hypertransport extended apic interrupt broadcast\n"); + htcfg |= (1 << 17); + write_pci_config(num, slot, func, 0x68, htcfg); + } + } + +} + +static void __init check_hypertransport_config() +{ + int num, slot, func; + u32 device, vendor; + func = 0; + for (num = 0; num < 32; num++) { + for (slot = 0; slot < 32; slot++) { + vendor = read_pci_config(num,slot,func, + PCI_VENDOR_ID); + device = read_pci_config(num,slot,func, + PCI_DEVICE_ID); + vendor &= 0x; + device >>= 16; + if ((vendor == PCI_VENDOR_ID_AMD) && + (device == PCI_DEVICE_ID_AMD_K8_NB)) + fix_hypertransport_config(num,slot,func); + } + } + + return; + +} + static void __init nvidia_bugs(void) { #ifdef CONFIG_ACPI @@ -83,15 +127,25 @@ static void __init ati_bugs(void) #endif } +static void __init amd_host_bugs(void) +{ + printk(KERN_CRIT "IN AMD_HOST_BUGS\n"); + check_hypertransport_config(); +} + struct chipset { u16 vendor; + u16 device; + u32 class; + u32 class_mask; void (*f)(void); }; static struct chipset early_qrk[] __initdata = { - { PCI_VENDOR_ID_NVIDIA, nvidia_bugs }, - { PCI_VENDOR_ID_VIA, via_bugs }, - { PCI_VENDOR_ID_ATI, ati_bugs }, + { PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, nvidia_bugs }, + { PCI_VENDOR_ID_VIA, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, via_bugs }, + { PCI_VENDOR_ID_ATI, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, ati_bugs }, + {
[PATCH 2/4 v2] added methods for sched_class changes
Dmitry Adamushko found that the current implementation of the RT balancing code left out changes to the sched_setscheduler and rt_mutex_setprio. This patch addresses this issue by adding methods to the schedule classes to handle being switched out of (switched_from) and being switched into (switched_to) a sched_class. Also a method for changing of priorities is also added (prio_changed). This patch also removes some duplicate logic between rt_mutex_setprio and sched_setscheduler. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> --- include/linux/sched.h |7 +++ kernel/sched.c | 42 ++ kernel/sched_fair.c | 39 + kernel/sched_idletask.c | 31 kernel/sched_rt.c | 89 5 files changed, 186 insertions(+), 22 deletions(-) Index: linux-sched/include/linux/sched.h === --- linux-sched.orig/include/linux/sched.h 2007-12-10 20:39:14.0 -0500 +++ linux-sched/include/linux/sched.h 2007-12-10 20:39:17.0 -0500 @@ -860,6 +860,13 @@ struct sched_class { void (*join_domain)(struct rq *rq); void (*leave_domain)(struct rq *rq); + + void (*switched_from) (struct rq *this_rq, struct task_struct *task, + int running); + void (*switched_to) (struct rq *this_rq, struct task_struct *task, +int running); + void (*prio_changed) (struct rq *this_rq, struct task_struct *task, +int oldprio, int running); }; struct load_weight { Index: linux-sched/kernel/sched.c === --- linux-sched.orig/kernel/sched.c 2007-12-10 20:39:14.0 -0500 +++ linux-sched/kernel/sched.c 2007-12-10 20:39:17.0 -0500 @@ -1147,6 +1147,18 @@ static inline void __set_task_cpu(struct #endif } +static inline void check_class_changed(struct rq *rq, struct task_struct *p, + const struct sched_class *prev_class, + int oldprio, int running) +{ + if (prev_class != p->sched_class) { + if (prev_class->switched_from) + prev_class->switched_from(rq, p, running); + p->sched_class->switched_to(rq, p, running); + } else + p->sched_class->prio_changed(rq, p, oldprio, running); +} + #ifdef CONFIG_SMP /* @@ -4012,6 +4024,7 @@ void rt_mutex_setprio(struct task_struct unsigned long flags; int oldprio, on_rq, running; struct rq *rq; + const struct sched_class *prev_class = p->sched_class; BUG_ON(prio < 0 || prio > MAX_PRIO); @@ -4037,18 +4050,10 @@ void rt_mutex_setprio(struct task_struct if (on_rq) { if (running) p->sched_class->set_curr_task(rq); + enqueue_task(rq, p, 0); - /* -* Reschedule if we are currently running on this runqueue and -* our priority decreased, or if we are not currently running on -* this runqueue and our priority is higher than the current's -*/ - if (running) { - if (p->prio > oldprio) - resched_task(rq->curr); - } else { - check_preempt_curr(rq, p); - } + + check_class_changed(rq, p, prev_class, oldprio, running); } task_rq_unlock(rq, ); } @@ -4248,6 +4253,7 @@ int sched_setscheduler(struct task_struc { int retval, oldprio, oldpolicy = -1, on_rq, running; unsigned long flags; + const struct sched_class *prev_class = p->sched_class; struct rq *rq; /* may grab non-irq protected spin_locks */ @@ -4341,18 +4347,10 @@ recheck: if (on_rq) { if (running) p->sched_class->set_curr_task(rq); + activate_task(rq, p, 0); - /* -* Reschedule if we are currently running on this runqueue and -* our priority decreased, or if we are not currently running on -* this runqueue and our priority is higher than the current's -*/ - if (running) { - if (p->prio > oldprio) - resched_task(rq->curr); - } else { - check_preempt_curr(rq, p); - } + + check_class_changed(rq, p, prev_class, oldprio, running); } __task_rq_unlock(rq); spin_unlock_irqrestore(>pi_lock, flags); Index: linux-sched/kernel/sched_fair.c === --- linux-sched.orig/kernel/sched_fair.c2007-12-10 20:39:11.0
[PATCH 1/4 v2] Replace hooks with pre/post schedule and wakeup methods
To make the main sched.c code more agnostic to the schedule classes. Instead of having specific hooks in the schedule code for the RT class balancing. They are replaced with a pre_schedule, post_schedule and task_wake_up methods. These methods may be used by any of the classes but currently, only the sched_rt class implements them. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> --- include/linux/sched.h |3 +++ kernel/sched.c| 20 kernel/sched_rt.c | 17 +++-- 3 files changed, 26 insertions(+), 14 deletions(-) Index: linux-sched/include/linux/sched.h === --- linux-sched.orig/include/linux/sched.h 2007-12-10 20:39:11.0 -0500 +++ linux-sched/include/linux/sched.h 2007-12-10 20:39:14.0 -0500 @@ -848,6 +848,9 @@ struct sched_class { int (*move_one_task) (struct rq *this_rq, int this_cpu, struct rq *busiest, struct sched_domain *sd, enum cpu_idle_type idle); + void (*pre_schedule) (struct rq *this_rq, struct task_struct *task); + void (*post_schedule) (struct rq *this_rq); + void (*task_wake_up) (struct rq *this_rq, struct task_struct *task); #endif void (*set_curr_task) (struct rq *rq); Index: linux-sched/kernel/sched.c === --- linux-sched.orig/kernel/sched.c 2007-12-10 20:39:11.0 -0500 +++ linux-sched/kernel/sched.c 2007-12-10 20:39:14.0 -0500 @@ -1620,7 +1620,10 @@ out_activate: out_running: p->state = TASK_RUNNING; - wakeup_balance_rt(rq, p); +#ifdef CONFIG_SMP + if (p->sched_class->task_wake_up) + p->sched_class->task_wake_up(rq, p); +#endif out: task_rq_unlock(rq, ); @@ -1743,7 +1746,10 @@ void fastcall wake_up_new_task(struct ta inc_nr_running(p, rq); } check_preempt_curr(rq, p); - wakeup_balance_rt(rq, p); +#ifdef CONFIG_SMP + if (p->sched_class->task_wake_up) + p->sched_class->task_wake_up(rq, p); +#endif task_rq_unlock(rq, ); } @@ -1864,7 +1870,10 @@ static void finish_task_switch(struct rq prev_state = prev->state; finish_arch_switch(prev); finish_lock_switch(rq, prev); - schedule_tail_balance_rt(rq); +#ifdef CONFIG_SMP + if (current->sched_class->post_schedule) + current->sched_class->post_schedule(rq); +#endif fire_sched_in_preempt_notifiers(current); if (mm) @@ -3633,7 +3642,10 @@ need_resched_nonpreemptible: switch_count = >nvcsw; } - schedule_balance_rt(rq, prev); +#ifdef CONFIG_SMP + if (prev->sched_class->pre_schedule) + prev->sched_class->pre_schedule(rq, prev); +#endif if (unlikely(!rq->nr_running)) idle_balance(cpu, rq); Index: linux-sched/kernel/sched_rt.c === --- linux-sched.orig/kernel/sched_rt.c 2007-12-10 20:39:11.0 -0500 +++ linux-sched/kernel/sched_rt.c 2007-12-10 20:39:14.0 -0500 @@ -689,14 +689,14 @@ static int pull_rt_task(struct rq *this_ return ret; } -static void schedule_balance_rt(struct rq *rq, struct task_struct *prev) +static void pre_schedule_rt(struct rq *rq, struct task_struct *prev) { /* Try to pull RT tasks here if we lower this rq's prio */ if (unlikely(rt_task(prev)) && rq->rt.highest_prio > prev->prio) pull_rt_task(rq); } -static void schedule_tail_balance_rt(struct rq *rq) +static void post_schedule_rt(struct rq *rq) { /* * If we have more than one rt_task queued, then @@ -713,10 +713,9 @@ static void schedule_tail_balance_rt(str } -static void wakeup_balance_rt(struct rq *rq, struct task_struct *p) +static void task_wake_up_rt(struct rq *rq, struct task_struct *p) { - if (unlikely(rt_task(p)) && - !task_running(rq, p) && + if (!task_running(rq, p) && (p->prio >= rq->rt.highest_prio) && rq->rt.overloaded) push_rt_tasks(rq); @@ -780,11 +779,6 @@ static void leave_domain_rt(struct rq *r if (rq->rt.overloaded) rt_clear_overload(rq); } - -#else /* CONFIG_SMP */ -# define schedule_tail_balance_rt(rq) do { } while (0) -# define schedule_balance_rt(rq, prev) do { } while (0) -# define wakeup_balance_rt(rq, p) do { } while (0) #endif /* CONFIG_SMP */ static void task_tick_rt(struct rq *rq, struct task_struct *p) @@ -838,6 +832,9 @@ const struct sched_class rt_sched_class .set_cpus_allowed = set_cpus_allowed_rt, .join_domain= join_domain_rt, .leave_domain = leave_domain_rt, + .pre_schedule = pre_schedule_rt, + .post_schedule = post_schedule_rt, +
[PATCH 4/4 v2] Subject: SCHED - Clean up some old cpuset logic
From: Gregory Haskins <[EMAIL PROTECTED]> We had support for overlapping cpuset based rto logic in early prototypes that is no longer used, so clean it up. Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]> Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> --- kernel/sched_rt.c | 33 - 1 file changed, 33 deletions(-) Index: linux-sched/kernel/sched_rt.c === --- linux-sched.orig/kernel/sched_rt.c 2007-12-10 20:39:19.0 -0500 +++ linux-sched/kernel/sched_rt.c 2007-12-10 20:39:21.0 -0500 @@ -586,38 +586,6 @@ static int pull_rt_task(struct rq *this_ continue; src_rq = cpu_rq(cpu); - if (unlikely(src_rq->rt.rt_nr_running <= 1)) { - /* -* It is possible that overlapping cpusets -* will miss clearing a non overloaded runqueue. -* Clear it now. -*/ - if (double_lock_balance(this_rq, src_rq)) { - /* unlocked our runqueue lock */ - struct task_struct *old_next = next; - - next = pick_next_task_rt(this_rq); - if (next != old_next) - ret = 1; - } - if (likely(src_rq->rt.rt_nr_running <= 1)) { - /* -* Small chance that this_rq->curr changed -* but it's really harmless here. -*/ - rt_clear_overload(this_rq); - } else { - /* -* Heh, the src_rq is now overloaded, since -* we already have the src_rq lock, go straight -* to pulling tasks from it. -*/ - goto try_pulling; - } - spin_unlock(_rq->lock); - continue; - } - /* * We can potentially drop this_rq's lock in * double_lock_balance, and another CPU could @@ -641,7 +609,6 @@ static int pull_rt_task(struct rq *this_ continue; } - try_pulling: p = pick_next_highest_task_rt(src_rq, this_cpu); /* -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/4 v2] RT balance updates against sched-devel
[Sorry if this is a repost, but I had a problem with quilt mail, and I don't know if my original post made it out. Unfortunately, I didn't save the original "prolog" file, and so this has to be rewritten from scratch, and I don't even remember the original subject :-/ ] This patch series goes against Ingo's sched-devel git tree. The first patch addresses Ingo's concerns about having hooks in the main sched.c and replaces them with generic methods that any class may use. The methods are: pre_schedule, post_schedule and task_wake_up; which is called before the schedule, after a context switch and when a task wakes up respectively. The are surrounded by ifdef CONFIG_SMP since they are currently only used by sched_rt in SMP mode. But if this appears to be applicable to other sched_classes in UP, then I can rerun this series without the ifdefs. The second patch addresses the concerns that Dmitry brought up showing that the current RT balancing neglected to handle changes in prio and classes from sched_setscheduler and rt_mutex_setprio. The added methods are: switched_to, switched_from and prio_changed; these are called in the when a task is assigned a new sched_class, after it leaves a sched_class, and when it changes its prio respectively. The last two patches are from Gregory Haskins where he cleaned up left over changes that were from previous versions of the balancing code. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4 v2] SCHED - Only adjust overload state when changing
From: Gregory Haskins <[EMAIL PROTECTED]> The overload set/clears were originally idempotent when this logic was first implemented. But that is no longer true due to the addition of the atomic counter and this logic was never updated to work properly with that change. So only adjust the overload state if it is actually changing to avoid getting out of sync. Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]> --- kernel/sched_rt.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) Index: linux-sched/kernel/sched_rt.c === --- linux-sched.orig/kernel/sched_rt.c 2007-12-10 20:39:17.0 -0500 +++ linux-sched/kernel/sched_rt.c 2007-12-10 20:39:19.0 -0500 @@ -34,9 +34,11 @@ static inline void rt_clear_overload(str static void update_rt_migration(struct rq *rq) { if (rq->rt.rt_nr_migratory && (rq->rt.rt_nr_running > 1)) { - rt_set_overload(rq); - rq->rt.overloaded = 1; - } else { + if (!rq->rt.overloaded) { + rt_set_overload(rq); + rq->rt.overloaded = 1; + } + } else if (rq->rt.overloaded) { rt_clear_overload(rq); rq->rt.overloaded = 0; } -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][SCSI] hptiop: add more adapter models and other fixes
Matthew Wilcox wrote: >> - add more PCI device IDs >> - support for adapters based on Marvell IOP > > Are you sure it's a good idea to do this? This patch is 1200 lines long > ... the same size as the existing driver: > > $ wc drivers/scsi/hptiop.* > 947 2273 24531 drivers/scsi/hptiop.c > 256 612 6175 drivers/scsi/hptiop.h > 1203 2885 30706 total > > That suggests to me there's not much commonality between the two drivers, > and you'd be better off adding a second driver for the 4xxx cards The new adapter implementation adds to the driver about 300 lines of code (some lines in the original driver was changed slightly to accommodate the difference). It is only different from the original models on the messaging interface, and still shares same firmware command block structures and work flow. HighPoint Linux Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[DOC][for -mm] update Documentation/controller/memory.txt
Balbir-san, could you review this update ? -- Documentation updates for memory controller. Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> Index: linux-2.6.24-rc4-mm1/Documentation/controllers/memory.txt === --- linux-2.6.24-rc4-mm1.orig/Documentation/controllers/memory.txt +++ linux-2.6.24-rc4-mm1/Documentation/controllers/memory.txt @@ -9,8 +9,7 @@ d. Provides a double LRU: global memory global LRU; a cgroup on hitting a limit, reclaims from the per cgroup LRU -NOTE: Page Cache (unmapped) also includes Swap Cache pages as a subset -and will not be referred to explicitly in the rest of the documentation. +NOTE: Swap Cache (unmapped) is not accounted now. Benefits and Purpose of the memory controller @@ -144,7 +143,7 @@ list. The memory controller uses the following hierarchy 1. zone->lru_lock is used for selecting pages to be isolated -2. mem->lru_lock protects the per cgroup LRU +2. mem->per_zone->lru_lock protects the per cgroup LRU (per zone) 3. lock_page_cgroup() is used to protect page->page_cgroup 3. User Interface @@ -193,6 +192,15 @@ this file after a write to guarantee the The memory.failcnt field gives the number of times that the cgroup limit was exceeded. +The memory.stat file gives accounting information. Now, the number of +caches, RSS and Active pages/Inactive pages are shown. + +The memory.force_empty gives an interface to drop *all* charges by force. + +# echo -n 1 > memory.force_empty + +will drop all charges in cgroup. Currently, this is maintained for test. + 4. Testing Balbir posted lmbench, AIM9, LTP and vmmstress results [10] and [11]. @@ -222,11 +230,8 @@ reclaimed. A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a cgroup might have some charge associated with it, even though all -tasks have migrated away from it. If some pages are still left, after following -the steps listed in sections 4.1 and 4.2, check the Swap Cache usage in -/proc/meminfo to see if the Swap Cache usage is showing up in the -cgroups memory.usage_in_bytes counter. A simple test of swapoff -a and -swapon -a should free any pending Swap Cache usage. +tasks have migrated away from it. Such charges are automatically dropped at +rmdir() if there are no tasks. 4.4 Choosing what to account -- Page Cache (unmapped) vs RSS (mapped)? @@ -238,15 +243,11 @@ echo -n 1 > memory.control_type 5. TODO 1. Add support for accounting huge pages (as a separate controller) -2. Improve the user interface to accept/display memory limits in KB or MB - rather than pages (since page sizes can differ across platforms/machines). -3. Make cgroup lists per-zone -4. Make per-cgroup scanner reclaim not-shared pages first -5. Teach controller to account for shared-pages -6. Start reclamation when the limit is lowered -7. Start reclamation in the background when the limit is +2. Make per-cgroup scanner reclaim not-shared pages first +3. Teach controller to account for shared-pages +4. Start reclamation when the limit is lowered +5. Start reclamation in the background when the limit is not yet hit but the usage is getting closer -8. Create per zone LRU lists per cgroup Summary -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
"Huang, Ying" <[EMAIL PROTECTED]> writes: > This patch implements the functionality of jumping between the kexeced > kernel and the original kernel. > > To support jumping between two kernels, before jumping to (executing) > the new kernel and jumping back to the original kernel, the devices > are put into quiescent state, and the state of devices and CPU is > saved. After jumping back from kexeced kernel and jumping to the new > kernel, the state of devices and CPU are restored accordingly. The > devices/CPU state save/restore code of software suspend is called to > implement corresponding function. > > To support jumping without reserving memory. One shadow backup page > (source page) is allocated for each page used by new (kexeced) kernel > (destination page). When do kexec_load, the image of new kernel is > loaded into source pages, and before executing, the destination pages > and the source pages are swapped, so the contents of destination pages > are backupped. Before jumping to the new (kexeced) kernel and after > jumping back to the original kernel, the destination pages and the > source pages are swapped too. > > A jump back protocol for kexec is defined and documented. It is an > extension to ordinary function calling protocol. So, the facility > provided by this patch can be used to call ordinary C function in real > mode. > > A set of flags for sys_kexec_load are added to control which state are > saved/restored before/after real mode code executing. For example, you > can specify the device state and FPU state are saved/restored > before/after real mode code executing. > > The states (exclude CPU state) save/restore code can be overridden > based on the "command" parameter of kexec jump. Because more states > need to be saved/restored by hibernating/resuming. > > Signed-off-by: Huang Ying <[EMAIL PROTECTED]> > > --- > Documentation/i386/jump_back_protocol.txt | 103 ++ > arch/powerpc/kernel/machine_kexec.c |2 > arch/ppc/kernel/machine_kexec.c |2 > arch/sh/kernel/machine_kexec.c|2 > arch/x86/kernel/machine_kexec_32.c| 88 +--- > arch/x86/kernel/machine_kexec_64.c|2 > arch/x86/kernel/relocate_kernel_32.S | 214 +++--- > include/asm-x86/kexec_32.h| 39 - > include/linux/kexec.h | 40 + > kernel/kexec.c| 188 ++ > kernel/power/Kconfig |2 > kernel/sys.c | 35 +++- > 12 files changed, 648 insertions(+), 69 deletions(-) > > --- a/arch/x86/kernel/machine_kexec_32.c > +++ b/arch/x86/kernel/machine_kexec_32.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > > #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) > static u32 kexec_pgd[1024] PAGE_ALIGNED; > @@ -83,10 +84,14 @@ static void load_segments(void) > * reboot code buffer to allow us to avoid allocations > * later. > * > - * Currently nothing. > + * Turn off NX bit for control page. > */ > int machine_kexec_prepare(struct kimage *image) > { > + if (nx_enabled) { > + change_page_attr(image->control_code_page, 1, PAGE_KERNEL_EXEC); > + global_flush_tlb(); > + } > return 0; > } > > @@ -96,25 +101,59 @@ int machine_kexec_prepare(struct kimage > */ > void machine_kexec_cleanup(struct kimage *image) > { > + if (nx_enabled) { > + change_page_attr(image->control_code_page, 1, PAGE_KERNEL); > + global_flush_tlb(); > + } > +} > + > +void machine_kexec(struct kimage *image) > +{ > + machine_kexec_call(image, NULL, 0); > } > > /* > * Do not allocate memory (or fail in any way) in machine_kexec(). > * We are past the point of no return, committed to rebooting now. > */ > -NORET_TYPE void machine_kexec(struct kimage *image) > +int machine_kexec_vcall(struct kimage *image, unsigned long *ret, > + unsigned int argc, va_list args) > { Why do we need var arg support? Can't we do that with a shim we load from user space? > unsigned long page_list[PAGES_NR]; > void *control_page; > + asmlinkage NORET_TYPE void > + (*relocate_kernel_ptr)(unsigned long indirection_page, > +unsigned long control_page, > +unsigned long start_address, > +unsigned int has_pae) ATTRIB_NORET; > > /* Interrupts aren't acceptable while we reboot */ > local_irq_disable(); > > control_page = page_address(image->control_code_page); > - memcpy(control_page, relocate_kernel, PAGE_SIZE); > + memcpy(control_page, relocate_page, PAGE_SIZE/2); > + KCALL_MAGIC(control_page) = 0; > > + if (image->preserve_cpu) { > + unsigned int i; > + KCALL_MAGIC(control_page) = KCALL_MAGIC_NUMBER; >
Re: [PATCH 2.6.24-rc4-mm 2/2] gpiolib: add Generic IRQ support for 16-bit PCA9539 GPIO expander
On Dec 10, 2007 6:14 PM, David Brownell <[EMAIL PROTECTED]> wrote: > On Monday 10 December 2007, eric miao wrote: > > +config GPIO_PCA9539_GENERIC_IRQ > > +bool " Generic IRQ support for PCA9539" > > +depends on GPIO_PCA9539=y > > Also depends on GENERIC_HARDIRQS, right? (You should let > the Kconfig UI handle indentation, too...) > > Seems like doing this for an I2C chip ought to shake loose > some interesting review comments. :) > > > > +help > > + Say yes here to support the Generic IRQ for the PCA9539 on-chip > > + GPIO lines. > > This somewhat resembles the pcf857x chips in that it only support > pin-changed IRQs (IRQ_TYPE_EDGE_BOTH) in hardware. Some other I/O > expanders are a bit more flexible. > > - Dave > Updated as follows: >From 486724d8b2b7a668600e38807680cc3a089ad533 Mon Sep 17 00:00:00 2001 From: eric miao <[EMAIL PROTECTED]> Date: Mon, 10 Dec 2007 17:24:36 +0800 Subject: [PATCH] gpiolib: add Generic IRQ support for 16-bit PCA9539 GPIO expander This patch adds the generic IRQ support for the PCA9539 on-chip GPIOs. Note: due to the inaccessibility of the generic IRQ code within modules, this support is only available if the driver is built-in. Signed-off-by: eric miao <[EMAIL PROTECTED]> Acked-by: Ben Gardner <[EMAIL PROTECTED]> --- drivers/gpio/Kconfig | 11 +++- drivers/gpio/pca9539.c | 184 2 files changed, 194 insertions(+), 1 deletions(-) diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index 6528fce..f897df8 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -40,7 +40,16 @@ config GPIO_PCA9539 16-bit I/O port. This driver can also be built as a module. If so, the module - will be called pca9539. + will be called pca9539. Note: the Generic IRQ support for the + chip will only be available if the driver is built-in + +config GPIO_PCA9539_GENERIC_IRQ + bool "Generic IRQ support for PCA9539" + depends on GPIO_PCA9539=y && GENERIC_HARDIRQS + help + Say yes here to support the Generic IRQ for the PCA9539 on-chip + GPIO lines. Only pin-changed IRQs (IRQ_TYPE_EDGE_BOTH) are + supported in hardware. comment "SPI GPIO expanders:" diff --git a/drivers/gpio/pca9539.c b/drivers/gpio/pca9539.c index 0a3ae6a..e736dd9 100644 --- a/drivers/gpio/pca9539.c +++ b/drivers/gpio/pca9539.c @@ -11,6 +11,9 @@ #include #include +#include +#include +#include #include #include @@ -27,9 +30,25 @@ struct pca9539_chip { unsigned gpio_start; uint16_t reg_output; uint16_t reg_direction; + uint16_t last_input; struct i2c_client *client; struct gpio_chip gpio_chip; +#ifdef CONFIG_GPIO_PCA9539_GENERIC_IRQ + /* +* Note: Generic IRQ is not accessible within module code, the IRQ +* support will thus _only_ be available if the driver is built-in +*/ + int irq;/* IRQ for the chip itself */ + int irq_start; /* starting IRQ for the on-chip GPIO lines */ + + uint16_t irq_mask; + uint16_t irq_falling_edge; + uint16_t irq_rising_edge; + + struct irq_chip irq_chip; + struct work_struct irq_work; +#endif }; static int pca9539_write_reg(struct pca9539_chip *chip, int reg, uint16_t val) @@ -152,6 +171,150 @@ static int pca9539_init_gpio(struct pca9539_chip *chip) return gpiochip_add(gc); } +#ifdef CONFIG_GPIO_PCA9539_GENERIC_IRQ +/* FIXME: change to schedule_delayed_work() here if reading out of + * registers does not reflect the actual pin levels + */ + +static void pca9539_irq_work(struct work_struct *work) +{ + struct pca9539_chip *chip; + uint16_t input, mask, rising, falling; + int ret, i; + + chip = container_of(work, struct pca9539_chip, irq_work); + + ret = pca9539_read_reg(chip, PCA9539_INPUT, ); + if (ret < 0) + return; + + mask = (input ^ chip->last_input) & chip->irq_mask; + rising = (input & mask) & chip->irq_rising_edge; + falling = (~input & mask) & chip->irq_falling_edge; + + irq_enter(); + + for (i = 0; i < NR_PCA9539_GPIOS; i++) { + if ((rising | falling) & (1u << i)) { + int irq = chip->irq_start + i; + struct irq_desc *desc; + + desc = irq_desc + irq; + desc_handle_irq(irq, desc); + } + } + + irq_exit(); + + chip->last_input = input; +} + +static void fastcall +pca9539_irq_demux(unsigned int irq, struct irq_desc *desc) +{ + struct pca9539_chip *chip = desc->handler_data; + + desc->chip->mask(chip->irq); + desc->chip->ack(chip->irq); + schedule_work(>irq_work); + desc->chip->unmask(chip->irq); +} + +static void pca9539_irq_mask(unsigned int irq) +{ + struct irq_desc *desc = irq_desc + irq; + struct pca9539_chip *chip = desc->chip_data; + +
[PATCH][for -mm] fix accounting in vmscan.c for memory controller
Without this, ALLOCSTALL and PGSCAN_DIRECT increases too much unless there is no memory shortage. against 2.6.24-rc4-mm1. -Kame == Some amount of accounting is done while page reclaiming. Now, there are 2 types of page reclaim (if memory controller is used) - global: shortage of (global) pages. - under cgroup: use up to limit. I think 2 accountings, ALLOCSTALL and DIRECT should be accounted only under global lru scan. They are accounted against memory shortage at alloc_pages(). Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> mm/vmscan.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc4-mm1/mm/vmscan.c === --- linux-2.6.24-rc4-mm1.orig/mm/vmscan.c +++ linux-2.6.24-rc4-mm1/mm/vmscan.c @@ -896,8 +896,9 @@ static unsigned long shrink_inactive_lis if (current_is_kswapd()) { __count_zone_vm_events(PGSCAN_KSWAPD, zone, nr_scan); __count_vm_events(KSWAPD_STEAL, nr_freed); - } else + } else if (scan_global_lru(sc)) __count_zone_vm_events(PGSCAN_DIRECT, zone, nr_scan); + __count_zone_vm_events(PGSTEAL, zone, nr_freed); if (nr_taken == 0) @@ -1333,7 +1334,8 @@ static unsigned long do_try_to_free_page unsigned long lru_pages = 0; int i; - count_vm_event(ALLOCSTALL); + if (scan_global_lru(sc)) + count_vm_event(ALLOCSTALL); /* * mem_cgroup will not do shrink_slab. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/1] Writeback fix for concurrent large and small file writes.
From: Michael Rubin <[EMAIL PROTECTED]> Fixing a bug where writing to large files while concurrently writing to smaller ones creates a situation where writeback cannot keep up with the traffic and memory baloons until the we hit the threshold watermark. This can result in surprising latency spikes when syncing. This latency can take minutes on large memory systems. Upon request I can provide a test to reproduce this situation. The only concern I have is that this makes the wb_kupdate slightly more agressive. I am not sure it is enough to cause any problems. I think there is enough checks to throttle the background activity. Feng also the one line change that you recommended here http://marc.info/?l=linux-kernel=119629655402153=2 had no effect. Signed-off-by: Michael Rubin <[EMAIL PROTECTED]> --- Index: 2624rc3_feng/fs/fs-writeback.c === --- 2624rc3_feng.orig/fs/fs-writeback.c 2007-11-29 14:44:24.0 -0800 +++ 2624rc3_feng/fs/fs-writeback.c 2007-12-10 17:21:45.0 -0800 @@ -408,8 +408,7 @@ sync_sb_inodes(struct super_block *sb, s { const unsigned long start = jiffies;/* livelock avoidance */ - if (!wbc->for_kupdate || list_empty(>s_io)) - queue_io(sb, wbc->older_than_this); + queue_io(sb, wbc->older_than_this); while (!list_empty(>s_io)) { struct inode *inode = list_entry(sb->s_io.prev, Index: 2624rc3_feng/mm/page-writeback.c === --- 2624rc3_feng.orig/mm/page-writeback.c 2007-11-16 21:16:36.0 -0800 +++ 2624rc3_feng/mm/page-writeback.c2007-12-10 17:37:17.0 -0800 @@ -638,7 +638,7 @@ static void wb_kupdate(unsigned long arg wbc.nr_to_write = MAX_WRITEBACK_PAGES; writeback_inodes(); if (wbc.nr_to_write > 0) { - if (wbc.encountered_congestion || wbc.more_io) + if (wbc.encountered_congestion) congestion_wait(WRITE, HZ/10); else break; /* All the old data is written */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
David Newall wrote: Exactly. You think it's 2us, but the documentation doesn't say. The _p functions are generic inasmuch as they provide an unspecified delay. Drivers which work across platforms, and which use _p, therefore have different delays on different platforms. Should the length of the delay be unimportant? I wouldn't have thought so. If it is important, does that mean that such drivers are buggy on some platforms? That the _p delay is different across platforms is actually to be expected, since it pretty much amounts to a platform delay. And yes, if it is used as a specific walltime delay that has nothing to do with the bus architecture of the system then I would classify that as a driver bug. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
David Newall wrote: H. Peter Anvin wrote: David Newall wrote: Where did the 8us delay come from? The documentation and source is careful not to say how long the delay is. Would changing it to, say 1us, be technically wrong? Is code that requires 8us correct? I think a single ISA bus transaction is 1 µs, so two of them back to back should be 2 µs, not 8 µs... Exactly. You think it's 2us, but the documentation doesn't say. The _p functions are generic inasmuch as they provide an unspecified delay. Drivers which work across platforms, and which use _p, therefore have different delays on different platforms. Should the length of the delay be unimportant? I wouldn't have thought so. If it is important, does that mean that such drivers are buggy on some platforms? What it specifically does is it generates a delay which is proportional to the ISA/LPC clock. I really *hate* the idea that access to non-present hardware is used to generate a delay. That sucks so badly. It's worthy of a school-aged hacker, not of a world-leading operating system. It's so not best-practice that it's worst-practice. Perhaps you do, but it's the de facto standard on the platform. Every BIOS uses the same technique, because it works. *Now*, the real question is how many drivers actually need these delays. My guess is most don't at all. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/2] wait_task_stopped: remove unneeded delay_group_leader check
Your change looks correct to me. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
H. Peter Anvin wrote: David Newall wrote: Where did the 8us delay come from? The documentation and source is careful not to say how long the delay is. Would changing it to, say 1us, be technically wrong? Is code that requires 8us correct? I think a single ISA bus transaction is 1 µs, so two of them back to back should be 2 µs, not 8 µs... Exactly. You think it's 2us, but the documentation doesn't say. The _p functions are generic inasmuch as they provide an unspecified delay. Drivers which work across platforms, and which use _p, therefore have different delays on different platforms. Should the length of the delay be unimportant? I wouldn't have thought so. If it is important, does that mean that such drivers are buggy on some platforms? I really *hate* the idea that access to non-present hardware is used to generate a delay. That sucks so badly. It's worthy of a school-aged hacker, not of a world-leading operating system. It's so not best-practice that it's worst-practice. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 4/2] ptrace_stop: fix racy nonstop_code setting
Your change looks correct to me. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/2] ptrace_stop: fix the race with ptrace detach+attach
Your change looks correct to me. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] will_become_orphaned_pgrp: we have threads
Oleg Nesterov <[EMAIL PROTECTED]> writes: > On 12/09, Eric W. Biederman wrote: >> >> Oleg below is my proof of concept patch, which really needs to be >> broken up into a whole patch series, so the changes are small >> enough we can do a thorough audit on them. Anyway take a look >> and see what you think. > > Amazing ;) > > This patch certainly needs a time for understanding, so far I have > read only the small subset, a couple of random questions. Well I think it succeeds as a proof of concept and totally fails as a production patch at this point. >> * pgrp and session fields are deprecated. >> @@ -1034,8 +1035,9 @@ struct task_struct { >> struct list_head sibling; /* linkage in my parent's children list */ >> struct task_struct *group_leader; /* threadgroup leader */ >> >> +struct pid *tid; >> /* PID/PID hash table linkage. */ >> -struct pid_link pids[PIDTYPE_MAX]; >> +struct hlist_node pids[PIDTYPE_ARRAY_MAX]; > > OK. It certainly makes sense to move PIDTYPE_PGID/SID pids from task_struct > to signal struct. > > But can't we go a bit further? With this patch pid->tasks[].first still > "points" > to leader's task_struct. Suppose we replace pid->tasks[] with pid->signals[], > so that pid->signals[].first points to signal_struct. Then we can find the > task > (group_leader) via signal->tgid. > > This means we can remove task_struct->pids, and kill transfer_pid(). We need a way to sill implement do_each_pid_task, but otherwise that should work and be a nice clean up all on it's own. >> static inline struct pid *task_tgid(struct task_struct *task) >> { >> -return task->group_leader->pids[PIDTYPE_PID].pid; >> +struct signal_struct *sig = rcu_dereference(task->signal); >> +struct pid *pid = NULL; >> +if (sig) >> +pid = sig->tgid; >> +return pid; >> } > > Hmm. This is fixable, but note that task->signal is not RCU protected, > only ->sighand. Yes. I realized that after I had sent the patch out. We do run those functions with just rcu protection sometimes so something would need to be resolved there. >> static inline int pid_alive(struct task_struct *p) >> { >> -return p->pids[PIDTYPE_PID].pid != NULL; >> +return p->signal != NULL; >> } > > (this change btw is imho good regardless, because pid_alive() currently > means "the task is not unhashed yet" anyway). Yes. >> static void __unhash_process(struct task_struct *p) >> { >> nr_threads--; >> -detach_pid(p, PIDTYPE_PID); >> if (thread_group_leader(p)) { >> detach_pid(p, PIDTYPE_PGID); >> detach_pid(p, PIDTYPE_SID); >> @@ -65,6 +64,7 @@ static void __unhash_process(struct task_struct *p) >> list_del_rcu(>tasks); >> __get_cpu_var(process_counts)--; >> } >> +detach_pid(p, PIDTYPE_PID); > > Not sure why this change is needed... To prevent the premature > detach_pid()->free_pid() ? But this doesn't looks possible, if > the task is leader, p->tid->tsk == p, and detach_pid() does This is a bit of a relic of how my patch developed. I had the "if (task->tid != tsk->signal->tgid)" check in there and was assuming the thread group id as my pid so I could clean things up properly. And it worked out nicer if the detach_pid was for PIDTYPE_PID came later as I could reuse the same logic as in de_thread. > > if (pid->tsk) // still used, don't free. > return; > >> @@ -946,6 +920,48 @@ fastcall NORET_TYPE void do_exit(long code) >> } >> >> tsk->flags |= PF_EXITING; >> +/* Transfer thread group leadership */ >> +if (thread_group_leader(tsk) && !thread_group_empty(tsk)) { > > Ah, this is racy without tasklist_lock. Suppose that the current > ->group_leader exits right now and elects us as a new leader. Hmm. I thought I was redoing that test inside of the lock. Anyway this hunk probably needs the most work as it is brand new code. >> +struct task_struct *new_leader, *t; >> +write_lock_irq(_lock); >> +for (t = next_thread(tsk); t != tsk; t = next_thread(t)) { >> +if (!(t->flags & PF_EXITING)) >> +break; >> +} >> +if (t != tsk) { >> +new_leader = t; >> + >> +new_leader->start_time = tsk->start_time; >> +task_pid(tsk)->tsk = new_leader; > > So this pid won't be freed when current does detach_pid(PIDTYPE_PID), from > now current->tid->tsk != current, so detach_pid() doesn't clear pid->tsk. > > But when it will be freed then? When new_leader does detach_pid on it. >> +transfer_pid(tsk, new_leader, PIDTYPE_PGID); >> +transfer_pid(tsk, new_leader, PIDTYPE_SID); >> +list_replace_rcu(>tasks, _leader->tasks); >> + >> +/* Update group_leader on all of the threads... */ >> +new_leader->group_leader = new_leader; >>
Re: Why does reading from /dev/urandom deplete entropy so much?
On Mon, Dec 10, 2007 at 05:35:25PM -0600, Matt Mackall wrote: > > I must have missed this. Can you please explain again? For a layman it > > looks like a paranoid application cannot read 500 Bytes from > > /dev/random without blocking if some other application has previously > > read 10 Kilobytes from /dev/urandom. > > /dev/urandom always leaves enough entropy in the input pool for > /dev/random to reseed. Thus, as long as entropy is coming in, it is > not possible for /dev/urandom readers to starve /dev/random readers. > But /dev/random readers may still block temporarily and they should > damn well expect to block if they read 500 bytes out of a 512 byte > pool. A paranoid application should only need to read ~500 bytes if it is generating a long-term RSA private key, and in that case, it would do well to use a non-blocking read, and if it can't get enough bytes, it should prompt the user to move the mouse around or bang on the keyboard. /dev/random is *not* magic where you can assume that you will always get an unlimited amount of good randomness. Applications who assume this are broken, and it has nothing to do with DOS attacks. Note that even paranoid applicatons should not be using /dev/random for session keys; again, /dev/random isn't magic, and entropy isn't unlimited. Instead, such an application should pull 16 bytes or so, and then use it to seed a cryptographic random number generator. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ITIMER_REAL: convert to use struct pid
This looks fine to me. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
David Newall wrote: Where did the 8us delay come from? The documentation and source is careful not to say how long the delay is. Would changing it to, say 1us, be technically wrong? Is code that requires 8us correct? I think a single ISA bus transaction is 1 µs, so two of them back to back should be 2 µs, not 8 µs... -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Updates to nfsroot documentation (take 3)
The difference between ip=off and ip=::off has been a cause of much confusion. Document how each behaves, and do not contradict ourselves by saying that "off" is the default when in fact "any" is the default and is descibed as being so lower in the file. Signed-off-by: Amos Waterland <[EMAIL PROTECTED]> Documentation/nfsroot.txt | 12 +--- net/ipv4/ipconfig.c | 20 +--- 2 files changed, 10 insertions(+), 22 deletions(-) diff --git a/Documentation/nfsroot.txt b/Documentation/nfsroot.txt index 16a7cae..0e87890 100644 --- a/Documentation/nfsroot.txt +++ b/Documentation/nfsroot.txt @@ -92,8 +92,14 @@ ip=:: autoconfiguration. The parameter can appear alone as the value to the `ip' - parameter (without all the ':' characters before) in which case auto- - configuration is used. + parameter (without all the ':' characters before). If the value is + "ip=off" or "ip=none", no autoconfiguration will take place, otherwise + autoconfiguration will take place. The most common way to use this + is "ip=dhcp". + + Note that "ip=off" is not the same thing as "ip=::off", because in + the latter autoconfiguration will take place if any of DHCP, BOOTP or RARP + are compiled in the kernel. IP address of the client. @@ -142,7 +148,7 @@ ip=:: into the kernel will be used, regardless of the value of this option. - off or none: don't use autoconfiguration (default) + off or none: don't use autoconfiguration on or any: use any protocol available in the kernel dhcp:use DHCP bootp: use BOOTP diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index c5c107a..96400b0 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -1396,25 +1396,7 @@ late_initcall(ip_auto_config); /* * Decode any IP configuration options in the "ip=" or "nfsaddrs=" kernel - * command line parameter. It consists of option fields separated by colons in - * the following order: - * - * :: - * - * Any of the fields can be empty which means to use a default value: - * - address given by BOOTP or RARP - * - address of host returning BOOTP or RARP packet - * - none, or the address returned by BOOTP - *- automatically determined from , or the - * one returned by BOOTP - * - in ASCII notation, or the name returned - * by BOOTP - * - use all available devices - * : - *off|none - don't do autoconfig at all (DEFAULT) - *on|any - use any configured protocol - *dhcp|bootp|rarp - use only the specified protocol - *both - use both BOOTP and RARP (not DHCP) + * command line parameter. See Documentation/nfsroot.txt. */ static int __init ic_proto_name(char *name) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 3/3] ptrace_check_attach: remove unneeded ->signal != NULL check
This looks fine to me. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
> Sorry to reply to myself, but do we have consensus on this patch? I'd like to > figure out its disposition if possible. What the patch tries to do looks like the right thing. So if we can get a version that is clean and actually works we should merge it. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 2/3] kill my_ptrace_child()
This looks fine to me. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/3] kill PT_ATTACHED
Starting to catch up on some old patch review today. This one has my ACK. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] Fix use of skb after netif_rx
From: Julia Lawall <[EMAIL PROTECTED]> Date: Sun, 9 Dec 2007 21:03:55 +0100 (CET) > From: Julia Lawall <[EMAIL PROTECTED]> > > Recently, Wang Chen submitted a patch > (d30f53aeb31d453a5230f526bea592af07944564) to move a call to netif_rx(skb) > after a subsequent reference to skb, because netif_rx may call kfree_skb on > its argument. The same problem occurs in some other drivers as well. > > This was found using the following semantic match. > (http://www.emn.fr/x-info/coccinelle/) ... > Signed-off-by: Julia Lawall <[EMAIL PROTECTED]> Also applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] Fix use of skb after netif_rx
From: Julia Lawall <[EMAIL PROTECTED]> Date: Sun, 9 Dec 2007 21:05:30 +0100 (CET) > From: Julia Lawall <[EMAIL PROTECTED]> > > Recently, Wang Chen submitted a patch > (d30f53aeb31d453a5230f526bea592af07944564) to move a call to netif_rx(skb) > after a subsequent reference to skb, because netif_rx may call kfree_skb on > its argument. netif_rx_ni calls netif_rx, so the same problem occurs in > the files below. > > I have left the updating of dev->last_rx after the calls to netif_rx_ni > because it seems time dependent, but moved the other field updates before. > > This was found using the following semantic match. > (http://www.emn.fr/x-info/coccinelle/) ... > Signed-off-by: Julia Lawall <[EMAIL PROTECTED]> Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] Fix use of skb after netif_rx
From: Julia Lawall <[EMAIL PROTECTED]> Date: Sun, 9 Dec 2007 21:02:31 +0100 (CET) > From: Julia Lawall <[EMAIL PROTECTED]> > > Recently, Wang Chen submitted a patch > (d30f53aeb31d453a5230f526bea592af07944564) to move a call to netif_rx(skb) > after a subsequent reference to skb, because netif_rx may call kfree_skb on > its argument. The same problem occurs in some other drivers as well. > > This was found using the following semantic match. > (http://www.emn.fr/x-info/coccinelle/) ... > Signed-off-by: Julia Lawall <[EMAIL PROTECTED]> Patch applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
Neil Horman <[EMAIL PROTECTED]> writes: > On Fri, Dec 07, 2007 at 09:21:44AM -0500, Neil Horman wrote: >> On Fri, Dec 07, 2007 at 01:22:04AM -0800, Yinghai Lu wrote: >> > On Dec 7, 2007 12:50 AM, Yinghai Lu <[EMAIL PROTECTED]> wrote: >> > > >> > > On Dec 6, 2007 4:33 PM, Eric W. Biederman <[EMAIL PROTECTED]> wrote: >> > ... >> > > > >> > > > My feel is that if it is for legacy interrupts only it should not be a > problem. >> > > > Let's investigate and see if we can unconditionally enable this quirk >> > > > for all opteron systems. >> > > >> > > i checked that bit >> > > >> > > > http://www.openbios.org/viewvc/trunk/LinuxBIOSv2/src/northbridge/amd/amdk8/coherent_ht.c?revision=2596=markup > >> > >> > it should be bit 18 (HTTC_APIC_EXT_ID) >> > >> > >> > YH >> >> this seems reasonable, I can reroll the patch for this. As I think about it > I'm >> also going to update the patch to make this check occur for any pci class >> 0600 >> device from vendor AMD, since its possible that more than just nvidia >> chipsets >> can be affected. >> >> I'll repost as soon as I've tested, thanks! >> Neil > > > Ok, New patch attached. It preforms the same function as previously > described, > but is more restricted in its application. As Yinghai pointed out, the > broadcast mask bit (bit 17 in the htcfg register) should only be enabled, if > the > extened apic id bit (bit 18 in the same register) is also set. So this patch > now check for that bit to be turned on first. Also, this patch now adds an > independent quirk check for all AMD hypertransport host controllers, since its > possible for this misconfiguration to be present in systems other than > nvidias. > The net effect of these changes is, that its now applicable to all AMD systems > containing hypertransport busses, and is only activated if extended apic ids > are > in use, meaning that this quirk guarantees that all processors in a system are > elligible to receive interrupts from the ioapic, even if their apicid extends > beyond the nominal 4 bit limitation. Tested successfully by me. > > Thanks & Regards > Neil > > Signed-off-by: Neil Horman <[EMAIL PROTECTED]> > > > early-quirks.c | 83 - > 1 file changed, 76 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c > index 88bb83e..d5a7b30 100644 > --- a/arch/x86/kernel/early-quirks.c > +++ b/arch/x86/kernel/early-quirks.c > @@ -44,6 +44,50 @@ static int __init nvidia_hpet_check(struct > acpi_table_header > *header) > #endif /* CONFIG_X86_IO_APIC */ > #endif /* CONFIG_ACPI */ > > +static void __init fix_hypertransport_config(int num, int slot, int func) > +{ > + u32 htcfg; > + /* > + *we found a hypertransport bus > + *make sure that are broadcasting > + *interrupts to all cpus on the ht bus > + *if we're using extended apic ids > + */ > + htcfg = read_pci_config(num, slot, func, 0x68); > + if ((htcfg & (1 << 18)) == 1) { Ok. This test is broken. Please remove the == 1. You are looking for == (1 << 18). So just saying: "if (htcfg & (1 << 18))" should be clearer. > + printk(KERN_INFO "Detected use of extended apic ids on hypertransport > bus\n"); > + if ((htcfg & (1 << 17)) == 0) { > + printk(KERN_INFO "Enabling hypertransport extended apic interrupt > broadcast\n"); > + htcfg |= (1 << 17); > + write_pci_config(num, slot, func, 0x68, htcfg); > + } > + } > + > +} The rest of this quirk looks fine, include the fact it is only intended to be applied to PCI_VENDOR_ID_AMD PCI_DEVICE_ID_AMD_K8_NB. For what is below I don't like the way the infrastructure has been extended as what you are doing quickly devolves into a big mess. Please extend struct chipset to be something like: struct chipset { u16 vendor; u16 device; u32 class, class_mask; void (*f)(void); }; And then the test for matching the chipset can be something like: if ((id->vendor == PCI_ANY_ID || id->vendor == dev->vendor) && (id->device == PCI_ANY_ID || id->device == dev->device) && !((id->class ^ dev->class) & id->class_mask)) Essentially a subset of pci_match_one_device from drivers/pci/pci.h That way you don't need to increase the number of tables or the number of passes through the pci busses, just update the early_qrk table with a few more bits of information. The extended form should be much more maintainable in the long run. Given that we may want this before we enable the timer which is very early doing this in the pci early quirks seems to make sense. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Where did the 8us delay come from? The documentation and source is careful not to say how long the delay is. Would changing it to, say 1us, be technically wrong? Is code that requires 8us correct? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
On Tue, 11 Dec 2007 01:01:25 +0100 Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > Arjan van de Ven <[EMAIL PROTECTED]> wrote: > > > the frequency of both cores is the maximum of what linux sets each > > core to; > > Do you mean that the cpufreq code can be confused about the actual > frequency of the cores? it means that cpufreq doesn't know the actual frequency (although bios sometimes tells us about the relationship, often the bios just lies through it's teeth); it only knows what it asks for, not what it gets. We know it'll get at least what it asks for, but it can get more than it asks for basically. >That sounds like a big problem. it'll get way worse going forward. (but even on todays systems, the tsc no longer represents frequency, but is some fixed clock totally unrelated to cpu frequency) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: help
Thanos Chatziathanassiou wrote: help I KNOW OF PLACES, ACTIONS, AND THINGS. MOST OF MY VOCABULARY DESCRIBES PLACES AND IS USED TO MOVE YOU THERE. TO MOVE TRY WORDS LIKE FOREST, BUILDING, DOWNSTREAM, ENTER, EAST, WEST NORTH, SOUTH, UP, OR DOWN. I KNOW ABOUT A FEW SPECIAL OBJECTS, LIKE A BLACK ROD HIDDEN IN THE CAVE. THESE OBJECTS CAN BE MANIPULATED USING ONE OF THE ACTION WORDS THAT I KNOW. USUALLY YOU WILL NEED TO GIVE BOTH THE OBJECT AND ACTION WORDS (IN EITHER ORDER), BUT SOMETIMES I CAN INFER THE OBJECT FROM THE VERB ALONE. THE OBJECTS HAVE SIDE EFFECTS - FOR INSTANCE, THE ROD SCARES THE BIRD. USUALLY PEOPLE HAVING TROUBLE MOVING JUST NEED TO TRY A FEW MORE WORDS. USUALLY PEOPLE TRYING TO MANIPULATE AN OBJECT ARE ATTEMPTING SOMETHING BEYOND THEIR (OR MY!) CAPABILITIES AND SHOULD TRY A COMPLETELY DIFFERENT TACK. TO SPEED THE GAME YOU CAN SOMETIMES MOVE LONG DISTANCES WITH A SINGLE WORD. FOR EXAMPLE, 'BUILDING' USUALLY GETS YOU TO THE BUILDING FROM ANYWHERE ABOVE GROUND EXCEPT WHEN LOST IN THE FOREST. ALSO, NOTE THAT CAVE PASSAGES TURN A LOT, AND THAT LEAVING A ROOM TO THE NORTH DOES NOT GUARANTEE ENTERING THE NEXT FROM THE SOUTH. GOOD LUCK! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Mon, 2007-12-10 at 17:31 -0500, Vivek Goyal wrote: > [..] > > > > -#define KEXEC_ON_CRASH 0x0001 > > -#define KEXEC_ARCH_MASK 0x > > +#define KEXEC_ON_CRASH 0x0001 > > +#define KEXEC_PRESERVE_CPU 0x0002 > > +#define KEXEC_PRESERVE_CPU_EXT 0x0004 > > +#define KEXEC_SINGLE_CPU 0x0008 > > +#define KEXEC_PRESERVE_DEVICE 0x0010 > > +#define KEXEC_PRESERVE_CONSOLE 0x0020 > > Hi, > > Why do we need so many different flags for preserving different types > of state (CPU, CPU_EXT, Device, console) ? To keep things simple, > can't we can create just one flag KEXEC_PRESERVE_CONTEXT, which will > indicate any special action required for preserving the previous kernel's > context so that one can swith back to old kernel? Yes. There are too many flags, especially when we have no users of these flags now. It is better to use one flag such as KEXEC_PRESERVE_CONTEXT now, and create the others required flags when really needed. Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Mon, 2007-12-10 at 14:55 -0500, Vivek Goyal wrote: > On Fri, Dec 07, 2007 at 03:53:30PM +, Huang, Ying wrote: > > This patch implements the functionality of jumping between the kexeced > > kernel and the original kernel. > > > > Hi, > > I am just going through your patches and trying to understand it. Don't > understand many things. Asking is easy so here you go... > > > To support jumping between two kernels, before jumping to (executing) > > the new kernel and jumping back to the original kernel, the devices > > are put into quiescent state, and the state of devices and CPU is > > saved. After jumping back from kexeced kernel and jumping to the new > > kernel, the state of devices and CPU are restored accordingly. The > > devices/CPU state save/restore code of software suspend is called to > > implement corresponding function. > > > > I need jumping back to restore a already hibernated kernel image? Can > you please tell little more about jumping back and why it is needed? Now, the jumping back is used to implement "kexec based hibernation", which uses kexec/kdump to save the memory image of hibernated kernel during hibernating, and uses /dev/oldmem to restore the memory image of hibernated kernel and jump back to the hibernated kernel to continue run. The other usage model maybe include: - Dump the system memory image then continue to run, that is, get some memory snapshot of system during system running. - Cooperative multi-task of different OS. You can load another OS (B) from current OS (A), and jump between the two OSes upon needed. - Call some code (such as firmware, etc) in physical mode. > > To support jumping without reserving memory. One shadow backup page > > (source page) is allocated for each page used by new (kexeced) kernel > > (destination page). When do kexec_load, the image of new kernel is > > loaded into source pages, and before executing, the destination pages > > and the source pages are swapped, so the contents of destination pages > > are backupped. Before jumping to the new (kexeced) kernel and after > > jumping back to the original kernel, the destination pages and the > > source pages are swapped too. > > > > Ok, so due to swapping of source and destination pages first kernel's data > is still preserved. How do I get the dynamic memory required for second > kernel boot (without writing first kernel's data)? All dynamic memory required for second kernel should be "loaded" by sys_kexec_load in first kernel. For example, not only the Linux kernel should be loaded at 1M, the memory 0~16M (exclude kernel) should be "loaded" (all zero) by /sbin/kexec via sys_kexec_load too. > > A jump back protocol for kexec is defined and documented. It is an > > extension to ordinary function calling protocol. So, the facility > > provided by this patch can be used to call ordinary C function in real > > mode. > > > > A set of flags for sys_kexec_load are added to control which state are > > saved/restored before/after real mode code executing. For example, you > > can specify the device state and FPU state are saved/restored > > before/after real mode code executing. > > > > The states (exclude CPU state) save/restore code can be overridden > > based on the "command" parameter of kexec jump. Because more states > > need to be saved/restored by hibernating/resuming. > > > > Signed-off-by: Huang Ying <[EMAIL PROTECTED]> > > > > --- > > Documentation/i386/jump_back_protocol.txt | 103 ++ > > arch/powerpc/kernel/machine_kexec.c |2 > > arch/ppc/kernel/machine_kexec.c |2 > > arch/sh/kernel/machine_kexec.c|2 > > arch/x86/kernel/machine_kexec_32.c| 88 +--- > > arch/x86/kernel/machine_kexec_64.c|2 > > arch/x86/kernel/relocate_kernel_32.S | 214 > > +++--- > > include/asm-x86/kexec_32.h| 39 - > > include/linux/kexec.h | 40 + > > kernel/kexec.c| 188 ++ > > kernel/power/Kconfig |2 > > kernel/sys.c | 35 +++- > > 12 files changed, 648 insertions(+), 69 deletions(-) > > > > --- a/arch/x86/kernel/machine_kexec_32.c > > +++ b/arch/x86/kernel/machine_kexec_32.c > > @@ -20,6 +20,7 @@ > > #include > > #include > > #include > > +#include > > > > #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) > > static u32 kexec_pgd[1024] PAGE_ALIGNED; > > @@ -83,10 +84,14 @@ static void load_segments(void) > > * reboot code buffer to allow us to avoid allocations > > * later. > > * > > - * Currently nothing. > > + * Turn off NX bit for control page. > > */ > > int machine_kexec_prepare(struct kimage *image) > > { > > + if (nx_enabled) { > > + change_page_attr(image->control_code_page, 1, PAGE_KERNEL_EXEC); > > + global_flush_tlb(); > > + } > > return 0; > > }
Re: [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
Oleg Nesterov <[EMAIL PROTECTED]> writes: > do_wait(WSTOPPED) assumes that p->state must be == TASK_STOPPED, this is not > true if the leader is already dead. Check SIGNAL_STOP_STOPPED instead and use > ->signal->group_exit_code. > > This patch is not complete if not buggy. At the very minimum it needs cleanup. Thinking about this set of problems. Testing SIGNAL_STOP_STOPPED seems more correct then testing TASK_STOPPED. It ensures we don't have a race, and except for ptrace the only way to stop a task triggers SIGNAL_STOP_STOPPED. We need a similar flag for thread group exit, to mark when every task in the thread group has exited. With those in place we can have race free tests of our status. /proc//status needs to be updated to use those the per signal struct status bits as well. As for the exit_code, we set tsk->exit_code = sig->group_exit_code so that doesn't seem to be a problem either. So to get a task group status looking at bits on the signal struct looks like the right approach, as this ensures we can avoid races in setting the status, and we don't need to test a dozen other fields. There is still some value in my other approach but even it will have small races if we continue look at per task status bits when what we want is a per thread group status. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: fix a few paravirt-related modpost warnings
Jan Beulich wrote: > Signed-off-by: Jan Beulich <[EMAIL PROTECTED]> > Acked-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> > arch/x86/kernel/head_32.S |2 +- > arch/x86/xen/setup.c |2 +- > arch/x86/xen/xen-head.S |2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > --- linux-2.6.24-rc4/arch/x86/kernel/head_32.S2007-12-07 > 09:00:59.0 +0100 > +++ 2.6.24-rc4-i386-lguest-warning/arch/x86/kernel/head_32.S 2007-12-05 > 18:30:33.0 +0100 > @@ -151,7 +151,7 @@ WEAK(xen_entry) > /* Unknown implementation; there's really > nothing we can do at this point. */ > ud2a > -.data > +.section .init.data, "aw" > subarch_entries: > .long default_entry /* normal x86/PC */ > .long lguest_entry /* lguest hypervisor */ > --- linux-2.6.24-rc4/arch/x86/xen/setup.c 2007-12-07 09:01:00.0 > +0100 > +++ 2.6.24-rc4-i386-lguest-warning/arch/x86/xen/setup.c 2007-12-10 > 17:31:06.0 +0100 > @@ -59,7 +59,7 @@ static void xen_idle(void) > /* > * Set the bit indicating "nosegneg" library variants should be used. > */ > -static void fiddle_vdso(void) > +static __init void fiddle_vdso(void) > { > extern u32 VDSO_NOTE_MASK; /* See ../kernel/vsyscall-note.S. */ > extern char vsyscall_int80_start; > --- linux-2.6.24-rc4/arch/x86/xen/xen-head.S 2007-12-07 09:01:00.0 > +0100 > +++ 2.6.24-rc4-i386-lguest-warning/arch/x86/xen/xen-head.S2007-12-10 > 17:25:46.0 +0100 > @@ -7,7 +7,7 @@ > #include > #include > > -.pushsection .init.text > +.pushsection .init.text, "ax" > ENTRY(startup_xen) > movl %esi,xen_start_info > cld > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: outb 0x80 in inb_p, outb_p harmful on some modern AMD64 with MCP51 laptops
Rene Herman wrote: By the way, David, it would be interesting if you could test 0xed. If your problem is some piece of hardware getting upset at LPC bus aborts it's not going to matter and we'd know an outb delay is just not an option on your system at least. You said you could quickly reproduce the problem with port 0x80? I tried 0xED for a few versions (1.31-1.37) of SYSLINUX. It broke on a lot of hardware (Phoenix BIOS uses 0xED by default, but BIOSes don't have to work on arbitrary hardware.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Please revert: PCI: fix IDE legacy mode resources
> The GT-64111 system controller doesn't provide any kind of mapping > functionality that would help here. So legacy port addressing can only > work by exploiting aliases due to incomplete decoding of legacy ioport > addreses by the VT82C586 - but direct addressing is impossible. Ok, that explains how the "fix" that we reverted worked. It caused crap to be added to the top bits of the address :-) So here, what you really want to do is not a call to pcibios_resource_to_bus(), but you actually want to use a different bus address in the first place, that you know the HW will decode the same way. The best way to achieve that imho, is to do a header quirk that is run just after the generic probe code, which offsets the fixed legacy resources by 0x1000 since that's really the bus address you are going to emit. Later on, your pcibios_fixup code should take that remove 0x1000 from all IO resources, since your 0xd000 mapping already maps 0x1000 as you probably already do. The trick is, you don't want to convert a "resource" into a "bus address" here, but really issue a different bus address. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Iomega ZIP-100 drive unsupported with jmicron JMB361 chip?
(linux-ide cc'ed) trash can wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have tolerated this problem for a year and do not post to this list in haste. I have posted on forums and searched the community over the past year. I have looked at the list archive on gossamer-threads.com for solutions. With Fedora Core 6 unsupported (the last kernel for which my zip drive worked), it is time for my last attempt at a solution. Please CC: any response as I have not joined the list. I have compiled a kernel-debug RPM and can run this if its output would help. Thank you for any time you might devote to this problem. motherboard: MSI P965 Platinum/Intel P965 Express Chipset Based (MS-7238 series) Fedora 8 : kernel 2.6.23.1-42.fc8 Iomega Zip drive internal Model Z100ATAPI lspci 03:00.0 SATA controller: JMicron Technologies, Inc. JMB361 AHCI/IDE (rev 02) 03:00.1 IDE interface: JMicron Technologies, Inc. JMB361 AHCI/IDE (rev 02) # lsmod | grep ata pata_jmicron8257 0 ata_generic 8901 0 ata_piix 16709 0 libata 99633 4 ahci,pata_jmicron,ata_generic,ata_piix scsi_mod 119757 4 sr_mod,sg,libata,sd_mod I have recently changed the BIOS setting for the SATA#1 Controller from [IDE] to [AHCI] with no effect. I assume AHCI is correct? AHCI is better, yes. It shouldn't be relevant this this problem though. Text below attached as text.txt for readability. from dmesg: libata version 2.21 loaded. device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: [EMAIL PROTECTED] PCI: Enabling device :03:00.1 ( -> 0001) ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device :03:00.1 to 64 scsi0 : pata_jmicron scsi1 : pata_jmicron ata1: PATA max UDMA/100 cmd 0x0001cc00 ctl 0x0001c882 bmdma 0x0001c400 irq 17 ata2: PATA max UDMA/100 cmd 0x0001c800 ctl 0x0001c482 bmdma 0x0001c408 irq 17 ata1.00: ATAPI: LITE-ON DVDRW SOHW-1693S, KS0B, max UDMA/66 ata1.01: ATAPI: IOMEGA ZIP 100 ATAPI, 05.H, max MWDMA1, CDB intr ata1.00: configured for UDMA/66 ata1.01: configured for MWDMA1 scsi 0:0:0:0: CD-ROMLITE-ON DVDRW SOHW-1693S KS0B PQ: 0 ANSI: 5 scsi 0:0:1:0: Direct-Access IOMEGA ZIP 100 05.H PQ: 0 ANSI: 5 sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB) sd 0:0:1:0: [sda] Write Protect is off sd 0:0:1:0: [sda] Mode Sense: 00 40 00 00 sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB) sd 0:0:1:0: [sda] Write Protect is off sd 0:0:1:0: [sda] Mode Sense: 00 40 00 00 sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda:<6>sd 0:0:1:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK sd 0:0:1:0: [sda] Sense Key : Hardware Error [current] sd 0:0:1:0: [sda] Add. Sense: Scsi parity error end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 If a disk is inserted into the drive (/var/log/messages) Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Spinning up disk.<5>sd 0:0:1:0: [sda] Spinning up diskready Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB) Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write Protect is off Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] 196608 512-byte hardware sectors (101 MB) Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write Protect is off Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 10 14:22:53 localhost kernel: sda:<6>sd 0:0:1:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Sense Key : Hardware Error [current] Dec 10 14:22:53 localhost kernel: sd 0:0:1:0: [sda] Add. Sense: Scsi parity error Dec 10 14:22:53 localhost kernel: end_request: I/O error, dev sda, sector 0 Dec 10 14:22:53 localhost kernel: printk: 42 messages suppressed. Dec 10 14:22:53 localhost kernel: Buffer I/O error on device sda, logical block 0 That is rather curious. There's no sign of any libata error handling going on.. Maybe the drive is actually returning that error code in the ATAPI CDB, or at least we think it is? You are sure that this drive still works with older kernels using drivers/ide, and that the hardware didn't break at some point, I assume? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH 2/3] arch/ : Platform changes for UCC TDM driver for MPC8323ERDB.Also includes related QE changes.
On Mon, 10 Dec 2007 17:39:22 +0530 (IST) Poonam_Aggrwal-b10812 <[EMAIL PROTECTED]> wrote: > > +++ b/arch/powerpc/sysdev/qe_lib/qe.c > @@ -149,22 +149,116 @@ EXPORT_SYMBOL(qe_issue_cmd); > */ > static unsigned int brg_clk = 0; > > -unsigned int get_brg_clk(void) > +u32 get_brg_clk(enum qe_clock brgclk, enum qe_clock *brg_source) > { > - struct device_node *qe; > - if (brg_clk) > - return brg_clk; > + struct device_node *qe, *brg, *clocks; > + enum qe_clock brg_src; > + u32 brg_input_freq = 0; > + u32 brg_num; > + const unsigned int *prop; > > - qe = of_find_node_by_type(NULL, "qe"); > - if (qe) { > + *brg_source = 0; > + > + brg_num = brgclk - QE_BRG1; > + brg = of_find_compatible_node(NULL, NULL, "fsl,cpm-brg"); > + if (brg) { > unsigned int size; > - const u32 *prop = of_get_property(qe, "brg-frequency", ); > - brg_clk = *prop; > - of_node_put(qe); > - }; > + prop = of_get_property(brg, > + "fsl,brg-sources", ); > + > + brg_src = *(prop + brg_num); You should probably sanity check that prop is not NULL and points to something large enough. You don't use brg after here, so the "of_node_put(brg)" could go here to save putting it in multiple places later. Also, currently there are paths through the following code that do not do the of_node_put(brg). > + if (brg_src == 0) { > + *brg_source = 0; > + if (brg_clk > 0) { > + of_node_put(brg); > + return brg_clk; > + } > + qe = of_find_node_by_type(NULL, "qe"); > + if (qe) { > + unsigned int size; > + prop = of_get_property > + (qe, "brg-frequency", ); > + of_node_put(qe); > + of_node_put(brg); > + return *prop; NULL check here (yes, I know that the old code didn't check). > + } > + } else { > + *brg_source = brg_src + QE_CLK1 - 1; > + clocks = of_find_compatible_node(NULL, NULL, > + "fsl,cpm-clocks"); > + prop = of_get_property(clocks, > + "#clock-cells", ); > + /* > + * clock-cells = 1 only supported right now. > + */ > + if (*prop != 1) Again check for NULL (and possibly size). > + return 0; > + prop = of_get_property(clocks, > + "clock-frequency", ); > + > + brg_input_freq = *(prop+(brg_src - 1)); And again. > + of_node_put(clocks); > + of_node_put(brg); > + return brg_input_freq; > + } > + } > return brg_clk; > } -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgp1EalSLFKWO.pgp Description: PGP signature
Re: Please revert: PCI: fix IDE legacy mode resources
On Mon, 2007-12-10 at 23:07 +, Alan Cox wrote: > > Forcing controllers into native mode tends to be something that really > > only works on -some- controllers. I'm happy to have a hack to try to do > > that on all of them on powermacs, because the range of controllers that > > might not be in native mode in the first place there is pretty small, > > and for CHRP briq, I do it for a specific known controller only. > > I'm thinking of doing this solely if the platform has > CONFIG_ATA_NO_LEGACY set. In other words we'd only try this stunt on a > system we *know* cannot address the low PCI space ports. Allright. I don't set CONFIG_ATA_NO_LEGACY on powerpc anyway, as I do support legacy ATA just fine on a range of machines. For example, Pegasos does the a quirk the other way around which is to put it back the VIA IDE into legacy mode as there are issues with the way that VIA chipset is configured on those machines. It's mostly a matter of making sure for me that the IRQ routing match what the platform code is set to deal with or that sort of thing as unfortunately, anything that involves legacy stuff is still pretty much full of hacks. Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
Arjan van de Ven <[EMAIL PROTECTED]> wrote: > the frequency of both cores is the maximum of what linux sets each core to; Do you mean that the cpufreq code can be confused about the actual frequency of the cores? That sounds like a big problem. Thanks for any insight. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Please revert: PCI: fix IDE legacy mode resources
On Tue, Dec 11, 2007 at 07:43:03AM +1100, Benjamin Herrenschmidt wrote: > > > :00:09.1 IDE interface: VIA Technologies, Inc. > > > VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) > > > (prog-if 8a [Master SecP PriP]) > > > Flags: bus master, fast Back2Back, medium devsel, latency 64 > > > I/O ports at 1820 [size=16] > > > > And that's lspci -v -b: > > > > > :00:09.1 IDE interface: VIA Technologies, Inc. > > > VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) > > > (prog-if 8a [Master SecP PriP]) > > > Flags: bus master, fast Back2Back, medium devsel, latency 64 > > > I/O ports at 10001820 > > > > So the IDE controller already seems to be in native mode? > > > > No, native mode is 5 not A in the low 4 bits of progif. > > You need to be a bit careful about those VIA, I remember having issues > on Pegasos where we left it in legacy mode. It think the problem is that > even when switched, the IRQ routing might be done based on some other > setting in the chipset, possibly a strap. But that's nothing you can't > deal with an appropriate quirk in the arch code. > > Also, double check the level/edge setting of the interrupts as it can be > different between legacy and native (native is level low, legacy is > rising edge). > > I'm surprised however that one would use such a legacy southbridge on a > platform that can't issue low IO ports, that doesn't seem to make sense > to me ... there's a whole lot of things on this such as the 8259 PIC > etc.. that can only be addressed via low IOs, unless the ISA space can > be somewhat remapped ? The GT-64111 system controller doesn't provide any kind of mapping functionality that would help here. So legacy port addressing can only work by exploiting aliases due to incomplete decoding of legacy ioport addreses by the VT82C586 - but direct addressing is impossible. Ralf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
Stefano Brivio <[EMAIL PROTECTED]> wrote: > Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in > delays isn't that good when using my crappy unstable TSC (mdelay(2000) > causes delays between 2 and 2.9 seconds) but it's not depending on frequency > changes anymore. So I'd say it's fixed, but please tell me if you want me > to do any other test so as to be sure it is. Ingo, it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock() based udelay), so how udelay can be affected by your proposed changes? Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
On Tue, 11 Dec 2007 00:34:33 +0100 Stefano Brivio <[EMAIL PROTECTED]> wrote: > On Tue, 11 Dec 2007 00:04:25 +0100 > Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > > > > what do you think? Right now i've got them queued up for > > > > > 2.6.25 in both the scheduler-devel and the x86-devel git > > > > > trees - but can submit them for 2.6.24 if it's better if we > > > > > did them there. I've got no strong opinion either way. > > > > > > > > printk_clock() doesn't seem terribly important but what's this > > > > stuff about effects on udelay/mdelay? That can be serious if > > > > they're getting shortened. > > > > > > since udelay depends on loops_per_jiffy, which is fixed up > > > time_cpufreq_notifier(), i dont see how it could be affected by > > > frequency changes. (but that's the theory - practice might be > > > different) > > > > Stefano Brivio reported udelay()/mdelay() effects in the b43 > > driver. (and it caused driver failures for him.) > > > > Stefano, could you please try to sum up your experiences with that > > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, > > could you try to re-do the mdelay verifications perhaps, to make > > sure it's not some other effect interacting here. In theory > > sched-clock scaling has no effect on udelay behavior.) > > Sorry for disappearing. Anyway, yes, those patches fixed it. > Precision in delays isn't that good when using my crappy unstable TSC > (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not > depending on frequency changes anymore. So I'd say it's fixed, but > please tell me if you want me to do any other test so as to be sure > it is. > > I'm still quite concerned about this in dual/quad core scenarios; the frequency of both cores is the maximum of what linux sets each core to; this means that if you're THIS sensitive to that there still is quite a nasty issue there. I wonder if the various delay functions (maybe only in .25) should use the maximum observed loops_per_jiffie instead always (across cpus) to be super safe here. -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]
--- David Howells <[EMAIL PROTECTED]> wrote: > Casey Schaufler <[EMAIL PROTECTED]> wrote: > > > That happens to me when interfaces are described in SELinux terms. I > > still don't care much for multiple contexts, and I don't have a good > > grasp of how you'll deal with Smack, or any LSM other than SELinux. > > Me neither. I understand SELinux somewhat, though it's got a lot of wibbly > bits, and WinNT's security system, but I have no experience of the other > stuff. > > > Just as Stephen mentions, I also don't see the generality that a change > > of this magnitude really ought to provide. > > Perhaps it should be a specific interface, solely for cachefiles's use then. That would help focus things, to be sure. I don't know if that focus will speed things up or slow them down, but I think that attempting to accomodate SELinux/NFS, with the state that effort is in, will only lead to tears. Casey Schaufler [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]
--- David Howells <[EMAIL PROTECTED]> wrote: > Stephen Smalley <[EMAIL PROTECTED]> wrote: > > > From a config file whose pathname would be provided by libselinux (ala > > the way in which dbusd imports contexts), or directly as a context > > returned by a libselinux function. > > That sounds too SELinux specific. How do I do it so that it works for any > LSM? > > Is linking against libselinux is a viable option if it's not available under > all LSM models? Is it available under all LSM models? Perhaps Casey can > answer this one. Linking against libselinux is not now, nor will it ever be, a viable option. There's just too much sophistication contained in libselinux for us simple folk to deal with. > > > I use to do that, but someone objected... Possibly Karl MacMillan. > > > > Yes, but I think I disagreed then too. > > So, who's right? Me! (smiley inserted here, for those in need) > > It doesn't fit with how other users of security_kernel_act_as() will > > likely want to work (they will want to just set the context to a > > specified value, whether one obtained from the client or from some local > > source), nor with how type transitions normally work (exec, with the > > program type as the second type field). I think it will just cause > > confusion and subtle breakage. > > It's causing me lots of confusion as it is. I have been / am being told by > different people to do different things just in dealing with SELinux, and > various people are raising extra requirements or restrictions beyond that. > There doesn't seem to be a consensus. > > It sounds like the best option is just to have the kernel nick the userspace > daemon's security context and use that as is, and junk all the restrictions > on > what the daemon can do so that the kernel isn't too restricted. That would be consistant with the (perhaps archaic now) behavior of nfsd on Unix, which did nothing but "lend it's credential" to the underlying kernel code. I think it's a rational approach, although I expect that in may have troubles under SELinux. Casey Schaufler [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]
Casey Schaufler <[EMAIL PROTECTED]> wrote: > That happens to me when interfaces are described in SELinux terms. I > still don't care much for multiple contexts, and I don't have a good > grasp of how you'll deal with Smack, or any LSM other than SELinux. Me neither. I understand SELinux somewhat, though it's got a lot of wibbly bits, and WinNT's security system, but I have no experience of the other stuff. > Just as Stephen mentions, I also don't see the generality that a change > of this magnitude really ought to provide. Perhaps it should be a specific interface, solely for cachefiles's use then. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
On Tue, 11 Dec 2007 00:04:25 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > > what do you think? Right now i've got them queued up for 2.6.25 in > > > > both the scheduler-devel and the x86-devel git trees - but can > > > > submit them for 2.6.24 if it's better if we did them there. I've got > > > > no strong opinion either way. > > > > > > printk_clock() doesn't seem terribly important but what's this stuff > > > about effects on udelay/mdelay? That can be serious if they're > > > getting shortened. > > > > since udelay depends on loops_per_jiffy, which is fixed up > > time_cpufreq_notifier(), i dont see how it could be affected by > > frequency changes. (but that's the theory - practice might be > > different) > > Stefano Brivio reported udelay()/mdelay() effects in the b43 driver. > (and it caused driver failures for him.) > > Stefano, could you please try to sum up your experiences with that > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, > could you try to re-do the mdelay verifications perhaps, to make sure > it's not some other effect interacting here. In theory sched-clock > scaling has no effect on udelay behavior.) Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in delays isn't that good when using my crappy unstable TSC (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not depending on frequency changes anymore. So I'd say it's fixed, but please tell me if you want me to do any other test so as to be sure it is. -- Ciao Stefano -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why does reading from /dev/urandom deplete entropy so much?
On Tue, Dec 11, 2007 at 12:06:43AM +0100, Marc Haber wrote: > On Sun, Dec 09, 2007 at 10:16:05AM -0600, Matt Mackall wrote: > > On Sun, Dec 09, 2007 at 01:42:00PM +0100, Marc Haber wrote: > > > On Wed, Dec 05, 2007 at 03:26:47PM -0600, Matt Mackall wrote: > > > > The distinction between /dev/random and /dev/urandom boils down to one > > > > word: paranoia. If you are not paranoid enough to mistrust your > > > > network, then /dev/random IS NOT FOR YOU. Use /dev/urandom. > > > > > > But currently, people who use /dev/urandom to obtain low-quality > > > entropy do a DoS for the paranoid people. > > > > Not true, as I've already pointed out in this thread. > > I must have missed this. Can you please explain again? For a layman it > looks like a paranoid application cannot read 500 Bytes from > /dev/random without blocking if some other application has previously > read 10 Kilobytes from /dev/urandom. /dev/urandom always leaves enough entropy in the input pool for /dev/random to reseed. Thus, as long as entropy is coming in, it is not possible for /dev/urandom readers to starve /dev/random readers. But /dev/random readers may still block temporarily and they should damn well expect to block if they read 500 bytes out of a 512 byte pool. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]
Stephen Smalley <[EMAIL PROTECTED]> wrote: > From a config file whose pathname would be provided by libselinux (ala > the way in which dbusd imports contexts), or directly as a context > returned by a libselinux function. That sounds too SELinux specific. How do I do it so that it works for any LSM? Is linking against libselinux is a viable option if it's not available under all LSM models? Is it available under all LSM models? Perhaps Casey can answer this one. > > I use to do that, but someone objected... Possibly Karl MacMillan. > > Yes, but I think I disagreed then too. So, who's right? > It doesn't fit with how other users of security_kernel_act_as() will > likely want to work (they will want to just set the context to a > specified value, whether one obtained from the client or from some local > source), nor with how type transitions normally work (exec, with the > program type as the second type field). I think it will just cause > confusion and subtle breakage. It's causing me lots of confusion as it is. I have been / am being told by different people to do different things just in dealing with SELinux, and various people are raising extra requirements or restrictions beyond that. There doesn't seem to be a consensus. It sounds like the best option is just to have the kernel nick the userspace daemon's security context and use that as is, and junk all the restrictions on what the daemon can do so that the kernel isn't too restricted. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: PNP: do not stop/start devices in suspend/resume path
On Friday 07 December 2007 12:13:35 am Shaohua Li wrote: > On Thu, 2007-12-06 at 02:24 +0800, Bjorn Helgaas wrote: > > Index: linux-mm/drivers/pnp/driver.c > > === > > --- linux-mm.orig/drivers/pnp/driver.c 2007-11-30 13:58:25.0 > > -0700 > > +++ linux-mm/drivers/pnp/driver.c 2007-12-03 09:58:35.0 > > -0700 > > @@ -161,13 +161,6 @@ > > return error; > > } > > > > - if (!(pnp_drv->flags & PNP_DRIVER_RES_DO_NOT_CHANGE) && > > - pnp_can_disable(pnp_dev)) { > > - error = pnp_stop_dev(pnp_dev); > > - if (error) > > - return error; > > - } > > - > > if (pnp_dev->protocol && pnp_dev->protocol->suspend) > > pnp_dev->protocol->suspend(pnp_dev, state); > > return 0; > > @@ -177,7 +170,6 @@ > > { > > struct pnp_dev *pnp_dev = to_pnp_dev(dev); > > struct pnp_driver *pnp_drv = pnp_dev->driver; > > - int error; > > > > if (!pnp_drv) > > return 0; > > @@ -185,12 +177,6 @@ > > if (pnp_dev->protocol && pnp_dev->protocol->resume) > > pnp_dev->protocol->resume(pnp_dev); > > > > - if (!(pnp_drv->flags & PNP_DRIVER_RES_DO_NOT_CHANGE)) { > > - error = pnp_start_dev(pnp_dev); > > - if (error) > > - return error; > > - } > > - > I'd suggest keep pnp_start_dev here to prevent BIOS not or assign > different resources after a resume. The patch I currently have in -mm (http://lkml.org/lkml/2007/10/29/412) merely requests resources in pnp_start_dev() and releases them in pnp_stop_dev(). So if we remove pnp_stop_dev() but keep pnp_start_dev(), I have to fix that patch to deal with things that may already be reserved. But I don't see any mention in the spec of running _SRS in the sleep/wakup path, so I'm not convinced it's really necessary. Section 7.4 mentions _TTS, _PTS, _GTS, etc., but not _SRS. For devices, it looks like the intent is that BIOS should generate notifications that cause OSPM to re-enumerate devices that might have changed. I'm pretty sure Linux is missing some of that code, though, so I could believe that _SRS might help paper over that deficiency. What I'd really like to do is figure out how Windows uses _SRS and do the same thing. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Please revert: PCI: fix IDE legacy mode resources
> Forcing controllers into native mode tends to be something that really > only works on -some- controllers. I'm happy to have a hack to try to do > that on all of them on powermacs, because the range of controllers that > might not be in native mode in the first place there is pretty small, > and for CHRP briq, I do it for a specific known controller only. I'm thinking of doing this solely if the platform has CONFIG_ATA_NO_LEGACY set. In other words we'd only try this stunt on a system we *know* cannot address the low PCI space ports. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
On Mon, Dec 10, 2007 at 09:08:53AM -0600, Bob Tracy wrote: > Ivan Kokshaysky wrote: > > For now I have reassigned the bug #9457 to myself and will gradually hack > > into udev... > > Thanks... Let me know if there's anything useful I can do to help. It turns out to be yet another strncpy() bug that indeed shows up only with certain src/dst alignments and breaks kobject_get_path(). Ugh... Hopefully I'll have a patch tomorrow. Ivan. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why does reading from /dev/urandom deplete entropy so much?
On Sun, Dec 09, 2007 at 10:16:05AM -0600, Matt Mackall wrote: > On Sun, Dec 09, 2007 at 01:42:00PM +0100, Marc Haber wrote: > > On Wed, Dec 05, 2007 at 03:26:47PM -0600, Matt Mackall wrote: > > > The distinction between /dev/random and /dev/urandom boils down to one > > > word: paranoia. If you are not paranoid enough to mistrust your > > > network, then /dev/random IS NOT FOR YOU. Use /dev/urandom. > > > > But currently, people who use /dev/urandom to obtain low-quality > > entropy do a DoS for the paranoid people. > > Not true, as I've already pointed out in this thread. I must have missed this. Can you please explain again? For a layman it looks like a paranoid application cannot read 500 Bytes from /dev/random without blocking if some other application has previously read 10 Kilobytes from /dev/urandom. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things."Winona Ryder | Fon: *49 621 72739834 Nordisch by Nature | How to make an American Quilt | Fax: *49 3221 2323190 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > what do you think? Right now i've got them queued up for 2.6.25 in > > > both the scheduler-devel and the x86-devel git trees - but can > > > submit them for 2.6.24 if it's better if we did them there. I've got > > > no strong opinion either way. > > > > printk_clock() doesn't seem terribly important but what's this stuff > > about effects on udelay/mdelay? That can be serious if they're > > getting shortened. > > since udelay depends on loops_per_jiffy, which is fixed up > time_cpufreq_notifier(), i dont see how it could be affected by > frequency changes. (but that's the theory - practice might be > different) Stefano Brivio reported udelay()/mdelay() effects in the b43 driver. (and it caused driver failures for him.) Stefano, could you please try to sum up your experiences with that issue? Is it reproducable, and the 5 patches i did fix it? (if yes, could you try to re-do the mdelay verifications perhaps, to make sure it's not some other effect interacting here. In theory sched-clock scaling has no effect on udelay behavior.) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4, v3] Physical PCI slot objects
Hi Kenji-san, I have been thinking about this problem for quite a bit, and think that there are no good solutions... * Kenji Kaneshige <[EMAIL PROTECTED]>: > On my system, hotplug slots themselves can be added, removed > and replaced with the ohter type of I/O box. >> Are you talking about some sort of I/O cabinet/chassis that you >> can attach to the actual computer? Can the I/O expander unit be >> hotplugged? Or do you need to power your machine down to attach >> it? >> If you can hotplug it, I'm guessing that is why your firmware >> presents SxFy objects in the namespace with "weird" _SUN values, >> and it's why you have to check _STA to see if the slots are valid >> or not. That means the value returned by _SUN will change too, >> right? What will it turn into? >> > > Currently, it's not hotpluggable (will be hotpluggable in the future). > Here is a sample AML code to explain what my firmware is doing. > > Device (PCI0) { > Device (P2PA) { > Device (P2PB) { // for I/O unit (A) > Name (_ADR, ...) > Method (_STA) { ... } > } > Device (S0F0) { // for I/O unit (B) > Name (_ADR, ...) > Method (_STA) { ... } > Method (_EJx) { ... } > Method (_SUN) { ... } > } > ... > } > ... > } > > If the I/O unit (A) is connected, _STA of P2PB returns as present > and _STA of S0F0 returns as not present. > If the I/O unit (B) is connected, _STA of P2PB returns as not > present and _STA of S0F0 returns as present. If I/O unit A or B can never appear while the system is turned on (aka not hotpluggable), then it is incorrect to present them in the current namespace. >>> In addtion, I think we should not trust the _SUN value of >>> non-existing device because the ACPI spec says in "6.5.1 _INI >>> (Init)" that _INI method is run before _ADR, _CID, _HID, _SUN, and >>> _UID are run. It means _SUN could be initialized in _INI method >>> implecitely. And it also says that "If the _STA method indicates >>> that the device is not present, OSPM will not run the _INI and will >>> not examine the children of the device for _INI methods.". After all, >>> _SUN for non-existing device is not reliable because it might not >>> initialized by _INI method. >> This is true, but HP platforms provide _INI at the root >> device/host bridge level, not on SxFy objects, so it doesn't seem >> that we would need to call _STA before calling _SUN for SxFy. >> Does your firmware provide _INI on SxFy objects? > > No, it doesn't. But what I wanted to say was we should not use _SUN > value of non-existing device object. There is nothing illegal about evaluating _SUN for an object that returns 0x0 for _STA. Also, when you say "non-existing", I think of the ACPI CA exception code AE_NOT_EXIST which means "absent from the namespace", and is the reason why my code works on both HP and IBM machines. It does not mean "_STA == 0x0". >> Our firmware teams seem to think that _STA should give the status >> of the card for hotplug support and general functional state. >> They claim that it doesn't makes much sense to support _STA on >> the slot itself unless you can physically change the slot >> topology on the machine at runtime, which we can't do (although >> maybe you can). >> The section of the spec you quoted is correct as long as we are >> talking ACPI 2.0 or later. My platforms implement ACPI 1.0b for >> legacy reasons. :-/ >> In ACPI 1.0b, _EJx definition says (section 6.3.2): >> For hot removal, the device must be immediately ejected >> when the OS calls the _EJ0 control method. The _EJ0 >> control method does not return until ejection is >> complete. After calling _EJ0, the OS will call _STA to >> determine whether or not the eject succeeded. >> So your firmware implementation does not seem backward compatible >> with the 1.0b spec. The different versions of ACPI is part of the >> reason why my patch is breaking on your machine. > > I think this is the real reason. My platform implements ACPI 2.0 or > later. I didn't notice the chage to_EJx definition. Maybe we need to > check ACPI version in pci_slot driver. I did some experiments on HP low-end ia64 (ACPI 1.0b only) and our mid-range and high-end ia64 platforms (ACPI 2.0c). Checking for _STA before evaluating _SUN leads to the same result for me: we only detect populated slots. I think that the real issue is not 1.0 vs 2.0, but the semantics that our different firmware teams have placed on _STA. Again, - HP firmware thinks _STA should give status of the card - Fujitsu firmware thinks _STA should give status of the slot So we are at an impasse. :( >> But as long as we are quoting the spec... :) >> _SUN evaluates to a DWORD that is the number to be used >> in the user interface. This number is required to be >> unique among
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
On Sat, Dec 08, 2007 at 04:07:14AM +0530, Balbir Singh wrote: > Signed-off-by: Balbir Singh <[EMAIL PROTECTED]> Looks good to me. Sure, it could be fleshed out to something more generic and in common code, but this is small and simple and doesn't bloat the kernel much as it stands, and it has value for debugging. Acked-by: Olof Johansson <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > what do you think? Right now i've got them queued up for 2.6.25 in > > both the scheduler-devel and the x86-devel git trees - but can > > submit them for 2.6.24 if it's better if we did them there. I've got > > no strong opinion either way. > > printk_clock() doesn't seem terribly important but what's this stuff > about effects on udelay/mdelay? That can be serious if they're > getting shortened. since udelay depends on loops_per_jiffy, which is fixed up time_cpufreq_notifier(), i dont see how it could be affected by frequency changes. (but that's the theory - practice might be different) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Fri, Dec 07, 2007 at 03:53:30PM +, Huang, Ying wrote: > This patch implements the functionality of jumping between the kexeced > kernel and the original kernel. > > To support jumping between two kernels, before jumping to (executing) > the new kernel and jumping back to the original kernel, the devices > are put into quiescent state, and the state of devices and CPU is > saved. After jumping back from kexeced kernel and jumping to the new > kernel, the state of devices and CPU are restored accordingly. The > devices/CPU state save/restore code of software suspend is called to > implement corresponding function. > > To support jumping without reserving memory. One shadow backup page > (source page) is allocated for each page used by new (kexeced) kernel > (destination page). When do kexec_load, the image of new kernel is > loaded into source pages, and before executing, the destination pages > and the source pages are swapped, so the contents of destination pages > are backupped. Before jumping to the new (kexeced) kernel and after > jumping back to the original kernel, the destination pages and the > source pages are swapped too. > > A jump back protocol for kexec is defined and documented. It is an > extension to ordinary function calling protocol. So, the facility > provided by this patch can be used to call ordinary C function in real > mode. > > A set of flags for sys_kexec_load are added to control which state are > saved/restored before/after real mode code executing. For example, you > can specify the device state and FPU state are saved/restored > before/after real mode code executing. > > The states (exclude CPU state) save/restore code can be overridden > based on the "command" parameter of kexec jump. Because more states > need to be saved/restored by hibernating/resuming. > [..] > > -#define KEXEC_ON_CRASH 0x0001 > -#define KEXEC_ARCH_MASK 0x > +#define KEXEC_ON_CRASH 0x0001 > +#define KEXEC_PRESERVE_CPU 0x0002 > +#define KEXEC_PRESERVE_CPU_EXT 0x0004 > +#define KEXEC_SINGLE_CPU 0x0008 > +#define KEXEC_PRESERVE_DEVICE0x0010 > +#define KEXEC_PRESERVE_CONSOLE 0x0020 Hi, Why do we need so many different flags for preserving different types of state (CPU, CPU_EXT, Device, console) ? To keep things simple, can't we can create just one flag KEXEC_PRESERVE_CONTEXT, which will indicate any special action required for preserving the previous kernel's context so that one can swith back to old kernel? Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]
--- Stephen Smalley <[EMAIL PROTECTED]> wrote: > On Mon, 2007-12-10 at 21:08 +, David Howells wrote: > > Stephen Smalley <[EMAIL PROTECTED]> wrote: > > > > > Otherwise, only other issue I have with this interface is it won't > > > generalize to dealing with nfsd, where we want to set the acting context > > > to a context we obtain from or determine based upon the client. > > > > Are you speaking of security_kernel_act_as() and security_create_files_as() > > specifically? Or the task_struct::act_as override pointer in general? > > security_kernel_act_as() > > > I don't really know how nfsd wants to obtain and set its LSM context, so > it's > > a bit difficult for me to make something that works for nfsd as well as > > cachefiles. > > It would get a context from the client or from a local configuration > that would map security-unaware clients to a default context, and then > want to assume that context for the particular operation. No transition > involved. I would expect that the operation would be more sophisticated than that. You certainly aren't going to use what comes from the other side without any processing, and I expect you'll have some sort of operation on anything you pull from a config file before you actually apply it. > > > Why can't cachefilesd just push a context into the kernel and pass that > > > into the hook as the acting context, > > > > How does cachefilesd come up with such a context? Grab it from > > /etc/cachefilesd.conf? > > >From a config file whose pathname would be provided by libselinux (ala > the way in which dbusd imports contexts), or directly as a context > returned by a libselinux function. Has to be done that way so that it > can be set differently for different policy types (strict, targeted, > mls). Unless you've got an LSM other than SELinux, of course. If cachefilesd is going to be responsible for maintaining this magic context there needs to be an LSM interface for it, not just an SELinux interface. > Naturally, cachefiles (the kernel module) would invoke a security hook > to check whether the daemon is allowed to set the specified context. > > > I use to do that, but someone objected... Possibly Karl MacMillan. > > Yes, but I think I disagreed then too. > > > > and then nfsd can do likewise using the context provided by the client or > > > obtained locally from exports for ordinary clients? Avoids the > transition > > > SID computation altogether within the kernel and makes this more generic. > > > > I seem to remember that I was told that it should be done this way, > possibly > > by Karl MacMillan, but I don't remember exactly. > > > > Now it's configured by cachefilesd.te: > > > > type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t; > > It doesn't fit with how other users of security_kernel_act_as() will > likely want to work (they will want to just set the context to a > specified value, whether one obtained from the client or from some local > source), nor with how type transitions normally work (exec, with the > program type as the second type field). I think it will just cause > confusion and subtle breakage. I think that I agree with Stephen, although I could be mirely confused. That happens to me when interfaces are described in SELinux terms. I still don't care much for multiple contexts, and I don't have a good grasp of how you'll deal with Smack, or any LSM other than SELinux. Just as Stephen mentions, I also don't see the generality that a change of this magnitude really ought to provide. Casey Schaufler [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.24-rc4] proc: Remove/Fix proc generic d_revalidate
Quoting Andrew Morton <[EMAIL PROTECTED]>: > On Mon, 10 Dec 2007 16:32:18 +0300 "Denis V. Lunev" <[EMAIL PROTECTED]> wrote: > > > > Plese don't top-post. It makes replying to you rather awkward. > > > could you, plz, check patch sent by Eric above in this thread. > > > > I have tried it on my test node and it works for module you have > > provided. The problem exists without it. > > > > When Peter says "with your patch in place" I assume that he's referring to > Eric's latest patch, namely. Sorry, I was not clear. No, I meant Eric's original patch. Without d_revalidate() problem does not occur. Petr > > --- a/fs/proc/generic.c~proc-remove-fix-proc-generic-d_revalidate > +++ a/fs/proc/generic.c > @@ -374,16 +374,9 @@ static int proc_delete_dentry(struct den > return 1; > } > > -static int proc_revalidate_dentry(struct dentry *dentry, struct nameidata > *nd) > -{ > - d_drop(dentry); > - return 0; > -} > - > static struct dentry_operations proc_dentry_operations = > { > .d_delete = proc_delete_dentry, > - .d_revalidate = proc_revalidate_dentry, > }; > > /* > > So we still have problems, it appears. > > > > > Petr Vandrovec wrote: > > > Eric W. Biederman wrote: > > >> Ultimately to implement /proc perfectly we need an implementation > > >> of d_revalidate because files and directories can be removed behind > > >> the back of the VFS, and d_revalidate is the only way we can let > > >> the VFS know that this has happened. > > >> > > >> So until we get a proper test for keeping dentries in the dcache > > >> fix the current d_revalidate method by completely removing it. This > > >> returns us to the current status quo. > > > > > > Hello, > > >I know that I'm late to the party, but mount points is not only > > > problem with d_revalidate. With your patch in place module below gets > > > refcount incremented by two every time I do 'ls -la /proc/fs/vmblock'. > > > > > > > > > #include > > > #include > > > #include > > > > > > static int vmblockinit(void) { > > >struct proc_dir_entry *controlProcDirEntry; > > > > > >/* Create /proc/fs/vmblock */ > > >controlProcDirEntry = proc_mkdir("vmblock", proc_root_fs); > > >if (!controlProcDirEntry) { > > > printk(KERN_DEBUG "Bad...\n"); > > > return -EINVAL; > > >} > > >controlProcDirEntry->owner = THIS_MODULE; > > >return 0; > > > } > > > > > > static void vmblockexit(void) { > > >remove_proc_entry("vmblock", proc_root_fs); > > > } > > > > > > module_init(vmblockinit); > > > module_exit(vmblockexit); > > > > > > > > > (code comes from VMware's vmblock module, > > > http://sourceforge.net/project/showfiles.php?group_id=204462) > > > Thanks, > > > Petr > > > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: syslets v7: back to basics
> I pulled from your tree to look over the patches, and noticed that it > looks like several commits were merged improperly. It looks like they > were auto merged or something from an email, and the commit message > contains the email headers, rather than just the commit message in the > body. This leads to the shortlog showing entries that start with > "Return-Path:". These are patches that guilt imported from email messages. It didn't strip the headers and I didn't care to. I'll try to in the future, it isn't a big deal. > I was hoping to find at least some initial information on the overall > design in Documentation/ but don't see any. Have you written any yet > that I could take a look at elsewhere maybe? No, but it's coming. I'd like to have some robust documentation so that Ulrich can help me understand what more he'd need to support POSIX AIO with syslets from glibc. > Some of the things I was trying to figure out is does each syslet get > its own stack, Yes. Each blocking operation has a thread that is performing the operation synchronously. The benefit is that the thread is only created if the operation blocks. If it doesn't block then it's a normal system call invocation. You don't have to manage threads and communicate the arguments and results of system calls amongst threads for the case where it never blocks. > and schedule only at a few well defined points No, every blocking point is considered a scheduling point. > , and if > so, would it then be fair to characterize them as kernel mode fibers? I'm not sure what exactly you mean by kernel mode fibers (I can guess, but I'd rather not). From the answer of to the last question, though, I'm going to guess that it might not be the most apt characterization. - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] A clean approach to writeout throttling
Hi, On Dec 10, 2007 11:31 PM, Jonathan Corbet <[EMAIL PROTECTED]> wrote: > I'm just getting around to looking at this. One thing jumped out at me: > > > + if (bio->bi_throttle) { > > + struct request_queue *q = bio->bi_queue; > > + bio->bi_throttle = 0; /* or detect multiple endio and err? */ > > + atomic_add(bio->bi_throttle, >available); > > + wake_up(>throttle_wait); > > + } > > I'm feeling like I must be really dumb, but...how can that possibly > work? You're zeroing >bi_throttle before adding it back into > q->available, so the latter will never increase... Heh, well, that's ok as long as bio->bi_vcnt is set to zero and I think we have some md raid drivers do just that... ;-) Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid overflows in kernel/time.c
On Mon, 10 Dec 2007 10:59:20 -0800 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > > > My ia64 allmodconfig build has taken > > > > akpm 15700 89.6 0.0 8256 700 pts/4RN+ 03:09 10:41 bc -q > > kernel/timeconst.bc > > > > 11 minutes so far. fc6/x86_64. > > > > I just tried this on my system, using your cross-compiler chain. I got > a different error: > > /opt/crosstool/gcc-3.4.5-glibc-2.3.6/ia64-unknown-linux-gnu/lib/gcc/ia64-unknown-linux-gnu/3.4.5/../../../../ia64-unknown-linux-gnu/bin/ld: > > section .data.patch [a500 -> a507] overlaps > section .dynamic [a3c8 -> a507] > collect2: ld returned 1 exit status > make[2]: *** [arch/ia64/kernel/gate.so] Error 1 You'll need rc4-mm1's ia64-increase-datapatch-offset.patch. That's now in Tony's tree and should go into 2.6.24 IMO. > ... but the timeconst stuff worked fine. I tried it both from the > command line and using your xb script. > > This is on a fc7/x86-64 box. I also ran through all the values from 48 > to 1024 on both an fc5 and an fc7 box (no fc6 box readily available, > although bc has been at 1.06 since 2000...) > > In short, this is highly weird. Could you possibly do me a favour and > just run, at the command line: > > echo 250 | bc -q kernel/timeconst.bc That works OK. > ... and see if it reproduces the lockup (I'm assuming HZ == 250 in your > config, since that's what I get when I do "make allmodconfig" on IA64.) > > (No need to wait 11 minutes. It should run in a small fraction of a > second.) I retested 2.6.24-rc4-mm1 plus avoid-overflows-in-kernel-timec.patch and the failure has magically gone away. Ho hum. I'll reconstitute the patch and will keep an eye on it. It'd be nice to avoid the introduction of the bc dependency though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Possibly SATA related freeze killed networking and RAID
Hello, I think, I'm experiencing the same problem: 09:16:34 : NETDEV WATCHDOG: eth0: transmit timed out 09:16:34 : eth0: Got tx_timeout. irq: 09:16:34 : eth0: Ring at 37e5 09:16:34 : eth0: Dumping tx registers 09:16:34 : 0: 00ff 0003 025003ca 09:16:34 : 20: [...] 09:16:54 : ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen 09:16:54 : ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen 09:16:54 : ata6.00: cmd 25/00:08:1e:97:48/00:00:19:00:00/e0 tag 0 cdb 0x0 data 4096 in 09:16:54 : res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) 09:16:54 : ata5.00: cmd 25/00:70:1e:97:48/00:00:19:00:00/e0 tag 0 cdb 0x0 data 57344 in 09:16:54 : res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) 09:16:54 : ata6: soft resetting port 09:16:54 : ata5: soft resetting port 09:16:54 : ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 09:16:54 : ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 09:16:54 : NETDEV WATCHDOG: eth0: transmit timed out 09:16:54 : eth0: Got tx_timeout. irq: 0032 09:16:54 : eth0: Ring at 37e5 09:16:54 : eth0: Dumping tx registers A more complete log can be found at: http://www.e18.physik.tu-muenchen.de/~tnagel/misc/kernel-crash.log The setup is strikingly similar to that of noah (I'm quoting all of this by heart, if somebody is interested in more detail, just ask.): Kernel: 2.6.22 (amd64, Debian patches, tainted) Mainboard: Asus M2N-SLI Deluxe (nForce 570 SLI MCP --> MCP55, same as noah) CPU: Athlon64 Dual-Core (same as noah) RAM: 1GB HD: 22 x Samsung HD501LJ 500GB (same as noah), 1-6 connected to chipset, 7-22 connected to RocketRaid 2340. I'm using software RAID like noah, (levels 1, 5 and 6), and like with noah the problem occurred during RAID check, in my case during heavy NFS load which had been ongoing for ~4 days. This is the third time, it has happened, but only this time I could catch the logs via netconsole. The two affected drives are connected to the chipset and show no SMART errors. Unfortunately, the kernel is tainted since I'm using HighPoint's drivers for the RR2340. I don't know whether I can change this easily. Kind regards, Thiemo Nagel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
> > map_phys_fmr > > In fact, we do use hCalls there. Our hardware doesn't actually support FMRs, > so we translate a "map FMR" into a "reallocate PMR", which doesn't work > without hCalls. What's more, the hCalls involved (e.g. H_FREE_RESOURCE) > might well return H_LONG_BUSY, so the whole operation might sleep; no way > around it. It's a big problem. If you cannot implement FMRs in such a way that you can handling having map_phys_fmr being called in a context that can't sleep, then I think the only option is to remove your FMR support. It's an optional device feature, so this should be OK (although the iSER driver currently seems to depend on a device supporting FMRs, which is probably going to be a problem with iWARP support in the future anyway). The fact that consumers can map FMRs from interrupt context, while holding locks, etc, is pretty fundamental to the use of FMRs so I don't see any way around the requirement that map_phys_fmr never sleep. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc] lockless get_user_pages for dio (and more)
On Mon, 2007-10-15 at 22:25 +1000, Nick Piggin wrote: > On Monday 15 October 2007 04:19, Siddha, Suresh B wrote: > > On Sun, Oct 14, 2007 at 11:01:02AM +1000, Nick Piggin wrote: > > > This is just a really quick hack, untested ATM, but one that > > > has at least a chance of working (on x86). > > > > When we fall back to slow mode, we should decrement the ref counts > > on the pages we got so far in the fast mode. > > Here is something that is actually tested and works (not > tested with hugepages yet, though). > > However it's not 100% secure at the moment. It's actually > not completely trivial; I think we need to use an extra bit > in the present pte in order to exclude "not normal" pages, > if we want fast_gup to work on small page mappings too. I > think this would be possible to do on most architectures, but > I haven't done it here obviously. > > Still, it should be enough to test the design. I've added > fast_gup and fast_gup_slow to /proc/vmstat, which count the > number of times fast_gup was called, and the number of times > it dropped into the slowpath. It would be interesting to know > how it performs compared to your granular hugepage ptl... Nick, I've played with the fast_gup patch a bit. I was able to find a problem in follow_hugetlb_page() that Adam Litke fixed. I'm haven't been brave enough to implement it on any other architectures, but I did add a default that takes mmap_sem and calls the normal get_user_pages() if the architecture doesn't define fast_gup(). I put it in linux/mm.h, for lack of a better place, but it's a little kludgy since I didn't want mm.h to have to include sched.h. This patch is against 2.6.24-rc4. It's not ready for inclusion yet, of course. I haven't done much benchmarking. The one test I was looking at didn't show much of a change. == Introduce a new "fast_gup" (for want of a better name right now) which is basically a get_user_pages with a less general API that is more suited to the common case. - task and mm are always current and current->mm - force is always 0 - pages is always non-NULL - don't pass back vmas This allows (at least on x86), an optimistic lockless pagetable walk, without taking any page table locks or even mmap_sem. Page table existence is guaranteed by turning interrupts off (combined with the fact that we're always looking up the current mm, which would need an IPI before its pagetables could be shot down from another CPU). Many other architectures could do the same thing. Those that don't IPI could potentially RCU free the page tables and do speculative references on the pages (a la lockless pagecache) to achieve a lockless fast_gup. Originally by Nick Piggin <[EMAIL PROTECTED]> --- arch/x86/lib/Makefile_64 |2 arch/x86/lib/gup_64.c| 188 +++ fs/bio.c |8 - fs/block_dev.c |5 - fs/direct-io.c | 10 -- fs/splice.c | 38 include/asm-x86/uaccess_64.h |4 include/linux/mm.h | 26 + include/linux/vmstat.h |1 mm/vmstat.c |3 10 files changed, 231 insertions(+), 54 deletions(-) diff -Nurp linux-2.6.24-rc4/arch/x86/lib/Makefile_64 linux/arch/x86/lib/Makefile_64 --- linux-2.6.24-rc4/arch/x86/lib/Makefile_64 2007-12-04 08:44:34.0 -0600 +++ linux/arch/x86/lib/Makefile_64 2007-12-10 15:01:17.0 -0600 @@ -10,4 +10,4 @@ obj-$(CONFIG_SMP) += msr-on-cpu.o lib-y := csum-partial_64.o csum-copy_64.o csum-wrappers_64.o delay_64.o \ usercopy_64.o getuser_64.o putuser_64.o \ thunk_64.o clear_page_64.o copy_page_64.o bitstr_64.o bitops_64.o -lib-y += memcpy_64.o memmove_64.o memset_64.o copy_user_64.o rwlock_64.o copy_user_nocache_64.o +lib-y += memcpy_64.o memmove_64.o memset_64.o copy_user_64.o rwlock_64.o copy_user_nocache_64.o gup_64.o diff -Nurp linux-2.6.24-rc4/arch/x86/lib/gup_64.c linux/arch/x86/lib/gup_64.c --- linux-2.6.24-rc4/arch/x86/lib/gup_64.c 1969-12-31 18:00:00.0 -0600 +++ linux/arch/x86/lib/gup_64.c 2007-12-10 15:01:17.0 -0600 @@ -0,0 +1,188 @@ +/* + * Lockless fast_gup for x86 + * + * Copyright (C) 2007 Nick Piggin + * Copyright (C) 2007 Novell Inc. + */ +#include +#include +#include +#include + +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, +int write, struct page **pages, int *nr) +{ + pte_t *ptep; + + /* XXX: this won't work for 32-bit (must map pte) */ + ptep = (pte_t *)pmd_page_vaddr(pmd) + pte_index(addr); + do { + pte_t pte = *ptep; + unsigned long pfn; + struct page *page; + + if ((pte_val(pte) & (_PAGE_PRESENT|_PAGE_USER)) != + (_PAGE_PRESENT|_PAGE_USER)) + return 0; + + if (write && !pte_write(pte)) +
Re: [RFC] [PATCH] A clean approach to writeout throttling
Hey, Daniel, I'm just getting around to looking at this. One thing jumped out at me: > + if (bio->bi_throttle) { > + struct request_queue *q = bio->bi_queue; > + bio->bi_throttle = 0; /* or detect multiple endio and err? */ > + atomic_add(bio->bi_throttle, >available); > + wake_up(>throttle_wait); > + } I'm feeling like I must be really dumb, but...how can that possibly work? You're zeroing >bi_throttle before adding it back into q->available, so the latter will never increase... jon -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: syslets v7: back to basics
Zach Brown wrote: The following patches are a substantial refactoring of the syslet code. I'm branding them as the v7 release of the syslet infrastructure, though they represent a signifiant change in focus. My current focus is to see the most fundamental functionality brought to maturity. To me, this means getting a ABI that is used by applications through glibc on x86 and PPC64. Only once that is ready should we distract ourselves with advanced complexity. I pulled from your tree to look over the patches, and noticed that it looks like several commits were merged improperly. It looks like they were auto merged or something from an email, and the commit message contains the email headers, rather than just the commit message in the body. This leads to the shortlog showing entries that start with "Return-Path:". I was hoping to find at least some initial information on the overall design in Documentation/ but don't see any. Have you written any yet that I could take a look at elsewhere maybe? Some of the things I was trying to figure out is does each syslet get its own stack, and schedule only at a few well defined points, and if so, would it then be fair to characterize them as kernel mode fibers? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]
On Mon, 2007-12-10 at 21:08 +, David Howells wrote: > Stephen Smalley <[EMAIL PROTECTED]> wrote: > > > Otherwise, only other issue I have with this interface is it won't > > generalize to dealing with nfsd, where we want to set the acting context > > to a context we obtain from or determine based upon the client. > > Are you speaking of security_kernel_act_as() and security_create_files_as() > specifically? Or the task_struct::act_as override pointer in general? security_kernel_act_as() > I don't really know how nfsd wants to obtain and set its LSM context, so it's > a bit difficult for me to make something that works for nfsd as well as > cachefiles. It would get a context from the client or from a local configuration that would map security-unaware clients to a default context, and then want to assume that context for the particular operation. No transition involved. > > Why can't cachefilesd just push a context into the kernel and pass that > > into the hook as the acting context, > > How does cachefilesd come up with such a context? Grab it from > /etc/cachefilesd.conf? >From a config file whose pathname would be provided by libselinux (ala the way in which dbusd imports contexts), or directly as a context returned by a libselinux function. Has to be done that way so that it can be set differently for different policy types (strict, targeted, mls). Naturally, cachefiles (the kernel module) would invoke a security hook to check whether the daemon is allowed to set the specified context. > I use to do that, but someone objected... Possibly Karl MacMillan. Yes, but I think I disagreed then too. > > and then nfsd can do likewise using the context provided by the client or > > obtained locally from exports for ordinary clients? Avoids the transition > > SID computation altogether within the kernel and makes this more generic. > > I seem to remember that I was told that it should be done this way, possibly > by Karl MacMillan, but I don't remember exactly. > > Now it's configured by cachefilesd.te: > > type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t; It doesn't fit with how other users of security_kernel_act_as() will likely want to work (they will want to just set the context to a specified value, whether one obtained from the client or from some local source), nor with how type transitions normally work (exec, with the program type as the second type field). I think it will just cause confusion and subtle breakage. -- Stephen Smalley National Security Agency -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] FireWire update
Linus, please pull from the for-linus branch at git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6.git for-linus to receive the following FireWire subsystem update. This considerably enhances compatibility of the new firewire-ohci driver with a number of controllers. It shrinks the list of chips with trouble with isochronous reception to VIA VT6306 and some variants of VT6307. The patch is somewhat big for this late -rc phase, and it has so far only surfaced in 2.6.24-rc4-mm1 and in recent Fedora test kernels. But the author and I did a lot of tests with as much previously working chips as we could get our hands on (some more than listed below) to make sure that there is no regression. drivers/firewire/fw-ohci.c | 175 +++- 1 files changed, 155 insertions(+), 20 deletions(-) Jarod Wilson (1): firewire: OHCI 1.0 Isochronous Receive support Full log and diff: commit a186b4a6b22fdc96a1ed63da483d267b5d00839e Author: Jarod Wilson <[EMAIL PROTECTED]> Date: Mon Dec 3 13:43:12 2007 -0500 firewire: OHCI 1.0 Isochronous Receive support Third rendition of FireWire OHCI 1.0 Isochronous Receive support, using a zer-copy method similar to OHCI 1.1 which puts the IR data payload directly into the userspace buffer. The zero-copy implementation eliminates the video artifacts, audio popping, and buffer underrun problems seen with version 1 of this patch, as well as fixing a regression in OHCI 1.1 support introduced by version 2 of this patch. Successfully tested in OHCI 1.1 mode on the following chipsets: - NEC uPD72847 (rev 01), OHCI 1.1 (PCI) - Ti XIO2200(A) (rev 01), OHCI 1.1 (PCIe) - Ti TSB41AB2 (rev 01), OHCI 1.1 (PCI on SB Audigy) - Apple UniNorth 2 (rev 81), OHCI 1.1 (PowerBook G4 onboard) Successfully tested in OHCI 1.0 mode on the following chipsets: - Agere FW323 (rev 06), OHCI 1.0 (Mac Mini onboard) - Agere FW323 (rev 06), OHCI 1.0 (PCI) - Via VT6306 (rev 46), OHCI 1.0 (PCI) - NEC OrangeLink (rev 01), OHCI 1.0 (PCI) - NEC uPD72847 (rev 01), OHCI 1.1 (PCI) - Ti XIO2200(A) (rev 01), OHCI 1.1 (PCIe) The bulk of testing was done in an x86_64 system, but was also successfully sanity-tested on other systems, including a PPC(32) PowerBook G4 and an i686 EPIA M10k. Crude benchmarking (watching top during capture) puts the cpu utilization during capture on the EPIA's 1GHz Via C3 processor around 13%, which is down from 30% with the v1 code. Some implementation details: To maintain the same userspace API as dual-buffer mode, we set up two descriptors for every incoming packet. The first is an INPUT_MORE descriptor, pointing to a buffer large enough to hold just the packet's iso headers, immediately followed by an INPUT_LAST descriptor, pointing to a chunk of the userspace buffer big enough for the packet's data payload. With this setup, each incoming packet fills in these two descriptors in a manner that very closely emulates dual-buffer receive, to the point where the bulk of the handle_ir_* code is now identical between the two (and probably primed for some restructuring to share code between them). The only caveat I have at the moment is that neither of my OHCI 1.0 Via VT6307-based FireWire controllers work particularly well with this code for reasons I have yet to figure out. Signed-off-by: Jarod Wilson <[EMAIL PROTECTED]> Signed-off-by: Stefan Richter <[EMAIL PROTECTED]> diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c index c9b9081..436a855 100644 --- a/drivers/firewire/fw-ohci.c +++ b/drivers/firewire/fw-ohci.c @@ -437,6 +437,21 @@ static void ar_context_run(struct ar_context *ctx) flush_writes(ctx->ohci); } +static struct descriptor * +find_branch_descriptor(struct descriptor *d, int z) +{ + int b, key; + + b = (le16_to_cpu(d->control) & DESCRIPTOR_BRANCH_ALWAYS) >> 2; + key = (le16_to_cpu(d->control) & DESCRIPTOR_KEY_IMMEDIATE) >> 8; + + /* figure out which descriptor the branch address goes in */ + if (z == 2 && (b == 3 || key == 2)) + return d; + else + return d + z - 1; +} + static void context_tasklet(unsigned long data) { struct context *ctx = (struct context *) data; @@ -455,7 +470,7 @@ static void context_tasklet(unsigned long data) address = le32_to_cpu(last->branch_address); z = address & 0xf; d = ctx->buffer + (address - ctx->buffer_bus) / sizeof(*d); - last = (z == 2) ? d : d + z - 1; + last = find_branch_descriptor(d, z); if (!ctx->callback(ctx, d, last)) break; @@ -566,7 +581,7 @@ static void context_append(struct context *ctx, ctx->head_descriptor = d + z + extra;