Re: [Bugme-new] [Bug 8100] New: dynticks makes ksoftirqd1 use unreasonable amount of cpu time
On Wed, 28 Feb 2007 09:34:10 -0800 [EMAIL PROTECTED] wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=8100 > >Summary: dynticks makes ksoftirqd1 use unreasonable amount of cpu > time > Kernel Version: 2.6.21-rc2 > Status: NEW > Severity: low > Owner: [EMAIL PROTECTED] > Submitter: [EMAIL PROTECTED] > > > Most recent kernel where this bug did *NOT* occur: > any kernel without dynticks > > Distribution: > Debian etch with linux-2.6.21-rc{2,1} > > Hardware Environment: > Macbook core2 with bios emulation > > Software Environment: > The problem is obvious when listening to shoutcast stream with kmplayer and > artsd via wi-fi with wpa (wpa_supplicant) > > Problem Description: > ksoftirqd1 uses ~30% cpu-time (by top) no other symptoms, while > without dyntikcs cpu-load in similar circumstances is negligible. > This might be a dynticks feature rather than bug. > > Steps to reproduce: > Just watch the top, if the bug is reproducible, probably just booting should > suffice. > > --- You are receiving this mail because: --- > You are on the CC list for the bug, or are watching someone who is. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc1: known regressions (v2) (part 2)
On Wednesday 28 February 2007 15:21, Mike Galbraith wrote: > (hrmph. having to copy/paste/try again. evolution seems to be broken.. > RCPT TO <[EMAIL PROTECTED]> failed: Cannot resolve your domain > {mp049} ..caused me to be unable to send despite receipts being disabled) Apologies for mangling the email address as I said :-( > On Wed, 2007-02-28 at 09:58 +1100, Con Kolivas wrote: > > On Tuesday 27 February 2007 19:54, Mike Galbraith wrote: > > > Agreed. > > > > > > I was recently looking at that spot because I found that niced tasks > > > were taking latency hits, and disabled it, which helped a bunch. > > > > Ok... as I said above to Ingo, nice means more latency too, and there is > > no doubt that if we disable nice as a working feature then the niced > > tasks will have less latency. Of course, this ends up meaning that > > un-niced tasks no longer receive their better cpu performance.. You're > > basically saying that you prefer nice not to work in the setting of HT. > > No I'm not, but let's go further in that direction just for the sake of > argument. You're then saying that you prefer realtime priorities to not > work in the HT setting, given that realtime tasks don't participate in > the 'single stream me' program. Where do I say that? I do not presume to manage realtime priorities in any way. You're turning my argument about nice levels around and somehow saying that because hyperthreading breaks the single stream me semantics by parallelising them that I would want to stop that happening. Nowhere have I argued that realtime semantics should be changed to somehow work around hyperthreading. SMT nice is about managing nice only, and not realtime priorities which are independent entities. > I'm saying only that we're defeating the purpose of HT, and overriding a > user decision every time we force a sibling idle. > > > > I also > > > can't understand why it would be OK to interleave a normal task with an > > > RT task sometimes, but not others.. that's meaningless to the RT task. > > > > Clearly there would be a reason that code is there... The whole problem > > with HT is that as soon as you load up a sibling, you slow down the > > logical sibling, hence why this code is there in the first place. Since I > > know you're one to test things for yourself, I will put it to you this > > way: > > > > Boot into UP. Time how long it takes to do a million of these in a real > > time task: > > asm volatile("" : : : "memory"); > > > > Then start up a SCHED_NORMAL task fully cpu bound such as "yes > > > /dev/null" and time that again. Obviously the former being a realtime > > task will take the same amount of time and the SCHED_NORMAL task will be > > starved until the realtime task finishes. > > Sure. > > > Now try the same experiment with hyperthreading enabled and an ordinary > > SMP kernel. You'll find the realtime task runs at only ~60% performance. > > So? User asked for HT. That's hardware multiplexing. It ain't free. > Buyer beware. But the buyer is not aware. You are aware because you tinker, but the vast majority of users who enable hyperthreading in their shiny pcs are not aware. The only thing they know is that if they enable hyperthreading their programs run slower in multitasking environments no matter how much they nice the other processes. Buyers do not buy hardware knowing that the internal design breaks something as fundamental as 'nice'. You seem to presume that most people who get hyperthreading are happy to compromise 'nice' in order to get their second core working and I put it to you that they do not make that decision. > > That's a > > serious performance hit for realtime tasks considering you're running a > > SCHED_NORMAL task. The SMT code that you seem to dislike fixes this > > problem. > > I don't think it does actually. Let your RT task sleep regularly, and > ever so briefly. We don't evict lower priority tasks from siblings upon > wakeup, we only prevent entry... sometimes. Well you know as well as I do that you're selecting out the exception rather than the rule, and statistically overall, it does work. > > The reason for interleaving is that there are a few cycles to be gained > > by using the second core for a separate SCHED_NORMAL task, and you don't > > want to disable access to the second core entirely for the duration the > > realtime task is running. Since there is no simple relationship between > > SCHED_NORMAL timeslices and realtime timeslices, we have to use some form > > of interleaving based on the expected extra cycles and HZ is the obvious > > choice. > > To me, the reason for interleaving is solely about keeping the core > busy . It has nothing to do with SCHED_POLICY_X what so ever. > > > > IMHO, SMT scheduling should be a buyer beware thing. Maximizing your > > > core utilization comes at a price, but so does disabling it, so I think > > > letting the user decide what he wants is the right thing to do. > > > > To me this is
Re: [PATCH] SLUB The unqueued slab allocator V3
From: Christoph Lameter <[EMAIL PROTECTED]> Date: Wed, 28 Feb 2007 11:20:44 -0800 (PST) > V2->V3 > - Debugging and diagnostic support. This is runtime enabled and not compile > time enabled. Runtime debugging can be controlled via kernel boot options > on an individual slab cache basis or globally. > - Slab Trace support (For individual slab caches). > - Resiliency support: If basic sanity checks are enabled (via F f.e.) > (boot option) then SLUB will do the best to perform diagnostics and > then continue (i.e. mark corrupted objects as used). > - Fix up numerous issues including clash of SLUBs use of page > flags with i386 arch use for pmd and pgds (which are managed > as slab caches, sigh). > - Dynamic per CPU array sizing. > - Explain SLUB slabcache flags V3 doesn't boot successfully on sparc64, sorry I don't have the ability to track this down at the moment since it resets the machine right as the video device is initialized and after diffing V2 to V3 there is way too much stuff changing for me to try and "bisect" between V2 to V3 to find the guilty sub-change. Maybe if you managed your individual changes in GIT or similar this could be debugged very quickly. :-) Meanwhile I noticed that your alignment algorithm is different than SLAB's. And I think this is important for the page table SLABs that some platforms use. No matter what flags are specified, SLAB gives at least the passed in alignment specified in kmem_cache_create(). That logic in slab is here: /* 3) caller mandated alignment */ if (ralign < align) { ralign = align; } Whereas SLUB uses the CPU cacheline size when the MUSTALIGN flag is set. Architectures do things like: pgtable_cache = kmem_cache_create("pgtable_cache", PAGE_SIZE, PAGE_SIZE, SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, zero_ctor, NULL); to get a PAGE_SIZE aligned slab, SLUB doesn't give the same behavior SLAB does in this case. Arguably SLAB_HWCACHE_ALIGN and SLAB_MUST_HWCACHE_ALIGN should not be set here, but SLUBs change in semantics in this area could cause similar grief in other areas, an audit is probably in order. The above example was from sparc64, but x86 does the same thing as probably do other platforms which use SLAB for pagetables. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 12/12] syslets: x86_64: add syslet/threadlet support
From: Ingo Molnar <[EMAIL PROTECTED]> add the arch specific bits of syslet/threadlet support to x86_64. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86_64/Kconfig|4 ++ arch/x86_64/ia32/ia32entry.S | 20 ++- arch/x86_64/kernel/entry.S | 72 - arch/x86_64/kernel/process.c | 11 ++ include/asm-x86_64/processor.h | 16 + include/asm-x86_64/system.h| 12 ++ include/asm-x86_64/unistd.h| 29 +++- 7 files changed, 160 insertions(+), 4 deletions(-) Index: linux/arch/x86_64/Kconfig === --- linux.orig/arch/x86_64/Kconfig +++ linux/arch/x86_64/Kconfig @@ -36,6 +36,10 @@ config ZONE_DMA32 bool default y +config ASYNC_SUPPORT + bool + default y + config LOCKDEP_SUPPORT bool default y Index: linux/arch/x86_64/ia32/ia32entry.S === --- linux.orig/arch/x86_64/ia32/ia32entry.S +++ linux/arch/x86_64/ia32/ia32entry.S @@ -368,6 +368,14 @@ quiet_ni_syscall: PTREGSCALL stub32_vfork, sys_vfork, %rdi PTREGSCALL stub32_iopl, sys_iopl, %rsi PTREGSCALL stub32_rt_sigsuspend, sys_rt_sigsuspend, %rdx + /* +* sys_async_thread() and sys_async_exec() both take 2 parameters, +* none of which is ptregs - but the syscalls rely on being able to +* modify ptregs. So we put ptregs into the 3rd parameter - so it's +* unused and it also does not mess up the first 2 parameters: +*/ + PTREGSCALL stub32_compat_async_exec, compat_sys_async_exec, %rdx + PTREGSCALL stub32_compat_async_thread, sys_async_thread, %rdx ENTRY(ia32_ptregs_common) popq %r11 @@ -394,6 +402,9 @@ END(ia32_ptregs_common) .section .rodata,"a" .align 8 +.globl compat_sys_call_table +compat_sys_call_table: +.globl ia32_sys_call_table ia32_sys_call_table: .quad sys_restart_syscall .quad sys_exit @@ -714,9 +725,16 @@ ia32_sys_call_table: .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu .quad sys_epoll_pwait + .quad stub32_compat_async_exec /* 320 */ + .quad sys_async_wait + .quad sys_umem_add + .quad stub32_compat_async_thread + .quad sys_threadlet_on + .quad sys_threadlet_off /* 325 */ +.globl ia32_syscall_end ia32_syscall_end: Index: linux/arch/x86_64/kernel/entry.S === --- linux.orig/arch/x86_64/kernel/entry.S +++ linux/arch/x86_64/kernel/entry.S @@ -410,6 +410,14 @@ END(\label) PTREGSCALL stub_rt_sigsuspend, sys_rt_sigsuspend, %rdx PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx PTREGSCALL stub_iopl, sys_iopl, %rsi + /* +* sys_async_thread() and sys_async_exec() both take 2 parameters, +* none of which is ptregs - but the syscalls rely on being able to +* modify ptregs. So we put ptregs into the 3rd parameter - so it's +* unused and it also does not mess up the first 2 parameters: +*/ + PTREGSCALL stub_async_thread, sys_async_thread, %rdx + PTREGSCALL stub_async_exec, sys_async_exec, %rdx ENTRY(ptregscall_common) popq %r11 @@ -430,7 +438,7 @@ ENTRY(ptregscall_common) ret CFI_ENDPROC END(ptregscall_common) - + ENTRY(stub_execve) CFI_STARTPROC popq %r11 @@ -990,6 +998,68 @@ child_rip: ENDPROC(child_rip) /* + * Create an async kernel thread. + * + * C extern interface: + * extern long create_async_thread(int (*fn)(void *), void * arg, unsigned long flags) + * + * asm input arguments: + * rdi: fn, rsi: arg, rdx: flags + */ +ENTRY(create_async_thread) + CFI_STARTPROC + FAKE_STACK_FRAME $async_child_rip + SAVE_ALL + + # rdi: flags, rsi: usp, rdx: will be _regs + movq %rdx,%rdi + movq $-1, %rsi + movq %rsp, %rdx + + xorl %r8d,%r8d + xorl %r9d,%r9d + + # clone now + call do_fork + movq %rax,RAX(%rsp) + xorl %edi,%edi + + /* +* It isn't worth to check for reschedule here, +* so internally to the x86_64 port you can rely on kernel_thread() +* not to reschedule the child before returning, this avoids the need +* of hacks for example to fork off the per-CPU idle tasks. + * [Hopefully no generic code relies on the reschedule -AK] +*/ + RESTORE_ALL + UNFAKE_STACK_FRAME + ret + CFI_ENDPROC +ENDPROC(async_kernel_thread) + +async_child_rip: + CFI_STARTPROC + + movq %rdi, %rax + movq %rsi, %rdi + call
solved Re: 2.6.20 SATA error
On Wed, 28 Feb 2007, Charles Shannon Hendrix wrote: > On Wed, 28 Feb 2007 13:25:00 -0500 (EST) > Gerhard Mack <[EMAIL PROTECTED]> wrote: > > > > > In another thread, I think they were saying it was either a SATA chipset > > > driver bug, or a problem in libata core. > > > > I also have an nforce4. > > On another mailing list, someone with an Intel chipset is reporting the same > problem, and also that others without nforce chipsets are seeing it. I was reaching inside my computer to check something and heared the thing click and got the same error message. Turns out the adaptor that goes between SATA drive and the old style power connector was loose on the drive side. Doesn't seem to me like it was very snug fitting to begin with. I changed it to one of the proper SATA connectors comming off the power supply and it doesn't do that anymore. Sorry for the false alarm, Gerhard -- Gerhard Mack [EMAIL PROTECTED] <>< As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fix implicit declaration in nv_backlight.
> On Wed, Feb 28, 2007 at 12:36:25PM -0500, Dave Jones wrote: >> +#ifdef __powerpc__ > > Is __powerpc__ defined when cross compiling? I'd rather use > CONFIG_PMAC_BACKLIGHT instead of it. Agree with this too. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 09/12] syslets: x86, mark async unsafe syscalls
From: Ingo Molnar <[EMAIL PROTECTED]> mark clone() and fork() as not available for async execution. Both need an intact user context beneath them to work. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- arch/i386/kernel/ioport.c |6 ++ arch/i386/kernel/ldt.c |3 +++ arch/i386/kernel/process.c |6 ++ arch/i386/kernel/vm86.c|6 ++ 4 files changed, 21 insertions(+) Index: linux/arch/i386/kernel/ioport.c === --- linux.orig/arch/i386/kernel/ioport.c +++ linux/arch/i386/kernel/ioport.c @@ -62,6 +62,9 @@ asmlinkage long sys_ioperm(unsigned long struct tss_struct * tss; unsigned long *bitmap; + if (async_syscall(current)) + return -ENOSYS; + if ((from + num <= from) || (from + num > IO_BITMAP_BITS)) return -EINVAL; if (turn_on && !capable(CAP_SYS_RAWIO)) @@ -139,6 +142,9 @@ asmlinkage long sys_iopl(unsigned long u unsigned int old = (regs->eflags >> 12) & 3; struct thread_struct *t = >thread; + if (async_syscall(current)) + return -ENOSYS; + if (level > 3) return -EINVAL; /* Trying to gain more privileges? */ Index: linux/arch/i386/kernel/ldt.c === --- linux.orig/arch/i386/kernel/ldt.c +++ linux/arch/i386/kernel/ldt.c @@ -233,6 +233,9 @@ asmlinkage int sys_modify_ldt(int func, { int ret = -ENOSYS; + if (async_syscall(current)) + return -ENOSYS; + switch (func) { case 0: ret = read_ldt(ptr, bytecount); Index: linux/arch/i386/kernel/process.c === --- linux.orig/arch/i386/kernel/process.c +++ linux/arch/i386/kernel/process.c @@ -750,6 +750,9 @@ struct task_struct fastcall * __switch_t asmlinkage int sys_fork(struct pt_regs regs) { + if (async_syscall(current)) + return -ENOSYS; + return do_fork(SIGCHLD, regs.esp, , 0, NULL, NULL); } @@ -759,6 +762,9 @@ asmlinkage int sys_clone(struct pt_regs unsigned long newsp; int __user *parent_tidptr, *child_tidptr; + if (async_syscall(current)) + return -ENOSYS; + clone_flags = regs.ebx; newsp = regs.ecx; parent_tidptr = (int __user *)regs.edx; Index: linux/arch/i386/kernel/vm86.c === --- linux.orig/arch/i386/kernel/vm86.c +++ linux/arch/i386/kernel/vm86.c @@ -209,6 +209,9 @@ asmlinkage int sys_vm86old(struct pt_reg struct task_struct *tsk; int tmp, ret = -EPERM; + if (async_syscall(current)) + return -ENOSYS; + tsk = current; if (tsk->thread.saved_esp0) goto out; @@ -239,6 +242,9 @@ asmlinkage int sys_vm86(struct pt_regs r int tmp, ret; struct vm86plus_struct __user *v86; + if (async_syscall(current)) + return -ENOSYS; + tsk = current; switch (regs.ebx) { case VM86_REQUEST_IRQ: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 06/12] x86: split FPU state from task state
From: Arjan van de Ven <[EMAIL PROTECTED]> Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two (future) optimizations: 1) allocate the right size for the actual cpu rather than 512 bytes always 2) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/i386/kernel/i387.c| 96 - arch/i386/kernel/process.c | 56 +++ arch/i386/kernel/traps.c | 10 include/asm-i386/i387.h|6 +- include/asm-i386/processor.h |6 ++ include/asm-i386/thread_info.h |6 ++ kernel/fork.c |7 ++ 7 files changed, 123 insertions(+), 64 deletions(-) Index: linux/arch/i386/kernel/i387.c === --- linux.orig/arch/i386/kernel/i387.c +++ linux/arch/i386/kernel/i387.c @@ -31,9 +31,9 @@ void mxcsr_feature_mask_init(void) unsigned long mask = 0; clts(); if (cpu_has_fxsr) { - memset(>thread.i387.fxsave, 0, sizeof(struct i387_fxsave_struct)); - asm volatile("fxsave %0" : : "m" (current->thread.i387.fxsave)); - mask = current->thread.i387.fxsave.mxcsr_mask; + memset(>thread.i387->fxsave, 0, sizeof(struct i387_fxsave_struct)); + asm volatile("fxsave %0" : : "m" (current->thread.i387->fxsave)); + mask = current->thread.i387->fxsave.mxcsr_mask; if (mask == 0) mask = 0xffbf; } mxcsr_feature_mask &= mask; @@ -49,16 +49,16 @@ void mxcsr_feature_mask_init(void) void init_fpu(struct task_struct *tsk) { if (cpu_has_fxsr) { - memset(>thread.i387.fxsave, 0, sizeof(struct i387_fxsave_struct)); - tsk->thread.i387.fxsave.cwd = 0x37f; + memset(>thread.i387->fxsave, 0, sizeof(struct i387_fxsave_struct)); + tsk->thread.i387->fxsave.cwd = 0x37f; if (cpu_has_xmm) - tsk->thread.i387.fxsave.mxcsr = 0x1f80; + tsk->thread.i387->fxsave.mxcsr = 0x1f80; } else { - memset(>thread.i387.fsave, 0, sizeof(struct i387_fsave_struct)); - tsk->thread.i387.fsave.cwd = 0x037fu; - tsk->thread.i387.fsave.swd = 0xu; - tsk->thread.i387.fsave.twd = 0xu; - tsk->thread.i387.fsave.fos = 0xu; + memset(>thread.i387->fsave, 0, sizeof(struct i387_fsave_struct)); + tsk->thread.i387->fsave.cwd = 0x037fu; + tsk->thread.i387->fsave.swd = 0xu; + tsk->thread.i387->fsave.twd = 0xu; + tsk->thread.i387->fsave.fos = 0xu; } /* only the device not available exception or ptrace can call init_fpu */ set_stopped_child_used_math(tsk); @@ -152,18 +152,18 @@ static inline unsigned long twd_fxsr_to_ unsigned short get_fpu_cwd( struct task_struct *tsk ) { if ( cpu_has_fxsr ) { - return tsk->thread.i387.fxsave.cwd; + return tsk->thread.i387->fxsave.cwd; } else { - return (unsigned short)tsk->thread.i387.fsave.cwd; + return (unsigned short)tsk->thread.i387->fsave.cwd; } } unsigned short get_fpu_swd( struct task_struct *tsk ) { if ( cpu_has_fxsr ) { - return tsk->thread.i387.fxsave.swd; + return tsk->thread.i387->fxsave.swd; } else { - return (unsigned short)tsk->thread.i387.fsave.swd; + return (unsigned short)tsk->thread.i387->fsave.swd; } } @@ -171,9 +171,9 @@ unsigned short get_fpu_swd( struct task_ unsigned short get_fpu_twd( struct task_struct *tsk ) { if ( cpu_has_fxsr ) { - return tsk->thread.i387.fxsave.twd; + return tsk->thread.i387->fxsave.twd; } else { - return (unsigned short)tsk->thread.i387.fsave.twd; + return (unsigned short)tsk->thread.i387->fsave.twd; } } #endif /* 0 */ @@ -181,7 +181,7 @@ unsigned short get_fpu_twd( struct task_ unsigned short get_fpu_mxcsr( struct task_struct *tsk ) { if ( cpu_has_xmm ) { - return tsk->thread.i387.fxsave.mxcsr; + return tsk->thread.i387->fxsave.mxcsr; } else { return 0x1f80; } @@ -192,27 +192,27 @@ unsigned short get_fpu_mxcsr( struct tas void set_fpu_cwd( struct task_struct *tsk, unsigned short cwd ) { if ( cpu_has_fxsr ) { - tsk->thread.i387.fxsave.cwd = cwd; + tsk->thread.i387->fxsave.cwd = cwd; } else { -
[patch 11/12] syslets: x86, wire up the syslet system calls
From: Ingo Molnar <[EMAIL PROTECTED]> wire up the new syslet / async system call syscalls and make it thus available to user-space. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- arch/i386/kernel/syscall_table.S |6 ++ include/asm-i386/unistd.h|8 +++- 2 files changed, 13 insertions(+), 1 deletion(-) Index: linux/arch/i386/kernel/syscall_table.S === --- linux.orig/arch/i386/kernel/syscall_table.S +++ linux/arch/i386/kernel/syscall_table.S @@ -319,3 +319,9 @@ ENTRY(sys_call_table) .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_async_exec/* 320 */ + .long sys_async_wait + .long sys_umem_add + .long sys_async_thread + .long sys_threadlet_on + .long sys_threadlet_off /* 325 */ Index: linux/include/asm-i386/unistd.h === --- linux.orig/include/asm-i386/unistd.h +++ linux/include/asm-i386/unistd.h @@ -327,10 +327,16 @@ #define __NR_move_pages317 #define __NR_getcpu318 #define __NR_epoll_pwait 319 +#define __NR_async_exec320 +#define __NR_async_wait321 +#define __NR_umem_add 322 +#define __NR_async_thread 323 +#define __NR_threadlet_on 324 +#define __NR_threadlet_off 325 #ifdef __KERNEL__ -#define NR_syscalls 320 +#define NR_syscalls 326 #ifndef __ASSEMBLY__ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 10/12] syslets: x86: enable ASYNC_SUPPORT
From: Ingo Molnar <[EMAIL PROTECTED]> enable CONFIG_ASYNC_SUPPORT on x86. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- arch/i386/Kconfig |4 1 file changed, 4 insertions(+) Index: linux/arch/i386/Kconfig === --- linux.orig/arch/i386/Kconfig +++ linux/arch/i386/Kconfig @@ -55,6 +55,10 @@ config ZONE_DMA bool default y +config ASYNC_SUPPORT + bool + default y + config SBUS bool - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 07/12] syslets: x86, add create_async_thread() method
From: Ingo Molnar <[EMAIL PROTECTED]> add the create_async_thread() way of creating kernel threads: these threads first execute a kernel function and when they return from it they execute user-space. An architecture must implement this interface before it can turn CONFIG_ASYNC_SUPPORT on. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- arch/i386/kernel/entry.S | 25 + arch/i386/kernel/process.c | 31 +++ include/asm-i386/processor.h | 17 + include/asm-i386/unistd.h| 10 ++ 4 files changed, 83 insertions(+) Index: linux/arch/i386/kernel/entry.S === --- linux.orig/arch/i386/kernel/entry.S +++ linux/arch/i386/kernel/entry.S @@ -1034,6 +1034,31 @@ ENTRY(kernel_thread_helper) CFI_ENDPROC ENDPROC(kernel_thread_helper) +ENTRY(async_thread_helper) + CFI_STARTPROC + /* +* Allocate space on the stack for pt-regs. +* sizeof(struct pt_regs) == 64, and we've got 8 bytes on the +* kernel stack already: +*/ + subl $64-8, %esp + CFI_ADJUST_CFA_OFFSET 64-8 + movl %edx,%eax + push %edx + CFI_ADJUST_CFA_OFFSET 4 + call *%ebx + addl $4, %esp + CFI_ADJUST_CFA_OFFSET -4 + + movl %eax, PT_EAX(%esp) + + GET_THREAD_INFO(%ebp) + + jmp syscall_exit + CFI_ENDPROC +ENDPROC(async_thread_helper) + + .section .rodata,"a" #include "syscall_table.S" Index: linux/arch/i386/kernel/process.c === --- linux.orig/arch/i386/kernel/process.c +++ linux/arch/i386/kernel/process.c @@ -355,6 +355,37 @@ int kernel_thread(int (*fn)(void *), voi EXPORT_SYMBOL(kernel_thread); /* + * This gets run with %ebx containing the + * function to call, and %edx containing + * the "args". + */ +extern void async_thread_helper(void); + +/* + * Create an async thread + */ +int create_async_thread(long (*fn)(void *), void * arg, unsigned long flags) +{ + struct pt_regs regs; + + memset(, 0, sizeof(regs)); + + regs.ebx = (unsigned long) fn; + regs.edx = (unsigned long) arg; + + regs.xds = __USER_DS; + regs.xes = __USER_DS; + regs.xfs = __KERNEL_PDA; + regs.orig_eax = -1; + regs.eip = (unsigned long) async_thread_helper; + regs.xcs = __KERNEL_CS | get_kernel_rpl(); + regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2; + + /* Ok, create the new task.. */ + return do_fork(flags, 0, , 0, NULL, NULL); +} + +/* * Free current thread data structures etc.. */ void exit_thread(void) Index: linux/include/asm-i386/processor.h === --- linux.orig/include/asm-i386/processor.h +++ linux/include/asm-i386/processor.h @@ -472,6 +472,11 @@ extern void prepare_to_copy(struct task_ */ extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags); +/* + * create an async thread: + */ +extern int create_async_thread(long (*fn)(void *), void * arg, unsigned long flags); + extern unsigned long thread_saved_pc(struct task_struct *tsk); void show_trace(struct task_struct *task, struct pt_regs *regs, unsigned long *stack); @@ -504,6 +509,18 @@ unsigned long get_wchan(struct task_stru #define KSTK_EIP(task) (task_pt_regs(task)->eip) #define KSTK_ESP(task) (task_pt_regs(task)->esp) +/* + * Register access methods for async syscall support. + * + * Note, task_stack_reg() must not be an lvalue, hence this macro: + */ +#define task_stack_reg(t) \ + ({ unsigned long __esp = task_pt_regs(t)->esp; __esp; }) +#define set_task_stack_reg(t, new_stack) \ + do { task_pt_regs(t)->esp = (new_stack); } while (0) +#define task_ip_reg(t) task_pt_regs(t)->eip +#define task_ret_reg(t)task_pt_regs(t)->eax + struct microcode_header { unsigned int hdrver; Index: linux/include/asm-i386/unistd.h === --- linux.orig/include/asm-i386/unistd.h +++ linux/include/asm-i386/unistd.h @@ -1,6 +1,8 @@ #ifndef _ASM_I386_UNISTD_H_ #define _ASM_I386_UNISTD_H_ +#include + /* * This file contains the system call numbers. */ @@ -330,6 +332,14 @@ #define NR_syscalls 320 +#ifndef __ASSEMBLY__ + +typedef asmlinkage long (*syscall_fn_t)(long, long, long, long, long, long); + +extern syscall_fn_t sys_call_table[NR_syscalls]; + +#endif + #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR #define __ARCH_WANT_OLD_STAT - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
[patch 08/12] syslets: x86, add move_user_context() method
From: Ingo Molnar <[EMAIL PROTECTED]> add the move_user_context() method to move the user-space context of one kernel thread to another kernel thread. User-space might notice the changed TID, but execution, stack and register contents (general purpose and FPU) are still the same. An architecture must implement this interface before it can turn CONFIG_ASYNC_SUPPORT on. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- arch/i386/kernel/process.c | 21 + include/asm-i386/system.h |7 +++ 2 files changed, 28 insertions(+) Index: linux/arch/i386/kernel/process.c === --- linux.orig/arch/i386/kernel/process.c +++ linux/arch/i386/kernel/process.c @@ -839,6 +839,27 @@ unsigned long get_wchan(struct task_stru } /* + * Move user-space context from one kernel thread to another. + * This includes registers and FPU state. Callers must make + * sure that neither task is running user context at the moment: + */ +void +move_user_context(struct task_struct *new_task, struct task_struct *old_task) +{ + struct pt_regs *old_regs = task_pt_regs(old_task); + struct pt_regs *new_regs = task_pt_regs(new_task); + union i387_union *tmp; + + *new_regs = *old_regs; + /* +* Flip around the FPU state too: +*/ + tmp = new_task->thread.i387; + new_task->thread.i387 = old_task->thread.i387; + old_task->thread.i387 = tmp; +} + +/* * sys_alloc_thread_area: get a yet unused TLS descriptor index. */ static int get_free_idx(void) Index: linux/include/asm-i386/system.h === --- linux.orig/include/asm-i386/system.h +++ linux/include/asm-i386/system.h @@ -33,6 +33,13 @@ extern struct task_struct * FASTCALL(__s "2" (prev), "d" (next)); \ } while (0) +/* + * Move user-space context from one kernel thread to another. + * This includes registers and FPU state for now: + */ +extern void +move_user_context(struct task_struct *new_task, struct task_struct *old_task); + #define _set_base(addr,base) do { unsigned long __pr; \ __asm__ __volatile__ ("movw %%dx,%1\n\t" \ "rorl $16,%%edx\n\t" \ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 04/12] syslets: core code
From: Ingo Molnar <[EMAIL PROTECTED]> the core syslet / async system calls infrastructure code. Is built only if CONFIG_ASYNC_SUPPORT is enabled. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- kernel/Makefile |1 kernel/async.c | 989 2 files changed, 990 insertions(+) Index: linux/kernel/Makefile === --- linux.orig/kernel/Makefile +++ linux/kernel/Makefile @@ -10,6 +10,7 @@ obj-y = sched.o fork.o exec_domain.o kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \ hrtimer.o rwsem.o latency.o nsproxy.o srcu.o +obj-$(CONFIG_ASYNC_SUPPORT) += async.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += time/ obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o Index: linux/kernel/async.c === --- /dev/null +++ linux/kernel/async.c @@ -0,0 +1,989 @@ +/* + * kernel/async.c + * + * The syslet and threadlet subsystem - asynchronous syscall and + * user-space code execution support. + * + * Started by Ingo Molnar: + * + * Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <[EMAIL PROTECTED]> + * + * This file is released under the GPLv2. + * + * This code implements asynchronous syscalls via 'syslets'. + * + * Syslets consist of a set of 'syslet atoms' which are residing + * purely in user-space memory and have no kernel-space resource + * attached to them. These atoms can be linked to each other via + * pointers. Besides the fundamental ability to execute system + * calls, syslet atoms can also implement branches, loops and + * arithmetics. + * + * Thus syslets can be used to build small autonomous programs that + * the kernel can execute purely from kernel-space, without having + * to return to any user-space context. Syslets can be run by any + * unprivileged user-space application - they are executed safely + * by the kernel. + * + * "Threadlets" are the user-space equivalent of syslets: small + * functions of execution that user-space attempts/expects to execute + * without scheduling. If the threadlet nevertheless blocks, the kernel + * creates a real thread from it, and that thread is put aside sleeping. + * The 'head' context (the context that never blocks) returns to the + * original function that called the threadlet. Once the sleeping thread + * wakes up again (after it got for whatever it was waiting - IO, timeout, + * etc.) the function continues executing asynchronously, as a thread. + * A user-space completion ring connects these asynchronous function calls + * back to the head context. + */ +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/* + * An async 'cachemiss context' is either busy, or it is ready. + * If it is ready, the 'head' might switch its user-space context + * to that ready thread anytime - so that if the ex-head blocks, + * one ready thread can become the next head and can continue to + * execute user-space code. + */ +static void +__mark_async_thread_ready(struct async_thread *at, struct async_head *ah) +{ + list_del(>entry); + list_add_tail(>entry, >ready_async_threads); + if (list_empty(>busy_async_threads)) + wake_up(>wait); +} + +static void +mark_async_thread_ready(struct async_thread *at, struct async_head *ah) +{ + spin_lock(>lock); + __mark_async_thread_ready(at, ah); + spin_unlock(>lock); +} + +static void +__mark_async_thread_busy(struct async_thread *at, struct async_head *ah) +{ + list_del(>entry); + list_add_tail(>entry, >busy_async_threads); +} + +static void +mark_async_thread_busy(struct async_thread *at, struct async_head *ah) +{ + spin_lock(>lock); + __mark_async_thread_busy(at, ah); + spin_unlock(>lock); +} + +static void +__async_thread_init(struct task_struct *t, struct async_thread *at, + struct async_head *ah) +{ + INIT_LIST_HEAD(>entry); + at->exit = 0; + at->task = t; + at->ah = ah; + + t->at = at; +} + +static void +async_thread_init(struct task_struct *t, struct async_thread *at, + struct async_head *ah) +{ + spin_lock(>lock); + __async_thread_init(t, at, ah); + __mark_async_thread_ready(at, ah); + spin_unlock(>lock); +} + +static void +async_thread_exit(struct async_thread *at, struct task_struct *t) +{ + struct async_head *ah = at->ah; + + spin_lock(>lock); + list_del_init(>entry); + if (at->exit) + complete(>exit_done); + t->at = NULL; + at->task = NULL; + spin_unlock(>lock); +} + +static struct async_thread * +pick_ready_cachemiss_thread(struct async_head *ah) +{ + struct list_head *head = >ready_async_threads; + + if (list_empty(head)) + return NULL; + + return
[patch 05/12] syslets: core, documentation
From: Ingo Molnar <[EMAIL PROTECTED]> Add Documentation/syslet-design.txt with a high-level description of the syslet concepts. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- Documentation/syslet-design.txt | 137 1 file changed, 137 insertions(+) Index: linux/Documentation/syslet-design.txt === --- /dev/null +++ linux/Documentation/syslet-design.txt @@ -0,0 +1,137 @@ +Syslets / asynchronous system calls +=== + +started by Ingo Molnar <[EMAIL PROTECTED]> + +Goal: +- + +The goal of the syslet subsystem is to allow user-space to execute +arbitrary system calls asynchronously. It does so by allowing user-space +to execute "syslets" which are small scriptlets that the kernel can execute +both securely and asynchronously without having to exit to user-space. + +the core syslet concepts are: + +The Syslet Atom: + + +The syslet atom is a small, fixed-size (44 bytes on 32-bit) piece of +user-space memory, which is the basic unit of execution within the syslet +framework. A syslet represents a single system-call and its arguments. +In addition it also has condition flags attached to it that allows the +construction of larger programs (syslets) from these atoms. + +Arguments to the system call are implemented via pointers to arguments. +This not only increases the flexibility of syslet atoms (multiple syslets +can share the same variable for example), but is also an optimization: +copy_uatom() will only fetch syscall parameters up until the point it +meets the first NULL pointer. 50% of all syscalls have 2 or less +parameters (and 90% of all syscalls have 4 or less parameters). + + [ Note: since the argument array is at the end of the atom, and the + kernel will not touch any argument beyond the first NULL one, atoms + might be packed more tightly. (the only special case exception to + this rule would be SKIP_TO_NEXT_ON_STOP atoms, where the kernel will + jump a full syslet_uatom number of bytes.) ] + +The Syslet: +--- + +A syslet is a program, represented by a graph of syslet atoms. The +syslet atoms are chained to each other either via the atom->next pointer, +or via the SYSLET_SKIP_TO_NEXT_ON_STOP flag. + +Running Syslets: + + +Syslets can be run via the sys_async_exec() system call, which takes +the first atom of the syslet as an argument. The kernel does not need +to be told about the other atoms - it will fetch them on the fly as +execution goes forward. + +A syslet might either be executed 'cached', or it might generate a +'cachemiss'. + +'Cached' syslet execution means that the whole syslet was executed +without blocking. The system-call returns the submitted atom's address +in this case. + +If a syslet blocks while the kernel executes a system-call embedded in +one of its atoms, the kernel will keep working on that syscall in +parallel, but it immediately returns to user-space with a NULL pointer, +so the submitting task can submit other syslets. + +Completion of asynchronous syslets: +--- + +Completion of asynchronous syslets is done via the 'completion ring', +which is a ringbuffer of syslet atom pointers in user-space memory, +provided by user-space as an argument to the sys_async_exec() syscall. +The kernel fills in the ringbuffer starting at index 0, and user-space +must clear out these pointers. Once the kernel reaches the end of +the ring it wraps back to index 0. The kernel will not overwrite +non-NULL pointers (but will return an error), and thus user-space has +to make sure it completes all events it asked for. + +Waiting for completions: + + +Syslet completions can be waited for via the sys_async_wait() +system call - which takes the number of events it should wait for as +a parameter. This system call will also return if the number of +pending events goes down to zero. + +Sample Hello World syslet code: + +---> +/* + * Set up a syslet atom: + */ +static void +init_atom(struct syslet_uatom *atom, int nr, + void *arg_ptr0, void *arg_ptr1, void *arg_ptr2, + void *arg_ptr3, void *arg_ptr4, void *arg_ptr5, + void *ret_ptr, unsigned long flags, struct syslet_uatom *next) +{ + atom->nr = nr; + atom->arg_ptr[0] = arg_ptr0; + atom->arg_ptr[1] = arg_ptr1; + atom->arg_ptr[2] = arg_ptr2; + atom->arg_ptr[3] = arg_ptr3; + atom->arg_ptr[4] = arg_ptr4; + atom->arg_ptr[5] = arg_ptr5; + atom->ret_ptr = ret_ptr; + atom->flags = flags; + atom->next = next; +} + +int main(int argc, char *argv[]) +{ + unsigned long int fd_out = 1; /* standard output */ + char *buf = "Hello Syslet World!\n"; + unsigned long size = strlen(buf); + struct syslet_uatom atom, *done; + +
[patch 02/12] syslets: add syslet.h include file, user API/ABI definitions
From: Ingo Molnar <[EMAIL PROTECTED]> add include/linux/syslet.h which contains the user-space API/ABI declarations. Add the new header to include/linux/Kbuild as well. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- include/linux/Kbuild |1 include/linux/syslet.h | 155 + 2 files changed, 156 insertions(+) Index: linux/include/linux/Kbuild === --- linux.orig/include/linux/Kbuild +++ linux/include/linux/Kbuild @@ -141,6 +141,7 @@ header-y += sockios.h header-y += som.h header-y += sound.h header-y += synclink.h +header-y += syslet.h header-y += telephony.h header-y += termios.h header-y += ticable.h Index: linux/include/linux/syslet.h === --- /dev/null +++ linux/include/linux/syslet.h @@ -0,0 +1,155 @@ +#ifndef _LINUX_SYSLET_H +#define _LINUX_SYSLET_H +/* + * The syslet subsystem - asynchronous syscall execution support. + * + * Started by Ingo Molnar: + * + * Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <[EMAIL PROTECTED]> + * + * User-space API/ABI definitions: + */ + +#ifndef __user +# define __user +#endif + +/* + * This is the 'Syslet Atom' - the basic unit of execution + * within the syslet framework. A syslet always represents + * a single system-call plus its arguments, plus has conditions + * attached to it that allows the construction of larger + * programs from these atoms. User-space variables can be used + * (for example a loop index) via the special sys_umem*() syscalls. + * + * Arguments are implemented via pointers to arguments. This not + * only increases the flexibility of syslet atoms (multiple syslets + * can share the same variable for example), but is also an + * optimization: copy_uatom() will only fetch syscall parameters + * up until the point it meets the first NULL pointer. 50% of all + * syscalls have 2 or less parameters (and 90% of all syscalls have + * 4 or less parameters). + * + * [ Note: since the argument array is at the end of the atom, and the + * kernel will not touch any argument beyond the final NULL one, atoms + * might be packed more tightly. (the only special case exception to + * this rule would be SKIP_TO_NEXT_ON_STOP atoms, where the kernel will + * jump a full syslet_uatom number of bytes.) ] + */ +struct syslet_uatom { + u32 flags; + u32 nr; + u64 ret_ptr; + u64 next; + u64 arg_ptr[6]; + /* +* User-space can put anything in here, kernel will not +* touch it: +*/ + u64 private; +}; + +/* + * Flags to modify/control syslet atom behavior: + */ + +/* + * Immediately queue this syslet asynchronously - do not even + * attempt to execute it synchronously in the user context: + */ +#define SYSLET_ASYNC 0x0001 + +/* + * Never queue this syslet asynchronously - even if synchronous + * execution causes a context-switching: + */ +#define SYSLET_SYNC0x0002 + +/* + * Do not queue the syslet in the completion ring when done. + * + * ( the default is that the final atom of a syslet is queued + * in the completion ring. ) + * + * Some syscalls generate implicit completion events of their + * own. + */ +#define SYSLET_NO_COMPLETE 0x0004 + +/* + * Execution control: conditions upon the return code + * of the just executed syslet atom. 'Stop' means syslet + * execution is stopped and the atom is put into the + * completion ring: + */ +#define SYSLET_STOP_ON_NONZERO 0x0008 +#define SYSLET_STOP_ON_ZERO0x0010 +#define SYSLET_STOP_ON_NEGATIVE0x0020 +#define SYSLET_STOP_ON_NON_POSITIVE0x0040 + +#define SYSLET_STOP_MASK \ + ( SYSLET_STOP_ON_NONZERO | \ + SYSLET_STOP_ON_ZERO | \ + SYSLET_STOP_ON_NEGATIVE | \ + SYSLET_STOP_ON_NON_POSITIVE ) + +/* + * Special modifier to 'stop' handling: instead of stopping the + * execution of the syslet, the linearly next syslet is executed. + * (Normal execution flows along atom->next, and execution stops + * if atom->next is NULL or a stop condition becomes true.) + * + * This is what allows true branches of execution within syslets. + */ +#define SYSLET_SKIP_TO_NEXT_ON_STOP0x0080 + +/* + * This is the (per-user-context) descriptor of the async completion + * ring. This gets passed in to sys_async_exec(): + */ +struct async_head_user { + /* +* Current completion ring index - managed by the kernel: +*/ + u64 kernel_ring_idx; + /* +* User-side ring index: +*/ + u64
[patch 03/12] syslets: generic kernel bits
From: Ingo Molnar <[EMAIL PROTECTED]> add the kernel generic bits - these are present even if !CONFIG_ASYNC_SUPPORT. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- fs/exec.c |4 include/linux/sched.h | 23 ++- kernel/capability.c |3 +++ kernel/exit.c |7 +++ kernel/fork.c |5 + kernel/sched.c|9 + kernel/sys.c | 36 7 files changed, 86 insertions(+), 1 deletion(-) Index: linux/fs/exec.c === --- linux.orig/fs/exec.c +++ linux/fs/exec.c @@ -1444,6 +1444,10 @@ static int coredump_wait(int exit_code) tsk->vfork_done = NULL; complete(vfork_done); } + /* +* Make sure we exit our async context before waiting: +*/ + async_exit(tsk); if (core_waiters) wait_for_completion(_done); Index: linux/include/linux/sched.h === --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -83,12 +83,12 @@ struct sched_param { #include #include #include +#include #include struct exec_domain; struct futex_pi_state; - /* * List of flags we want to share for kernel threads, * if only because they are not used by them anyway. @@ -997,6 +997,12 @@ struct task_struct { /* journalling filesystem info */ void *journal_info; +/* async syscall support: */ + struct async_thread *at, *async_ready; + struct async_head *ah; + struct async_thread __at; + struct async_head __ah; + /* VM state */ struct reclaim_state *reclaim_state; @@ -1055,6 +1061,21 @@ struct task_struct { #endif }; +/* + * Is an async syscall being executed currently? + */ +#ifdef CONFIG_ASYNC_SUPPORT +static inline int async_syscall(struct task_struct *t) +{ + return t->async_ready != NULL; +} +#else /* !CONFIG_ASYNC_SUPPORT */ +static inline int async_syscall(struct task_struct *t) +{ + return 0; +} +#endif /* !CONFIG_ASYNC_SUPPORT */ + static inline pid_t process_group(struct task_struct *tsk) { return tsk->signal->pgrp; Index: linux/kernel/capability.c === --- linux.orig/kernel/capability.c +++ linux/kernel/capability.c @@ -178,6 +178,9 @@ asmlinkage long sys_capset(cap_user_head int ret; pid_t pid; + if (async_syscall(current)) + return -ENOSYS; + if (get_user(version, >version)) return -EFAULT; Index: linux/kernel/exit.c === --- linux.orig/kernel/exit.c +++ linux/kernel/exit.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -890,6 +891,12 @@ fastcall NORET_TYPE void do_exit(long co schedule(); } + /* +* Note: async threads have to exit their context before the MM +* exit (due to the coredumping wait): +*/ + async_exit(tsk); + tsk->flags |= PF_EXITING; if (unlikely(in_atomic())) Index: linux/kernel/fork.c === --- linux.orig/kernel/fork.c +++ linux/kernel/fork.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -1056,6 +1057,7 @@ static struct task_struct *copy_process( p->lock_depth = -1; /* -1 = no lock */ do_posix_clock_monotonic_gettime(>start_time); + async_init(p); p->security = NULL; p->io_context = NULL; p->io_wait = NULL; @@ -1623,6 +1625,9 @@ asmlinkage long sys_unshare(unsigned lon struct uts_namespace *uts, *new_uts = NULL; struct ipc_namespace *ipc, *new_ipc = NULL; + if (async_syscall(current)) + return -ENOSYS; + check_unshare_flags(_flags); /* Return -EINVAL for all unsupported flags */ Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -3455,6 +3456,14 @@ asmlinkage void __sched schedule(void) } profile_hit(SCHED_PROFILING, __builtin_return_address(0)); + prev = current; + if (unlikely(prev->async_ready)) { + if (prev->state && !(preempt_count() & PREEMPT_ACTIVE) && + (!(prev->state & TASK_INTERRUPTIBLE) || + !signal_pending(prev))) + __async_schedule(prev); + } + need_resched: preempt_disable(); prev = current; Index: linux/kernel/sys.c
[patch 01/12] syslets: add async.h include file, kernel-side API definitions
From: Ingo Molnar <[EMAIL PROTECTED]> add include/linux/async.h which contains the kernel-side API declarations. it also provides NOP stubs for the !CONFIG_ASYNC_SUPPORT case. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- include/linux/async.h | 88 ++ 1 file changed, 88 insertions(+) Index: linux/include/linux/async.h === --- /dev/null +++ linux/include/linux/async.h @@ -0,0 +1,88 @@ +#ifndef _LINUX_ASYNC_H +#define _LINUX_ASYNC_H + +#include +#include +#include +#include + +/* + * The syslet subsystem - asynchronous syscall execution support. + * + * Syslet-subsystem internal definitions: + */ + +/* + * The kernel-side copy of a syslet atom - with arguments expanded: + */ +struct syslet_atom { + unsigned long flags; + unsigned long nr; + long __user *ret_ptr; + struct syslet_uatom __user *next; + unsigned long args[6]; + syscall_fn_t*call_table; + unsigned intnr_syscalls; +}; + +/* + * The 'async head' is the thread which has user-space context (ptregs) + * 'below it' - this is the one that can return to user-space: + */ +struct async_head { + spinlock_t lock; + struct task_struct *user_task; + + struct list_headready_async_threads; + struct list_headbusy_async_threads; + + struct mutexcompletion_lock; + longevents_left; + wait_queue_head_t wait; + + struct async_head_user __user *ahu; + + unsigned long __user *new_stackp; + unsigned long new_ip; + unsigned long restore_stack; + unsigned long restore_ip; + struct completion start_done; + struct completion exit_done; +}; + +/* + * The 'async thread' is either a newly created async thread or it is + * an 'ex-head' - it cannot return to user-space and only has kernel + * context. + */ +struct async_thread { + struct task_struct *task; + unsigned long user_stack; + unsigned long user_ip; + struct async_head *ah; + + struct list_headentry; + + unsigned intexit; +}; + +/* + * Generic kernel API definitions: + */ +#ifdef CONFIG_ASYNC_SUPPORT +extern void async_init(struct task_struct *t); +extern void async_exit(struct task_struct *t); +extern void __async_schedule(struct task_struct *t); +#else /* !CONFIG_ASYNC_SUPPORT */ +static inline void async_init(struct task_struct *t) +{ +} +static inline void async_exit(struct task_struct *t) +{ +} +static inline void __async_schedule(struct task_struct *t) +{ +} +#endif /* !CONFIG_ASYNC_SUPPORT */ + +#endif - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 00/12] Syslets, Threadlets, generic AIO support, v5
this is the v5 release of the syslet/threadlet subsystem: http://redhat.com/~mingo/syslet-patches/ this release took 4 days to get out, but there were a couple of key changes that needed some time to settle down: - ported the code from v2.6.20 to current -git (v2.6.20-rc2 should be fine as a base) - 64-bit support in terms of a x86_64 port. Jens has updated the FIO syslet code to work on 64-bit too. (kernel/async.c was pretty 64-bit clean already, it needed minimal changes for basic x86_64 support.) - 32-bit user-space on 64-bit kernel compat support. 32-bit syslet and threadlet binaries work fine on 64-bit kernels. - various cleanups and simplifications the v4->v5 delta is: 17 files changed, 327 insertions(+), 271 deletions(-) amongst the plans for v6 are cleanups/simplifications to the syslet engine API, a number of suggestions have been made for that already. the linecount increase in v5 is mostly due to the x86_64 port. The ABI had to change again - see the async-test userspace code for details. the x86_64 patch is a bit monolithic at the moment, i'll split it up further in v6. As always, comments, suggestions, reports are welcome! Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, 28 Feb 2007, Ingo Molnar wrote: > * Davide Libenzi wrote: > > > Did you hide all the complexity of the userspace atom decoding inside > > another function? :) > > no, i made the 64-bit and 32-bit structures layout-compatible. This > makes the 32-bit structure as large as the 64-bit ones, but that's not a > big issue, compared to the simplifications it brings. Do you have a new version to review? > > > But i'm happy to change the syslet API in any sane way, and did so > > > based on feedback from Jens who is actually using them. > > > > Wouldn't you agree on a simple/parallel execution engine [...] > > the thing is, there's almost zero overhead from having those basic > things like conditions and the ->next link, and they make it so much > more capable. As usual my biggest problem is that you are not trying to > use syslets at all - you are only trying to get rid of them ;-) My > purpose with syslets is to enable a syslet to do almost anything that > user-space could do too, as simply as possible. Syslets could even > allocate user-space memory and then use it (i dont think we actually > want to do that though). That doesnt mean arbitrary complex code > /should/ be done via syslets, or that it wont be significantly slower > than what user-space can do, but i'd not like to artificially dumb the > engine down. I'm totally willing to simplify/shrink the vectoring of > arguments and just about anything else, but your proposals so far (such > as your return-value-embedded-in-atom suggestion) all kill important > aspects of the engine. Ok, we're past the error code in the atom, as Linus pointed out ;) How about this, with async_wait returning asynid's back to a userspace ring buffer? struct syslet_utaom { long *result; unsigned long asynid; unsigned long nr_sysc; unsigned long params[8]; }; My problem with the syslets in their current form is, do we have a real use for them that justify the extra complexity inside the kernel? Or with a simple/parellel async submission, coupled with threadlets, we can cover a pretty broad range of real life use cases? - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fully honor vdso_enabled [i386, sh; x86_64?]
On Wed, 28 Feb 2007 18:11:11 +0900 Paul Mundt <[EMAIL PROTECTED]> wrote: > On Thu, Feb 22, 2007 at 12:31:20PM -0800, John Reiser wrote: > > This patch changes arch_setup_additonal_pages() to honor vdso_enabled. > > For i386 it also allows the option of a fixed addresss to avoid > > fragmenting the address space. Compiles and runs on i386. > > x86_64 [IA32 support] and sh maintainers also please comment. > > > We didn't actually have the sysctl entry wired up on SH, but once that's > done, this patch works fine there too. > > Andrew, do you want a separate patch for the vdso_enabled sysctl or > is it more convenient through my git tree? > If it's an sh-only thing then through your tree is fine, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86_64: shut up vm86(2)
>From originally rate-limited printk, to just printk, to current version. Everybody had enough time to learn about vm86(2) absense. Also remove possibility of dmesg spamming. Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> --- arch/x86_64/ia32/ia32entry.S |4 ++-- arch/x86_64/ia32/sys_ia32.c | 12 2 files changed, 2 insertions(+), 14 deletions(-) --- a/arch/x86_64/ia32/ia32entry.S +++ b/arch/x86_64/ia32/ia32entry.S @@ -512,7 +512,7 @@ #endif .quad stub32_iopl /* 110 */ .quad sys_vhangup .quad quiet_ni_syscall /* old "idle" system call */ - .quad sys32_vm86_warning/* vm86old */ + .quad quiet_ni_syscall /* vm86old */ .quad compat_sys_wait4 .quad sys_swapoff /* 115 */ .quad compat_sys_sysinfo @@ -565,7 +565,7 @@ #endif .quad sys_mremap .quad sys_setresuid16 .quad sys_getresuid16 /* 165 */ - .quad sys32_vm86_warning/* vm86 */ + .quad quiet_ni_syscall /* vm86 */ .quad quiet_ni_syscall /* query_module */ .quad sys_poll .quad compat_sys_nfsservctl --- a/arch/x86_64/ia32/sys_ia32.c +++ b/arch/x86_64/ia32/sys_ia32.c @@ -842,18 +842,6 @@ long sys32_fadvise64_64(int fd, __u32 of advice); } -long sys32_vm86_warning(void) -{ - struct task_struct *me = current; - static char lastcomm[sizeof(me->comm)]; - if (strncmp(lastcomm, me->comm, sizeof(lastcomm))) { - compat_printk(KERN_INFO "%s: vm86 mode not supported on 64 bit kernel\n", - me->comm); - strncpy(lastcomm, me->comm, sizeof(lastcomm)); - } - return -ENOSYS; -} - long sys32_lookup_dcookie(u32 addr_low, u32 addr_high, char __user * buf, size_t len) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc1: known regressions (v2) (part 1)
>Quoting Thomas Gleixner <[EMAIL PROTECTED]>: >Subject: Re: 2.6.21-rc1: known regressions (v2) (part 1) > >On Wed, 2007-02-28 at 23:13 +0200, Michael S. Tsirkin wrote: >> >Subject: ThinkPad T60: no screen after suspend to RAM >> >References : http://lkml.org/lkml/2007/2/22/391 >> >Submitter : Michael S. Tsirkin <[EMAIL PROTECTED]> >> >Handled-By : Ingo Molnar <[EMAIL PROTECTED]> >> >Status : unknown >> >> Just reproduced this in -rc2. >> Another thing I noticed: >> with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to >> RAM. >> >> On 2.6.21-rc2, after resume (when the box is accessible from network), >> pressing Fn/F4 again does not seem to have any effect. > >Can you please get the dmesg output after resume via the network ? The link above has it. -- MST - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata_sil problems with recent kernels
On Tue, 2007-02-27 at 13:54 -0500, Dale Blount wrote: > On Fri, 2007-02-23 at 12:00 -0500, Dale Blount wrote: > > Hi, > > > > Excuse me if this has been covered or fixed, I couldn't find anything in > > the archives. > > > > I upgraded from 2.6.11.7 to 2.6.20.1 today and found all the drives > > connected to 2 brands of sata_sil sata controllers not working. The > > drives are also (now) of various brands, Maxtor 300GB and 500GB > > Seagates. For the archives, the fix is documented here: http://article.gmane.org/gmane.linux.ide/16304 Dale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?
Hi! > > Well I had an idea after looking at k8temp -- why not make it default to > > doing only reads from the sensor? You'd only get information from whatever > > core/sensor combination that ACPI had last used, but it would be safe. > > ACPI is broken here, not k8temp, so let's fix ACPI instead. ACPI > doesn't conflict with only k8temp, but with virtually all hardware > monitoring drivers, all I2C/SMBus drivers, and probably other types of > drivers too. We just can't restrict or blacklist all these drivers > because ACPI misbehaves. Oops, sorry about that but no, that will not work. There's piece of paper, called ACPI specification, and we are following it. Bug is not in our implementation. Bug is in the ACPI specs... it does not explicitely allow you to go out and bitbang i2c, and you do it, and you get problems. Now, you may try to change specs to be hwmon-friendly... good luck. But currently hw manufacturers follow ACPI specs, so we have to follow it, too; bad luck for hwmon. BIOS hiding smbus from you is good hint you are doing something wrong...? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] [patch 00/21] 2.6.19-stable review
On Wed, Feb 28, 2007 at 05:28:27AM -0700, Eric W. Biederman wrote: > > What are the rules that are supposed to govern backports to stable > trees these days anyway? Documentation/stable_kernel_rules.txt thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fbdev driver for S3 Trio/Virge, updated
On Wed, 2007-02-28 at 16:53 +, James Simmons wrote: > > On Thu, 2007-02-22 at 00:53 +, James Simmons wrote: > > > > > +/* image data is MSB-first, fb structure is MSB-first too */ > > > > > +static inline u32 expand_color(u32 c) > > > > > +{ > > > > > + return ((c & 1) | ((c & 2) << 7) | ((c & 4) << 14) | ((c & 8) > > > > > << 21)) * 0xFF; > > > > > +} > > > > > + > > > > > +/* s3fb_iplan_imageblit silently assumes that almost everything is > > > > > 8-pixel aligned */ > > > > > > > > Hmn, same thing with vga16fb... Perhaps we should bring back the > > > > fontwidth flag of 2.2 and 2.4 kernels. > > > > > > Ug no. It is possible to get 12,6 bit width fonts working with vga > > > interleaved planes. I got it paritally working but never got back to it. > > > Its in my queue of this to do. Now that I finished the display class I > > > need to get around to makeing drm/fbdev work together :-) > > > > > > > Of course, not fontwidth exactly, but to allow the driver to specify the > > alignment of the blit engine, in this case 8 pixels. I do believe X also > > has similar functionality to compensate for the limitation of the > > hardware. > > Isn't scan_align in pixmap for this? Or do we need more. No, scan_align is how much to pad each line, and it's up to the engine to discard the padding. In this case, the hardware does not allow padding and must be given data in exact multiples. For example, vesafb can accept 4x4 fonts padded to 8x4, but vga16fb will not be able to draw 4x4 fonts properly. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Davide Libenzi wrote: > On Wed, 28 Feb 2007, Ingo Molnar wrote: > > > > > * Davide Libenzi wrote: > > > > > My point is, the syslet infrastructure is expensive for the kernel in > > > terms of compat, [...] > > > > it is not. Today i've implemented 64-bit syslets on x86_64 and > > 32-bit-on-64-bit compat syslets. Both the 64-bit and the 32-bit syslet > > (and threadlet) binaries work just fine on a 64-bit kernel, and they > > share 99% of the infrastructure. There's only a single #ifdef > > CONFIG_COMPAT in kernel/async.c: > > > > #ifdef CONFIG_COMPAT > > > > asmlinkage struct syslet_uatom __user * > > compat_sys_async_exec(struct syslet_uatom __user *uatom, > > struct async_head_user __user *ahu) > > { > > return __sys_async_exec(uatom, ahu, _sys_call_table, > > compat_NR_syscalls); > > } > > > > #endif > > Did you hide all the complexity of the userspace atom decoding inside > another function? :) no, i made the 64-bit and 32-bit structures layout-compatible. This makes the 32-bit structure as large as the 64-bit ones, but that's not a big issue, compared to the simplifications it brings. > > But i'm happy to change the syslet API in any sane way, and did so > > based on feedback from Jens who is actually using them. > > Wouldn't you agree on a simple/parallel execution engine [...] the thing is, there's almost zero overhead from having those basic things like conditions and the ->next link, and they make it so much more capable. As usual my biggest problem is that you are not trying to use syslets at all - you are only trying to get rid of them ;-) My purpose with syslets is to enable a syslet to do almost anything that user-space could do too, as simply as possible. Syslets could even allocate user-space memory and then use it (i dont think we actually want to do that though). That doesnt mean arbitrary complex code /should/ be done via syslets, or that it wont be significantly slower than what user-space can do, but i'd not like to artificially dumb the engine down. I'm totally willing to simplify/shrink the vectoring of arguments and just about anything else, but your proposals so far (such as your return-value-embedded-in-atom suggestion) all kill important aspects of the engine. All the existing syslet features were purpose-driven: i actually coded up a sample syslet, trying to do something that makes sense, and added these features based on that. The engine core takes up maybe 50 lines of code. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fix implicit declaration in nv_backlight.
On Wed, Feb 28, 2007 at 10:13:24PM +0100, Michael Hanselmann wrote: > On Wed, Feb 28, 2007 at 12:36:25PM -0500, Dave Jones wrote: > > +#ifdef __powerpc__ > > Is __powerpc__ defined when cross compiling? I'd rather use > CONFIG_PMAC_BACKLIGHT instead of it. Sounds ok to me. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: debug registers and fork
It is true that debug registers are inherited by fork and clone. I am 99% sure that this was never specifically intended, but it has been this way for a long time (since 2.4 at least). It's an implicit consequence of the do_fork implementation style, which does a blind copy of the whole task_struct and then explicitly reinitializes some individual fields. I suppose this has some benefit or other, but it is very prone to new pieces of state getting implicitly copied without the person adding that new state ever consciously deciding what its inheritance semantics should be. Alan Stern is working on a revamp of the x86 debug register support. This is a fine opportunity to clean this area up and decide positively what the semantics ought to be. When his stuff gets ported to other machines, that will be a natural way to make the analogous stuff coherent and sensible on all machines that have debug-feature CPU state. AFAIK, gdb expects this behavior but not in the positive sense. Rather, it finds the kernel's semantics here unhelpful, and has to work around them. If it has watchpoints on a thread that might fork, it has to catch the child just to clear the debug registers even if it never really wanted to be tracing that child. Otherwise, the fork/clone child that was never ptrace'd at all (and its children!) might get a spurious SIGTRAP later and dump core for no apparent reason; at least exec does clear the debug registers (flush_thread). Since the debugger interface is the only way to set the debug registers, this kernel behavior seems rather insane on the face of it. OTOH, there is always the argument to leave existing behavior as it is for compatibility's sake. (I won't be shocked to find some loony application that uses ptrace on its own threads to set debug registers with the expectation of running a SIGTRAP handler; such things have been seen out there, though we no longer allow exactly that with NPTL threads.) I'm pretty sure gdb won't mind if the inheritance goes away, though we should check with gdb people to be sure before changing any semantics. Personally, I don't care whether the semantics of fork when the debug registers were previously set by ptrace change. Existing applications already have to cope with the lossage to work now, and won't be able to go without those workarounds later anyway if they want to support older kernels. With Alan's stuff, particular facilities cooperate coherently on maintaining this thread state, and inheritance semantics for each particular use will be specified explicitly how that use wants it. Eventually I think all "raw" use of the debug registers (as by the current ptrace interfaces) will be obsolete anyway. It is true that %dr7 is not cleared when switching to a task where it's logically 0, but that is intentional and not a problem AFAIK. The trap handler (arch/{i386,x86_64}/kernel/traps.c:do_debug) first checks if %dr7 is logically 0 in the current task, and if so it swallows the trap and clears %dr7 in hardware. This also has been this way for a very long time. I assume that whenever it was first implemented, someone found reason to think that clearing %dr7 was more costly overall than the possibility of a spurious trap (relatively quite unlikely compared to 100% of context switches). (I have no idea what the overhead is on current or older hardware.) I have no reason to think there is anything wrong with how this behaves. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc1: known regressions (v2) (part 1)
On Wed, 2007-02-28 at 23:13 +0200, Michael S. Tsirkin wrote: > >Subject: ThinkPad T60: no screen after suspend to RAM > >References : http://lkml.org/lkml/2007/2/22/391 > >Submitter : Michael S. Tsirkin <[EMAIL PROTECTED]> > >Handled-By : Ingo Molnar <[EMAIL PROTECTED]> > >Status : unknown > > Just reproduced this in -rc2. > Another thing I noticed: > with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to > RAM. > > On 2.6.21-rc2, after resume (when the box is accessible from network), > pressing Fn/F4 again does not seem to have any effect. Can you please get the dmesg output after resume via the network ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pata_sil680 suspend/resume
On Mon, 26 Feb 2007, Guennadi Liakhovetski wrote: > With a post 2.6.20 kernel from powerpc.git I cannot suspend at all: > > pata_sil680 :00:0c.0: suspend > ata1: suspend failed, device 0 still active > pci_device_suspend(): ata_pci_device_suspend+0x0/0x74() returns -16 > suspend_device(): pci_device_suspend+0x0/0xac() returns -16 > Could not suspend device :00:0c.0: error -16 AFAICS, "still active" is printed from ata_host_suspend() if a device (disk) on the host to be suspended doesn't have ATA_DFLAG_SUSPENDED flag set. This flag is only set in ata_eh_suspend(), which is only called from ata_eh_recover(), like this: generic_error_handler() ata_bmdma_drive_eh() ata_do_eh() ata_eh_recover() ata_eh_suspend() dev->flags |= ATA_DFLAG_SUSPENDED; but I don't understand why the error handler should be envoked? Should the "disk" be suspended before the host and is it when the eh should set the flag? If my guess is right - why doesn't the disk get suspended on my machine? Shall I suspend it explicitely from userspace? I do "hdparm -Y", and it does stop spinning", but I still get the error. Thanks Guennadi --- Guennadi Liakhovetski - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with freezable workqueues
Hi! > > OK, thanks. > > > > We can (I think) do pretty much the same with some additional complications > > in worker_thread() (check !cpu_online() after try_to_freeze() and break). > > Okay, but I've just finished the patch that removes the freezability of > workqueues (appended), so can we please do this in a separate one? Hmm, nothing obviously wrong with the patch (ACK), but xfs people should ack this one, too: 'is it okay to let xfs run while suspending' is not a trivial question. > Since freezable workqueues are broken in 2.6.21-rc > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) > it's better to remove them altogether for 2.6.21 and change the only user of > them (XFS) accordingly. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fix implicit declaration in nv_backlight.
On Wed, Feb 28, 2007 at 12:36:25PM -0500, Dave Jones wrote: > +#ifdef __powerpc__ Is __powerpc__ defined when cross compiling? I'd rather use CONFIG_PMAC_BACKLIGHT instead of it. Greets, Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc1: known regressions (v2) (part 1)
>Subject: ThinkPad T60: no screen after suspend to RAM >References : http://lkml.org/lkml/2007/2/22/391 >Submitter : Michael S. Tsirkin <[EMAIL PROTECTED]> >Handled-By : Ingo Molnar <[EMAIL PROTECTED]> >Status : unknown Just reproduced this in -rc2. Another thing I noticed: with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to RAM. On 2.6.21-rc2, after resume (when the box is accessible from network), pressing Fn/F4 again does not seem to have any effect. -- MST - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, 28 Feb 2007, Ingo Molnar wrote: > > * Davide Libenzi wrote: > > > My point is, the syslet infrastructure is expensive for the kernel in > > terms of compat, [...] > > it is not. Today i've implemented 64-bit syslets on x86_64 and > 32-bit-on-64-bit compat syslets. Both the 64-bit and the 32-bit syslet > (and threadlet) binaries work just fine on a 64-bit kernel, and they > share 99% of the infrastructure. There's only a single #ifdef > CONFIG_COMPAT in kernel/async.c: > > #ifdef CONFIG_COMPAT > > asmlinkage struct syslet_uatom __user * > compat_sys_async_exec(struct syslet_uatom __user *uatom, > struct async_head_user __user *ahu) > { > return __sys_async_exec(uatom, ahu, _sys_call_table, > compat_NR_syscalls); > } > > #endif Did you hide all the complexity of the userspace atom decoding inside another function? :) How much code would go away, in case we pick a simple/parallel sys_async_exec engine? Atoms decoding, special userspace variable access for loops, jumps/cond/... VM engine. > Even mixed-mode syslets should work (although i havent specifically > tested them), where the head switches between 64-bit and 32-bit mode and > submits syslets from both 64-bit and from 32-bit mode, and at the same > time there might be both 64-bit and 32-bit syslets 'in flight'. > > But i'm happy to change the syslet API in any sane way, and did so based > on feedback from Jens who is actually using them. Wouldn't you agree on a simple/parallel execution engine like me and Linus are proposing (and threadlets, of course)? - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/22] update ctime and mtime for mmaped write
Miklos Szeredi wrote: What happens if the application overwrites what it had written some time later? Nothing. The page is already read-write, the pte dirty, so even though the file was clearly modified, there's absolutely no way in which this can be used to force an update to the timestamp. Which, I realize now, actually means, that the patch is wrong. Msync will have to write protect the page table entries, so that later dirtyings may have an effect on the timestamp. I thought that PeterZ's changes were to write-protect the page after cleaning it so that future modifications could be detected and tracked accordingly? Does the right thing not happen already? Thanx... ps - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: struct page field arrangement
On Wed, 28 Feb 2007, Jan Beulich wrote: > A change early last year reordered struct page so that ptl overlaps not only > private, but also mapping. Since spinlock_t can be much larger, I'm wondering > whether there's a reason to not also overlay the space index and lru take - > are these used for anything on page table pages? Overlaying lru is a problem for for those architectures which use kmem_cache_alloc for their pagetables: arm26, powerpc, sparc64 and perhaps others (I just grepped quickly through include/asm*, didn't follow up those who have extern functions): since slab reuses the lru fields for its own purposes. Could perhaps be stacked somehow. Overlaying index is fairly straightforward: the index field is fair game. In my original patches I did overlay index, but Andrew was strongly averse to the way I was doing it, and scaled things back, to private alone if I remember rightly, then relaxed a little to include mapping too. Way back then I made up a patch to overlay index too (when I saw Fedora going out with CONFIG_DEBUG_SPINLOCK), but I could never get it into a form where I felt it would satisfy Andrew; and grew increasingly dissatisfied with that approach myself. I don't think further overlaying is the right answer really. But I do think it's a scandal that the size of struct page (in a DEBUG_SPINLOCK system) is governed by such a minority use of the struct page. Lacking a satisfying answer, I've just let it drift on until someone notices and complains. kmalloc a separate spinlock structure when it's too big to fit in? Not such a good idea, since then there will tend to be false sharing of cachelines between them: simpler just to disable SPLIT_PTLOCK in that case. I'm not happy with the status quo, but I don't know the right answer: perhaps allow pagetable pages to use an undebugged spinlock variant? Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] worker_thread: don't play with signals
worker_thread() doesn't need to "Block and flush all signals", this was already done by its caller, kthread(). Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> --- 6.20-rc6-mm3/kernel/workqueue.c~signals 2007-02-20 02:21:11.0 +0300 +++ 6.20-rc6-mm3/kernel/workqueue.c 2007-02-28 23:58:11.0 +0300 @@ -290,18 +290,11 @@ static int worker_thread(void *__cwq) struct cpu_workqueue_struct *cwq = __cwq; DEFINE_WAIT(wait); struct k_sigaction sa; - sigset_t blocked; if (!cwq->wq->freezeable) current->flags |= PF_NOFREEZE; set_user_nice(current, -5); - - /* Block and flush all signals */ - sigfillset(); - sigprocmask(SIG_BLOCK, , NULL); - flush_signals(current); - /* * We inherited MPOL_INTERLEAVE from the booting kernel. * Set MPOL_DEFAULT to insure node local allocations. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 04/26] Xen-paravirt_ops: Add pagetable accessors to pack and unpack pagetable entries
On Wed, 2007-02-28 at 09:32 +0100, Ingo Molnar wrote: > * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > > > >> +#ifdef CONFIG_PARAVIRT > > >> +/* After pte_t, etc, have been defined */ > > >> +#include > > >> +#endif > > >> > > > > > > hm - there's already a CONFIG_PARAVIRT conditional in > > > asm-i386/paravirt.h. > > > > Yes, but it happens after asm/paravirt.h has already included some > > things, and it ends up causing problems. paravirt.h still defines > > various stub functions in the !CONFIG_PARAVIRT case, so it needs to do > > the includes either way. > > hm, it then needs to be fixed first, instead of adding to the mess. Yes, originally paravirt.h didn't define anything if !CONFIG_PARAVIRT for this reason: getting it tied into the other headers correctly is a PITA. Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/22] update ctime and mtime for mmaped write
> What happens if the application overwrites what it had written some > time later? Nothing. The page is already read-write, the pte dirty, > so even though the file was clearly modified, there's absolutely no > way in which this can be used to force an update to the timestamp. Which, I realize now, actually means, that the patch is wrong. Msync will have to write protect the page table entries, so that later dirtyings may have an effect on the timestamp. Thanks, Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
On Feb 28, 2007, at 2:43 PM, Timur Tabi wrote: What about major number 205? It also has the screwed-up /dev/ ttyCPM entries, but it has more room, and the CPM driver doesn't actually use it. At least, I can't see where it uses it. Please, let's just leave the four we have and let the driver just allocate increasing minor numbers. If anyone has a product with more than 4 UARTs, they will have to figure out what to do with the additional minors. We are making a very complicated problem out of nothing. This hasn't caused any problems in the past, and changing the existing names and minors will cause problems for everyone today. Just leave it alone, fix up the documentation, and have the driver print some warning if it allocates more than 4 UARTs. Thanks. -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] jffs2: Allow selection of compression mode via a sysfs attribute
On Wed, 2007-02-28 at 21:39 +0200, Artem Bityutskiy wrote: > On Wed, 2007-02-28 at 19:13 +, Richard Purdie wrote: > > +/* gives us jffs2_subsys */ > > +static decl_subsys(jffs2, NULL, NULL); > > There is actually a file-system subsys - look up for fs_subsys. It is > declared at fs/namespace.c. Further down the patch you'll see: + kset_set_kset_s(_subsys, fs_subsys); There was a reason for doing that instead using fs_subsys in the above although I can't remember why offhand. I did try it and it didn't work as expected... Regards, Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Resume from S2R fails after dpm_resume()
Gentlemen, I instrumented 2.6.21-rc1 base/power/resume.c device_resume() with TRACE_RESUME(0) as the last statement in the function. Sure enough it was the last hash value in the RTC after a hard reboot when resume failed: [ 12.028820] hash matches drivers/base/power/resume.c:104 The machine appears to be absolutely wedged after initiating resume by pressing the power button. The disk flashes for a half second or so, then thats it. It is a Dell XPS, BIOS rev A04. I'm using 'echo 1 > /sys/power/pm_trace; echo mem > /sys/power/state' to initiate the S2R sequence. Any suggestions on where to go from here? rtg -- Tim Gardner [EMAIL PROTECTED] www.tpi.com OR 503-601-0234 x102 MT 406-443-5357 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fix mv643xx_eth compilation.
Commit 908b637fe793165b6aecdc875cdca67c4959a1ad removed ETH_DMA_ALIGN but missed a usage of it in a macro, which broke the build. Signed-off-by: Dave Jones <[EMAIL PROTECTED]> diff --git a/drivers/net/mv643xx_eth.h b/drivers/net/mv643xx_eth.h index 7cb0a41..7d4e90c 100644 --- a/drivers/net/mv643xx_eth.h +++ b/drivers/net/mv643xx_eth.h @@ -9,6 +9,8 @@ #include +#include + /* Checksum offload for Tx works for most packets, but * fails if previous packet sent did not use hw csum */ @@ -47,7 +49,7 @@ #define ETH_HW_IP_ALIGN2 /* hw aligns IP header */ #define ETH_WRAPPER_LEN(ETH_HW_IP_ALIGN + ETH_HLEN + \ ETH_VLAN_HLEN + ETH_FCS_LEN) -#define ETH_RX_SKB_SIZE(dev->mtu + ETH_WRAPPER_LEN + ETH_DMA_ALIGN) +#define ETH_RX_SKB_SIZE(dev->mtu + ETH_WRAPPER_LEN + dma_get_cache_alignment()) #define ETH_RX_QUEUES_ENABLED (1 << 0)/* use only Q0 for receive */ #define ETH_TX_QUEUES_ENABLED (1 << 0)/* use only Q0 for transmit */ -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wanted: simple, safe x86 stack overflow detection
On Wed, Feb 28, 2007 at 09:27:09AM -0500, Chuck Ebbert wrote: > Can we just put a canary in the threadinfo and check it on every > task switch? What are the drawbacks? Likely already too late then -- if critical state is overwritten you crashed before. Also a lot of stack intensive codes relatively large unused holes so it might miss the canary completely Anyways if you want a crash on context switch in the non hole case you can probably get it by just rearranging thread_info a bit. e.g. put preempt_count first. Any corruption of that will lead to schedule complaining. Don't think it is worth it though. I suppose one could have a CONFIG_DEBUG_STACK_OVERFLOW that gets the stacks from vmalloc which would catch any overflow with its guard pages. This is you would need to change __pa() to handle that too because there might be still some drivers that do DMA on stack addresses. Would be somewhat ugly but doable. But I have my doubts it is worth it again -- in my experience static analysis works well enough to trace them down and there are not that many anyways. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] usbfs2: Why asynchronous I/O?
On Monday 26 February 2007 12:54 am, Sarah Bailey wrote: > On Sun, Feb 25, 2007 at 08:53:03AM -0800, David Brownell wrote: > > On Sunday 25 February 2007 12:57 am, Sarah Bailey wrote: > > > I haven't seen any evidence that the kernel-side aio is substantially > > > more efficient than the GNU libc implementation, > > > > Face it: spawning a new thread is fundamentally not as lightweight > > as just submitting an aiocb plus an URB. And spawning ten threads > > costs a *LOT* more than submitting ten aiocbs and URBs. (Count just > > the 4KB stacks associated with each thread, vs memory consumed by > > using AIO ... and remember to count the scheduling overheads.) > > Yes, spawning a new thread is costly. However, if someone writes their > own thread-based program and allocates the threads from a pool, that > argument is irrelevant. I don't see how that would follow from that assumption. But even if it did, the assumption isn't necessarily valid. People who can write threaded programs are the minority; people who can write correct ones are even more rare! We all hope that changes. It's been hoped for at least a decade now. Maybe in another decade or two, such skills can safely be assumed. > Even with fibrils, you have a stack and > scheduling overhead. With kernel AIO, you have also have some memory > overhead, and you also have context switch overhead when you call > kick_iocb or aio_complete. > > Can someone point me at hard evidence one way or another? (stack_size + other_thread_costs + urb_size) > (aoicb_size + urb_size) There was recent discussion on another LKML thread pointing out how an event-driven server ran at basically 100% of hardware capacity, where a thread-one ran at 60%. (That was, as I infer by skimming archives of that threadlet discussion, intended to be a fair comparison...) > > > so it seems like it would be better to leave the complexity in > > > userspace. > > > > Thing is, the kernel *already* has URBs. And the concept behind them > > maps one-to-one onto AIOCBs. All the kernel needs to expose is: > > mechanisms to submit (and probably cancel) async requests, then collect > > the responses when they're done. > > It seems to me that you're arguing that URBs and AIOCBs go together on > the basis that they are both asynchronous and both have some sort of > completion function. Just because two things are alike doesn't mean > that it's better to use them together. I pointed out that any other approach must accordingly add overhead. One of the basic rules of thumb in system design is to avoid such needless additions. > > You're right that associating a thread with an URB is complexity. > > That's not what I said. No ... but you *were* avoiding that consequence what you did say, though. > > I can't much help application writers that don't bother to read the > > relevant documentation (after it's been pointed out to them). > > Where is this documentation? There's a man page on io_submit, etc., but > how would an application writer know to look for it? How did *you* know to look for it? How did *I* know to look for it? ISTR asking Google, and finding that "libaio" is how to get access to the Linux kernel AIO facility. Very quickly. I didn't even need to make the mistake of trying to use POSIX calls then finding they don't work ... > > The gap between POSIX AIO and kernel AIO has been an ongoing problem. This > > syslet/fibril/yadda-yadda stuff is just the latest spin. > > Do you think that fibrils will replace the kernel AIO? Still under discussion, but I hope not. But remember two different things are being called AIO -- while in my book, only one of them is really AIO. - The AIO VFS interface ... which is mostly ok, though the retry stuff is wierd as well as misnamed, and the POSIX hookery should also be improved. (Those POSIX APIs omit key functionality, like collecting multiple results with one call, and are technically inferior. Usually that's so that vendors can claim conformance without kernel updates. It could also be that the functionality is "optional", and so not part of what I find in my systems's libc.) - Filesystem hookery and direct-io linkage ... which has been trouble, and I suspect was never the right design. The filesystem stacks in Linux were designed around thread based synch, so trying to slide an event model around the edges was probably never a good idea. I see fibrils/threadlets/syslets/etc as a better approach to that hookery; something like EXT4 over a RAID is less likely to break if that complex code is not forced to restructure itself into an event model. But for things that are already using event models ... the current AIO is a better fit. And maybe getting all that other stuff out of the mix will finally let some of the "real I/O, not disks" AIO issues get fixed. All of the "bad" things I've heard about AIO in Linux boil down to either (a) criticisms about direct-IO and that
Re: Problem with freezable workqueues
On Wed, 2007-02-28 at 12:14 +1100, Nigel Cunningham wrote: > Controversy is no reason to give in! Nevertheless, I think you're right > - I believe the XFS guys said they fixed the issue that had caused I/O > to be submitted post-freeze. Well, we'll see if it appears again, won't > we? I get to be the guinea pig, right? :P Unfortunately I was sick for the better part of the past few days and can only test all this stuff early next week. johannes signature.asc Description: This is a digitally signed message part
Re: [patch 01/22] update ctime and mtime for mmaped write
> >> While these entry points do not actually modify the file itself, > >> as was pointed out, they are handy points at which the kernel gains > >> control and could actually notice that the contents of the file are > >> no longer the same as they were, ie. modified. > >> > >> From the operating system viewpoint, this is where the semantics of > >> modification to file contents via mmap differs from the semantics of > >> modification to file contents via write(2). > >> > >> It is desirable for the file times to be updated as quickly as > >> possible after the actual modification has occurred. > >> > > > > I disagree. > > > > You don't worry about the timestamp being updated _during_ a large > > write() call, even though the file is constantly being modified. > > > > > > No, but you do worry about the timestamps being updated after > every write() call, no matter how large or small. Right. All I'm saying is that just writing to a shared mapping without calling msync() is similar to a write() which hasn't yet finished. In both cases, you have a modified file, without a modified timestamp. > > You think of write() as something instantaneous, while you think of > > writing to a shared mapping, then doing msync() as something taking a > > long time. In actual fact both of these are basically equivalent > > operations, the differences being, that you can easily modify > > non-contiguous parts of a file with mmap, while you can't do that with > > write. The disadvantage from mmap comes from the cost of setting up > > the page tables and handling the faults. > > > > Think of it this way: > > > > shared mmap write + msync(MS_ASYNC) == write() > > msync(MS_ASYNC) + fsync() == msync(MS_SYNC) > > > > > > I don't believe that this is a valid characterization because the > changes to the contents of the file, made through the mmap'd region, > are immediately visible to any and all other applications accessing > the file. Since the contents of the file are changing, then so > should the timestamps to reflect this. Same case with a large write(). Nothing prevents you from reading a file, while a huge write is taking place to it, and yet, the modification time isn't updated. > I think that we are going to have to agree to disagree because > I don't agree either with your characterizations of the desirable > semantics associated with shared mmap or that maintaining the > correctness in the system is a waste of CPU. I didn't quite say _that_ in so many words :). I said that updating the timestamp on a per-page first dirtying base, or per-inode first dirtying base is a waste of effort. Why? What happens if the application overwrites what it had written some time later? Nothing. The page is already read-write, the pte dirty, so even though the file was clearly modified, there's absolutely no way in which this can be used to force an update to the timestamp. Is there anything special about the _first_ modification? I don't think so. From an external application's point of view it doesn't matter one whit, whether a modification was through write() or after a page-fault, or on an already present read-write page. So what exactly _are_ the semantics we are trying to achieve? > I view mmap as a way for an application to treat the contents of a > file as another segment in its address space. This allows it to > manipulate the contents of a file without incurring the overhead of > the read and write system calls and the double buffering that > naturally occurs with those system calls. I think that: > > char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > *p = 1; > *(p + 4096) = 2; > > should have the same effect as: > > char c = 1; > pwrite(fd, , 1, 0); > c = 2; > pwrite(fd, , 1, 4096); Not necessarily. This is the equivalent _portable_ call sequence: char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); *p = 1; *(p + 4096) = 2; msync(p, 4097, MS_ASYNC); Yes, on linux the prior would work too, but there's really no point in allowing applications to be lax and not do it properly. But we've been over this. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with freezable workqueues
On 02/28, Rafael J. Wysocki wrote: > > Okay, but I've just finished the patch that removes the freezability of > workqueues (appended), so can we please do this in a separate one? Please, please, no. This patch is of course correct, but it breaks _a lot_ of patches in -mm tree. May I ask you to send just > === > --- linux-2.6.21-rc2.orig/fs/xfs/linux-2.6/xfs_buf.c > +++ linux-2.6.21-rc2/fs/xfs/linux-2.6/xfs_buf.c > @@ -1829,11 +1829,11 @@ xfs_buf_init(void) > if (!xfs_buf_zone) > goto out_free_trace_buf; > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); > + xfslogd_workqueue = create_workqueue("xfslogd"); > if (!xfslogd_workqueue) > goto out_free_buf_zone; > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); > + xfsdatad_workqueue = create_workqueue("xfsdatad"); > if (!xfsdatad_workqueue) > goto out_destroy_xfslogd_workqueue; > > this bit? After that, we can do the "removes the freezability of workqueues" patch against -mm tree. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Soft lockup on shutdown in nf_ct_iterate_cleanup()
Patrick McHardy wrote: > Thanks, the previous approach doesn't seem to work properly without > unpleasant event cache hacks. This patch takes a simpler approach > and keeps the unconfirmed list iteration, but makes sure to make > forward progress. > > > > > > [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops > > Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling: > Works great: survived three reboots without lockup or warning messages. And it's a nice simple patch, too... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/3] Freezer: Fix vfork problem
On 02/28, Rafael J. Wysocki wrote: > > Okay, I have added a comment to freezer.h. Please have a look. > > > -extern void thaw_some_processes(int all); > +/* > + * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it > + * calls wait_for_completion() and reset right after it returns from > this > + * function. Next, the parent should call try_to_freeze() to freeze itself > + * appropriately in case the child has exited before the freezing of tasks is > + * complete. However, we don't want kernel threads to be frozen in > unexpected > + * places, so we allow them to block freeze_processes() instead or to set > + * PF_NOFREEZE if needed and PF_FREEZER_SKIP is only set for userland vfork > + * parents. Fortunately, in the call_usermodehelper() case the parent > won't > + * really block freeze_processes(), since call_usermodehelper() (the > child) > + * does a little before exec/exit and it can't be frozen before waking up the > + * parent. > + */ I think this comment is accurate and understandable, and I am not suggesting to change it. However, please note that PF_FREEZER_SKIP can be used not only for vfork(). For example, it seems to me we can also use freezer_...count() to solve the problem with coredump. We can use the same "wait_for_completion_freezable" pattern in exit_mm() and in coredump_wait(). (i do not claim this is a best fix though). Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Davide Libenzi wrote: > My point is, the syslet infrastructure is expensive for the kernel in > terms of compat, [...] it is not. Today i've implemented 64-bit syslets on x86_64 and 32-bit-on-64-bit compat syslets. Both the 64-bit and the 32-bit syslet (and threadlet) binaries work just fine on a 64-bit kernel, and they share 99% of the infrastructure. There's only a single #ifdef CONFIG_COMPAT in kernel/async.c: #ifdef CONFIG_COMPAT asmlinkage struct syslet_uatom __user * compat_sys_async_exec(struct syslet_uatom __user *uatom, struct async_head_user __user *ahu) { return __sys_async_exec(uatom, ahu, _sys_call_table, compat_NR_syscalls); } #endif Even mixed-mode syslets should work (although i havent specifically tested them), where the head switches between 64-bit and 32-bit mode and submits syslets from both 64-bit and from 32-bit mode, and at the same time there might be both 64-bit and 32-bit syslets 'in flight'. But i'm happy to change the syslet API in any sane way, and did so based on feedback from Jens who is actually using them. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] adapt page_lock_anon_vma() to PREEMPT_RCU
On Sun, 25 Feb 2007, Oleg Nesterov wrote: > page_lock_anon_vma() uses spin_lock() to block RCU. This doesn't work with > PREEMPT_RCU, we have to do rcu_read_lock() explicitely. Otherwise, it is > theoretically possible that slab returns anon_vma's memory to the system > before we do spin_unlock(_vma->lock). > > Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> Acked-by: Hugh Dickins <[EMAIL PROTECTED]> Thanks for doing this, and sorry for my delay. Hugh > > --- WQ/mm/rmap.c~ 2007-02-18 22:56:49.0 +0300 > +++ WQ/mm/rmap.c 2007-02-25 22:43:00.0 +0300 > @@ -183,7 +183,7 @@ void __init anon_vma_init(void) > */ > static struct anon_vma *page_lock_anon_vma(struct page *page) > { > - struct anon_vma *anon_vma = NULL; > + struct anon_vma *anon_vma; > unsigned long anon_mapping; > > rcu_read_lock(); > @@ -195,9 +195,16 @@ static struct anon_vma *page_lock_anon_v > > anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); > spin_lock(_vma->lock); > + return anon_vma; > out: > rcu_read_unlock(); > - return anon_vma; > + return NULL; > +} > + > +static void page_unlock_anon_vma(struct anon_vma *anon_vma) > +{ > + spin_unlock(_vma->lock); > + rcu_read_unlock(); > } > > /* > @@ -333,7 +340,8 @@ static int page_referenced_anon(struct p > if (!mapcount) > break; > } > - spin_unlock(_vma->lock); > + > + page_unlock_anon_vma(anon_vma); > return referenced; > } > > @@ -809,7 +817,8 @@ static int try_to_unmap_anon(struct page > !page_mapped(page)) > break; > } > - spin_unlock(_vma->lock); > + > + page_unlock_anon_vma(anon_vma); > return ret; > } > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] i386: Fix usage of -mtune when X86_GENERIC=y or CONFIG_MCORE2=y
Two fixes to arch/i386/Makefile.cpu: 1) When X86_GENERIC=y is set, use -mtune=i686 if $(CC) doesn't support -mtune=generic. GCC 4.1.2 and earlier don't support -mtune=generic. When building a generic kernel for a distro that runs on i586 and better, it is nice to use -march=i586 -mtune=i686 instead of plain -march=i586. 2) Use $(call tune) instead of hardcoded -mtune when CONFIG_MCORE2=y. This makes it possible to have CONFIG_MCORE2=y when using GCC 3.3, which uses -mcpu instead of -mtune. Also dropped fallback to -mtune=generic and -mtune=i686, because -march=i686 already implies -mtune=i686. The patch is against 2.6.20, but Makefile.cpu hasn't changed recently. --- linux-2.6.20/arch/i386/Makefile.cpu.orig2007-02-04 20:44:54.0 +0200 +++ linux-2.6.20/arch/i386/Makefile.cpu 2007-02-28 21:22:47.0 +0200 @@ -4,9 +4,9 @@ #-mtune exists since gcc 3.4 HAS_MTUNE := $(call cc-option-yn, -mtune=i386) ifeq ($(HAS_MTUNE),y) -tune = $(call cc-option,-mtune=$(1),) +tune = $(call cc-option,-mtune=$(1),$(2)) else -tune = $(call cc-option,-mcpu=$(1),) +tune = $(call cc-option,-mcpu=$(1),$(2)) endif align := $(cc-option-align) @@ -32,7 +32,7 @@ cflags-$(CONFIG_MWINCHIP3D)+= $(call cc-option,-march=winchip2,-march=i586) cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0 cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686) -cflags-$(CONFIG_MCORE2)+= -march=i686 $(call cc-option,-mtune=core2,$(call cc-option,-mtune=generic,-mtune=i686)) +cflags-$(CONFIG_MCORE2)+= -march=i686 $(call tune,core2) # AMD Elan support cflags-$(CONFIG_X86_ELAN) += -march=i486 @@ -42,5 +42,5 @@ # add at the end to overwrite eventual tuning options from earlier # cpu entries -cflags-$(CONFIG_X86_GENERIC) += $(call tune,generic) +cflags-$(CONFIG_X86_GENERIC) += $(call tune,generic,$(call tune,i686)) -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with freezable workqueues
On Wednesday, 28 February 2007 21:08, Oleg Nesterov wrote: > On 02/28, Rafael J. Wysocki wrote: > > > > On Wednesday, 28 February 2007 20:32, Oleg Nesterov wrote: > > > > > > I am sorry, I lost track of this problem. As for 2.6.21, > > > create_freezeable_workqueue > > > doesn't work and conflict with suspend. Why can't we remove it from XFS > > > as you > > > suggested before? > > > > Yes, we can (preparing a patch). I was just curious. :-) > > OK, thanks. > > We can (I think) do pretty much the same with some additional complications > in worker_thread() (check !cpu_online() after try_to_freeze() and break). Okay, but I've just finished the patch that removes the freezability of workqueues (appended), so can we please do this in a separate one? Rafael --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- include/linux/workqueue.h |8 +++- kernel/workqueue.c | 21 +++-- 3 files changed, 12 insertions(+), 21 deletions(-) Index: linux-2.6.21-rc2/kernel/workqueue.c === --- linux-2.6.21-rc2.orig/kernel/workqueue.c +++ linux-2.6.21-rc2/kernel/workqueue.c @@ -59,7 +59,6 @@ struct cpu_workqueue_struct { int run_depth; /* Detect run_workqueue() recursion depth */ - int freezeable; /* Freeze the thread during suspend */ } cacheline_aligned; /* @@ -352,8 +351,7 @@ static int worker_thread(void *__cwq) struct k_sigaction sa; sigset_t blocked; - if (!cwq->freezeable) - current->flags |= PF_NOFREEZE; + current->flags |= PF_NOFREEZE; set_user_nice(current, -5); @@ -376,9 +374,6 @@ static int worker_thread(void *__cwq) set_current_state(TASK_INTERRUPTIBLE); while (!kthread_should_stop()) { - if (cwq->freezeable) - try_to_freeze(); - add_wait_queue(>more_work, ); if (list_empty(>worklist)) schedule(); @@ -454,8 +449,8 @@ void fastcall flush_workqueue(struct wor } EXPORT_SYMBOL_GPL(flush_workqueue); -static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq, - int cpu, int freezeable) +static struct task_struct +*create_workqueue_thread(struct workqueue_struct *wq, int cpu) { struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu); struct task_struct *p; @@ -465,7 +460,6 @@ static struct task_struct *create_workqu cwq->thread = NULL; cwq->insert_sequence = 0; cwq->remove_sequence = 0; - cwq->freezeable = freezeable; INIT_LIST_HEAD(>worklist); init_waitqueue_head(>more_work); init_waitqueue_head(>work_done); @@ -480,8 +474,7 @@ static struct task_struct *create_workqu return p; } -struct workqueue_struct *__create_workqueue(const char *name, - int singlethread, int freezeable) +struct workqueue_struct *__create_workqueue(const char *name, int singlethread) { int cpu, destroy = 0; struct workqueue_struct *wq; @@ -501,7 +494,7 @@ struct workqueue_struct *__create_workqu mutex_lock(_mutex); if (singlethread) { INIT_LIST_HEAD(>list); - p = create_workqueue_thread(wq, singlethread_cpu, freezeable); + p = create_workqueue_thread(wq, singlethread_cpu); if (!p) destroy = 1; else @@ -509,7 +502,7 @@ struct workqueue_struct *__create_workqu } else { list_add(>list, ); for_each_online_cpu(cpu) { - p = create_workqueue_thread(wq, cpu, freezeable); + p = create_workqueue_thread(wq, cpu); if (p) { kthread_bind(p, cpu); wake_up_process(p); @@ -760,7 +753,7 @@ static int __devinit workqueue_cpu_callb mutex_lock(_mutex); /* Create a new workqueue thread for it. */ list_for_each_entry(wq, , list) { - if (!create_workqueue_thread(wq, hotcpu, 0)) { + if (!create_workqueue_thread(wq, hotcpu)) { printk("workqueue for %i failed\n", hotcpu); return NOTIFY_BAD; } Index: linux-2.6.21-rc2/include/linux/workqueue.h === --- linux-2.6.21-rc2.orig/include/linux/workqueue.h +++ linux-2.6.21-rc2/include/linux/workqueue.h @@
Kernel Oops with shm namespace cleanups
Hey. While testing 2.6.21-rc2 with libhugetlbfs, the shm-fork test case causes the kernel to oops. To reproduce: Execute 'make check' in the latest libhugetlbfs source on a 2.6.21-rc2 kernel with 100 huge pages allocated. Using fewer huge pages will likely also trigger the oops. Libhugetlbfs can be downloaded from: http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070228.tar.gz I have collected the following information: bc56bba8f31bd99f350a5ebfd43d50f411b620c7 is first bad commit commit bc56bba8f31bd99f350a5ebfd43d50f411b620c7 Author: Eric W. Biederman <[EMAIL PROTECTED]> Date: Tue Feb 20 13:57:53 2007 -0800 [PATCH] shm: make sysv ipc shared memory use stacked files [ cut here ] Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=32 NUMA Modules linked in: NIP: C002EA80 LR: C00A3F70 CTR: 6400 REGS: c0077967b770 TRAP: 0700 Not tainted (2.6.20-g1df49008) MSR: 80029032 CR: 28000448 XER: TASK = c0002f6737d0[3042] 'shm-fork' THREAD: c00779678000 CPU: 1 GPR00: C0077967B9F0 C06725A0 C0002F94EC00 GPR04: 93FD1000 93FD1000 0200 93FD1000 GPR08: 0001 0001 0001 GPR12: 48000444 C058BE00 FFEE8094 GPR16: 0200 100AC5E8 100A 1008 GPR20: 93FD1000 C0077FDBD088 C0002F94EC00 GPR24: C0077FDBD088 0200 C0002F94EC00 93FD1000 GPR28: C0077967BEA0 93FD1000 C05A2F58 C0077FDBD088 NIP [C002EA80] .huge_pte_alloc+0x7c/0x1dc LR [C00A3F70] .hugetlb_fault+0x48/0x150 Call Trace: [C0077967B9F0] [C0077967BA80] 0xc0077967ba80 (unreliable) [C0077967BAA0] [C00A3F70] .hugetlb_fault+0x48/0x150 [C0077967BB50] [C0094254] .__handle_mm_fault+0xa8/0x119c [C0077967BC50] [C002A1E0] .do_page_fault+0x3a8/0x57c [C0077967BE30] [C0004AFC] handle_page_fault+0x20/0x58 Instruction dump: 7820 7fa40040 409d0010 a00302be 7889c220 480c a00302bc 78892702 7c004e30 780907e1 40820008 3961 <0b0b> e922adb8 3800 ebda0048 [ cut here ] kernel BUG at /home/aglitke/git/linux-2.6/mm/hugetlb.c:375! Oops: Exception in kernel mode, sig: 5 [#2] SMP NR_CPUS=32 NUMA Modules linked in: NIP: C00A3518 LR: C00A376C CTR: C006B348 REGS: c0077967ace0 TRAP: 0700 Not tainted (2.6.20-g1df49008) MSR: 80029032 CR: 42022442 XER: TASK = c0002f6737d0[3042] 'shm-fork' THREAD: c00779678000 CPU: 1 GPR00: 0018 C0077967AF60 C06725A0 C0077FDBD088 GPR04: 93FD1000 F7FD1000 C0077FFA5A83 C0077FFEF6E0 GPR08: 10013000 00FD1000 10013000 C0697EB0 GPR12: 2200 C058BE00 10013000 10013000 GPR16: 10013000 C0077967B120 GPR20: F7FD1000 C40DBDD0 C0077FDBD088 GPR24: 00EF9C340793 10013000 C0002F94EC00 C0077967AFD0 GPR28: F7FD1000 93FD1000 C05A2F58 C0002F94EC00 NIP [C00A3518] .__unmap_hugepage_range+0x68/0x264 LR [C00A376C] .unmap_hugepage_range+0x58/0xa0 Call Trace: [C0077967AF60] [0001] 0x1 (unreliable) [C0077967B020] [C00A376C] .unmap_hugepage_range+0x58/0xa0 [C0077967B0B0] [C0091464] .unmap_vmas+0x17c/0x954 [C0077967B210] [C0099488] .exit_mmap+0xa4/0x17c [C0077967B2C0] [C004CB08] .mmput+0x60/0x160 [C0077967B360] [C0052E4C] .exit_mm+0x130/0x154 [C0077967B400] [C00535D8] .do_exit+0x238/0x964 [C0077967B4C0] [C0022AC4] .die+0x150/0x154 [C0077967B550] [C0022B10] ._exception+0x48/0x138 [C0077967B660] [C0023634] .program_check_exception+0x5cc/0x5e4 [C0077967B700] [C00046F4] program_check_common+0xf4/0x100 --- Exception: 700 at .huge_pte_alloc+0x7c/0x1dc LR = .hugetlb_fault+0x48/0x150 [C0077967B9F0] [C0077967BA80] 0xc0077967ba80 (unreliable) [C0077967BAA0] [C00A3F70] .hugetlb_fault+0x48/0x150 [C0077967BB50] [C0094254] .__handle_mm_fault+0xa8/0x119c [C0077967BC50] [C002A1E0] .do_page_fault+0x3a8/0x57c [C0077967BE30] [C0004AFC] handle_page_fault+0x20/0x58 Instruction dump: fb610078 780957e3 ebe3 7c26 54001ffe 0b00 e97e8030 3921 800b 7d290036 3929 7c894838 <0b09> 800b 3921 7d290036 Fixing recursive fault but reboot is needed! BUG: soft lockup detected on CPU#0! Call Trace: [C00779AD74C0] [C000F588] .show_stack+0x68/0x1b4 (unreliable) [C00779AD7570] [C007C5E0] .softlockup_tick+0xec/0x140
Re: [patch 04/26] Xen-paravirt_ops: Add pagetable accessors to pack and unpack pagetable entries
Ingo Molnar wrote: >> Yes, but it happens after asm/paravirt.h has already included some >> things, and it ends up causing problems. paravirt.h still defines >> various stub functions in the !CONFIG_PARAVIRT case, so it needs to do >> the includes either way. >> > > hm, it then needs to be fixed first, instead of adding to the mess. > OK, I've fixed this by hoisting all the native_* implementations into pgtable.h. In the !PARAVIRT case the normal macros directly use the native_* functions, and in the PARAVIRT case they're used by the native paravirt_ops. This has the nice property of avoiding this specific problem, and also generally removes code duplication. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 06/26] Xen-paravirt_ops: paravirt_ops: allocate a fixmap slot
Ingo Molnar wrote: > fair enough. Please rename it to FIX_PARAVIRT_BOOTUP - you can still > rely on it being available later on too, but we'd like to give everyone > the right fundamental idea about this: it's meant to be a limited, > inflexible interface for bootstrap only. > Will do. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with freezable workqueues
On 02/28, Rafael J. Wysocki wrote: > > On Wednesday, 28 February 2007 20:32, Oleg Nesterov wrote: > > > > I am sorry, I lost track of this problem. As for 2.6.21, > > create_freezeable_workqueue > > doesn't work and conflict with suspend. Why can't we remove it from XFS as > > you > > suggested before? > > Yes, we can (preparing a patch). I was just curious. :-) OK, thanks. We can (I think) do pretty much the same with some additional complications in worker_thread() (check !cpu_online() after try_to_freeze() and break). Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/22] update ctime and mtime for mmaped write
Miklos Szeredi wrote: While these entry points do not actually modify the file itself, as was pointed out, they are handy points at which the kernel gains control and could actually notice that the contents of the file are no longer the same as they were, ie. modified. From the operating system viewpoint, this is where the semantics of modification to file contents via mmap differs from the semantics of modification to file contents via write(2). It is desirable for the file times to be updated as quickly as possible after the actual modification has occurred. I disagree. You don't worry about the timestamp being updated _during_ a large write() call, even though the file is constantly being modified. No, but you do worry about the timestamps being updated after every write() call, no matter how large or small. You think of write() as something instantaneous, while you think of writing to a shared mapping, then doing msync() as something taking a long time. In actual fact both of these are basically equivalent operations, the differences being, that you can easily modify non-contiguous parts of a file with mmap, while you can't do that with write. The disadvantage from mmap comes from the cost of setting up the page tables and handling the faults. Think of it this way: shared mmap write + msync(MS_ASYNC) == write() msync(MS_ASYNC) + fsync() == msync(MS_SYNC) I don't believe that this is a valid characterization because the changes to the contents of the file, made through the mmap'd region, are immediately visible to any and all other applications accessing the file. Since the contents of the file are changing, then so should the timestamps to reflect this. A better design for all of this would be to update the file times and mark the inode as needing to be written out when a page fault is taken for a page which either does not exist or needs to be made writable and that page is part of an appropriate style mapping. I think this would just be a waste of CPU. I think that we are going to have to agree to disagree because I don't agree either with your characterizations of the desirable semantics associated with shared mmap or that maintaining the correctness in the system is a waste of CPU. I view mmap as a way for an application to treat the contents of a file as another segment in its address space. This allows it to manipulate the contents of a file without incurring the overhead of the read and write system calls and the double buffering that naturally occurs with those system calls. I think that: char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); *p = 1; *(p + 4096) = 2; should have the same effect as: char c = 1; pwrite(fd, , 1, 0); c = 2; pwrite(fd, , 1, 4096); Clearly, the two can't be equivalent since the operating system can only become involved at certain times in order to update the timestamps. That's why there are specifications about the timestamps for things like msync. They should be as close as possible though. However, since I seem to be the only one presenting a different viewpoint, then I will agree to disagree and commit. I will see if I can sell your semantics to my customer and find out if that will satisfy them. Thanx... ps - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] - platform_kernel_launch_event is noop on generic kernel
Add a missing #define for the platform_kernel_launch_event. Without this fix, a call to platform_kernel_launch_event() becomes a noop on generic kernels. SN systems require this fix to successfully kdump/kexec from certain hardware errors. Signed-off-by: John Keller <[EMAIL PROTECTED]> --- Index: linux-2.6/include/asm-ia64/machvec.h === --- linux-2.6.orig/include/asm-ia64/machvec.h 2007-02-28 08:39:45.764537727 -0600 +++ linux-2.6/include/asm-ia64/machvec.h2007-02-28 08:40:01.254467899 -0600 @@ -168,6 +168,7 @@ extern void machvec_tlb_migrate_finish ( # define platform_setup_msi_irq ia64_mv.setup_msi_irq # define platform_teardown_msi_irqia64_mv.teardown_msi_irq # define platform_pci_fixup_bus ia64_mv.pci_fixup_bus +# define platform_kernel_launch_event ia64_mv.kernel_launch_event # endif /* __attribute__((__aligned__(16))) is required to make size of the - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] Add LZO Compression
On Wed, 2007-02-28 at 19:13 +, Richard Purdie wrote: > The following patch series adds LZO compression support to the kernel > and exposes it in a variety of places (jffs2, crypto). > > This is particularly useful for jffs2 where significant boot time > speedups (~10%) and file read speed improvements (~40%) are seen when > its used with only a slight drop in file compression ratio. Providing the digits are accurate, this is very good stuff. -- Best regards, Artem Bityutskiy (Битюцкий Артём) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
Another option is to use 46..49 for UARTs #0..3, and 192..195 for UARTs #4..7. Or, perhaps better, use 46..49 for #0..3, and 192..199 for #0..7, handling the duplication in the driver; and deprecate the old range. That sounds like more hassle than it's worth. The discontinuous range may be annoying, but it isn't really a huge amount of code. Yeah. My suggestion would allow to get rid of that extra code some day, though (but sure, is that worth it?) Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
Kumar Gala wrote: Eh, I'm not crazy about that. That means that I have to complicate my driver because someone else screwed up a long time ago. If not you someone else. The cost in the driver is small compared to fixing up all the distro's and such. If you don't provide this change someone else will. *sigh* What about major number 205? It also has the screwed-up /dev/ttyCPM entries, but it has more room, and the CPM driver doesn't actually use it. At least, I can't see where it uses it. -- Timur Tabi Linux Kernel Developer @ Freescale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, 28 Feb 2007, Chris Friesen wrote: > Davide Libenzi wrote: > > > struct async_syscall { > > unsigned long nr_sysc; > > unsigned long params[8]; > > long *result; > > }; > > > > And what would async_wait() return bak? Pointers to "struct async_syscall" > > or pointers to "result"? > > Either one has downsides. Pointer to struct async_syscall requires that the > caller keep the struct around. Pointer to result requires that the caller > always reserve a location for the result. > > Does the kernel care about the (possibly rare) case of callers that don't want > to pay attention to result? If so, what about adding some kind of > caller-specified handle to struct async_syscall, and having async_wait() > return the handle? In the case where the caller does care about the result, > the handle could just be the address of result. Something like this (with async_wait() returning asynid's)? struct async_syscall { long *result; unsigned long asynid; unsigned long nr_sysc; unsigned long params[8]; }; - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] jffs2: Allow selection of compression mode via a sysfs attribute
Hi Richard, On Wed, 2007-02-28 at 19:13 +, Richard Purdie wrote: > +/* gives us jffs2_subsys */ > +static decl_subsys(jffs2, NULL, NULL); There is actually a file-system subsys - look up for fs_subsys. It is declared at fs/namespace.c. -- Best regards, Artem Bityutskiy (Битюцкий Артём) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with freezable workqueues
On Wednesday, 28 February 2007 20:32, Oleg Nesterov wrote: > On 02/28, Rafael J. Wysocki wrote: > > > > > --- workqueue.c.org 2007-02-28 18:32:48.0 +0530 > > > +++ workqueue.c 2007-02-28 18:44:23.0 +0530 > > > @@ -718,6 +718,8 @@ static void cleanup_workqueue_thread(str > > > insert_wq_barrier(cwq, , 1); > > > cwq->should_stop = 1; > > > alive = 1; > > > + if (frozen(cwq->thread)) > > > + thaw(cwq->thread); > > > } > > > spin_unlock_irq(>lock); > > > > Unfortunately, the above code is mm-only. Is the analogous fix for > > 2.6.21-rc2 > > viable? > > I am sorry, I lost track of this problem. As for 2.6.21, > create_freezeable_workqueue > doesn't work and conflict with suspend. Why can't we remove it from XFS as you > suggested before? Yes, we can (preparing a patch). I was just curious. :-) Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] crypto: Add LZO compression support to the crypto interface
Add LZO1X compression support to the crypto interface, including a couple of tests. Also convert test_deflate into a more generic test_compress() and avoid duplicating the data for compression and decompression tests since this can always work both ways in the compression case. Signed-off-by: Richard Purdie <[EMAIL PROTECTED]> --- crypto/Kconfig |8 +++ crypto/Makefile |1 crypto/lzo.c| 120 crypto/tcrypt.c | 43 crypto/tcrypt.h | 75 +++ 5 files changed, 190 insertions(+), 57 deletions(-) Index: linux/crypto/Kconfig === --- linux.orig/crypto/Kconfig 2007-02-28 18:12:17.0 + +++ linux/crypto/Kconfig2007-02-28 18:12:32.0 + @@ -406,6 +406,14 @@ config CRYPTO_DEFLATE You will most probably want this if using IPSec. +config CRYPTO_LZO + tristate "LZO compression algorithm" + depends on CRYPTO + select LZO + help + Enable use of the LZO compression algorithm through the crypto + subsystem. + config CRYPTO_MICHAEL_MIC tristate "Michael MIC keyed digest algorithm" select CRYPTO_ALGAPI Index: linux/crypto/Makefile === --- linux.orig/crypto/Makefile 2007-02-28 18:12:17.0 + +++ linux/crypto/Makefile 2007-02-28 18:12:32.0 + @@ -44,6 +44,7 @@ obj-$(CONFIG_CRYPTO_TEA) += tea.o obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o +obj-$(CONFIG_CRYPTO_LZO) += lzo.o obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o Index: linux/crypto/lzo.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux/crypto/lzo.c 2007-02-28 18:12:32.0 + @@ -0,0 +1,120 @@ +/* + * Cryptographic API for LZO compression. + * + * Copyright (C) 2007 Nokia Corporation. All rights reserved. + * + * Author: Richard Purdie <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA + * 02110-1301 USA + * + */ + +#include +#include +#include +#include +#include +#include +#include + +struct lzo_ctx { + void *lzo_mem; +}; + +static int lzo_init(struct crypto_tfm *tfm) +{ + struct lzo_ctx *ctx = crypto_tfm_ctx(tfm); + + ctx->lzo_mem = vmalloc(LZO1X_MEM_COMPRESS); + + if (!ctx->lzo_mem) { + vfree(ctx->lzo_mem); + return -ENOMEM; + } + + return 0; +} + +static void lzo_exit(struct crypto_tfm *tfm) +{ + struct lzo_ctx *ctx = crypto_tfm_ctx(tfm); + + vfree(ctx->lzo_mem); +} + +static int lzo_compress(struct crypto_tfm *tfm, const u8 *src, + unsigned int slen, u8 *dst, unsigned int *dlen) +{ + struct lzo_ctx *ctx = crypto_tfm_ctx(tfm); + unsigned long compress_size; + int ret; + + /* Check if enough space in dst buffer for worst case expansion */ + if (*dlen < lzo1x_worst_compress(slen)) + return -EINVAL; + + ret = lzo1x_1_compress(src, slen, dst, _size, ctx->lzo_mem); + + if (ret != LZO_E_OK) + return -EINVAL; + + *dlen = compress_size; + + return 0; +} + +static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src, + unsigned int slen, u8 *dst, unsigned int *dlen) +{ + int ret; + + ret = lzo1x_decompress_safe(src, slen, dst, dlen, NULL); + + if (ret != LZO_E_OK) + return -EINVAL; + + return 0; +} + +static struct crypto_alg alg = { + .cra_name = "lzo1x", + .cra_flags = CRYPTO_ALG_TYPE_COMPRESS, + .cra_ctxsize= sizeof(struct lzo_ctx), + .cra_module = THIS_MODULE, + .cra_list = LIST_HEAD_INIT(alg.cra_list), + .cra_init = lzo_init, + .cra_exit = lzo_exit, + .cra_u = { .compress = { + .coa_compress = lzo_compress, + .coa_decompress = lzo_decompress } } +}; + +static int __init init(void) +{ + return crypto_register_alg();
[PATCH 5/5] jffs2: Allow selection of compression mode via a sysfs attribute
Allow selection of the compression mode for jffs2 via a sysfs attribute. This establishes a sysfs presence for jffs2 through which other compression options could easily be exported too. Signed-off-by: Richard Purdie <[EMAIL PROTECTED]> --- fs/jffs2/compr.c | 131 +++ 1 file changed, 94 insertions(+), 37 deletions(-) Index: linux/fs/jffs2/compr.c === --- linux.orig/fs/jffs2/compr.c 2007-02-28 18:12:33.0 + +++ linux/fs/jffs2/compr.c 2007-02-28 18:12:33.0 + @@ -13,6 +13,7 @@ * */ +#include #include "compr.h" static DEFINE_SPINLOCK(jffs2_compressor_list_lock); @@ -298,6 +299,43 @@ int jffs2_unregister_compressor(struct j return 0; } +char *jffs2_get_compression_mode_name(void) +{ +switch (jffs2_compression_mode) { +case JFFS2_COMPR_MODE_NONE: +return "none"; +case JFFS2_COMPR_MODE_PRIORITY: +return "priority"; +case JFFS2_COMPR_MODE_SIZE: +return "size"; + case JFFS2_COMPR_MODE_FAVOURLZO: + return "favourlzo"; +} +return "unkown"; +} + +int jffs2_set_compression_mode_name(const char *name) +{ +if (!strncmp("none", name, 4)) { +jffs2_compression_mode = JFFS2_COMPR_MODE_NONE; +return 0; +} +if (!strncmp("priority", name, 8)) { +jffs2_compression_mode = JFFS2_COMPR_MODE_PRIORITY; +return 0; +} +if (!strncmp("size", name, 4)) { +jffs2_compression_mode = JFFS2_COMPR_MODE_SIZE; +return 0; +} + if (!strncmp("favourlzo", name, 9)) { + jffs2_compression_mode = JFFS2_COMPR_MODE_FAVOURLZO; + return 0; + } +return -EINVAL; +} + + #ifdef CONFIG_JFFS2_PROC #define JFFS2_STAT_BUF_SIZE 16000 @@ -347,42 +385,6 @@ char *jffs2_stats(void) return buf; } -char *jffs2_get_compression_mode_name(void) -{ -switch (jffs2_compression_mode) { -case JFFS2_COMPR_MODE_NONE: -return "none"; -case JFFS2_COMPR_MODE_PRIORITY: -return "priority"; -case JFFS2_COMPR_MODE_SIZE: -return "size"; -case JFFS2_COMPR_MODE_FAVOURLZO: -return "favourlzo"; -} -return "unkown"; -} - -int jffs2_set_compression_mode_name(const char *name) -{ -if (!strcmp("none",name)) { -jffs2_compression_mode = JFFS2_COMPR_MODE_NONE; -return 0; -} -if (!strcmp("priority",name)) { -jffs2_compression_mode = JFFS2_COMPR_MODE_PRIORITY; -return 0; -} -if (!strcmp("size",name)) { -jffs2_compression_mode = JFFS2_COMPR_MODE_SIZE; -return 0; -} -if (!strncmp("favourlzo", name, 9)) { -jffs2_compression_mode = JFFS2_COMPR_MODE_FAVOURLZO; -return 0; -} -return 1; -} - static int jffs2_compressor_Xable(const char *name, int disabled) { struct jffs2_compressor *this; @@ -448,8 +450,54 @@ void jffs2_free_comprbuf(unsigned char * kfree(comprbuf); } +static struct attribute jffs2_attr_mode = { + .name = "mode", + .mode = S_IRUGO | S_IWUSR, +}; + +static struct attribute *jffs2_attrs[] = { + _attr_mode, + NULL, +}; + +static ssize_t jffs2_attr_show(struct kobject *kobj, struct attribute *attr, + char *page) +{ + if (!strcmp("mode", attr->name)) + return sprintf(page, "%s\n", jffs2_get_compression_mode_name()); + return 0; +} + +static ssize_t jffs2_attr_store(struct kobject *kobj, struct attribute *attr, + const char *page, size_t count) +{ + int ret = -EINVAL; + + if (!strcmp("mode", attr->name)) { + ret = jffs2_set_compression_mode_name(page); + if (ret >= 0) + return count; + } + return ret; +} + +static struct sysfs_ops jffs2_sysfs_ops = { + .show = jffs2_attr_show, + .store = jffs2_attr_store, +}; + +static struct kobj_type jffs2_subsys_type = { + .default_attrs = jffs2_attrs, + .sysfs_ops = _sysfs_ops, +}; + +/* gives us jffs2_subsys */ +static decl_subsys(jffs2, NULL, NULL); + int __init jffs2_compressors_init(void) { + int ret; + /* Registering compressors */ #ifdef CONFIG_JFFS2_ZLIB jffs2_zlib_init(); @@ -481,12 +529,21 @@ int __init jffs2_compressors_init(void) #endif #endif #endif + /* Errors here are not fatal */ + kset_set_kset_s(_subsys, fs_subsys); + jffs2_subsys.kset.kobj.ktype = _subsys_type; + ret = subsystem_register(_subsys); + if (ret) + printk(KERN_WARNING "Error registering
[PATCH 4/5] jffs2: Add a "favourlzo" compression mode to jffs2
Add a "favourlzo" compression mode to jffs2 which tries to optimise by size but gives lzo an advantage when comparing sizes. This means the faster lzo algorithm can be preferred when there isn't much difference in compressed size (the exact threshold can be changed). Signed-off-by: Richard Purdie <[EMAIL PROTECTED]> --- fs/Kconfig |7 +++ fs/jffs2/compr.c | 51 ++- fs/jffs2/compr.h |3 +++ 3 files changed, 56 insertions(+), 5 deletions(-) Index: linux/fs/Kconfig === --- linux.orig/fs/Kconfig 2007-02-28 18:12:31.0 + +++ linux/fs/Kconfig2007-02-28 18:12:33.0 + @@ -1359,6 +1359,13 @@ config JFFS2_CMODE_SIZE Tries all compressors and chooses the one which has the smallest result. +config JFFS2_CMODE_FAVOURLZO +bool "Favour LZO" +help + Tries all compressors and chooses the one which has the smallest + result but gives some preference to LZO (which has faster + decompression) at the expense of size. + endchoice config CRAMFS Index: linux/fs/jffs2/compr.c === --- linux.orig/fs/jffs2/compr.c 2007-02-28 18:12:31.0 + +++ linux/fs/jffs2/compr.c 2007-02-28 18:13:09.0 + @@ -26,6 +26,34 @@ static int jffs2_compression_mode = JFFS /* Statistics for blocks stored without compression */ static uint32_t none_stat_compr_blocks=0,none_stat_decompr_blocks=0,none_stat_compr_size=0; + +/* + * Return 1 to use this compression + */ +static int jffs2_is_best_compression(struct jffs2_compressor *this, + struct jffs2_compressor *best, uint32_t size, uint32_t bestsize) +{ + switch (jffs2_compression_mode) { + case JFFS2_COMPR_MODE_SIZE: + if (bestsize > size) + return 1; + return 0; + case JFFS2_COMPR_MODE_FAVOURLZO: + if ((this->compr == JFFS2_COMPR_LZO) && (bestsize > size)) + return 1; + if ((best->compr != JFFS2_COMPR_LZO) && (bestsize > size)) + return 1; + if ((this->compr == JFFS2_COMPR_LZO) && (bestsize > (size * FAVOUR_LZO_PERCENT / 100))) + return 1; + if ((bestsize * FAVOUR_LZO_PERCENT / 100) > size) + return 1; + + return 0; + } + /* Shouldn't happen */ + return 0; +} + /* jffs2_compress: * @data: Pointer to uncompressed data * @cdata: Pointer to returned pointer to buffer for compressed data @@ -91,6 +119,7 @@ uint16_t jffs2_compress(struct jffs2_sb_ if (ret == JFFS2_COMPR_NONE) kfree(output_buf); break; case JFFS2_COMPR_MODE_SIZE: +case JFFS2_COMPR_MODE_FAVOURLZO: orig_slen = *datalen; orig_dlen = *cdatalen; spin_lock(_compressor_list_lock); @@ -99,7 +128,7 @@ uint16_t jffs2_compress(struct jffs2_sb_ if ((!this->compress)||(this->disabled)) continue; /* Allocating memory for output buffer if necessary */ -if ((this->compr_buf_sizecompr_buf)) { +if ((this->compr_buf_sizecompr_buf)) { spin_unlock(_compressor_list_lock); kfree(this->compr_buf); spin_lock(_compressor_list_lock); @@ -108,15 +137,15 @@ uint16_t jffs2_compress(struct jffs2_sb_ } if (!this->compr_buf) { spin_unlock(_compressor_list_lock); -tmp_buf = kmalloc(orig_dlen,GFP_KERNEL); +tmp_buf = kmalloc(orig_slen,GFP_KERNEL); spin_lock(_compressor_list_lock); if (!tmp_buf) { -printk(KERN_WARNING "JFFS2: No memory for compressor allocation. (%d bytes)\n",orig_dlen); +printk(KERN_WARNING "JFFS2: No memory for compressor allocation. (%d bytes)\n",orig_slen); continue; } else { this->compr_buf = tmp_buf; -this->compr_buf_size = orig_dlen; +this->compr_buf_size = orig_slen; } } this->usecount++; @@ -127,7 +156,8 @@ uint16_t jffs2_compress(struct
[PATCH 2/5] jffs2: Add LZO compression support to jffs2
Add LZO1X compression/decompression support to jffs2. LZO's interface doesn't entirely match that required by jffs2 so a buffer and memcpy is unavoidable. Signed-off-by: Richard Purdie <[EMAIL PROTECTED]> --- fs/Kconfig| 10 fs/jffs2/Makefile |1 fs/jffs2/compr.c |6 ++ fs/jffs2/compr.h |3 - fs/jffs2/compr_lzo.c | 120 ++ include/linux/jffs2.h |1 6 files changed, 140 insertions(+), 1 deletion(-) Index: linux/fs/Kconfig === --- linux.orig/fs/Kconfig 2007-02-28 18:12:17.0 + +++ linux/fs/Kconfig2007-02-28 18:13:10.0 + @@ -1310,6 +1310,16 @@ config JFFS2_ZLIB Say 'Y' if unsure. +config JFFS2_LZO + bool "JFFS2 LZO compression support" if JFFS2_COMPRESSION_OPTIONS + select LZO + depends on JFFS2_FS + default y +help + minilzo-based compression. Generally works better than Zlib. + + Say 'Y' if unsure. + config JFFS2_RTIME bool "JFFS2 RTIME compression support" if JFFS2_COMPRESSION_OPTIONS depends on JFFS2_FS Index: linux/fs/jffs2/Makefile === --- linux.orig/fs/jffs2/Makefile2007-02-28 18:12:17.0 + +++ linux/fs/jffs2/Makefile 2007-02-28 18:12:31.0 + @@ -18,4 +18,5 @@ jffs2-$(CONFIG_JFFS2_FS_POSIX_ACL)+= ac jffs2-$(CONFIG_JFFS2_RUBIN)+= compr_rubin.o jffs2-$(CONFIG_JFFS2_RTIME)+= compr_rtime.o jffs2-$(CONFIG_JFFS2_ZLIB) += compr_zlib.o +jffs2-$(CONFIG_JFFS2_LZO) += compr_lzo.o jffs2-$(CONFIG_JFFS2_SUMMARY) += summary.o Index: linux/fs/jffs2/compr.c === --- linux.orig/fs/jffs2/compr.c 2007-02-28 18:12:17.0 + +++ linux/fs/jffs2/compr.c 2007-02-28 18:13:10.0 + @@ -425,6 +425,9 @@ int __init jffs2_compressors_init(void) jffs2_rubinmips_init(); jffs2_dynrubin_init(); #endif +#ifdef CONFIG_JFFS2_LZO +jffs2_lzo_init(); +#endif /* Setting default compression mode */ #ifdef CONFIG_JFFS2_CMODE_NONE jffs2_compression_mode = JFFS2_COMPR_MODE_NONE; @@ -443,6 +446,9 @@ int __init jffs2_compressors_init(void) int jffs2_compressors_exit(void) { /* Unregistering compressors */ +#ifdef CONFIG_JFFS2_LZO +jffs2_lzo_exit(); +#endif #ifdef CONFIG_JFFS2_RUBIN jffs2_dynrubin_exit(); jffs2_rubinmips_exit(); Index: linux/fs/jffs2/compr_lzo.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux/fs/jffs2/compr_lzo.c 2007-02-28 18:12:31.0 + @@ -0,0 +1,120 @@ +/* + * JFFS2 LZO Compression Interface + * + * Copyright (C) 2007 Nokia Corporation. All rights reserved. + * + * Author: Richard Purdie <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA + * 02110-1301 USA + * + */ + +#include +#include +#include +#include +#include +#include +#include "compr.h" + +static void *lzo_mem; +static void *lzo_compress_buf; +static DEFINE_MUTEX(deflate_mutex); + +static void free_workspace(void) +{ + vfree(lzo_mem); + vfree(lzo_compress_buf); +} + +static int __init alloc_workspace(void) +{ + lzo_mem = vmalloc(LZO1X_MEM_COMPRESS); + lzo_compress_buf = vmalloc(lzo1x_worst_compress(PAGE_SIZE)); + + if (!lzo_mem || !lzo_compress_buf) { + printk(KERN_WARNING "Failed to allocate lzo deflate workspace\n"); + free_workspace(); + return -ENOMEM; + } + + return 0; +} + +static int jffs2_lzo_compress(unsigned char *data_in, unsigned char *cpage_out, + uint32_t *sourcelen, uint32_t *dstlen, void *model) +{ + unsigned long compress_size; + int ret; + + mutex_lock(_mutex); + ret = lzo1x_1_compress(data_in, *sourcelen, lzo_compress_buf, _size, lzo_mem); + mutex_unlock(_mutex); + + if (ret != LZO_E_OK) + return -1; + + if (compress_size > *dstlen) + return -1; + + memcpy(cpage_out, lzo_compress_buf, compress_size); + *dstlen = compress_size; + + return 0; +} + +static int jffs2_lzo_decompress(unsigned char
[PATCH 1/5] Add LZO compression support to the kernel
Add LZO1X compression/decompression support to the kernel. This is based on the standard userspace lzo library, particularly minilzo with the headers much trimmed down and simplified for kernel use. Its structured so that it should still diff with the userspace version for ease of future updating. Signed-off-by: Richard Purdie <[EMAIL PROTECTED]> --- include/linux/lzo.h | 63 + lib/Kconfig |5 lib/Makefile|1 lib/lzo/Makefile|3 lib/lzo/lzoconf.h | 186 + lib/lzo/lzodefs.h | 463 + lib/lzo/lzointf.c | 37 + lib/lzo/minilzo.c | 1771 8 files changed, 2529 insertions(+) http://folks.o-hand.com/richard/lzo/lzo_kernel.patch (since it exceeds the file size limit for LKML) I can email inline if anyone prefers it that way. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/5] Add LZO Compression
The following patch series adds LZO compression support to the kernel and exposes it in a variety of places (jffs2, crypto). This is particularly useful for jffs2 where significant boot time speedups (~10%) and file read speed improvements (~40%) are seen when its used with only a slight drop in file compression ratio. It also adds a favourlzo mode to jffs2 which is similar to the existing size mode but lets lzo compression win if the lzo compressed size is "similar" to but not the best compression ratio. This means we can keep zlib compression where it makes a significant difference to compressed file size. The final jffs2 patch which starts adding sysfs support is something I have around from testing and I'm including it for comments to see if its desirable upstream. It could be extended further to allow greater control of jffs2 at runtime. Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/3] Freezer: Fix vfork problem
On Wednesday, 28 February 2007 12:00, Oleg Nesterov wrote: > On 02/28, Rafael J. Wysocki wrote: > > > > On Wednesday, 28 February 2007 02:23, Srivatsa Vaddagiri wrote: > > > On Wed, Feb 28, 2007 at 12:53:14AM +0300, Oleg Nesterov wrote: > > > > I think it is good. Srivatsa? > > > > > > Maybe additional comments on why we don't skip vfork kernel tasks may be > > > good. > > > > Which is because we don't want the kernel threads to be frozen in unexpected > > places, so we allow them to block freeze_processes() instead or to set > > PF_NOFREEZE? > > ... and because in fact it won't block freeze_processes(), > call_usermodehelper > (the child) does a minimum before exec/exit, and it can't be frozen until it > wakes > up the parent. Okay, I have added a comment to freezer.h. Please have a look. Rafael --- From: Rafael J. Wysocki <[EMAIL PROTECTED]> Currently try_to_freeze_tasks() has to wait until all of the vforked processes exit and for this reason every user can make it fail. To fix this problem we can introduce the additional process flag PF_FREEZER_SKIP to be used by tasks that do not want to be counted as freezable by the freezer and want to have TIF_FREEZE set nevertheless. Then, this flag can be set by tasks using sys_vfork() before they call wait_for_completion() and cleared after they have woken up. After clearing it, the tasks should call try_to_freeze() as soon as possible. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> include/linux/freezer.h | 48 ++-- include/linux/sched.h |1 + kernel/fork.c |3 +++ kernel/power/process.c | 27 --- 4 files changed, 58 insertions(+), 21 deletions(-) Index: linux-2.6.20-mm2/include/linux/sched.h === --- linux-2.6.20-mm2.orig/include/linux/sched.h +++ linux-2.6.20-mm2/include/linux/sched.h @@ -1189,6 +1189,7 @@ static inline void put_task_struct(struc #define PF_SPREAD_SLAB 0x0200 /* Spread some slab caches over cpuset */ #define PF_MEMPOLICY 0x1000 /* Non-default NUMA mempolicy */ #define PF_MUTEX_TESTER0x2000 /* Thread belongs to the rt mutex tester */ +#define PF_FREEZER_SKIP0x4000 /* Freezer should not count it as freezeable */ /* * Only the _current_ task can read/write to tsk->flags, but other Index: linux-2.6.20-mm2/include/linux/freezer.h === --- linux-2.6.20-mm2.orig/include/linux/freezer.h +++ linux-2.6.20-mm2/include/linux/freezer.h @@ -75,7 +75,49 @@ static inline int try_to_freeze(void) return 0; } -extern void thaw_some_processes(int all); +/* + * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it + * calls wait_for_completion() and reset right after it returns from this + * function. Next, the parent should call try_to_freeze() to freeze itself + * appropriately in case the child has exited before the freezing of tasks is + * complete. However, we don't want kernel threads to be frozen in unexpected + * places, so we allow them to block freeze_processes() instead or to set + * PF_NOFREEZE if needed and PF_FREEZER_SKIP is only set for userland vfork + * parents. Fortunately, in the call_usermodehelper() case the parent won't + * really block freeze_processes(), since call_usermodehelper() (the child) + * does a little before exec/exit and it can't be frozen before waking up the + * parent. + */ + +/* + * If the current task is a user space one, tell the freezer not to count it as + * freezable. + */ +static inline void freezer_do_not_count(void) +{ + if (current->mm) + current->flags |= PF_FREEZER_SKIP; +} + +/* + * If the current task is a user space one, tell the freezer to count it as + * freezable again and try to freeze it. + */ +static inline void freezer_count(void) +{ + if (current->mm) { + current->flags &= ~PF_FREEZER_SKIP; + try_to_freeze(); + } +} + +/* + * Check if the task should be counted as freezeable by the freezer + */ +static inline int freezer_should_skip(struct task_struct *p) +{ + return !!(p->flags & PF_FREEZER_SKIP); +} #else static inline int frozen(struct task_struct *p) { return 0; } @@ -90,5 +132,7 @@ static inline void thaw_processes(void) static inline int try_to_freeze(void) { return 0; } - +static inline void freezer_do_not_count(void) {} +static inline void freezer_count(void) {} +static inline int freezer_should_skip(struct task_struct *p) { return 0; } #endif Index: linux-2.6.20-mm2/kernel/fork.c === --- linux-2.6.20-mm2.orig/kernel/fork.c +++ linux-2.6.20-mm2/kernel/fork.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include @@ -1393,7 +1394,9 @@ long do_fork(unsigned long
Re: lanana: Add major/minor entries for PPC QE UART devices
On Feb 28, 2007, at 1:30 PM, Timur Tabi wrote: H. Peter Anvin wrote: Kumar Gala wrote: Why don't we allocate the 2nd group of four as well, just at a new location. They'll be discontinuous, but at least we'll have support for all 8. Right, it means two tty driver structures, but that's not a problem. Eh, I'm not crazy about that. That means that I have to complicate my driver because someone else screwed up a long time ago. If not you someone else. The cost in the driver is small compared to fixing up all the distro's and such. If you don't provide this change someone else will. - k - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with freezable workqueues
On 02/28, Rafael J. Wysocki wrote: > > > --- workqueue.c.org 2007-02-28 18:32:48.0 +0530 > > +++ workqueue.c 2007-02-28 18:44:23.0 +0530 > > @@ -718,6 +718,8 @@ static void cleanup_workqueue_thread(str > > insert_wq_barrier(cwq, , 1); > > cwq->should_stop = 1; > > alive = 1; > > + if (frozen(cwq->thread)) > > + thaw(cwq->thread); > > } > > spin_unlock_irq(>lock); > > Unfortunately, the above code is mm-only. Is the analogous fix for 2.6.21-rc2 > viable? I am sorry, I lost track of this problem. As for 2.6.21, create_freezeable_workqueue doesn't work and conflict with suspend. Why can't we remove it from XFS as you suggested before? Iirc, On 02/28, Nigel Cunningham wrote: > > On Wed, 2007-02-28 at 01:08 +0100, Rafael J. Wysocki wrote: > > On Wednesday, 28 February 2007 01:01, Johannes Berg wrote: > > > On Wed, 2007-02-28 at 00:57 +0100, Rafael J. Wysocki wrote: > > > > > > > Okay, in that case I'd suggest removing create_freezeable_workqueue() and > > > > make all workqueues nonfreezable once again for 2.6.21 (as far as I know, only > > > > the two XFS workqueues are affected). > > > > > > I think Nigel might object but I forgot what specific trouble XFS was > > > causing him. > > > > We suspected that the XFS' worker threads might commit I/O after > > freeze_processes() has returned, but that hasn't been supported by evidence, > > as far as I can recall. > > > > Also, making them freezable was controversial ... > > Controversy is no reason to give in! Nevertheless, I think you're right > - I believe the XFS guys said they fixed the issue that had caused I/O > to be submitted post-freeze. Well, we'll see if it appears again, won't > we? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
H. Peter Anvin wrote: Kumar Gala wrote: Why don't we allocate the 2nd group of four as well, just at a new location. They'll be discontinuous, but at least we'll have support for all 8. Right, it means two tty driver structures, but that's not a problem. Eh, I'm not crazy about that. That means that I have to complicate my driver because someone else screwed up a long time ago. -- Timur Tabi Linux Kernel Developer @ Freescale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
Segher Boessenkool wrote: Just allocate the four slots and we'll deal with anything above this in custom products. Another option is to use 46..49 for UARTs #0..3, and 192..195 for UARTs #4..7. Or, perhaps better, use 46..49 for #0..3, and 192..199 for #0..7, handling the duplication in the driver; and deprecate the old range. That sounds like more hassle than it's worth. The discontinuous range may be annoying, but it isn't really a huge amount of code. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
Just allocate the four slots and we'll deal with anything above this in custom products. Another option is to use 46..49 for UARTs #0..3, and 192..195 for UARTs #4..7. Or, perhaps better, use 46..49 for #0..3, and 192..199 for #0..7, handling the duplication in the driver; and deprecate the old range. Segher - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]: Fix __init declarations in Compaq SMART2 Controller driver
Fix __init declarations in Compaq SMART2 Controller driver. Resolves MODPOST warnings similar to: WARNING: drivers/block/cpqarray.o - Section mismatch: reference to .init.text:cpqarray_init_one from .data.rel.local between 'cpqarray_pci_driver' (at offset 0x20) and 'smart1_access' Signed-off-by: Prarit Bhargava <[EMAIL PROTECTED]> --- linux-2.6.18.ia64.orig/drivers/block/cpqarray.c 2007-02-14 11:36:20.0 -0500 +++ linux-2.6.18.ia64/drivers/block/cpqarray.c 2007-02-14 13:08:57.0 -0500 @@ -212,7 +212,7 @@ static struct proc_dir_entry *proc_array * Get us a file in /proc/array that says something about each controller. * Create /proc/array if it doesn't exist yet. */ -static void __init ida_procinit(int i) +static void __devinit ida_procinit(int i) { if (proc_array == NULL) { proc_array = proc_mkdir("cpqarray", proc_root_driver); @@ -390,7 +390,7 @@ static void __devexit cpqarray_remove_on } /* pdev is NULL for eisa */ -static int __init cpqarray_register_ctlr( int i, struct pci_dev *pdev) +static int __devinit cpqarray_register_ctlr( int i, struct pci_dev *pdev) { request_queue_t *q; int j; @@ -511,7 +511,7 @@ Enomem4: return -1; } -static int __init cpqarray_init_one( struct pci_dev *pdev, +static int __devinit cpqarray_init_one( struct pci_dev *pdev, const struct pci_device_id *ent) { int i; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch - v3] epoll ready set loops diet ...
On Wed, 28 Feb 2007, Eric Dumazet wrote: > On Wednesday 28 February 2007 19:37, Davide Libenzi wrote: > > > + list_del(>rdllink); > > + if (!(epi->event.events & EPOLLET) && (revents & > > epi->event.events)) > > + list_add_tail(>rdllink, ); > > + else { > > Is the ( ... & epi->event.events) really necessary ? (It seems already done) Yes, look here: if (epi->event.events & EPOLLONESHOT) epi->event.events &= EP_PRIVATE_BITS; Oneshot events should not be requeued. > I was wrong about the size of epitem : it is now 68 bytes instead of 72. > At least we now use/dirty one cache line instead of two per epitem. > > Do you have another brilliant idea to shrink 4 more bytes ? :) I don't think we can cleanly shove more stuff out of it. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)
Chuck Ebbert wrote: > There are two patches for raid5/6 out there that might fix this. I'll > attach them (the second just fixes a minor bug in the first one.) Never mind, those patches are already in 2.6.21-rc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
Kumar Gala wrote: Why don't we allocate the 2nd group of four as well, just at a new location. They'll be discontinuous, but at least we'll have support for all 8. Right, it means two tty driver structures, but that's not a problem. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
Dan Malek wrote: Just allocate the four slots and we'll deal with anything above this in custom products. Assuming that this is the agreed-upon standard, should I arbitrarily restrict my driver to 4 ports, or allow all 8? I assume that if a driver already claims a particular major/minor combo, then when the 2nd driver calls uart_add_one_port(), that call will fail? -- Timur Tabi Linux Kernel Developer @ Freescale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lanana: Add major/minor entries for PPC QE UART devices
On Feb 28, 2007, at 12:20 PM, Dan Malek wrote: On Feb 28, 2007, at 1:00 PM, H. Peter Anvin wrote: I would much rather see these devices moved to a different minor range. No. We just did that all too recently, and i don't know why the minors didn't get allocated properly. I don't want to have to update all of our embedded software distributions just because someone doesn't like minor numbers that aren't causing trouble. If we allocate unique spaces for all of the possible UART variations, there isn't going to be enough space. Just allocate the four slots and we'll deal with anything above this in custom products. Using more than four of these processor resources as UARTs isn't likely to happen because there won't be anything left for the interesting communication ports. Why don't we allocate the 2nd group of four as well, just at a new location. They'll be discontinuous, but at least we'll have support for all 8. - k - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with freezable workqueues
On Wednesday, 28 February 2007 14:17, Srivatsa Vaddagiri wrote: > On Wed, Feb 28, 2007 at 12:11:03PM +0100, Rafael J. Wysocki wrote: > > > In addition to thawing worker thread before kthread_stopping it, there > > > are minor changes required in worker threads, to check for > > > is_cpu_offline(bind_cpu) when they come out of refrigerator and jump to > > > wait_to_die if so (ex: softirq.c). > > > > > > I guess you would need these changes before freezer-based hotplug is > > > merged, in which case Gautham can send those patches out first. > > > > Yes, please, if that's possible. > > After looking at the current workqueue code, the above minor change I > suggested is not required. > > So you should be able to fix your "kthread_stop on a frozen worker > thread hangs" problem by just a simple patch like this (against > 2.6.20-mm2): > > > --- workqueue.c.org 2007-02-28 18:32:48.0 +0530 > +++ workqueue.c 2007-02-28 18:44:23.0 +0530 > @@ -718,6 +718,8 @@ static void cleanup_workqueue_thread(str > insert_wq_barrier(cwq, , 1); > cwq->should_stop = 1; > alive = 1; > + if (frozen(cwq->thread)) > + thaw(cwq->thread); > } > spin_unlock_irq(>lock); > > > Can you test with this? Unfortunately, the above code is mm-only. Is the analogous fix for 2.6.21-rc2 viable? Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch - v3] epoll ready set loops diet ...
On Wednesday 28 February 2007 19:37, Davide Libenzi wrote: > + list_del(>rdllink); > + if (!(epi->event.events & EPOLLET) && (revents & > epi->event.events)) > + list_add_tail(>rdllink, ); > + else { Is the ( ... & epi->event.events) really necessary ? (It seems already done) I was wrong about the size of epitem : it is now 68 bytes instead of 72. At least we now use/dirty one cache line instead of two per epitem. Do you have another brilliant idea to shrink 4 more bytes ? :) It seems to me 'nwait' is only used at init time (so that ep_ptable_queue_proc() can signal an error occured). Maybe another mechanism could let us delete nwait from epitem ? We could use a field in task_struct for example (see usage of total_link_count for example) Thank you - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]: __init to __cpuinit in mtrr code
(Resending to wider audience) __init to __cpuinit in mtrr code. Resolves warnings similar to: WARNING: vmlinux - Section mismatch: reference to .init.text:mtrr_bp_init from .text between 'identify_cpu' (at offset 0xc040b38e) and 'detect_ht' Signed-off-by: Prarit Bhargava <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/cpu/mtrr/amd.c b/arch/i386/kernel/cpu/mtrr/amd.c index 0949cdb..375752a 100644 --- a/arch/i386/kernel/cpu/mtrr/amd.c +++ b/arch/i386/kernel/cpu/mtrr/amd.c @@ -112,7 +112,7 @@ static struct mtrr_ops amd_mtrr_ops = { .have_wrcomb = positive_have_wrcomb, }; -int __init amd_init_mtrr(void) +int __cpuinit amd_init_mtrr(void) { set_mtrr_ops(_mtrr_ops); return 0; diff --git a/arch/i386/kernel/cpu/mtrr/centaur.c b/arch/i386/kernel/cpu/mtrr/centaur.c index cb9aa3a..8b61016 100644 --- a/arch/i386/kernel/cpu/mtrr/centaur.c +++ b/arch/i386/kernel/cpu/mtrr/centaur.c @@ -215,7 +215,7 @@ static struct mtrr_ops centaur_mtrr_ops = { .have_wrcomb = positive_have_wrcomb, }; -int __init centaur_init_mtrr(void) +int __cpuinit centaur_init_mtrr(void) { set_mtrr_ops(_mtrr_ops); return 0; diff --git a/arch/i386/kernel/cpu/mtrr/cyrix.c b/arch/i386/kernel/cpu/mtrr/cyrix.c index 0737a59..df38d8c 100644 --- a/arch/i386/kernel/cpu/mtrr/cyrix.c +++ b/arch/i386/kernel/cpu/mtrr/cyrix.c @@ -370,7 +370,7 @@ static struct mtrr_ops cyrix_mtrr_ops = { .have_wrcomb = positive_have_wrcomb, }; -int __init cyrix_init_mtrr(void) +int __cpuinit cyrix_init_mtrr(void) { set_mtrr_ops(_mtrr_ops); return 0; diff --git a/arch/i386/kernel/cpu/mtrr/generic.c b/arch/i386/kernel/cpu/mtrr/generic.c index f77fc53..fd97f84 100644 --- a/arch/i386/kernel/cpu/mtrr/generic.c +++ b/arch/i386/kernel/cpu/mtrr/generic.c @@ -30,14 +30,14 @@ static __initdata int mtrr_show; module_param_named(show, mtrr_show, bool, 0); /* Get the MSR pair relating to a var range */ -static void __init +static void __cpuinit get_mtrr_var_range(unsigned int index, struct mtrr_var_range *vr) { rdmsr(MTRRphysBase_MSR(index), vr->base_lo, vr->base_hi); rdmsr(MTRRphysMask_MSR(index), vr->mask_lo, vr->mask_hi); } -static void __init +static void __cpuinit get_fixed_ranges(mtrr_type * frs) { unsigned int *p = (unsigned int *) frs; @@ -60,7 +60,7 @@ static void __init print_fixed(unsigned base, unsigned step, const mtrr_type*typ } /* Grab all of the MTRR state for this CPU into *state */ -void __init get_mtrr_state(void) +void __cpuinit get_mtrr_state(void) { unsigned int i; struct mtrr_var_range *vrs; diff --git a/arch/i386/kernel/cpu/mtrr/main.c b/arch/i386/kernel/cpu/mtrr/main.c index 0acfb6a..cdbca55 100644 --- a/arch/i386/kernel/cpu/mtrr/main.c +++ b/arch/i386/kernel/cpu/mtrr/main.c @@ -103,7 +103,7 @@ static int have_wrcomb(void) } /* This function returns the number of variable MTRRs */ -static void __init set_num_var_ranges(void) +static void __cpuinit set_num_var_ranges(void) { unsigned long config = 0, dummy; @@ -116,7 +116,7 @@ static void __init set_num_var_ranges(void) num_var_ranges = config & 0xff; } -static void __init init_table(void) +static void __cpuinit init_table(void) { int i, max; @@ -571,7 +571,7 @@ extern void amd_init_mtrr(void); extern void cyrix_init_mtrr(void); extern void centaur_init_mtrr(void); -static void __init init_ifs(void) +static void __cpuinit init_ifs(void) { #ifndef CONFIG_X86_64 amd_init_mtrr(); @@ -639,7 +639,7 @@ static struct sysdev_driver mtrr_sysdev_driver = { * initialized (i.e. before smp_init()). * */ -void __init mtrr_bp_init(void) +void __cpuinit mtrr_bp_init(void) { init_ifs(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fix locking in mousedev
If a process is closing /dev/input/mice and an mouse disconnects simulta- neously, the system is likely to oops. This usually happens when someone hits F1 or logs out from X, and flips a KVM while the system is reacting. I reproduced the issue by running this: while true; do cat /dev/input/mice; done This way, it oopses on 2nd or 3rd disconnect reliably. With the patch, I can disconnect the mouse 20 times. Signed-off-by: Pete Zaitcev <[EMAIL PROTECTED]> --- Discussion One of the race scenarios is related to the list of handles. The cat calls mousedev_close -> mixdev_release, does list_for_each to walk for all handles for a given handler. Iterations are longish while it does input_close_device -> hidinput_close -> usbhid_close -> usb_kill_urb, which sleeps briefly. Into this gap goes khubd and does hid_disconnect -> hidinput_disconnect -> input_unregister_device. This corrupts the list of handles which cat process is walking. I was unable to devise a scheme to protect the stock h_list adequately, so I implemented a private list of mousedev instances, which can be protected correctly. Dmitry, please consider getting rid of the list of handles entirely. The other major user is drivers/char/keyboard.c. Other than that, the patch is straightforward. It adds a static mutex to guard common data structures. It has to be static because instances of mousedev share common structures, such as the mousedev_table[]. This should be uncontroversial, but please let me know if I missed something obvious. -- Pete diff --git a/drivers/input/mousedev.c b/drivers/input/mousedev.c index 664bcc8..2425c2a 100644 --- a/drivers/input/mousedev.c +++ b/drivers/input/mousedev.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -64,6 +65,7 @@ struct mousedev { char name[16]; wait_queue_head_t wait; struct list_head list; + struct list_head h_node; struct input_handle handle; struct mousedev_hw_data packet; @@ -108,10 +110,13 @@ static unsigned char mousedev_imps_seq[] = { 0xf3, 200, 0xf3, 100, 0xf3, 80 }; static unsigned char mousedev_imex_seq[] = { 0xf3, 200, 0xf3, 200, 0xf3, 80 }; static struct input_handler mousedev_handler; +static LIST_HEAD(mousedev_h_list); static struct mousedev *mousedev_table[MOUSEDEV_MINORS]; static struct mousedev mousedev_mix; +static DEFINE_MUTEX(mousedev_lock); + #define fx(i) (mousedev->old_x[(mousedev->pkt_count - (i)) & 03]) #define fy(i) (mousedev->old_y[(mousedev->pkt_count - (i)) & 03]) @@ -366,11 +371,9 @@ static void mousedev_free(struct mousedev *mousedev) static void mixdev_release(void) { - struct input_handle *handle; - - list_for_each_entry(handle, _handler.h_list, h_node) { - struct mousedev *mousedev = handle->private; + struct mousedev *mousedev; + list_for_each_entry(mousedev, _h_list, h_node) { if (!mousedev->open) { if (mousedev->exist) input_close_device(>handle); @@ -386,6 +389,7 @@ static int mousedev_release(struct inode * inode, struct file * file) mousedev_fasync(-1, file, 0); + mutex_lock(_lock); list_del(>node); if (!--list->mousedev->open) { @@ -398,6 +402,7 @@ static int mousedev_release(struct inode * inode, struct file * file) mousedev_free(list->mousedev); } } + mutex_unlock(_lock); kfree(list); return 0; @@ -406,7 +411,6 @@ static int mousedev_release(struct inode * inode, struct file * file) static int mousedev_open(struct inode * inode, struct file * file) { struct mousedev_list *list; - struct input_handle *handle; struct mousedev *mousedev; int i; @@ -417,11 +421,16 @@ static int mousedev_open(struct inode * inode, struct file * file) #endif i = iminor(inode) - MOUSEDEV_MINOR_BASE; - if (i >= MOUSEDEV_MINORS || !mousedev_table[i]) + mutex_lock(_lock); + if (i >= MOUSEDEV_MINORS || !mousedev_table[i]) { + mutex_unlock(_lock); return -ENODEV; + } - if (!(list = kzalloc(sizeof(struct mousedev_list), GFP_KERNEL))) + if (!(list = kzalloc(sizeof(struct mousedev_list), GFP_KERNEL))) { + mutex_unlock(_lock); return -ENOMEM; + } spin_lock_init(>packet_lock); list->pos_x = xres / 2; @@ -432,16 +441,16 @@ static int mousedev_open(struct inode * inode, struct file * file) if (!list->mousedev->open++) { if (list->mousedev->minor == MOUSEDEV_MIX) { - list_for_each_entry(handle, _handler.h_list, h_node) { - mousedev = handle->private; + list_for_each_entry(mousedev, _h_list, h_node) { if (!mousedev->open && mousedev->exist)
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Davide Libenzi wrote: struct async_syscall { unsigned long nr_sysc; unsigned long params[8]; long *result; }; And what would async_wait() return bak? Pointers to "struct async_syscall" or pointers to "result"? Either one has downsides. Pointer to struct async_syscall requires that the caller keep the struct around. Pointer to result requires that the caller always reserve a location for the result. Does the kernel care about the (possibly rare) case of callers that don't want to pay attention to result? If so, what about adding some kind of caller-specified handle to struct async_syscall, and having async_wait() return the handle? In the case where the caller does care about the result, the handle could just be the address of result. Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, 2007-02-28 at 10:51 -0800, Jean Tourrilhes wrote: > That's why I always specify the kernel version. I'll look into > that, I'm sure it's not the end of the world ;-) Sure, just wanted to point it out. > In which sense ? Wireless interface are regular netdevices. Yeah but in mac80211 we have the wiphy concept since multiple virtual interfaces can be associated to one hardware, and that is where QoS is done, not the netdevs. Of course, those interested can just listen to nl80211 events to figure out if someone renamed a 802.11 phy, but things like hal would probably not want to and still know about the name change. > I'm just trying to follow the established pattern. Both > class_device_add() and class_device_del() are generating the > event. Also, I'm not sure if other subsystem would benefit from it, I > don't want to generate too many useless events. I don't think many other subsystems (can) rename things ;) johannes signature.asc Description: This is a digitally signed message part
Re: [patch] Add insmod option to force the use of the backup timer.
On Wed, Feb 28, 2007 at 11:23:46AM +0100, Gerd Hoffmann wrote: > The test which automatically enables the backup timer on some HP > machines doesn't trigger on other hardware which needs the backup > timer too. Did you figure out *why* that test doesn't trigger? Making that work seems a better solution to me than adding magic options that users won't know they have to use. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 10:16:05AM +0100, Johannes Berg wrote: > Hi, > > > Patch for 2.6.20 is attached. > > ... and in the meantime netdevices aren't class_device any more :) IOW, > your patch isn't going to work any more. That's why I always specify the kernel version. I'll look into that, I'm sure it's not the end of the world ;-) > Also, I think wireless could benefit from this as well. In which sense ? Wireless interface are regular netdevices. > > The kobject framework is well designed, so adding these > > features is trivial change and won't run the risk of breaking anything > > (famous last words). Obviously, hotplug apps are free to ignore those > > additional features. > > Why not just add this to base kobject_rename instead? That way, > userspace is notified for all renames in sysfs. > The patch then collapses down to the change in net's sysfs code to add > the ifindex to the environment, and another change in kobject to invoke > a new event when a name changes and show the old name. I'm just trying to follow the established pattern. Both class_device_add() and class_device_del() are generating the event. Also, I'm not sure if other subsystem would benefit from it, I don't want to generate too many useless events. > johannes Thanks ! Jean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ecryptfs: check xattr operation support fix
On Wed, Feb 28, 2007 at 08:05:16PM +0300, Dmitriy Monakhov wrote: > - ecryptfs_write_inode_size_to_metadata() error code was ignored. > - i_op->setxattr() must be supported by lower fs because used below. > > Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> Acked-by: Michael Halcrow <[EMAIL PROTECTED]> > --- > fs/ecryptfs/inode.c |6 +++--- > fs/ecryptfs/mmap.c |3 ++- > 2 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c > index 27fd14a..9ccefad 100644 > --- a/fs/ecryptfs/inode.c > +++ b/fs/ecryptfs/inode.c > @@ -168,9 +168,9 @@ static int grow_file(struct dentry *ecryptfs_dentry, > struct file *lower_file, > goto out; > } > i_size_write(inode, 0); > - ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, inode, > - ecryptfs_dentry, > - ECRYPTFS_LOWER_I_MUTEX_NOT_HELD); > + rc = ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, > + inode, ecryptfs_dentry, > + ECRYPTFS_LOWER_I_MUTEX_NOT_HELD); > ecryptfs_inode_to_private(inode)->crypt_stat.flags |= ECRYPTFS_NEW_FILE; > out: > return rc; > diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c > index 1e5d2ba..416985f 100644 > --- a/fs/ecryptfs/mmap.c > +++ b/fs/ecryptfs/mmap.c > @@ -491,7 +491,8 @@ static int ecryptfs_write_inode_size_to_xattr(struct > inode *lower_inode, > goto out; > } > lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry); > - if (!lower_dentry->d_inode->i_op->getxattr) { > + if (!lower_dentry->d_inode->i_op->getxattr || > + !lower_dentry->d_inode->i_op->setxattr) { > printk(KERN_WARNING > "No support for setting xattr in lower filesystem\n"); > rc = -ENOSYS; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)
Dan Williams wrote: > I can reliably reproduce a null pointer dereference on 2.6.20 and > 2.6.21-rc2. I will keep digging to find the kernel version where this > last worked, but wanted to see if there were any immediate experiments I > should try. > > The failure is caused by running tiobench on a MD raid6 array with 6 out > of 8 disks available. The commands I issued to reproduce this are: > > mdadm -A /dev/md0 /dev/sd[bcdefg] > mount /dev/md0 /mnt/raid > tiobench --numruns 5 --size 2048 --dir /mnt/raid > > The filesystem is ext3. The controller is an LSI 1068. Here are the > two BUG messages first 2.6.21-rc2 followed by 2.6.20. I will reply to > this message with the config. > Kernel 2.6.20 on an i686 > > [ 177.299787] BUG: unable to handle kernel NULL pointer dereference at > virtual address 005c > [ 177.308526] printing eip: > [ 177.311287] c01de510 > [ 177.313521] *pde = 34d40001 > [ 177.316353] Oops: [#1] > [ 177.319202] SMP > [ 177.321107] Modules linked in: raid456 xor nfsd exportfs lockd nfs_acl > sunrpc autofs4 hidp l2cap bluetooth iptable_raw xt_policy xt_multiport > ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT > ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN > ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_tcpmss xt_pkttype xt_physdev > xt_NFQUEUE xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp > xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state > iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink > iptable_filter ip_tables x_tables video sbs i2c_ec dock button battery > asus_acpi ac radeon drm ipv6 lp parport_pc parport e1000 uhci_hcd floppy > mptsas mptscsih mptbase sg ehci_hcd scsi_transport_sas i2c_i801 i2c_core > pcspkr dm_snapshot dm_zero dm_mirror dm_mod ata_piix ata_generic libata > sd_mod scsi_mod ext3 jbd > [ 177.402252] CPU:2 > [ 177.402253] EIP:0060:[]Not tainted VLI > [ 177.402253] EFLAGS: 00210016 (2.6.20 #5) > [ 177.414194] EIP is at cfq_dispatch_insert+0xb/0x53 > [ 177.419056] eax: f7773ec0 ebx: ecx: f7773cc0 edx: > [ 177.425982] esi: f70abae0 edi: f7773cc0 ebp: esp: f34dbcbc > [ 177.432953] ds: 007b es: 007b ss: 0068 > [ 177.437127] Process tiotest (pid: 5405, ti=f34db000 task=f7efc030 > task.ti=f34db000) > [ 177.444763] Stack: 0049 f77d3b9c f7773cc0 c01de6ce c014041e > f8a26806 0082 > [ 177.453456]f7efc030 fffe22d6 0004 > f7efc030 f7773cc0 > [ 177.462121] f70abae0 f7cd5800 f70abae0 > c01d4fcc 0001 > [ 177.470798] Call Trace: > [ 177.473503] [] cfq_dispatch_requests+0x12d/0x466 > [ 177.479223] [] __lock_acquire+0x9e9/0xa72 > [ 177.484285] [] scsi_request_fn+0x286/0x336 [scsi_mod] > [ 177.490485] [] elv_next_request+0x1a2/0x1b2 > [ 177.495766] [] scsi_request_fn+0x286/0x336 [scsi_mod] > [ 177.501912] [] _spin_lock_irq+0x38/0x43 > [ 177.506840] [] scsi_request_fn+0x59/0x336 [scsi_mod] > [ 177.512981] [] blk_remove_plug+0x5a/0x66 > [ 177.517983] [] __generic_unplug_device+0x1d/0x1f > [ 177.523705] [] generic_unplug_device+0x15/0x21 > [ 177.529272] [] unplug_slaves+0x54/0x88 [raid456] > [ 177.535013] [] blk_backing_dev_unplug+0x73/0x7b > [ 177.540657] [] _spin_unlock_irqrestore+0x3e/0x4d > [ 177.546382] [] sync_page+0x0/0x3b > [ 177.550774] [] trace_hardirqs_on+0x12e/0x158 > [ 177.556108] [] sync_page+0x0/0x3b > [ 177.560471] [] block_sync_page+0x31/0x32 > [ 177.565449] [] sync_page+0x33/0x3b > [ 177.569916] [] __wait_on_bit_lock+0x2a/0x52 > [ 177.575201] [] __lock_page+0x58/0x5e > [ 177.579810] [] wake_bit_function+0x0/0x3c > [ 177.584905] [] do_generic_mapping_read+0x1db/0x44f > [ 177.590911] [] generic_file_aio_read+0x173/0x1a4 > [ 177.596617] [] file_read_actor+0x0/0xdb > [ 177.601525] [] do_sync_read+0xc7/0x10a > [ 177.606365] [] autoremove_wake_function+0x0/0x35 > [ 177.612130] [] do_sync_read+0x0/0x10a > [ 177.616867] [] vfs_read+0xa6/0x152 > [ 177.621362] [] sys_read+0x41/0x67 > [ 177.625794] [] syscall_call+0x7/0xb > [ 177.630403] === > [ 177.634031] Code: da 11 3b c0 c7 04 24 51 9d 39 c0 e8 c9 a1 f4 ff e8 ca 6e > f2 ff ff 4f 34 83 c4 18 5b 5e 5f 5d c3 55 57 56 89 c6 53 8b 40 0c 89 d3 <8b> > 7a 5c 8b 68 04 89 d0 e8 b5 fe ff ff 8b 43 14 89 da 25 01 80 > [ 177.654378] EIP: [] cfq_dispatch_insert+0xb/0x53 SS:ESP > 0068:f34dbcbc cfq_dispatch_requests() has called cfq_dispatch_insert() with a NULL second argument (struct request *rq) There are two patches for raid5/6 out there that might fix this. I'll attach them (the second just fixes a minor bug in the first one.) From: Neil Brown <[EMAIL PROTECTED]> On Sunday February 11, [EMAIL PROTECTED] wrote: > > Greetings, > > > > I've been running md on my server for some time now and a few days ago one > > of
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, 28 Feb 2007, Linus Torvalds wrote: > On Wed, 28 Feb 2007, Davide Libenzi wrote: > > > > Here we very much agree. The way I'd like it: > > > > struct async_syscall { > > unsigned long nr_sysc; > > unsigned long params[8]; > > long result; > > }; > > No, the "result" needs to go somewhere else. The caller may be totally > uninterested in keeping the system call number or parameters around until > the operation completes, but if you put them in the same structure with > the result, you obviously cannot sanely get rid of them. > > I also don't much like read-write interfaces (which the above would be: > the kernel would read most of the structure, and then write one member of > the structure). > > It's entirely possible, for example, that the operation we submit is some > legacy "aio_read()", which has soem other structure layout than the new > one (but one field will be the result code). Ok, makes sense. Something like this then? struct async_syscall { unsigned long nr_sysc; unsigned long params[8]; long *result; }; And what would async_wait() return bak? Pointers to "struct async_syscall" or pointers to "result"? - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote: > On 28-02-2007 02:27, Jean Tourrilhes wrote: > > Hi all, > ... > > Patch for 2.6.20 is attached. The patch was tested on a system > > running the hotplug scripts, and on another system running udev. > > > > Have fun... > > > > Jean > > > > Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]> > > > > - > ... > > diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c > > --- linux/net/core/net-sysfs.j1.c 2007-02-27 15:01:08.0 -0800 > > +++ linux/net/core/net-sysfs.c 2007-02-27 15:06:49.0 -0800 > > @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de > > if ((size <= 0) || (i >= num_envp)) > > return -ENOMEM; > > > > + /* pass ifindex to uevent. > > +* ifindex is useful as it won't change (interface name may change) > > +* and is what RtNetlink uses natively. */ > > + envp[i++] = buf; > > + n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1; > > + buf += n; > > + size -= n; > > + > > + if ((size <= 0) || (i >= num_envp)) > > Btw.: > 1. if size == 10 and snprintf returns 9 (without NULL) >then n == 10 (with NULL), so isn't it enough (here and above): > > if ((size < 0) || (i >= num_envp)) I just cut'n'pasted the code a few line above. If the original code is incorrect, it need fixing. And it will need fixing in probably a lot of places. > 2. shouldn't there be (here and above): > > envp[--i] = NULL; > No, envp is local, so who cares. Thanks. Jean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, 28 Feb 2007, Davide Libenzi wrote: > > Here we very much agree. The way I'd like it: > > struct async_syscall { > unsigned long nr_sysc; > unsigned long params[8]; > long result; > }; No, the "result" needs to go somewhere else. The caller may be totally uninterested in keeping the system call number or parameters around until the operation completes, but if you put them in the same structure with the result, you obviously cannot sanely get rid of them. I also don't much like read-write interfaces (which the above would be: the kernel would read most of the structure, and then write one member of the structure). It's entirely possible, for example, that the operation we submit is some legacy "aio_read()", which has soem other structure layout than the new one (but one field will be the result code). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 07:36:17AM -0800, Greg KH wrote: > On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote: > > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c > > --- linux/drivers/base/class.j1.c 2007-02-26 18:38:10.0 -0800 > > +++ linux/drivers/base/class.c 2007-02-27 15:52:37.0 -0800 > > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev > > This function is not in the 2.6.21-rc2 kernel, so you might want to > rework this patch a bit :) It was a trial balloon to gather feedback. I will do. > Also, it's userspace that causes the rename to happen, so it knows it > did it, why should the kernel have to emit a message to tell userspace > again what just happened? Username is not one big program, but a collection of program, and one program does not know what another program do. In particular, udev does not know when people are using iproute2 to rename interface and loose its marbles. We don't really want to ban iproute2 or udev ;-) > thanks, > > greg k-h Have fun... Jean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with freezable workqueues
On Wednesday, 28 February 2007 19:17, Gautham R Shenoy wrote: > On Wed, Feb 28, 2007 at 08:37:26AM +0530, Srivatsa Vaddagiri wrote: > > > > Hmm ..good point. So can we assume that disable/enable_nonboot_cpus() are > > called > > with processes frozen already? > > > > Gautham, you need to take this into account in your patchset! > > Yup. That would mean making the freezer reentrant since we will > be freezing twice (once for suspend and later on for hotplug). This is > ok since the api in my patches looks like > freeze_processes(int freeze_event); > > But thaw will be interesting. If we are thawing for hotplug, we gotta > only thaw processes which were frozen *only* for hotplug. > > Rafael, does that mean more status flags?! Well, I don't really think so, but we need to store some information in the freezer (eg. in a status variable). Namely, we can define a variable, say tasks_frozen, the value of which will be the bitwise or of the flags SPE_SUSPEND, SPE_HOTPLUG etc. In a fully functional system, tasks_frozen is equal to zero. If freeze_processes(SPE_SUSPEND) is run, it does tasks_frozen |= SPE_SUSPEND and analogously for SPE_HOTPLUG etc. If tasks_frozen is equal to SPE_SUSPEND|SPE_HOTPLUG, for example, and thaw_tasks(SPE_HOTPLUG) runs, it only thaws the tasks that need not stay frozen for the suspend and does tasks_frozen &= ~SPE_SUSPEND etc. I think something like this should work. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/