Re: [Bugme-new] [Bug 8100] New: dynticks makes ksoftirqd1 use unreasonable amount of cpu time

2007-02-28 Thread Andrew Morton
On Wed, 28 Feb 2007 09:34:10 -0800
[EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8100
> 
>Summary: dynticks makes ksoftirqd1 use unreasonable amount of cpu
> time
> Kernel Version: 2.6.21-rc2
> Status: NEW
>   Severity: low
>  Owner: [EMAIL PROTECTED]
>  Submitter: [EMAIL PROTECTED]
> 
> 
> Most recent kernel where this bug did *NOT* occur:
> any kernel without dynticks
> 
> Distribution:
> Debian etch with linux-2.6.21-rc{2,1}
> 
> Hardware Environment: 
> Macbook core2 with bios emulation
> 
> Software Environment:
> The problem is obvious when listening to shoutcast stream with kmplayer and 
> artsd via wi-fi with  wpa (wpa_supplicant)
> 
> Problem Description:
> ksoftirqd1 uses ~30% cpu-time (by top) no other symptoms, while
> without dyntikcs cpu-load in similar circumstances is negligible.
> This might be a dynticks feature rather than bug.
> 
> Steps to reproduce:
> Just watch the top, if the bug is reproducible, probably just booting should 
> suffice.
> 
> --- You are receiving this mail because: ---
> You are on the CC list for the bug, or are watching someone who is.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc1: known regressions (v2) (part 2)

2007-02-28 Thread Con Kolivas
On Wednesday 28 February 2007 15:21, Mike Galbraith wrote:
> (hrmph.  having to copy/paste/try again.  evolution seems to be broken..
> RCPT TO <[EMAIL PROTECTED]> failed: Cannot resolve your domain
> {mp049} ..caused me to be unable to send despite receipts being disabled)

Apologies for mangling the email address as I said :-(

> On Wed, 2007-02-28 at 09:58 +1100, Con Kolivas wrote:
> > On Tuesday 27 February 2007 19:54, Mike Galbraith wrote:
> > > Agreed.
> > >
> > > I was recently looking at that spot because I found that niced tasks
> > > were taking latency hits, and disabled it, which helped a bunch.
> >
> > Ok... as I said above to Ingo, nice means more latency too, and there is
> > no doubt that if we disable nice as a working feature then the niced
> > tasks will have less latency. Of course, this ends up meaning that
> > un-niced tasks no longer receive their better cpu performance..  You're
> > basically saying that you prefer nice not to work in the setting of HT.
>
> No I'm not, but let's go further in that direction just for the sake of
> argument.  You're then saying that you prefer realtime priorities to not
> work in the HT setting, given that realtime tasks don't participate in
> the 'single stream me' program.

Where do I say that? I do not presume to manage realtime priorities in any 
way. You're turning my argument about nice levels around and somehow saying 
that because hyperthreading breaks the single stream me semantics by 
parallelising them that I would want to stop that happening. Nowhere have I 
argued that realtime semantics should be changed to somehow work around 
hyperthreading. SMT nice is about managing nice only, and not realtime 
priorities which are independent entities.

> I'm saying only that we're defeating the purpose of HT, and overriding a
> user decision every time we force a sibling idle.
>
> > > I also
> > > can't understand why it would be OK to interleave a normal task with an
> > > RT task sometimes, but not others.. that's meaningless to the RT task.
> >
> > Clearly there would be a reason that code is there... The whole problem
> > with HT is that as soon as you load up a sibling, you slow down the
> > logical sibling, hence why this code is there in the first place. Since I
> > know you're one to test things for yourself, I will put it to you this
> > way:
> >
> > Boot into UP. Time how long it takes to do a million of these in a real
> > time task:
> >  asm volatile("" : : : "memory");
> >
> > Then start up a SCHED_NORMAL task fully cpu bound such as "yes >
> > /dev/null" and time that again. Obviously the former being a realtime
> > task will take the same amount of time and the SCHED_NORMAL task will be
> > starved until the realtime task finishes.
>
> Sure.
>
> > Now try the same experiment with hyperthreading enabled and an ordinary
> > SMP kernel. You'll find the realtime task runs at only ~60% performance.
>
> So?  User asked for HT.  That's hardware multiplexing. It ain't free.
> Buyer beware.

But the buyer is not aware. You are aware because you tinker, but the vast 
majority of users who enable hyperthreading in their shiny pcs are not aware. 
The only thing they know is that if they enable hyperthreading their programs 
run slower in multitasking environments no matter how much they nice the 
other processes. Buyers do not buy hardware knowing that the internal design 
breaks something as fundamental as 'nice'. You seem to presume that most 
people who get hyperthreading are happy to compromise 'nice' in order to get 
their second core working and I put it to you that they do not make that 
decision.

> >  That's a
> > serious performance hit for realtime tasks considering you're running a
> > SCHED_NORMAL task. The SMT code that you seem to dislike fixes this
> > problem.
>
> I don't think it does actually. Let your RT task sleep regularly, and
> ever so briefly.  We don't evict lower priority tasks from siblings upon
> wakeup, we only prevent entry... sometimes.

Well you know as well as I do that you're selecting out the exception rather 
than the rule, and statistically overall, it does work.

> > The reason for interleaving is that there are a few cycles to be gained
> > by using the second core for a separate SCHED_NORMAL task, and you don't
> > want to disable access to the second core entirely for the duration the
> > realtime task is running. Since there is no simple relationship between
> > SCHED_NORMAL timeslices and realtime timeslices, we have to use some form
> > of interleaving based on the expected extra cycles and HZ is the obvious
> > choice.
>
> To me, the reason for interleaving is solely about keeping the core
> busy .  It has nothing to do with SCHED_POLICY_X what so ever.
>
> > > IMHO, SMT scheduling should be a buyer beware thing.  Maximizing your
> > > core utilization comes at a price, but so does disabling it, so I think
> > > letting the user decide what he wants is the right thing to do.
> >
> > To me this is 

Re: [PATCH] SLUB The unqueued slab allocator V3

2007-02-28 Thread David Miller
From: Christoph Lameter <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 11:20:44 -0800 (PST)

> V2->V3
> - Debugging and diagnostic support. This is runtime enabled and not compile
>   time enabled. Runtime debugging can be controlled via kernel boot options
>   on an individual slab cache basis or globally.
> - Slab Trace support (For individual slab caches).
> - Resiliency support: If basic sanity checks are enabled (via F f.e.)
>   (boot option) then SLUB will do the best to perform diagnostics and
>   then continue (i.e. mark corrupted objects as used).
> - Fix up numerous issues including clash of SLUBs use of page
>   flags with i386 arch use for pmd and pgds (which are managed
>   as slab caches, sigh).
> - Dynamic per CPU array sizing.
> - Explain SLUB slabcache flags

V3 doesn't boot successfully on sparc64, sorry I don't have the
ability to track this down at the moment since it resets the
machine right as the video device is initialized and after diffing
V2 to V3 there is way too much stuff changing for me to try and
"bisect" between V2 to V3 to find the guilty sub-change.

Maybe if you managed your individual changes in GIT or similar
this could be debugged very quickly. :-)

Meanwhile I noticed that your alignment algorithm is different
than SLAB's.  And I think this is important for the page table
SLABs that some platforms use.

No matter what flags are specified, SLAB gives at least the
passed in alignment specified in kmem_cache_create().  That
logic in slab is here:

/* 3) caller mandated alignment */
if (ralign < align) {
ralign = align;
}

Whereas SLUB uses the CPU cacheline size when the MUSTALIGN
flag is set.  Architectures do things like:

pgtable_cache = kmem_cache_create("pgtable_cache",
  PAGE_SIZE, PAGE_SIZE,
  SLAB_HWCACHE_ALIGN |
  SLAB_MUST_HWCACHE_ALIGN,
  zero_ctor,
  NULL);

to get a PAGE_SIZE aligned slab, SLUB doesn't give the same
behavior SLAB does in this case.

Arguably SLAB_HWCACHE_ALIGN and SLAB_MUST_HWCACHE_ALIGN should
not be set here, but SLUBs change in semantics in this area
could cause similar grief in other areas, an audit is probably
in order.

The above example was from sparc64, but x86 does the same thing
as probably do other platforms which use SLAB for pagetables.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 12/12] syslets: x86_64: add syslet/threadlet support

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

add the arch specific bits of syslet/threadlet support to x86_64.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86_64/Kconfig|4 ++
 arch/x86_64/ia32/ia32entry.S   |   20 ++-
 arch/x86_64/kernel/entry.S |   72 -
 arch/x86_64/kernel/process.c   |   11 ++
 include/asm-x86_64/processor.h |   16 +
 include/asm-x86_64/system.h|   12 ++
 include/asm-x86_64/unistd.h|   29 +++-
 7 files changed, 160 insertions(+), 4 deletions(-)

Index: linux/arch/x86_64/Kconfig
===
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -36,6 +36,10 @@ config ZONE_DMA32
bool
default y
 
+config ASYNC_SUPPORT
+   bool
+   default y
+
 config LOCKDEP_SUPPORT
bool
default y
Index: linux/arch/x86_64/ia32/ia32entry.S
===
--- linux.orig/arch/x86_64/ia32/ia32entry.S
+++ linux/arch/x86_64/ia32/ia32entry.S
@@ -368,6 +368,14 @@ quiet_ni_syscall:
PTREGSCALL stub32_vfork, sys_vfork, %rdi
PTREGSCALL stub32_iopl, sys_iopl, %rsi
PTREGSCALL stub32_rt_sigsuspend, sys_rt_sigsuspend, %rdx
+   /*
+* sys_async_thread() and sys_async_exec() both take 2 parameters,
+* none of which is ptregs - but the syscalls rely on being able to
+* modify ptregs. So we put ptregs into the 3rd parameter - so it's
+* unused and it also does not mess up the first 2 parameters:
+*/
+   PTREGSCALL stub32_compat_async_exec, compat_sys_async_exec, %rdx
+   PTREGSCALL stub32_compat_async_thread, sys_async_thread, %rdx
 
 ENTRY(ia32_ptregs_common)
popq %r11
@@ -394,6 +402,9 @@ END(ia32_ptregs_common)
 
.section .rodata,"a"
.align 8
+.globl compat_sys_call_table
+compat_sys_call_table:
+.globl ia32_sys_call_table
 ia32_sys_call_table:
.quad sys_restart_syscall
.quad sys_exit
@@ -714,9 +725,16 @@ ia32_sys_call_table:
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
.quad sys_epoll_pwait
+   .quad stub32_compat_async_exec  /* 320 */
+   .quad sys_async_wait
+   .quad sys_umem_add
+   .quad stub32_compat_async_thread
+   .quad sys_threadlet_on
+   .quad sys_threadlet_off /* 325 */
+.globl ia32_syscall_end
 ia32_syscall_end:  
Index: linux/arch/x86_64/kernel/entry.S
===
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -410,6 +410,14 @@ END(\label)
PTREGSCALL stub_rt_sigsuspend, sys_rt_sigsuspend, %rdx
PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
PTREGSCALL stub_iopl, sys_iopl, %rsi
+   /*
+* sys_async_thread() and sys_async_exec() both take 2 parameters,
+* none of which is ptregs - but the syscalls rely on being able to
+* modify ptregs. So we put ptregs into the 3rd parameter - so it's
+* unused and it also does not mess up the first 2 parameters:
+*/
+   PTREGSCALL stub_async_thread, sys_async_thread, %rdx
+   PTREGSCALL stub_async_exec, sys_async_exec, %rdx
 
 ENTRY(ptregscall_common)
popq %r11
@@ -430,7 +438,7 @@ ENTRY(ptregscall_common)
ret
CFI_ENDPROC
 END(ptregscall_common)
-   
+
 ENTRY(stub_execve)
CFI_STARTPROC
popq %r11
@@ -990,6 +998,68 @@ child_rip:
 ENDPROC(child_rip)
 
 /*
+ * Create an async kernel thread.
+ *
+ * C extern interface:
+ * extern long create_async_thread(int (*fn)(void *), void * arg, unsigned 
long flags)
+ *
+ * asm input arguments:
+ * rdi: fn, rsi: arg, rdx: flags
+ */
+ENTRY(create_async_thread)
+   CFI_STARTPROC
+   FAKE_STACK_FRAME $async_child_rip
+   SAVE_ALL
+
+   # rdi: flags, rsi: usp, rdx: will be _regs
+   movq %rdx,%rdi
+   movq $-1, %rsi
+   movq %rsp, %rdx
+
+   xorl %r8d,%r8d
+   xorl %r9d,%r9d
+
+   # clone now
+   call do_fork
+   movq %rax,RAX(%rsp)
+   xorl %edi,%edi
+
+   /*
+* It isn't worth to check for reschedule here,
+* so internally to the x86_64 port you can rely on kernel_thread()
+* not to reschedule the child before returning, this avoids the need
+* of hacks for example to fork off the per-CPU idle tasks.
+ * [Hopefully no generic code relies on the reschedule -AK]
+*/
+   RESTORE_ALL
+   UNFAKE_STACK_FRAME
+   ret
+   CFI_ENDPROC
+ENDPROC(async_kernel_thread)
+
+async_child_rip:
+   CFI_STARTPROC
+
+   movq %rdi, %rax
+   movq %rsi, %rdi
+   call 

solved Re: 2.6.20 SATA error

2007-02-28 Thread Gerhard Mack
On Wed, 28 Feb 2007, Charles Shannon Hendrix wrote:

> On Wed, 28 Feb 2007 13:25:00 -0500 (EST)
> Gerhard Mack <[EMAIL PROTECTED]> wrote:
> 
>  
> > > In another thread, I think they were saying it was either a SATA chipset
> > > driver bug, or a problem in libata core.
> > 
> > I also have an nforce4.
> 
> On another mailing list, someone with an Intel chipset is reporting the same
> problem, and also that others without nforce chipsets are seeing it.

I was reaching inside my computer to check something and heared the thing 
click and got the same error message.

Turns out the adaptor that goes between SATA drive and the old style power 
connector was loose on the drive side.  Doesn't seem to me like it was 
very snug fitting to begin with.  I changed it to one of the proper SATA 
connectors comming off the power supply and it doesn't do that anymore.

Sorry for the false alarm, 

Gerhard

--
Gerhard Mack

[EMAIL PROTECTED]

<>< As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fix implicit declaration in nv_backlight.

2007-02-28 Thread Antonino A. Daplas


> On Wed, Feb 28, 2007 at 12:36:25PM -0500, Dave Jones wrote:
>> +#ifdef __powerpc__
>
> Is __powerpc__ defined when cross compiling? I'd rather use
> CONFIG_PMAC_BACKLIGHT instead of it.

Agree with this too.

Tony



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 09/12] syslets: x86, mark async unsafe syscalls

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

mark clone() and fork() as not available for async execution.
Both need an intact user context beneath them to work.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 arch/i386/kernel/ioport.c  |6 ++
 arch/i386/kernel/ldt.c |3 +++
 arch/i386/kernel/process.c |6 ++
 arch/i386/kernel/vm86.c|6 ++
 4 files changed, 21 insertions(+)

Index: linux/arch/i386/kernel/ioport.c
===
--- linux.orig/arch/i386/kernel/ioport.c
+++ linux/arch/i386/kernel/ioport.c
@@ -62,6 +62,9 @@ asmlinkage long sys_ioperm(unsigned long
struct tss_struct * tss;
unsigned long *bitmap;
 
+   if (async_syscall(current))
+   return -ENOSYS;
+
if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
return -EINVAL;
if (turn_on && !capable(CAP_SYS_RAWIO))
@@ -139,6 +142,9 @@ asmlinkage long sys_iopl(unsigned long u
unsigned int old = (regs->eflags >> 12) & 3;
struct thread_struct *t = >thread;
 
+   if (async_syscall(current))
+   return -ENOSYS;
+
if (level > 3)
return -EINVAL;
/* Trying to gain more privileges? */
Index: linux/arch/i386/kernel/ldt.c
===
--- linux.orig/arch/i386/kernel/ldt.c
+++ linux/arch/i386/kernel/ldt.c
@@ -233,6 +233,9 @@ asmlinkage int sys_modify_ldt(int func, 
 {
int ret = -ENOSYS;
 
+   if (async_syscall(current))
+   return -ENOSYS;
+
switch (func) {
case 0:
ret = read_ldt(ptr, bytecount);
Index: linux/arch/i386/kernel/process.c
===
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -750,6 +750,9 @@ struct task_struct fastcall * __switch_t
 
 asmlinkage int sys_fork(struct pt_regs regs)
 {
+   if (async_syscall(current))
+   return -ENOSYS;
+
return do_fork(SIGCHLD, regs.esp, , 0, NULL, NULL);
 }
 
@@ -759,6 +762,9 @@ asmlinkage int sys_clone(struct pt_regs 
unsigned long newsp;
int __user *parent_tidptr, *child_tidptr;
 
+   if (async_syscall(current))
+   return -ENOSYS;
+
clone_flags = regs.ebx;
newsp = regs.ecx;
parent_tidptr = (int __user *)regs.edx;
Index: linux/arch/i386/kernel/vm86.c
===
--- linux.orig/arch/i386/kernel/vm86.c
+++ linux/arch/i386/kernel/vm86.c
@@ -209,6 +209,9 @@ asmlinkage int sys_vm86old(struct pt_reg
struct task_struct *tsk;
int tmp, ret = -EPERM;
 
+   if (async_syscall(current))
+   return -ENOSYS;
+
tsk = current;
if (tsk->thread.saved_esp0)
goto out;
@@ -239,6 +242,9 @@ asmlinkage int sys_vm86(struct pt_regs r
int tmp, ret;
struct vm86plus_struct __user *v86;
 
+   if (async_syscall(current))
+   return -ENOSYS;
+
tsk = current;
switch (regs.ebx) {
case VM86_REQUEST_IRQ:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 06/12] x86: split FPU state from task state

2007-02-28 Thread Ingo Molnar
From: Arjan van de Ven <[EMAIL PROTECTED]>

Split the FPU save area from the task struct. This allows easy migration
of FPU context, and it's generally cleaner. It also allows the following
two (future) optimizations:

1) allocate the right size for the actual cpu rather than 512 bytes always
2) only allocate when the application actually uses FPU, so in the first
   lazy FPU trap. This could save memory for non-fpu using apps.

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/i386/kernel/i387.c|   96 -
 arch/i386/kernel/process.c |   56 +++
 arch/i386/kernel/traps.c   |   10 
 include/asm-i386/i387.h|6 +-
 include/asm-i386/processor.h   |6 ++
 include/asm-i386/thread_info.h |6 ++
 kernel/fork.c  |7 ++
 7 files changed, 123 insertions(+), 64 deletions(-)

Index: linux/arch/i386/kernel/i387.c
===
--- linux.orig/arch/i386/kernel/i387.c
+++ linux/arch/i386/kernel/i387.c
@@ -31,9 +31,9 @@ void mxcsr_feature_mask_init(void)
unsigned long mask = 0;
clts();
if (cpu_has_fxsr) {
-   memset(>thread.i387.fxsave, 0, sizeof(struct 
i387_fxsave_struct));
-   asm volatile("fxsave %0" : : "m" 
(current->thread.i387.fxsave)); 
-   mask = current->thread.i387.fxsave.mxcsr_mask;
+   memset(>thread.i387->fxsave, 0, sizeof(struct 
i387_fxsave_struct));
+   asm volatile("fxsave %0" : : "m" 
(current->thread.i387->fxsave));
+   mask = current->thread.i387->fxsave.mxcsr_mask;
if (mask == 0) mask = 0xffbf;
} 
mxcsr_feature_mask &= mask;
@@ -49,16 +49,16 @@ void mxcsr_feature_mask_init(void)
 void init_fpu(struct task_struct *tsk)
 {
if (cpu_has_fxsr) {
-   memset(>thread.i387.fxsave, 0, sizeof(struct 
i387_fxsave_struct));
-   tsk->thread.i387.fxsave.cwd = 0x37f;
+   memset(>thread.i387->fxsave, 0, sizeof(struct 
i387_fxsave_struct));
+   tsk->thread.i387->fxsave.cwd = 0x37f;
if (cpu_has_xmm)
-   tsk->thread.i387.fxsave.mxcsr = 0x1f80;
+   tsk->thread.i387->fxsave.mxcsr = 0x1f80;
} else {
-   memset(>thread.i387.fsave, 0, sizeof(struct 
i387_fsave_struct));
-   tsk->thread.i387.fsave.cwd = 0x037fu;
-   tsk->thread.i387.fsave.swd = 0xu;
-   tsk->thread.i387.fsave.twd = 0xu;
-   tsk->thread.i387.fsave.fos = 0xu;
+   memset(>thread.i387->fsave, 0, sizeof(struct 
i387_fsave_struct));
+   tsk->thread.i387->fsave.cwd = 0x037fu;
+   tsk->thread.i387->fsave.swd = 0xu;
+   tsk->thread.i387->fsave.twd = 0xu;
+   tsk->thread.i387->fsave.fos = 0xu;
}
/* only the device not available exception or ptrace can call init_fpu 
*/
set_stopped_child_used_math(tsk);
@@ -152,18 +152,18 @@ static inline unsigned long twd_fxsr_to_
 unsigned short get_fpu_cwd( struct task_struct *tsk )
 {
if ( cpu_has_fxsr ) {
-   return tsk->thread.i387.fxsave.cwd;
+   return tsk->thread.i387->fxsave.cwd;
} else {
-   return (unsigned short)tsk->thread.i387.fsave.cwd;
+   return (unsigned short)tsk->thread.i387->fsave.cwd;
}
 }
 
 unsigned short get_fpu_swd( struct task_struct *tsk )
 {
if ( cpu_has_fxsr ) {
-   return tsk->thread.i387.fxsave.swd;
+   return tsk->thread.i387->fxsave.swd;
} else {
-   return (unsigned short)tsk->thread.i387.fsave.swd;
+   return (unsigned short)tsk->thread.i387->fsave.swd;
}
 }
 
@@ -171,9 +171,9 @@ unsigned short get_fpu_swd( struct task_
 unsigned short get_fpu_twd( struct task_struct *tsk )
 {
if ( cpu_has_fxsr ) {
-   return tsk->thread.i387.fxsave.twd;
+   return tsk->thread.i387->fxsave.twd;
} else {
-   return (unsigned short)tsk->thread.i387.fsave.twd;
+   return (unsigned short)tsk->thread.i387->fsave.twd;
}
 }
 #endif  /*  0  */
@@ -181,7 +181,7 @@ unsigned short get_fpu_twd( struct task_
 unsigned short get_fpu_mxcsr( struct task_struct *tsk )
 {
if ( cpu_has_xmm ) {
-   return tsk->thread.i387.fxsave.mxcsr;
+   return tsk->thread.i387->fxsave.mxcsr;
} else {
return 0x1f80;
}
@@ -192,27 +192,27 @@ unsigned short get_fpu_mxcsr( struct tas
 void set_fpu_cwd( struct task_struct *tsk, unsigned short cwd )
 {
if ( cpu_has_fxsr ) {
-   tsk->thread.i387.fxsave.cwd = cwd;
+   tsk->thread.i387->fxsave.cwd = cwd;
} else {
-   

[patch 11/12] syslets: x86, wire up the syslet system calls

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

wire up the new syslet / async system call syscalls and make it
thus available to user-space.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 arch/i386/kernel/syscall_table.S |6 ++
 include/asm-i386/unistd.h|8 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

Index: linux/arch/i386/kernel/syscall_table.S
===
--- linux.orig/arch/i386/kernel/syscall_table.S
+++ linux/arch/i386/kernel/syscall_table.S
@@ -319,3 +319,9 @@ ENTRY(sys_call_table)
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+   .long sys_async_exec/* 320 */
+   .long sys_async_wait
+   .long sys_umem_add
+   .long sys_async_thread
+   .long sys_threadlet_on
+   .long sys_threadlet_off /* 325 */
Index: linux/include/asm-i386/unistd.h
===
--- linux.orig/include/asm-i386/unistd.h
+++ linux/include/asm-i386/unistd.h
@@ -327,10 +327,16 @@
 #define __NR_move_pages317
 #define __NR_getcpu318
 #define __NR_epoll_pwait   319
+#define __NR_async_exec320
+#define __NR_async_wait321
+#define __NR_umem_add  322
+#define __NR_async_thread  323
+#define __NR_threadlet_on  324
+#define __NR_threadlet_off 325
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 320
+#define NR_syscalls 326
 
 #ifndef __ASSEMBLY__
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 10/12] syslets: x86: enable ASYNC_SUPPORT

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

enable CONFIG_ASYNC_SUPPORT on x86.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 arch/i386/Kconfig |4 
 1 file changed, 4 insertions(+)

Index: linux/arch/i386/Kconfig
===
--- linux.orig/arch/i386/Kconfig
+++ linux/arch/i386/Kconfig
@@ -55,6 +55,10 @@ config ZONE_DMA
bool
default y
 
+config ASYNC_SUPPORT
+   bool
+   default y
+
 config SBUS
bool
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 07/12] syslets: x86, add create_async_thread() method

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

add the create_async_thread() way of creating kernel threads:
these threads first execute a kernel function and when they
return from it they execute user-space.

An architecture must implement this interface before it can turn
CONFIG_ASYNC_SUPPORT on.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 arch/i386/kernel/entry.S |   25 +
 arch/i386/kernel/process.c   |   31 +++
 include/asm-i386/processor.h |   17 +
 include/asm-i386/unistd.h|   10 ++
 4 files changed, 83 insertions(+)

Index: linux/arch/i386/kernel/entry.S
===
--- linux.orig/arch/i386/kernel/entry.S
+++ linux/arch/i386/kernel/entry.S
@@ -1034,6 +1034,31 @@ ENTRY(kernel_thread_helper)
CFI_ENDPROC
 ENDPROC(kernel_thread_helper)
 
+ENTRY(async_thread_helper)
+   CFI_STARTPROC
+   /*
+* Allocate space on the stack for pt-regs.
+* sizeof(struct pt_regs) == 64, and we've got 8 bytes on the
+* kernel stack already:
+*/
+   subl $64-8, %esp
+   CFI_ADJUST_CFA_OFFSET 64-8
+   movl %edx,%eax
+   push %edx
+   CFI_ADJUST_CFA_OFFSET 4
+   call *%ebx
+   addl $4, %esp
+   CFI_ADJUST_CFA_OFFSET -4
+
+   movl %eax, PT_EAX(%esp)
+
+   GET_THREAD_INFO(%ebp)
+
+   jmp syscall_exit
+   CFI_ENDPROC
+ENDPROC(async_thread_helper)
+
+
 .section .rodata,"a"
 #include "syscall_table.S"
 
Index: linux/arch/i386/kernel/process.c
===
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -355,6 +355,37 @@ int kernel_thread(int (*fn)(void *), voi
 EXPORT_SYMBOL(kernel_thread);
 
 /*
+ * This gets run with %ebx containing the
+ * function to call, and %edx containing
+ * the "args".
+ */
+extern void async_thread_helper(void);
+
+/*
+ * Create an async thread
+ */
+int create_async_thread(long (*fn)(void *), void * arg, unsigned long flags)
+{
+   struct pt_regs regs;
+
+   memset(, 0, sizeof(regs));
+
+   regs.ebx = (unsigned long) fn;
+   regs.edx = (unsigned long) arg;
+
+   regs.xds = __USER_DS;
+   regs.xes = __USER_DS;
+   regs.xfs = __KERNEL_PDA;
+   regs.orig_eax = -1;
+   regs.eip = (unsigned long) async_thread_helper;
+   regs.xcs = __KERNEL_CS | get_kernel_rpl();
+   regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;
+
+   /* Ok, create the new task.. */
+   return do_fork(flags, 0, , 0, NULL, NULL);
+}
+
+/*
  * Free current thread data structures etc..
  */
 void exit_thread(void)
Index: linux/include/asm-i386/processor.h
===
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -472,6 +472,11 @@ extern void prepare_to_copy(struct task_
  */
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
+/*
+ * create an async thread:
+ */
+extern int create_async_thread(long (*fn)(void *), void * arg, unsigned long 
flags);
+
 extern unsigned long thread_saved_pc(struct task_struct *tsk);
 void show_trace(struct task_struct *task, struct pt_regs *regs, unsigned long 
*stack);
 
@@ -504,6 +509,18 @@ unsigned long get_wchan(struct task_stru
 #define KSTK_EIP(task) (task_pt_regs(task)->eip)
 #define KSTK_ESP(task) (task_pt_regs(task)->esp)
 
+/*
+ * Register access methods for async syscall support.
+ *
+ * Note, task_stack_reg() must not be an lvalue, hence this macro:
+ */
+#define task_stack_reg(t)  \
+   ({ unsigned long __esp = task_pt_regs(t)->esp; __esp; })
+#define set_task_stack_reg(t, new_stack)   \
+   do { task_pt_regs(t)->esp = (new_stack); } while (0)
+#define task_ip_reg(t) task_pt_regs(t)->eip
+#define task_ret_reg(t)task_pt_regs(t)->eax
+
 
 struct microcode_header {
unsigned int hdrver;
Index: linux/include/asm-i386/unistd.h
===
--- linux.orig/include/asm-i386/unistd.h
+++ linux/include/asm-i386/unistd.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_I386_UNISTD_H_
 #define _ASM_I386_UNISTD_H_
 
+#include 
+
 /*
  * This file contains the system call numbers.
  */
@@ -330,6 +332,14 @@
 
 #define NR_syscalls 320
 
+#ifndef __ASSEMBLY__
+
+typedef asmlinkage long (*syscall_fn_t)(long, long, long, long, long, long);
+
+extern syscall_fn_t sys_call_table[NR_syscalls];
+
+#endif
+
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
 #define __ARCH_WANT_OLD_STAT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[patch 08/12] syslets: x86, add move_user_context() method

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

add the move_user_context() method to move the user-space
context of one kernel thread to another kernel thread.
User-space might notice the changed TID, but execution,
stack and register contents (general purpose and FPU) are
still the same.

An architecture must implement this interface before it can turn
CONFIG_ASYNC_SUPPORT on.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 arch/i386/kernel/process.c |   21 +
 include/asm-i386/system.h  |7 +++
 2 files changed, 28 insertions(+)

Index: linux/arch/i386/kernel/process.c
===
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -839,6 +839,27 @@ unsigned long get_wchan(struct task_stru
 }
 
 /*
+ * Move user-space context from one kernel thread to another.
+ * This includes registers and FPU state. Callers must make
+ * sure that neither task is running user context at the moment:
+ */
+void
+move_user_context(struct task_struct *new_task, struct task_struct *old_task)
+{
+   struct pt_regs *old_regs = task_pt_regs(old_task);
+   struct pt_regs *new_regs = task_pt_regs(new_task);
+   union i387_union *tmp;
+
+   *new_regs = *old_regs;
+   /*
+* Flip around the FPU state too:
+*/
+   tmp = new_task->thread.i387;
+   new_task->thread.i387 = old_task->thread.i387;
+   old_task->thread.i387 = tmp;
+}
+
+/*
  * sys_alloc_thread_area: get a yet unused TLS descriptor index.
  */
 static int get_free_idx(void)
Index: linux/include/asm-i386/system.h
===
--- linux.orig/include/asm-i386/system.h
+++ linux/include/asm-i386/system.h
@@ -33,6 +33,13 @@ extern struct task_struct * FASTCALL(__s
  "2" (prev), "d" (next));  \
 } while (0)
 
+/*
+ * Move user-space context from one kernel thread to another.
+ * This includes registers and FPU state for now:
+ */
+extern void
+move_user_context(struct task_struct *new_task, struct task_struct *old_task);
+
 #define _set_base(addr,base) do { unsigned long __pr; \
 __asm__ __volatile__ ("movw %%dx,%1\n\t" \
"rorl $16,%%edx\n\t" \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 04/12] syslets: core code

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

the core syslet / async system calls infrastructure code.

Is built only if CONFIG_ASYNC_SUPPORT is enabled.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 kernel/Makefile |1 
 kernel/async.c  |  989 
 2 files changed, 990 insertions(+)

Index: linux/kernel/Makefile
===
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -10,6 +10,7 @@ obj-y = sched.o fork.o exec_domain.o
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o latency.o nsproxy.o srcu.o
 
+obj-$(CONFIG_ASYNC_SUPPORT) += async.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
Index: linux/kernel/async.c
===
--- /dev/null
+++ linux/kernel/async.c
@@ -0,0 +1,989 @@
+/*
+ * kernel/async.c
+ *
+ * The syslet and threadlet subsystem - asynchronous syscall and
+ * user-space code execution support.
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <[EMAIL PROTECTED]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This code implements asynchronous syscalls via 'syslets'.
+ *
+ * Syslets consist of a set of 'syslet atoms' which are residing
+ * purely in user-space memory and have no kernel-space resource
+ * attached to them. These atoms can be linked to each other via
+ * pointers. Besides the fundamental ability to execute system
+ * calls, syslet atoms can also implement branches, loops and
+ * arithmetics.
+ *
+ * Thus syslets can be used to build small autonomous programs that
+ * the kernel can execute purely from kernel-space, without having
+ * to return to any user-space context. Syslets can be run by any
+ * unprivileged user-space application - they are executed safely
+ * by the kernel.
+ *
+ * "Threadlets" are the user-space equivalent of syslets: small
+ * functions of execution that user-space attempts/expects to execute
+ * without scheduling. If the threadlet nevertheless blocks, the kernel
+ * creates a real thread from it, and that thread is put aside sleeping.
+ * The 'head' context (the context that never blocks) returns to the
+ * original function that called the threadlet. Once the sleeping thread
+ * wakes up again (after it got for whatever it was waiting - IO, timeout,
+ * etc.) the function continues executing asynchronously, as a thread.
+ * A user-space completion ring connects these asynchronous function calls
+ * back to the head context.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/*
+ * An async 'cachemiss context' is either busy, or it is ready.
+ * If it is ready, the 'head' might switch its user-space context
+ * to that ready thread anytime - so that if the ex-head blocks,
+ * one ready thread can become the next head and can continue to
+ * execute user-space code.
+ */
+static void
+__mark_async_thread_ready(struct async_thread *at, struct async_head *ah)
+{
+   list_del(>entry);
+   list_add_tail(>entry, >ready_async_threads);
+   if (list_empty(>busy_async_threads))
+   wake_up(>wait);
+}
+
+static void
+mark_async_thread_ready(struct async_thread *at, struct async_head *ah)
+{
+   spin_lock(>lock);
+   __mark_async_thread_ready(at, ah);
+   spin_unlock(>lock);
+}
+
+static void
+__mark_async_thread_busy(struct async_thread *at, struct async_head *ah)
+{
+   list_del(>entry);
+   list_add_tail(>entry, >busy_async_threads);
+}
+
+static void
+mark_async_thread_busy(struct async_thread *at, struct async_head *ah)
+{
+   spin_lock(>lock);
+   __mark_async_thread_busy(at, ah);
+   spin_unlock(>lock);
+}
+
+static void
+__async_thread_init(struct task_struct *t, struct async_thread *at,
+   struct async_head *ah)
+{
+   INIT_LIST_HEAD(>entry);
+   at->exit = 0;
+   at->task = t;
+   at->ah = ah;
+
+   t->at = at;
+}
+
+static void
+async_thread_init(struct task_struct *t, struct async_thread *at,
+ struct async_head *ah)
+{
+   spin_lock(>lock);
+   __async_thread_init(t, at, ah);
+   __mark_async_thread_ready(at, ah);
+   spin_unlock(>lock);
+}
+
+static void
+async_thread_exit(struct async_thread *at, struct task_struct *t)
+{
+   struct async_head *ah = at->ah;
+
+   spin_lock(>lock);
+   list_del_init(>entry);
+   if (at->exit)
+   complete(>exit_done);
+   t->at = NULL;
+   at->task = NULL;
+   spin_unlock(>lock);
+}
+
+static struct async_thread *
+pick_ready_cachemiss_thread(struct async_head *ah)
+{
+   struct list_head *head = >ready_async_threads;
+
+   if (list_empty(head))
+   return NULL;
+
+   return 

[patch 05/12] syslets: core, documentation

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

Add Documentation/syslet-design.txt with a high-level description
of the syslet concepts.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 Documentation/syslet-design.txt |  137 
 1 file changed, 137 insertions(+)

Index: linux/Documentation/syslet-design.txt
===
--- /dev/null
+++ linux/Documentation/syslet-design.txt
@@ -0,0 +1,137 @@
+Syslets / asynchronous system calls
+===
+
+started by Ingo Molnar <[EMAIL PROTECTED]>
+
+Goal:
+-
+
+The goal of the syslet subsystem is to allow user-space to execute
+arbitrary system calls asynchronously. It does so by allowing user-space
+to execute "syslets" which are small scriptlets that the kernel can execute
+both securely and asynchronously without having to exit to user-space.
+
+the core syslet concepts are:
+
+The Syslet Atom:
+
+
+The syslet atom is a small, fixed-size (44 bytes on 32-bit) piece of
+user-space memory, which is the basic unit of execution within the syslet
+framework. A syslet represents a single system-call and its arguments.
+In addition it also has condition flags attached to it that allows the
+construction of larger programs (syslets) from these atoms.
+
+Arguments to the system call are implemented via pointers to arguments.
+This not only increases the flexibility of syslet atoms (multiple syslets
+can share the same variable for example), but is also an optimization:
+copy_uatom() will only fetch syscall parameters up until the point it
+meets the first NULL pointer. 50% of all syscalls have 2 or less
+parameters (and 90% of all syscalls have 4 or less parameters).
+
+ [ Note: since the argument array is at the end of the atom, and the
+   kernel will not touch any argument beyond the first NULL one, atoms
+   might be packed more tightly. (the only special case exception to
+   this rule would be SKIP_TO_NEXT_ON_STOP atoms, where the kernel will
+   jump a full syslet_uatom number of bytes.) ]
+
+The Syslet:
+---
+
+A syslet is a program, represented by a graph of syslet atoms. The
+syslet atoms are chained to each other either via the atom->next pointer,
+or via the SYSLET_SKIP_TO_NEXT_ON_STOP flag.
+
+Running Syslets:
+
+
+Syslets can be run via the sys_async_exec() system call, which takes
+the first atom of the syslet as an argument. The kernel does not need
+to be told about the other atoms - it will fetch them on the fly as
+execution goes forward.
+
+A syslet might either be executed 'cached', or it might generate a
+'cachemiss'.
+
+'Cached' syslet execution means that the whole syslet was executed
+without blocking. The system-call returns the submitted atom's address
+in this case.
+
+If a syslet blocks while the kernel executes a system-call embedded in
+one of its atoms, the kernel will keep working on that syscall in
+parallel, but it immediately returns to user-space with a NULL pointer,
+so the submitting task can submit other syslets.
+
+Completion of asynchronous syslets:
+---
+
+Completion of asynchronous syslets is done via the 'completion ring',
+which is a ringbuffer of syslet atom pointers in user-space memory,
+provided by user-space as an argument to the sys_async_exec() syscall.
+The kernel fills in the ringbuffer starting at index 0, and user-space
+must clear out these pointers. Once the kernel reaches the end of
+the ring it wraps back to index 0. The kernel will not overwrite
+non-NULL pointers (but will return an error), and thus user-space has
+to make sure it completes all events it asked for.
+
+Waiting for completions:
+
+
+Syslet completions can be waited for via the sys_async_wait()
+system call - which takes the number of events it should wait for as
+a parameter. This system call will also return if the number of
+pending events goes down to zero.
+
+Sample Hello World syslet code:
+
+--->
+/*
+ * Set up a syslet atom:
+ */
+static void
+init_atom(struct syslet_uatom *atom, int nr,
+ void *arg_ptr0, void *arg_ptr1, void *arg_ptr2,
+ void *arg_ptr3, void *arg_ptr4, void *arg_ptr5,
+ void *ret_ptr, unsigned long flags, struct syslet_uatom *next)
+{
+   atom->nr = nr;
+   atom->arg_ptr[0] = arg_ptr0;
+   atom->arg_ptr[1] = arg_ptr1;
+   atom->arg_ptr[2] = arg_ptr2;
+   atom->arg_ptr[3] = arg_ptr3;
+   atom->arg_ptr[4] = arg_ptr4;
+   atom->arg_ptr[5] = arg_ptr5;
+   atom->ret_ptr = ret_ptr;
+   atom->flags = flags;
+   atom->next = next;
+}
+
+int main(int argc, char *argv[])
+{
+   unsigned long int fd_out = 1; /* standard output */
+   char *buf = "Hello Syslet World!\n";
+   unsigned long size = strlen(buf);
+   struct syslet_uatom atom, *done;
+
+   

[patch 02/12] syslets: add syslet.h include file, user API/ABI definitions

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

add include/linux/syslet.h which contains the user-space API/ABI
declarations. Add the new header to include/linux/Kbuild as well.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 include/linux/Kbuild   |1 
 include/linux/syslet.h |  155 +
 2 files changed, 156 insertions(+)

Index: linux/include/linux/Kbuild
===
--- linux.orig/include/linux/Kbuild
+++ linux/include/linux/Kbuild
@@ -141,6 +141,7 @@ header-y += sockios.h
 header-y += som.h
 header-y += sound.h
 header-y += synclink.h
+header-y += syslet.h
 header-y += telephony.h
 header-y += termios.h
 header-y += ticable.h
Index: linux/include/linux/syslet.h
===
--- /dev/null
+++ linux/include/linux/syslet.h
@@ -0,0 +1,155 @@
+#ifndef _LINUX_SYSLET_H
+#define _LINUX_SYSLET_H
+/*
+ * The syslet subsystem - asynchronous syscall execution support.
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <[EMAIL PROTECTED]>
+ *
+ * User-space API/ABI definitions:
+ */
+
+#ifndef __user
+# define __user
+#endif
+
+/*
+ * This is the 'Syslet Atom' - the basic unit of execution
+ * within the syslet framework. A syslet always represents
+ * a single system-call plus its arguments, plus has conditions
+ * attached to it that allows the construction of larger
+ * programs from these atoms. User-space variables can be used
+ * (for example a loop index) via the special sys_umem*() syscalls.
+ *
+ * Arguments are implemented via pointers to arguments. This not
+ * only increases the flexibility of syslet atoms (multiple syslets
+ * can share the same variable for example), but is also an
+ * optimization: copy_uatom() will only fetch syscall parameters
+ * up until the point it meets the first NULL pointer. 50% of all
+ * syscalls have 2 or less parameters (and 90% of all syscalls have
+ * 4 or less parameters).
+ *
+ * [ Note: since the argument array is at the end of the atom, and the
+ *   kernel will not touch any argument beyond the final NULL one, atoms
+ *   might be packed more tightly. (the only special case exception to
+ *   this rule would be SKIP_TO_NEXT_ON_STOP atoms, where the kernel will
+ *   jump a full syslet_uatom number of bytes.) ]
+ */
+struct syslet_uatom {
+   u32 flags;
+   u32 nr;
+   u64 ret_ptr;
+   u64 next;
+   u64 arg_ptr[6];
+   /*
+* User-space can put anything in here, kernel will not
+* touch it:
+*/
+   u64 private;
+};
+
+/*
+ * Flags to modify/control syslet atom behavior:
+ */
+
+/*
+ * Immediately queue this syslet asynchronously - do not even
+ * attempt to execute it synchronously in the user context:
+ */
+#define SYSLET_ASYNC   0x0001
+
+/*
+ * Never queue this syslet asynchronously - even if synchronous
+ * execution causes a context-switching:
+ */
+#define SYSLET_SYNC0x0002
+
+/*
+ * Do not queue the syslet in the completion ring when done.
+ *
+ * ( the default is that the final atom of a syslet is queued
+ *   in the completion ring. )
+ *
+ * Some syscalls generate implicit completion events of their
+ * own.
+ */
+#define SYSLET_NO_COMPLETE 0x0004
+
+/*
+ * Execution control: conditions upon the return code
+ * of the just executed syslet atom. 'Stop' means syslet
+ * execution is stopped and the atom is put into the
+ * completion ring:
+ */
+#define SYSLET_STOP_ON_NONZERO 0x0008
+#define SYSLET_STOP_ON_ZERO0x0010
+#define SYSLET_STOP_ON_NEGATIVE0x0020
+#define SYSLET_STOP_ON_NON_POSITIVE0x0040
+
+#define SYSLET_STOP_MASK   \
+   (   SYSLET_STOP_ON_NONZERO  |   \
+   SYSLET_STOP_ON_ZERO |   \
+   SYSLET_STOP_ON_NEGATIVE |   \
+   SYSLET_STOP_ON_NON_POSITIVE )
+
+/*
+ * Special modifier to 'stop' handling: instead of stopping the
+ * execution of the syslet, the linearly next syslet is executed.
+ * (Normal execution flows along atom->next, and execution stops
+ *  if atom->next is NULL or a stop condition becomes true.)
+ *
+ * This is what allows true branches of execution within syslets.
+ */
+#define SYSLET_SKIP_TO_NEXT_ON_STOP0x0080
+
+/*
+ * This is the (per-user-context) descriptor of the async completion
+ * ring. This gets passed in to sys_async_exec():
+ */
+struct async_head_user {
+   /*
+* Current completion ring index - managed by the kernel:
+*/
+   u64 kernel_ring_idx;
+   /*
+* User-side ring index:
+*/
+   u64  

[patch 03/12] syslets: generic kernel bits

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

add the kernel generic bits - these are present even if !CONFIG_ASYNC_SUPPORT.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 fs/exec.c |4 
 include/linux/sched.h |   23 ++-
 kernel/capability.c   |3 +++
 kernel/exit.c |7 +++
 kernel/fork.c |5 +
 kernel/sched.c|9 +
 kernel/sys.c  |   36 
 7 files changed, 86 insertions(+), 1 deletion(-)

Index: linux/fs/exec.c
===
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -1444,6 +1444,10 @@ static int coredump_wait(int exit_code)
tsk->vfork_done = NULL;
complete(vfork_done);
}
+   /*
+* Make sure we exit our async context before waiting:
+*/
+   async_exit(tsk);
 
if (core_waiters)
wait_for_completion(_done);
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -83,12 +83,12 @@ struct sched_param {
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 struct exec_domain;
 struct futex_pi_state;
-
 /*
  * List of flags we want to share for kernel threads,
  * if only because they are not used by them anyway.
@@ -997,6 +997,12 @@ struct task_struct {
 /* journalling filesystem info */
void *journal_info;
 
+/* async syscall support: */
+   struct async_thread *at, *async_ready;
+   struct async_head *ah;
+   struct async_thread __at;
+   struct async_head __ah;
+
 /* VM state */
struct reclaim_state *reclaim_state;
 
@@ -1055,6 +1061,21 @@ struct task_struct {
 #endif
 };
 
+/*
+ * Is an async syscall being executed currently?
+ */
+#ifdef CONFIG_ASYNC_SUPPORT
+static inline int async_syscall(struct task_struct *t)
+{
+   return t->async_ready != NULL;
+}
+#else /* !CONFIG_ASYNC_SUPPORT */
+static inline int async_syscall(struct task_struct *t)
+{
+   return 0;
+}
+#endif /* !CONFIG_ASYNC_SUPPORT */
+
 static inline pid_t process_group(struct task_struct *tsk)
 {
return tsk->signal->pgrp;
Index: linux/kernel/capability.c
===
--- linux.orig/kernel/capability.c
+++ linux/kernel/capability.c
@@ -178,6 +178,9 @@ asmlinkage long sys_capset(cap_user_head
  int ret;
  pid_t pid;
 
+ if (async_syscall(current))
+ return -ENOSYS;
+
  if (get_user(version, >version))
 return -EFAULT; 
 
Index: linux/kernel/exit.c
===
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -890,6 +891,12 @@ fastcall NORET_TYPE void do_exit(long co
schedule();
}
 
+   /*
+* Note: async threads have to exit their context before the MM
+* exit (due to the coredumping wait):
+*/
+   async_exit(tsk);
+
tsk->flags |= PF_EXITING;
 
if (unlikely(in_atomic()))
Index: linux/kernel/fork.c
===
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1056,6 +1057,7 @@ static struct task_struct *copy_process(
 
p->lock_depth = -1; /* -1 = no lock */
do_posix_clock_monotonic_gettime(>start_time);
+   async_init(p);
p->security = NULL;
p->io_context = NULL;
p->io_wait = NULL;
@@ -1623,6 +1625,9 @@ asmlinkage long sys_unshare(unsigned lon
struct uts_namespace *uts, *new_uts = NULL;
struct ipc_namespace *ipc, *new_ipc = NULL;
 
+   if (async_syscall(current))
+   return -ENOSYS;
+
check_unshare_flags(_flags);
 
/* Return -EINVAL for all unsupported flags */
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3455,6 +3456,14 @@ asmlinkage void __sched schedule(void)
}
profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
+   prev = current;
+   if (unlikely(prev->async_ready)) {
+   if (prev->state && !(preempt_count() & PREEMPT_ACTIVE) &&
+   (!(prev->state & TASK_INTERRUPTIBLE) ||
+   !signal_pending(prev)))
+   __async_schedule(prev);
+   }
+
 need_resched:
preempt_disable();
prev = current;
Index: linux/kernel/sys.c

[patch 01/12] syslets: add async.h include file, kernel-side API definitions

2007-02-28 Thread Ingo Molnar
From: Ingo Molnar <[EMAIL PROTECTED]>

add include/linux/async.h which contains the kernel-side API
declarations.

it also provides NOP stubs for the !CONFIG_ASYNC_SUPPORT case.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
---
 include/linux/async.h |   88 ++
 1 file changed, 88 insertions(+)

Index: linux/include/linux/async.h
===
--- /dev/null
+++ linux/include/linux/async.h
@@ -0,0 +1,88 @@
+#ifndef _LINUX_ASYNC_H
+#define _LINUX_ASYNC_H
+
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * The syslet subsystem - asynchronous syscall execution support.
+ *
+ * Syslet-subsystem internal definitions:
+ */
+
+/*
+ * The kernel-side copy of a syslet atom - with arguments expanded:
+ */
+struct syslet_atom {
+   unsigned long   flags;
+   unsigned long   nr;
+   long __user *ret_ptr;
+   struct syslet_uatom __user  *next;
+   unsigned long   args[6];
+   syscall_fn_t*call_table;
+   unsigned intnr_syscalls;
+};
+
+/*
+ * The 'async head' is the thread which has user-space context (ptregs)
+ * 'below it' - this is the one that can return to user-space:
+ */
+struct async_head {
+   spinlock_t  lock;
+   struct task_struct  *user_task;
+
+   struct list_headready_async_threads;
+   struct list_headbusy_async_threads;
+
+   struct mutexcompletion_lock;
+   longevents_left;
+   wait_queue_head_t   wait;
+
+   struct async_head_user  __user  *ahu;
+
+   unsigned long   __user  *new_stackp;
+   unsigned long   new_ip;
+   unsigned long   restore_stack;
+   unsigned long   restore_ip;
+   struct completion   start_done;
+   struct completion   exit_done;
+};
+
+/*
+ * The 'async thread' is either a newly created async thread or it is
+ * an 'ex-head' - it cannot return to user-space and only has kernel
+ * context.
+ */
+struct async_thread {
+   struct task_struct  *task;
+   unsigned long   user_stack;
+   unsigned long   user_ip;
+   struct async_head   *ah;
+
+   struct list_headentry;
+
+   unsigned intexit;
+};
+
+/*
+ * Generic kernel API definitions:
+ */
+#ifdef CONFIG_ASYNC_SUPPORT
+extern void async_init(struct task_struct *t);
+extern void async_exit(struct task_struct *t);
+extern void __async_schedule(struct task_struct *t);
+#else /* !CONFIG_ASYNC_SUPPORT */
+static inline void async_init(struct task_struct *t)
+{
+}
+static inline void async_exit(struct task_struct *t)
+{
+}
+static inline void __async_schedule(struct task_struct *t)
+{
+}
+#endif /* !CONFIG_ASYNC_SUPPORT */
+
+#endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 00/12] Syslets, Threadlets, generic AIO support, v5

2007-02-28 Thread Ingo Molnar

this is the v5 release of the syslet/threadlet subsystem:

   http://redhat.com/~mingo/syslet-patches/

this release took 4 days to get out, but there were a couple of key 
changes that needed some time to settle down:

 - ported the code from v2.6.20 to current -git (v2.6.20-rc2 should be 
   fine as a base)

 - 64-bit support in terms of a x86_64 port. Jens has updated the FIO
   syslet code to work on 64-bit too. (kernel/async.c was pretty 64-bit
   clean already, it needed minimal changes for basic x86_64 support.)

 - 32-bit user-space on 64-bit kernel compat support. 32-bit syslet and
   threadlet binaries work fine on 64-bit kernels.

 - various cleanups and simplifications

the v4->v5 delta is:

 17 files changed, 327 insertions(+), 271 deletions(-)

amongst the plans for v6 are cleanups/simplifications to the syslet 
engine API, a number of suggestions have been made for that already.

the linecount increase in v5 is mostly due to the x86_64 port. The ABI 
had to change again - see the async-test userspace code for details.

the x86_64 patch is a bit monolithic at the moment, i'll split it up 
further in v6.

As always, comments, suggestions, reports are welcome!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Davide Libenzi
On Wed, 28 Feb 2007, Ingo Molnar wrote:

> * Davide Libenzi  wrote:
> 
> > Did you hide all the complexity of the userspace atom decoding inside 
> > another function? :)
> 
> no, i made the 64-bit and 32-bit structures layout-compatible. This 
> makes the 32-bit structure as large as the 64-bit ones, but that's not a 
> big issue, compared to the simplifications it brings.

Do you have a new version to review?



> > > But i'm happy to change the syslet API in any sane way, and did so 
> > > based on feedback from Jens who is actually using them.
> > 
> > Wouldn't you agree on a simple/parallel execution engine [...]
> 
> the thing is, there's almost zero overhead from having those basic 
> things like conditions and the ->next link, and they make it so much 
> more capable. As usual my biggest problem is that you are not trying to 
> use syslets at all - you are only trying to get rid of them ;-) My 
> purpose with syslets is to enable a syslet to do almost anything that 
> user-space could do too, as simply as possible. Syslets could even 
> allocate user-space memory and then use it (i dont think we actually 
> want to do that though). That doesnt mean arbitrary complex code 
> /should/ be done via syslets, or that it wont be significantly slower 
> than what user-space can do, but i'd not like to artificially dumb the 
> engine down. I'm totally willing to simplify/shrink the vectoring of 
> arguments and just about anything else, but your proposals so far (such 
> as your return-value-embedded-in-atom suggestion) all kill important 
> aspects of the engine.

Ok, we're past the error code in the atom, as Linus pointed out ;)
How about this, with async_wait returning asynid's back to a userspace 
ring buffer?

struct syslet_utaom {
long *result;
unsigned long asynid;
unsigned long nr_sysc;
unsigned long params[8];
};

My problem with the syslets in their current form is, do we have a real 
use for them that justify the extra complexity inside the kernel? Or with 
a simple/parellel async submission, coupled with threadlets, we can cover 
a pretty broad range of real life use cases?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fully honor vdso_enabled [i386, sh; x86_64?]

2007-02-28 Thread Andrew Morton
On Wed, 28 Feb 2007 18:11:11 +0900
Paul Mundt <[EMAIL PROTECTED]> wrote:

> On Thu, Feb 22, 2007 at 12:31:20PM -0800, John Reiser wrote:
> > This patch changes arch_setup_additonal_pages() to honor vdso_enabled.
> > For i386 it also allows the option of a fixed addresss to avoid
> > fragmenting the address space.  Compiles and runs on i386.
> > x86_64 [IA32 support] and sh maintainers also please comment.
> > 
> We didn't actually have the sysctl entry wired up on SH, but once that's
> done, this patch works fine there too.
> 
> Andrew, do you want a separate patch for the vdso_enabled sysctl or
> is it more convenient through my git tree?
> 

If it's an sh-only thing then through your tree is fine, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: shut up vm86(2)

2007-02-28 Thread Alexey Dobriyan
>From originally rate-limited printk, to just printk, to current version.
Everybody had enough time to learn about vm86(2) absense.
Also remove possibility of dmesg spamming.

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 arch/x86_64/ia32/ia32entry.S |4 ++--
 arch/x86_64/ia32/sys_ia32.c  |   12 
 2 files changed, 2 insertions(+), 14 deletions(-)

--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -512,7 +512,7 @@ #endif
.quad stub32_iopl   /* 110 */
.quad sys_vhangup
.quad quiet_ni_syscall  /* old "idle" system call */
-   .quad sys32_vm86_warning/* vm86old */ 
+   .quad quiet_ni_syscall  /* vm86old */ 
.quad compat_sys_wait4
.quad sys_swapoff   /* 115 */
.quad compat_sys_sysinfo
@@ -565,7 +565,7 @@ #endif
.quad sys_mremap
.quad sys_setresuid16
.quad sys_getresuid16   /* 165 */
-   .quad sys32_vm86_warning/* vm86 */ 
+   .quad quiet_ni_syscall  /* vm86 */ 
.quad quiet_ni_syscall  /* query_module */
.quad sys_poll
.quad compat_sys_nfsservctl
--- a/arch/x86_64/ia32/sys_ia32.c
+++ b/arch/x86_64/ia32/sys_ia32.c
@@ -842,18 +842,6 @@ long sys32_fadvise64_64(int fd, __u32 of
   advice); 
 } 
 
-long sys32_vm86_warning(void)
-{ 
-   struct task_struct *me = current;
-   static char lastcomm[sizeof(me->comm)];
-   if (strncmp(lastcomm, me->comm, sizeof(lastcomm))) {
-   compat_printk(KERN_INFO "%s: vm86 mode not supported on 64 bit 
kernel\n",
-  me->comm);
-   strncpy(lastcomm, me->comm, sizeof(lastcomm));
-   } 
-   return -ENOSYS;
-} 
-
 long sys32_lookup_dcookie(u32 addr_low, u32 addr_high,
  char __user * buf, size_t len)
 {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc1: known regressions (v2) (part 1)

2007-02-28 Thread Michael S. Tsirkin
>Quoting Thomas Gleixner <[EMAIL PROTECTED]>:
>Subject: Re: 2.6.21-rc1: known regressions (v2) (part 1)
>
>On Wed, 2007-02-28 at 23:13 +0200, Michael S. Tsirkin wrote:
>> >Subject: ThinkPad T60: no screen after suspend to RAM
>> >References : http://lkml.org/lkml/2007/2/22/391
>> >Submitter  : Michael S. Tsirkin <[EMAIL PROTECTED]>
>> >Handled-By : Ingo Molnar <[EMAIL PROTECTED]>
>> >Status : unknown
>> 
>> Just reproduced this in -rc2.
>> Another thing I noticed:
>> with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to 
>> RAM.
>> 
>> On 2.6.21-rc2, after resume (when the box is accessible from network),
>> pressing Fn/F4 again does not seem to have any effect.
>
>Can you please get the dmesg output after resume via the network ?

The link above has it.

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_sil problems with recent kernels

2007-02-28 Thread Dale Blount
On Tue, 2007-02-27 at 13:54 -0500, Dale Blount wrote:
> On Fri, 2007-02-23 at 12:00 -0500, Dale Blount wrote:
> > Hi,
> > 
> > Excuse me if this has been covered or fixed, I couldn't find anything in
> > the archives.
> > 
> > I upgraded from 2.6.11.7 to 2.6.20.1 today and found all the drives
> > connected to 2 brands of sata_sil sata controllers not working.  The
> > drives are also (now) of various brands, Maxtor 300GB and 500GB
> > Seagates.

For the archives, the fix is documented here:

http://article.gmane.org/gmane.linux.ide/16304

Dale

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-02-28 Thread Pavel Machek
Hi!

> > Well I had an idea after looking at k8temp -- why not make it default to
> > doing only reads from the sensor?  You'd only get information from whatever
> > core/sensor combination that ACPI had last used, but it would be safe.
> 
> ACPI is broken here, not k8temp, so let's fix ACPI instead. ACPI
> doesn't conflict with only k8temp, but with virtually all hardware
> monitoring drivers, all I2C/SMBus drivers, and probably other types of
> drivers too. We just can't restrict or blacklist all these drivers
> because ACPI misbehaves.

Oops, sorry about that but no, that will not work.

There's piece of paper, called ACPI specification, and we are
following it.

Bug is not in our implementation.

Bug is in the ACPI specs... it does not explicitely allow you to go
out and bitbang i2c, and you do it, and you get problems.

Now, you may try to change specs to be hwmon-friendly... good luck.

But currently hw manufacturers follow ACPI specs, so we have to follow
it, too; bad luck for hwmon. BIOS hiding smbus from you is good hint
you are doing something wrong...?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] [patch 00/21] 2.6.19-stable review

2007-02-28 Thread Greg KH
On Wed, Feb 28, 2007 at 05:28:27AM -0700, Eric W. Biederman wrote:
> 
> What are the rules that are supposed to govern backports to stable
> trees these days anyway?

Documentation/stable_kernel_rules.txt

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fbdev driver for S3 Trio/Virge, updated

2007-02-28 Thread Antonino A. Daplas
On Wed, 2007-02-28 at 16:53 +, James Simmons wrote:
> > On Thu, 2007-02-22 at 00:53 +, James Simmons wrote:
> > > > > +/* image data is MSB-first, fb structure is MSB-first too */
> > > > > +static inline u32 expand_color(u32 c)
> > > > > +{
> > > > > + return ((c & 1) | ((c & 2) << 7) | ((c & 4) << 14) | ((c & 8) 
> > > > > << 21)) * 0xFF;
> > > > > +}
> > > > > +
> > > > > +/* s3fb_iplan_imageblit silently assumes that almost everything is 
> > > > > 8-pixel aligned */
> > > > 
> > > > Hmn, same thing with vga16fb... Perhaps we should bring back the
> > > > fontwidth flag of 2.2 and 2.4 kernels.
> > > 
> > > Ug no. It is possible to get 12,6 bit width fonts working with vga 
> > > interleaved planes. I got it paritally working but never got back to it.
> > > Its in my queue of this to do. Now that I finished the display class I 
> > > need to get around to makeing drm/fbdev work together :-)
> > > 
> > 
> > Of course, not fontwidth exactly, but to allow the driver to specify the
> > alignment of the blit engine, in this case 8 pixels. I do believe X also
> > has similar functionality to compensate for the limitation of the
> > hardware.
> 
> Isn't scan_align in pixmap for this? Or do we need more.

No, scan_align is how much to pad each line, and it's up to the engine
to discard the padding.  In this case, the hardware does not allow
padding and must be given data in exact multiples. For example, vesafb
can accept 4x4 fonts padded to 8x4, but vga16fb will not be able to draw
4x4 fonts properly.

Tony

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Ingo Molnar

* Davide Libenzi  wrote:

> On Wed, 28 Feb 2007, Ingo Molnar wrote:
> 
> > 
> > * Davide Libenzi  wrote:
> > 
> > > My point is, the syslet infrastructure is expensive for the kernel in 
> > > terms of compat, [...]
> > 
> > it is not. Today i've implemented 64-bit syslets on x86_64 and 
> > 32-bit-on-64-bit compat syslets. Both the 64-bit and the 32-bit syslet 
> > (and threadlet) binaries work just fine on a 64-bit kernel, and they 
> > share 99% of the infrastructure. There's only a single #ifdef 
> > CONFIG_COMPAT in kernel/async.c:
> > 
> > #ifdef CONFIG_COMPAT
> > 
> > asmlinkage struct syslet_uatom __user *
> > compat_sys_async_exec(struct syslet_uatom __user *uatom,
> >   struct async_head_user __user *ahu)
> > {
> > return __sys_async_exec(uatom, ahu, _sys_call_table,
> > compat_NR_syscalls);
> > }
> > 
> > #endif
> 
> Did you hide all the complexity of the userspace atom decoding inside 
> another function? :)

no, i made the 64-bit and 32-bit structures layout-compatible. This 
makes the 32-bit structure as large as the 64-bit ones, but that's not a 
big issue, compared to the simplifications it brings.

> > But i'm happy to change the syslet API in any sane way, and did so 
> > based on feedback from Jens who is actually using them.
> 
> Wouldn't you agree on a simple/parallel execution engine [...]

the thing is, there's almost zero overhead from having those basic 
things like conditions and the ->next link, and they make it so much 
more capable. As usual my biggest problem is that you are not trying to 
use syslets at all - you are only trying to get rid of them ;-) My 
purpose with syslets is to enable a syslet to do almost anything that 
user-space could do too, as simply as possible. Syslets could even 
allocate user-space memory and then use it (i dont think we actually 
want to do that though). That doesnt mean arbitrary complex code 
/should/ be done via syslets, or that it wont be significantly slower 
than what user-space can do, but i'd not like to artificially dumb the 
engine down. I'm totally willing to simplify/shrink the vectoring of 
arguments and just about anything else, but your proposals so far (such 
as your return-value-embedded-in-atom suggestion) all kill important 
aspects of the engine.

All the existing syslet features were purpose-driven: i actually coded 
up a sample syslet, trying to do something that makes sense, and added 
these features based on that. The engine core takes up maybe 50 lines of 
code.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fix implicit declaration in nv_backlight.

2007-02-28 Thread Dave Jones
On Wed, Feb 28, 2007 at 10:13:24PM +0100, Michael Hanselmann wrote:
 > On Wed, Feb 28, 2007 at 12:36:25PM -0500, Dave Jones wrote:
 > > +#ifdef __powerpc__
 > 
 > Is __powerpc__ defined when cross compiling? I'd rather use
 > CONFIG_PMAC_BACKLIGHT instead of it.

Sounds ok to me.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: debug registers and fork

2007-02-28 Thread Roland McGrath
It is true that debug registers are inherited by fork and clone.
I am 99% sure that this was never specifically intended, but it
has been this way for a long time (since 2.4 at least).  It's an
implicit consequence of the do_fork implementation style, which
does a blind copy of the whole task_struct and then explicitly
reinitializes some individual fields.  I suppose this has some
benefit or other, but it is very prone to new pieces of state
getting implicitly copied without the person adding that new state
ever consciously deciding what its inheritance semantics should be.

Alan Stern is working on a revamp of the x86 debug register
support.  This is a fine opportunity to clean this area up and
decide positively what the semantics ought to be.  When his stuff
gets ported to other machines, that will be a natural way to make
the analogous stuff coherent and sensible on all machines that
have debug-feature CPU state.

AFAIK, gdb expects this behavior but not in the positive sense.
Rather, it finds the kernel's semantics here unhelpful, and has to
work around them.  If it has watchpoints on a thread that might
fork, it has to catch the child just to clear the debug registers
even if it never really wanted to be tracing that child.
Otherwise, the fork/clone child that was never ptrace'd at all
(and its children!)  might get a spurious SIGTRAP later and dump
core for no apparent reason; at least exec does clear the debug
registers (flush_thread).  Since the debugger interface is the
only way to set the debug registers, this kernel behavior seems
rather insane on the face of it.  OTOH, there is always the
argument to leave existing behavior as it is for compatibility's
sake.  (I won't be shocked to find some loony application that
uses ptrace on its own threads to set debug registers with the
expectation of running a SIGTRAP handler; such things have been
seen out there, though we no longer allow exactly that with NPTL
threads.)  I'm pretty sure gdb won't mind if the inheritance goes
away, though we should check with gdb people to be sure before
changing any semantics.

Personally, I don't care whether the semantics of fork when the
debug registers were previously set by ptrace change.  Existing
applications already have to cope with the lossage to work now,
and won't be able to go without those workarounds later anyway if
they want to support older kernels.  With Alan's stuff, particular
facilities cooperate coherently on maintaining this thread state,
and inheritance semantics for each particular use will be
specified explicitly how that use wants it.  Eventually I think
all "raw" use of the debug registers (as by the current ptrace
interfaces) will be obsolete anyway.

It is true that %dr7 is not cleared when switching to a task where
it's logically 0, but that is intentional and not a problem AFAIK.
The trap handler (arch/{i386,x86_64}/kernel/traps.c:do_debug)
first checks if %dr7 is logically 0 in the current task, and if so
it swallows the trap and clears %dr7 in hardware.  This also has
been this way for a very long time.  I assume that whenever it was
first implemented, someone found reason to think that clearing
%dr7 was more costly overall than the possibility of a spurious
trap (relatively quite unlikely compared to 100% of context switches).
(I have no idea what the overhead is on current or older hardware.)
I have no reason to think there is anything wrong with how this behaves.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc1: known regressions (v2) (part 1)

2007-02-28 Thread Thomas Gleixner
On Wed, 2007-02-28 at 23:13 +0200, Michael S. Tsirkin wrote:
> >Subject: ThinkPad T60: no screen after suspend to RAM
> >References : http://lkml.org/lkml/2007/2/22/391
> >Submitter  : Michael S. Tsirkin <[EMAIL PROTECTED]>
> >Handled-By : Ingo Molnar <[EMAIL PROTECTED]>
> >Status : unknown
> 
> Just reproduced this in -rc2.
> Another thing I noticed:
> with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to 
> RAM.
> 
> On 2.6.21-rc2, after resume (when the box is accessible from network),
> pressing Fn/F4 again does not seem to have any effect.

Can you please get the dmesg output after resume via the network ?

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pata_sil680 suspend/resume

2007-02-28 Thread Guennadi Liakhovetski
On Mon, 26 Feb 2007, Guennadi Liakhovetski wrote:

> With a post 2.6.20 kernel from powerpc.git I cannot suspend at all:
> 
> pata_sil680 :00:0c.0: suspend
> ata1: suspend failed, device 0 still active
> pci_device_suspend(): ata_pci_device_suspend+0x0/0x74() returns -16
> suspend_device(): pci_device_suspend+0x0/0xac() returns -16
> Could not suspend device :00:0c.0: error -16

AFAICS, "still active" is printed from ata_host_suspend() if a device 
(disk) on the host to be suspended doesn't have ATA_DFLAG_SUSPENDED flag 
set. This flag is only set in ata_eh_suspend(), which is only called from 
ata_eh_recover(), like this:

generic_error_handler()
ata_bmdma_drive_eh()
ata_do_eh()
ata_eh_recover()
ata_eh_suspend()
dev->flags |= ATA_DFLAG_SUSPENDED;

but I don't understand why the error handler should be envoked? Should the 
"disk" be suspended before the host and is it when the eh should set the 
flag? If my guess is right - why doesn't the disk get suspended on my 
machine? Shall I suspend it explicitely from userspace? I do "hdparm -Y", 
and it does stop spinning", but I still get the error.

Thanks
Guennadi
---
Guennadi Liakhovetski
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Pavel Machek
Hi!

> > OK, thanks.
> > 
> > We can (I think) do pretty much the same with some additional complications
> > in worker_thread() (check !cpu_online() after try_to_freeze() and break).
> 
> Okay, but I've just finished the patch that removes the freezability of
> workqueues (appended), so can we please do this in a separate one?

Hmm, nothing obviously wrong with the patch (ACK), but xfs people
should ack this one, too: 'is it okay to let xfs run while suspending'
is not a trivial question.

> Since freezable workqueues are broken in 2.6.21-rc
> (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
> http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
> it's better to remove them altogether for 2.6.21 and change the only user of
> them (XFS) accordingly.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fix implicit declaration in nv_backlight.

2007-02-28 Thread Michael Hanselmann
On Wed, Feb 28, 2007 at 12:36:25PM -0500, Dave Jones wrote:
> +#ifdef __powerpc__

Is __powerpc__ defined when cross compiling? I'd rather use
CONFIG_PMAC_BACKLIGHT instead of it.

Greets,
Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc1: known regressions (v2) (part 1)

2007-02-28 Thread Michael S. Tsirkin
>Subject: ThinkPad T60: no screen after suspend to RAM
>References : http://lkml.org/lkml/2007/2/22/391
>Submitter  : Michael S. Tsirkin <[EMAIL PROTECTED]>
>Handled-By : Ingo Molnar <[EMAIL PROTECTED]>
>Status : unknown

Just reproduced this in -rc2.
Another thing I noticed:
with 2.6.20, pressing Fn/F4 generates an ACPI event and triggers suspend to RAM.

On 2.6.21-rc2, after resume (when the box is accessible from network),
pressing Fn/F4 again does not seem to have any effect.


-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Davide Libenzi
On Wed, 28 Feb 2007, Ingo Molnar wrote:

> 
> * Davide Libenzi  wrote:
> 
> > My point is, the syslet infrastructure is expensive for the kernel in 
> > terms of compat, [...]
> 
> it is not. Today i've implemented 64-bit syslets on x86_64 and 
> 32-bit-on-64-bit compat syslets. Both the 64-bit and the 32-bit syslet 
> (and threadlet) binaries work just fine on a 64-bit kernel, and they 
> share 99% of the infrastructure. There's only a single #ifdef 
> CONFIG_COMPAT in kernel/async.c:
> 
> #ifdef CONFIG_COMPAT
> 
> asmlinkage struct syslet_uatom __user *
> compat_sys_async_exec(struct syslet_uatom __user *uatom,
>   struct async_head_user __user *ahu)
> {
> return __sys_async_exec(uatom, ahu, _sys_call_table,
> compat_NR_syscalls);
> }
> 
> #endif

Did you hide all the complexity of the userspace atom decoding inside 
another function? :)
How much code would go away, in case we pick a simple/parallel 
sys_async_exec engine? Atoms decoding, special userspace variable access 
for loops, jumps/cond/... VM engine.



> Even mixed-mode syslets should work (although i havent specifically 
> tested them), where the head switches between 64-bit and 32-bit mode and 
> submits syslets from both 64-bit and from 32-bit mode, and at the same 
> time there might be both 64-bit and 32-bit syslets 'in flight'.
> 
> But i'm happy to change the syslet API in any sane way, and did so based 
> on feedback from Jens who is actually using them.

Wouldn't you agree on a simple/parallel execution engine like me and Linus 
are proposing (and threadlets, of course)?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Peter Staubach

Miklos Szeredi wrote:

What happens if the application overwrites what it had written some
time later?  Nothing.  The page is already read-write, the pte dirty,
so even though the file was clearly modified, there's absolutely no
way in which this can be used to force an update to the timestamp.



Which, I realize now, actually means, that the patch is wrong.  Msync
will have to write protect the page table entries, so that later
dirtyings may have an effect on the timestamp.


I thought that PeterZ's changes were to write-protect the page after
cleaning it so that future modifications could be detected and tracked
accordingly?  Does the right thing not happen already?

   Thanx...

  ps
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: struct page field arrangement

2007-02-28 Thread Hugh Dickins
On Wed, 28 Feb 2007, Jan Beulich wrote:

> A change early last year reordered struct page so that ptl overlaps not only
> private, but also mapping. Since spinlock_t can be much larger, I'm wondering
> whether there's a reason to not also overlay the space index and lru take -
> are these used for anything on page table pages?

Overlaying lru is a problem for for those architectures which use
kmem_cache_alloc for their pagetables: arm26, powerpc, sparc64 and
perhaps others (I just grepped quickly through include/asm*, didn't
follow up those who have extern functions): since slab reuses the
lru fields for its own purposes.  Could perhaps be stacked somehow.

Overlaying index is fairly straightforward: the index field is fair
game.  In my original patches I did overlay index, but Andrew was
strongly averse to the way I was doing it, and scaled things back,
to private alone if I remember rightly, then relaxed a little to
include mapping too.  Way back then I made up a patch to overlay
index too (when I saw Fedora going out with CONFIG_DEBUG_SPINLOCK),
but I could never get it into a form where I felt it would satisfy
Andrew; and grew increasingly dissatisfied with that approach myself.

I don't think further overlaying is the right answer really.
But I do think it's a scandal that the size of struct page (in a
DEBUG_SPINLOCK system) is governed by such a minority use of the
struct page.  Lacking a satisfying answer, I've just let it drift
on until someone notices and complains.

kmalloc a separate spinlock structure when it's too big to fit in?
Not such a good idea, since then there will tend to be false sharing
of cachelines between them: simpler just to disable SPLIT_PTLOCK in
that case.

I'm not happy with the status quo, but I don't know the right answer:
perhaps allow pagetable pages to use an undebugged spinlock variant?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] worker_thread: don't play with signals

2007-02-28 Thread Oleg Nesterov
worker_thread() doesn't need to "Block and flush all signals", this was already
done by its caller, kthread().

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- 6.20-rc6-mm3/kernel/workqueue.c~signals 2007-02-20 02:21:11.0 
+0300
+++ 6.20-rc6-mm3/kernel/workqueue.c 2007-02-28 23:58:11.0 +0300
@@ -290,18 +290,11 @@ static int worker_thread(void *__cwq)
struct cpu_workqueue_struct *cwq = __cwq;
DEFINE_WAIT(wait);
struct k_sigaction sa;
-   sigset_t blocked;
 
if (!cwq->wq->freezeable)
current->flags |= PF_NOFREEZE;
 
set_user_nice(current, -5);
-
-   /* Block and flush all signals */
-   sigfillset();
-   sigprocmask(SIG_BLOCK, , NULL);
-   flush_signals(current);
-
/*
 * We inherited MPOL_INTERLEAVE from the booting kernel.
 * Set MPOL_DEFAULT to insure node local allocations.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 04/26] Xen-paravirt_ops: Add pagetable accessors to pack and unpack pagetable entries

2007-02-28 Thread Rusty Russell
On Wed, 2007-02-28 at 09:32 +0100, Ingo Molnar wrote:
> * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:
> 
> > >> +#ifdef CONFIG_PARAVIRT
> > >> +/* After pte_t, etc, have been defined */
> > >> +#include 
> > >> +#endif
> > >> 
> > >
> > > hm - there's already a CONFIG_PARAVIRT conditional in 
> > > asm-i386/paravirt.h.
> > 
> > Yes, but it happens after asm/paravirt.h has already included some 
> > things, and it ends up causing problems.  paravirt.h still defines 
> > various stub functions in the !CONFIG_PARAVIRT case, so it needs to do 
> > the includes either way.
> 
> hm, it then needs to be fixed first, instead of adding to the mess.

Yes, originally paravirt.h didn't define anything if !CONFIG_PARAVIRT
for this reason: getting it tied into the other headers correctly is a
PITA.

Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Miklos Szeredi
> What happens if the application overwrites what it had written some
> time later?  Nothing.  The page is already read-write, the pte dirty,
> so even though the file was clearly modified, there's absolutely no
> way in which this can be used to force an update to the timestamp.

Which, I realize now, actually means, that the patch is wrong.  Msync
will have to write protect the page table entries, so that later
dirtyings may have an effect on the timestamp.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Dan Malek


On Feb 28, 2007, at 2:43 PM, Timur Tabi wrote:

What about major number 205?  It also has the screwed-up /dev/ 
ttyCPM entries, but it has more room, and the CPM driver doesn't  
actually use it.  At least, I can't see where it uses it.


Please, let's just leave the four we have and let
the driver just allocate increasing minor numbers.
If anyone has a product with more than 4 UARTs,
they will have to figure out what to do with the
additional minors.

We are making a very complicated problem
out of nothing.  This hasn't caused any problems
in the past, and changing the existing names and
minors will cause problems for everyone today.

Just leave it alone, fix up the documentation,
and have the driver print some warning if it
allocates more than 4 UARTs.

Thanks.

-- Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] jffs2: Allow selection of compression mode via a sysfs attribute

2007-02-28 Thread Richard Purdie
On Wed, 2007-02-28 at 21:39 +0200, Artem Bityutskiy wrote:
> On Wed, 2007-02-28 at 19:13 +, Richard Purdie wrote:
> > +/* gives us jffs2_subsys */
> > +static decl_subsys(jffs2, NULL, NULL);
> 
> There is actually a file-system subsys - look up for fs_subsys. It is
> declared at fs/namespace.c.

Further down the patch you'll see:

+   kset_set_kset_s(_subsys, fs_subsys);

There was a reason for doing that instead using fs_subsys in the above
although I can't remember why offhand. I did try it and it didn't work
as expected...

Regards,

Richard

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Resume from S2R fails after dpm_resume()

2007-02-28 Thread Tim Gardner
Gentlemen,

I instrumented 2.6.21-rc1 base/power/resume.c device_resume() with
TRACE_RESUME(0) as the last statement in the function. Sure enough it
was the last hash value in the RTC after a hard reboot when resume failed:

[   12.028820]   hash matches drivers/base/power/resume.c:104

The machine appears to be absolutely wedged after initiating resume by
pressing the power button. The disk flashes for a half second or so,
then thats it.

It is a Dell XPS, BIOS rev A04. I'm using 'echo 1 > /sys/power/pm_trace;
echo mem > /sys/power/state' to initiate the S2R sequence.

Any suggestions on where to go from here?

rtg
-- 
Tim Gardner [EMAIL PROTECTED] www.tpi.com
OR 503-601-0234 x102 MT 406-443-5357
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fix mv643xx_eth compilation.

2007-02-28 Thread Dave Jones
Commit 908b637fe793165b6aecdc875cdca67c4959a1ad removed ETH_DMA_ALIGN
but missed a usage of it in a macro, which broke the build.

Signed-off-by: Dave Jones <[EMAIL PROTECTED]>

diff --git a/drivers/net/mv643xx_eth.h b/drivers/net/mv643xx_eth.h
index 7cb0a41..7d4e90c 100644
--- a/drivers/net/mv643xx_eth.h
+++ b/drivers/net/mv643xx_eth.h
@@ -9,6 +9,8 @@
 
 #include 
 
+#include 
+
 /* Checksum offload for Tx works for most packets, but
  * fails if previous packet sent did not use hw csum
  */
@@ -47,7 +49,7 @@
 #define ETH_HW_IP_ALIGN2   /* hw aligns IP header 
*/
 #define ETH_WRAPPER_LEN(ETH_HW_IP_ALIGN + ETH_HLEN + \
ETH_VLAN_HLEN + ETH_FCS_LEN)
-#define ETH_RX_SKB_SIZE(dev->mtu + ETH_WRAPPER_LEN + 
ETH_DMA_ALIGN)
+#define ETH_RX_SKB_SIZE(dev->mtu + ETH_WRAPPER_LEN + 
dma_get_cache_alignment())
 
 #define ETH_RX_QUEUES_ENABLED  (1 << 0)/* use only Q0 for receive */
 #define ETH_TX_QUEUES_ENABLED  (1 << 0)/* use only Q0 for transmit */

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wanted: simple, safe x86 stack overflow detection

2007-02-28 Thread Andi Kleen
On Wed, Feb 28, 2007 at 09:27:09AM -0500, Chuck Ebbert wrote:
> Can we just put a canary in the threadinfo and check it on every
> task switch? What are the drawbacks?

Likely already too late then -- if critical state is overwritten
you crashed before. Also a lot of stack intensive codes
relatively large unused holes so it might miss the canary completely

Anyways if you want a crash on context switch in the non
hole case you can probably get it by just rearranging thread_info a bit.
e.g. put preempt_count first. Any corruption of that will lead
to schedule complaining.

Don't think it is worth it though.

I suppose one could have a CONFIG_DEBUG_STACK_OVERFLOW that gets
the stacks from vmalloc which would catch any overflow with its
guard pages. This is you would need to change __pa() to handle
that too because there might be still some drivers that do
DMA on stack addresses.  Would be somewhat ugly but doable.

But I have my doubts it is worth it again -- in my experience static
analysis works well enough to trace them down and 
there are not that many anyways.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] usbfs2: Why asynchronous I/O?

2007-02-28 Thread David Brownell
On Monday 26 February 2007 12:54 am, Sarah Bailey wrote:
> On Sun, Feb 25, 2007 at 08:53:03AM -0800, David Brownell wrote:
> > On Sunday 25 February 2007 12:57 am, Sarah Bailey wrote:
> > > I haven't seen any evidence that the kernel-side aio is substantially
> > > more efficient than the GNU libc implementation,
> > 
> > Face it:  spawning a new thread is fundamentally not as lightweight
> > as just submitting an aiocb plus an URB.  And spawning ten threads
> > costs a *LOT* more than submitting ten aiocbs and URBs.  (Count just
> > the 4KB stacks associated with each thread, vs memory consumed by
> > using AIO ... and remember to count the scheduling overheads.)
> 
> Yes, spawning a new thread is costly.  However, if someone writes their
> own thread-based program and allocates the threads from a pool, that
> argument is irrelevant. 

I don't see how that would follow from that assumption.  But even if
it did, the assumption isn't necessarily valid.  People who can write
threaded programs are the minority; people who can write correct ones
are even more rare!

We all hope that changes.  It's been hoped for at least a decade now.
Maybe in another decade or two, such skills can safely be assumed.


> Even with fibrils, you have a stack and 
> scheduling overhead.  With kernel AIO, you have also have some memory
> overhead, and you also have context switch overhead when you call
> kick_iocb or aio_complete.
> 
> Can someone point me at hard evidence one way or another?

(stack_size + other_thread_costs + urb_size) > (aoicb_size + urb_size)

There was recent discussion on another LKML thread pointing out how an
event-driven server ran at basically 100% of hardware capacity, where
a thread-one ran at 60%.  (That was, as I infer by skimming archives of
that threadlet discussion, intended to be a fair comparison...)


> > > so it seems like it would be better to leave the complexity in
> > > userspace. 
> > 
> > Thing is, the kernel *already* has URBs.  And the concept behind them
> > maps one-to-one onto AIOCBs.  All the kernel needs to expose is:
> > mechanisms to submit (and probably cancel) async requests, then collect
> > the responses when they're done.
> 
> It seems to me that you're arguing that URBs and AIOCBs go together on
> the basis that they are both asynchronous and both have some sort of
> completion function.  Just because two things are alike doesn't mean
> that it's better to use them together.

I pointed out that any other approach must accordingly add overhead.
One of the basic rules of thumb in system design is to avoid such
needless additions.


> > You're right that associating a thread with an URB is complexity.
> 
> That's not what I said.

No ... but you *were* avoiding that consequence what you did say, though.


> > I can't much help application writers that don't bother to read the
> > relevant documentation (after it's been pointed out to them).
> 
> Where is this documentation?  There's a man page on io_submit, etc., but
> how would an application writer know to look for it?

How did *you* know to look for it?  How did *I* know to look for it?

ISTR asking Google, and finding that "libaio" is how to get access
to the Linux kernel AIO facility.  Very quickly.  I didn't even need
to make the mistake of trying to use POSIX calls then finding they
don't work ...


> > The gap between POSIX AIO and kernel AIO has been an ongoing problem.  This
> > syslet/fibril/yadda-yadda stuff is just the latest spin.
> 
> Do you think that fibrils will replace the kernel AIO?

Still under discussion, but I hope not.  But remember two different things
are being called AIO -- while in my book, only one of them is really AIO.

 - The AIO VFS interface ... which is mostly ok, though the retry stuff
   is wierd as well as misnamed, and the POSIX hookery should also be
   improved.  (Those POSIX APIs omit key functionality, like collecting
   multiple results with one call, and are technically inferior.  Usually
   that's so that vendors can claim conformance without kernel updates.
   It could also be that the functionality is "optional", and so not part
   of what I find in my systems's libc.)

 - Filesystem hookery and direct-io linkage ... which has been trouble,
   and I suspect was never the right design.  The filesystem stacks in
   Linux were designed around thread based synch, so trying to slide
   an event model around the edges was probably never a good idea.

I see fibrils/threadlets/syslets/etc as a better approach to that hookery;
something like EXT4 over a RAID is less likely to break if that complex
code is not forced to restructure itself into an event model.

But for things that are already using event models ... the current AIO
is a better fit.  And maybe getting all that other stuff out of the mix
will finally let some of the "real I/O, not disks" AIO issues get fixed.

All of the "bad" things I've heard about AIO in Linux boil down to either
(a) criticisms about direct-IO and that 

Re: Problem with freezable workqueues

2007-02-28 Thread Johannes Berg
On Wed, 2007-02-28 at 12:14 +1100, Nigel Cunningham wrote:

> Controversy is no reason to give in! Nevertheless, I think you're right
> - I believe the XFS guys said they fixed the issue that had caused I/O
> to be submitted post-freeze. Well, we'll see if it appears again, won't
> we?

I get to be the guinea pig, right? :P Unfortunately I was sick for the
better part of the past few days and can only test all this stuff early
next week.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Miklos Szeredi
> >> While these entry points do not actually modify the file itself,
> >> as was pointed out, they are handy points at which the kernel gains
> >> control and could actually notice that the contents of the file are
> >> no longer the same as they were, ie. modified.
> >>
> >>  From the operating system viewpoint, this is where the semantics of
> >> modification to file contents via mmap differs from the semantics of
> >> modification to file contents via write(2).
> >>
> >> It is desirable for the file times to be updated as quickly as
> >> possible after the actual modification has occurred.
> >> 
> >
> > I disagree.
> >
> > You don't worry about the timestamp being updated _during_ a large
> > write() call, even though the file is constantly being modified.
> >
> >   
> 
> No, but you do worry about the timestamps being updated after
> every write() call, no matter how large or small.

Right.  All I'm saying is that just writing to a shared mapping
without calling msync() is similar to a write() which hasn't yet
finished.  In both cases, you have a modified file, without a modified
timestamp.

> > You think of write() as something instantaneous, while you think of
> > writing to a shared mapping, then doing msync() as something taking a
> > long time.  In actual fact both of these are basically equivalent
> > operations, the differences being, that you can easily modify
> > non-contiguous parts of a file with mmap, while you can't do that with
> > write.  The disadvantage from mmap comes from the cost of setting up
> > the page tables and handling the faults.
> >
> > Think of it this way:
> >
> >   shared mmap write + msync(MS_ASYNC)  ==  write()
> >   msync(MS_ASYNC) + fsync()  ==  msync(MS_SYNC)
> >
> >   
> 
> I don't believe that this is a valid characterization because the
> changes to the contents of the file, made through the mmap'd region,
> are immediately visible to any and all other applications accessing
> the file.  Since the contents of the file are changing, then so
> should the timestamps to reflect this.

Same case with a large write().  Nothing prevents you from reading a
file, while a huge write is taking place to it, and yet, the
modification time isn't updated.

> I think that we are going to have to agree to disagree because
> I don't agree either with your characterizations of the desirable
> semantics associated with shared mmap or that maintaining the
> correctness in the system is a waste of CPU.

I didn't quite say _that_ in so many words :).  I said that updating
the timestamp on a per-page first dirtying base, or per-inode first
dirtying base is a waste of effort.  Why?

What happens if the application overwrites what it had written some
time later?  Nothing.  The page is already read-write, the pte dirty,
so even though the file was clearly modified, there's absolutely no
way in which this can be used to force an update to the timestamp.

Is there anything special about the _first_ modification?  I don't
think so.  From an external application's point of view it doesn't
matter one whit, whether a modification was through write() or after a
page-fault, or on an already present read-write page.

So what exactly _are_ the semantics we are trying to achieve?

> I view mmap as a way for an application to treat the contents of a
> file as another segment in its address space.  This allows it to
> manipulate the contents of a file without incurring the overhead of
> the read and write system calls and the double buffering that
> naturally occurs with those system calls.  I think that:
> 
> char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> *p = 1;
> *(p + 4096) = 2;
> 
> should have the same effect as:
> 
> char c = 1;
> pwrite(fd, , 1, 0);
> c = 2;
> pwrite(fd, , 1, 4096);

Not necessarily.  This is the equivalent _portable_ call sequence:

 char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
 *p = 1;
 *(p + 4096) = 2;
 msync(p, 4097, MS_ASYNC);

Yes, on linux the prior would work too, but there's really no point in
allowing applications to be lax and not do it properly.  But we've
been over this.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Oleg Nesterov
On 02/28, Rafael J. Wysocki wrote:
> 
> Okay, but I've just finished the patch that removes the freezability of
> workqueues (appended), so can we please do this in a separate one?

Please, please, no. This patch is of course correct, but it breaks _a lot_
of patches in -mm tree.

May I ask you to send just

> ===
> --- linux-2.6.21-rc2.orig/fs/xfs/linux-2.6/xfs_buf.c
> +++ linux-2.6.21-rc2/fs/xfs/linux-2.6/xfs_buf.c
> @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
>   if (!xfs_buf_zone)
>   goto out_free_trace_buf;
>  
> - xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> + xfslogd_workqueue = create_workqueue("xfslogd");
>   if (!xfslogd_workqueue)
>   goto out_free_buf_zone;
>  
> - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> + xfsdatad_workqueue = create_workqueue("xfsdatad");
>   if (!xfsdatad_workqueue)
>   goto out_destroy_xfslogd_workqueue;
>  
> 

this bit?

After that, we can do the "removes the freezability of workqueues" patch
against -mm tree.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup on shutdown in nf_ct_iterate_cleanup()

2007-02-28 Thread Chuck Ebbert
Patrick McHardy wrote:
> Thanks, the previous approach doesn't seem to work properly without
> unpleasant event cache hacks. This patch takes a simpler approach
> and keeps the unconfirmed list iteration, but makes sure to make
> forward progress.
> 
> 
> 
> 
> 
> [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops
> 
> Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling:
> 

Works great: survived three reboots without lockup or warning messages.
And it's a nice simple patch, too...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/3] Freezer: Fix vfork problem

2007-02-28 Thread Oleg Nesterov
On 02/28, Rafael J. Wysocki wrote:
>
> Okay, I have added a comment to freezer.h.  Please have a look.
>
>
> -extern void thaw_some_processes(int all);
> +/*
> + * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
> + * calls wait_for_completion() and reset right after it returns from 
> this
> + * function.  Next, the parent should call try_to_freeze() to freeze itself
> + * appropriately in case the child has exited before the freezing of tasks is
> + * complete.  However, we don't want kernel threads to be frozen in 
> unexpected
> + * places, so we allow them to block freeze_processes() instead or to set
> + * PF_NOFREEZE if needed and PF_FREEZER_SKIP is only set for userland vfork
> + * parents.  Fortunately, in the call_usermodehelper() case the parent 
> won't
> + * really block freeze_processes(), since call_usermodehelper() (the 
> child)
> + * does a little before exec/exit and it can't be frozen before waking up the
> + * parent.
> + */

I think this comment is accurate and understandable, and I am not suggesting
to change it.

However, please note that PF_FREEZER_SKIP can be used not only for vfork().
For example, it seems to me we can also use freezer_...count() to solve the
problem with coredump. We can use the same "wait_for_completion_freezable"
pattern in exit_mm() and in coredump_wait(). (i do not claim this is a best
fix though).

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Ingo Molnar

* Davide Libenzi  wrote:

> My point is, the syslet infrastructure is expensive for the kernel in 
> terms of compat, [...]

it is not. Today i've implemented 64-bit syslets on x86_64 and 
32-bit-on-64-bit compat syslets. Both the 64-bit and the 32-bit syslet 
(and threadlet) binaries work just fine on a 64-bit kernel, and they 
share 99% of the infrastructure. There's only a single #ifdef 
CONFIG_COMPAT in kernel/async.c:

#ifdef CONFIG_COMPAT

asmlinkage struct syslet_uatom __user *
compat_sys_async_exec(struct syslet_uatom __user *uatom,
  struct async_head_user __user *ahu)
{
return __sys_async_exec(uatom, ahu, _sys_call_table,
compat_NR_syscalls);
}

#endif

Even mixed-mode syslets should work (although i havent specifically 
tested them), where the head switches between 64-bit and 32-bit mode and 
submits syslets from both 64-bit and from 32-bit mode, and at the same 
time there might be both 64-bit and 32-bit syslets 'in flight'.

But i'm happy to change the syslet API in any sane way, and did so based 
on feedback from Jens who is actually using them.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] adapt page_lock_anon_vma() to PREEMPT_RCU

2007-02-28 Thread Hugh Dickins
On Sun, 25 Feb 2007, Oleg Nesterov wrote:

> page_lock_anon_vma() uses spin_lock() to block RCU. This doesn't work with
> PREEMPT_RCU, we have to do rcu_read_lock() explicitely. Otherwise, it is
> theoretically possible that slab returns anon_vma's memory to the system
> before we do spin_unlock(_vma->lock).
> 
> Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

Acked-by: Hugh Dickins <[EMAIL PROTECTED]>

Thanks for doing this, and sorry for my delay.

Hugh

> 
> --- WQ/mm/rmap.c~ 2007-02-18 22:56:49.0 +0300
> +++ WQ/mm/rmap.c  2007-02-25 22:43:00.0 +0300
> @@ -183,7 +183,7 @@ void __init anon_vma_init(void)
>   */
>  static struct anon_vma *page_lock_anon_vma(struct page *page)
>  {
> - struct anon_vma *anon_vma = NULL;
> + struct anon_vma *anon_vma;
>   unsigned long anon_mapping;
>  
>   rcu_read_lock();
> @@ -195,9 +195,16 @@ static struct anon_vma *page_lock_anon_v
>  
>   anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
>   spin_lock(_vma->lock);
> + return anon_vma;
>  out:
>   rcu_read_unlock();
> - return anon_vma;
> + return NULL;
> +}
> +
> +static void page_unlock_anon_vma(struct anon_vma *anon_vma)
> +{
> + spin_unlock(_vma->lock);
> + rcu_read_unlock();
>  }
>  
>  /*
> @@ -333,7 +340,8 @@ static int page_referenced_anon(struct p
>   if (!mapcount)
>   break;
>   }
> - spin_unlock(_vma->lock);
> +
> + page_unlock_anon_vma(anon_vma);
>   return referenced;
>  }
>  
> @@ -809,7 +817,8 @@ static int try_to_unmap_anon(struct page
>   !page_mapped(page))
>   break;
>   }
> - spin_unlock(_vma->lock);
> +
> + page_unlock_anon_vma(anon_vma);
>   return ret;
>  }
>  
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386: Fix usage of -mtune when X86_GENERIC=y or CONFIG_MCORE2=y

2007-02-28 Thread Lasse Collin
Two fixes to arch/i386/Makefile.cpu:

1) When X86_GENERIC=y is set, use -mtune=i686 if $(CC) doesn't
   support -mtune=generic. GCC 4.1.2 and earlier don't support
   -mtune=generic. When building a generic kernel for a distro
   that runs on i586 and better, it is nice to use
   -march=i586 -mtune=i686 instead of plain -march=i586.

2) Use $(call tune) instead of hardcoded -mtune when CONFIG_MCORE2=y.
   This makes it possible to have CONFIG_MCORE2=y when using GCC 3.3,
   which uses -mcpu instead of -mtune. Also dropped fallback to
   -mtune=generic and -mtune=i686, because -march=i686 already
   implies -mtune=i686.

The patch is against 2.6.20, but Makefile.cpu hasn't changed recently.

--- linux-2.6.20/arch/i386/Makefile.cpu.orig2007-02-04 20:44:54.0 
+0200
+++ linux-2.6.20/arch/i386/Makefile.cpu 2007-02-28 21:22:47.0 +0200
@@ -4,9 +4,9 @@
 #-mtune exists since gcc 3.4
 HAS_MTUNE  := $(call cc-option-yn, -mtune=i386)
 ifeq ($(HAS_MTUNE),y)
-tune   = $(call cc-option,-mtune=$(1),)
+tune   = $(call cc-option,-mtune=$(1),$(2))
 else
-tune   = $(call cc-option,-mcpu=$(1),)
+tune   = $(call cc-option,-mcpu=$(1),$(2))
 endif
 
 align := $(cc-option-align)
@@ -32,7 +32,7 @@
 cflags-$(CONFIG_MWINCHIP3D)+= $(call cc-option,-march=winchip2,-march=i586)
 cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) 
$(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
 cflags-$(CONFIG_MVIAC3_2)  += $(call cc-option,-march=c3-2,-march=i686)
-cflags-$(CONFIG_MCORE2)+= -march=i686 $(call 
cc-option,-mtune=core2,$(call cc-option,-mtune=generic,-mtune=i686))
+cflags-$(CONFIG_MCORE2)+= -march=i686 $(call tune,core2)
 
 # AMD Elan support
 cflags-$(CONFIG_X86_ELAN)  += -march=i486
@@ -42,5 +42,5 @@
 
 # add at the end to overwrite eventual tuning options from earlier
 # cpu entries
-cflags-$(CONFIG_X86_GENERIC)   += $(call tune,generic)
+cflags-$(CONFIG_X86_GENERIC)   += $(call tune,generic,$(call tune,i686))
 

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Rafael J. Wysocki
On Wednesday, 28 February 2007 21:08, Oleg Nesterov wrote:
> On 02/28, Rafael J. Wysocki wrote:
> >
> > On Wednesday, 28 February 2007 20:32, Oleg Nesterov wrote:
> > > 
> > > I am sorry, I lost track of this problem. As for 2.6.21, 
> > > create_freezeable_workqueue
> > > doesn't work and conflict with suspend. Why can't we remove it from XFS 
> > > as you
> > > suggested before?
> > 
> > Yes, we can (preparing a patch).  I was just curious. :-)
> 
> OK, thanks.
> 
> We can (I think) do pretty much the same with some additional complications
> in worker_thread() (check !cpu_online() after try_to_freeze() and break).

Okay, but I've just finished the patch that removes the freezability of
workqueues (appended), so can we please do this in a separate one?

Rafael

---
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
it's better to remove them altogether for 2.6.21 and change the only user of
them (XFS) accordingly.

---
 fs/xfs/linux-2.6/xfs_buf.c |4 ++--
 include/linux/workqueue.h  |8 +++-
 kernel/workqueue.c |   21 +++--
 3 files changed, 12 insertions(+), 21 deletions(-)

Index: linux-2.6.21-rc2/kernel/workqueue.c
===
--- linux-2.6.21-rc2.orig/kernel/workqueue.c
+++ linux-2.6.21-rc2/kernel/workqueue.c
@@ -59,7 +59,6 @@ struct cpu_workqueue_struct {
 
int run_depth;  /* Detect run_workqueue() recursion depth */
 
-   int freezeable; /* Freeze the thread during suspend */
 } cacheline_aligned;
 
 /*
@@ -352,8 +351,7 @@ static int worker_thread(void *__cwq)
struct k_sigaction sa;
sigset_t blocked;
 
-   if (!cwq->freezeable)
-   current->flags |= PF_NOFREEZE;
+   current->flags |= PF_NOFREEZE;
 
set_user_nice(current, -5);
 
@@ -376,9 +374,6 @@ static int worker_thread(void *__cwq)
 
set_current_state(TASK_INTERRUPTIBLE);
while (!kthread_should_stop()) {
-   if (cwq->freezeable)
-   try_to_freeze();
-
add_wait_queue(>more_work, );
if (list_empty(>worklist))
schedule();
@@ -454,8 +449,8 @@ void fastcall flush_workqueue(struct wor
 }
 EXPORT_SYMBOL_GPL(flush_workqueue);
 
-static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq,
-  int cpu, int freezeable)
+static struct task_struct
+*create_workqueue_thread(struct workqueue_struct *wq, int cpu)
 {
struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
struct task_struct *p;
@@ -465,7 +460,6 @@ static struct task_struct *create_workqu
cwq->thread = NULL;
cwq->insert_sequence = 0;
cwq->remove_sequence = 0;
-   cwq->freezeable = freezeable;
INIT_LIST_HEAD(>worklist);
init_waitqueue_head(>more_work);
init_waitqueue_head(>work_done);
@@ -480,8 +474,7 @@ static struct task_struct *create_workqu
return p;
 }
 
-struct workqueue_struct *__create_workqueue(const char *name,
-   int singlethread, int freezeable)
+struct workqueue_struct *__create_workqueue(const char *name, int singlethread)
 {
int cpu, destroy = 0;
struct workqueue_struct *wq;
@@ -501,7 +494,7 @@ struct workqueue_struct *__create_workqu
mutex_lock(_mutex);
if (singlethread) {
INIT_LIST_HEAD(>list);
-   p = create_workqueue_thread(wq, singlethread_cpu, freezeable);
+   p = create_workqueue_thread(wq, singlethread_cpu);
if (!p)
destroy = 1;
else
@@ -509,7 +502,7 @@ struct workqueue_struct *__create_workqu
} else {
list_add(>list, );
for_each_online_cpu(cpu) {
-   p = create_workqueue_thread(wq, cpu, freezeable);
+   p = create_workqueue_thread(wq, cpu);
if (p) {
kthread_bind(p, cpu);
wake_up_process(p);
@@ -760,7 +753,7 @@ static int __devinit workqueue_cpu_callb
mutex_lock(_mutex);
/* Create a new workqueue thread for it. */
list_for_each_entry(wq, , list) {
-   if (!create_workqueue_thread(wq, hotcpu, 0)) {
+   if (!create_workqueue_thread(wq, hotcpu)) {
printk("workqueue for %i failed\n", hotcpu);
return NOTIFY_BAD;
}
Index: linux-2.6.21-rc2/include/linux/workqueue.h
===
--- linux-2.6.21-rc2.orig/include/linux/workqueue.h
+++ linux-2.6.21-rc2/include/linux/workqueue.h
@@ 

Kernel Oops with shm namespace cleanups

2007-02-28 Thread Adam Litke
Hey.  While testing 2.6.21-rc2 with libhugetlbfs, the shm-fork test case
causes the kernel to oops.  To reproduce:  Execute 'make check' in the
latest libhugetlbfs source on a 2.6.21-rc2 kernel with 100 huge pages
allocated.  Using fewer huge pages will likely also trigger the oops.
Libhugetlbfs can be downloaded from:
http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070228.tar.gz

I have collected the following information:

bc56bba8f31bd99f350a5ebfd43d50f411b620c7 is first bad commit
commit bc56bba8f31bd99f350a5ebfd43d50f411b620c7
Author: Eric W. Biederman <[EMAIL PROTECTED]>
Date:   Tue Feb 20 13:57:53 2007 -0800

[PATCH] shm: make sysv ipc shared memory use stacked files

 [ cut here ]
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=32 NUMA 
Modules linked in:
NIP: C002EA80 LR: C00A3F70 CTR: 6400
REGS: c0077967b770 TRAP: 0700   Not tainted  (2.6.20-g1df49008)
MSR: 80029032   CR: 28000448  XER: 
TASK = c0002f6737d0[3042] 'shm-fork' THREAD: c00779678000 CPU: 1
GPR00:  C0077967B9F0 C06725A0 C0002F94EC00 
GPR04: 93FD1000 93FD1000 0200 93FD1000 
GPR08: 0001  0001 0001 
GPR12: 48000444 C058BE00 FFEE8094  
GPR16: 0200 100AC5E8 100A 1008 
GPR20:  93FD1000 C0077FDBD088 C0002F94EC00 
GPR24: C0077FDBD088 0200 C0002F94EC00 93FD1000 
GPR28: C0077967BEA0 93FD1000 C05A2F58 C0077FDBD088 
NIP [C002EA80] .huge_pte_alloc+0x7c/0x1dc
LR [C00A3F70] .hugetlb_fault+0x48/0x150
Call Trace:
[C0077967B9F0] [C0077967BA80] 0xc0077967ba80 (unreliable)
[C0077967BAA0] [C00A3F70] .hugetlb_fault+0x48/0x150
[C0077967BB50] [C0094254] .__handle_mm_fault+0xa8/0x119c
[C0077967BC50] [C002A1E0] .do_page_fault+0x3a8/0x57c
[C0077967BE30] [C0004AFC] handle_page_fault+0x20/0x58
Instruction dump:
7820 7fa40040 409d0010 a00302be 7889c220 480c a00302bc 78892702 
7c004e30 780907e1 40820008 3961 <0b0b> e922adb8 3800 ebda0048 
[ cut here ]
kernel BUG at /home/aglitke/git/linux-2.6/mm/hugetlb.c:375!
Oops: Exception in kernel mode, sig: 5 [#2]
SMP NR_CPUS=32 NUMA 
Modules linked in:
NIP: C00A3518 LR: C00A376C CTR: C006B348
REGS: c0077967ace0 TRAP: 0700   Not tainted  (2.6.20-g1df49008)
MSR: 80029032   CR: 42022442  XER: 
TASK = c0002f6737d0[3042] 'shm-fork' THREAD: c00779678000 CPU: 1
GPR00: 0018 C0077967AF60 C06725A0 C0077FDBD088 
GPR04: 93FD1000 F7FD1000 C0077FFA5A83 C0077FFEF6E0 
GPR08: 10013000 00FD1000 10013000 C0697EB0 
GPR12: 2200 C058BE00 10013000 10013000 
GPR16: 10013000   C0077967B120 
GPR20: F7FD1000  C40DBDD0 C0077FDBD088 
GPR24: 00EF9C340793 10013000 C0002F94EC00 C0077967AFD0 
GPR28: F7FD1000 93FD1000 C05A2F58 C0002F94EC00 
NIP [C00A3518] .__unmap_hugepage_range+0x68/0x264
LR [C00A376C] .unmap_hugepage_range+0x58/0xa0
Call Trace:
[C0077967AF60] [0001] 0x1 (unreliable)
[C0077967B020] [C00A376C] .unmap_hugepage_range+0x58/0xa0
[C0077967B0B0] [C0091464] .unmap_vmas+0x17c/0x954
[C0077967B210] [C0099488] .exit_mmap+0xa4/0x17c
[C0077967B2C0] [C004CB08] .mmput+0x60/0x160
[C0077967B360] [C0052E4C] .exit_mm+0x130/0x154
[C0077967B400] [C00535D8] .do_exit+0x238/0x964
[C0077967B4C0] [C0022AC4] .die+0x150/0x154
[C0077967B550] [C0022B10] ._exception+0x48/0x138
[C0077967B660] [C0023634] .program_check_exception+0x5cc/0x5e4
[C0077967B700] [C00046F4] program_check_common+0xf4/0x100
--- Exception: 700 at .huge_pte_alloc+0x7c/0x1dc
LR = .hugetlb_fault+0x48/0x150
[C0077967B9F0] [C0077967BA80] 0xc0077967ba80 (unreliable)
[C0077967BAA0] [C00A3F70] .hugetlb_fault+0x48/0x150
[C0077967BB50] [C0094254] .__handle_mm_fault+0xa8/0x119c
[C0077967BC50] [C002A1E0] .do_page_fault+0x3a8/0x57c
[C0077967BE30] [C0004AFC] handle_page_fault+0x20/0x58
Instruction dump:
fb610078 780957e3 ebe3 7c26 54001ffe 0b00 e97e8030 3921 
800b 7d290036 3929 7c894838 <0b09> 800b 3921 7d290036 
Fixing recursive fault but reboot is needed!
BUG: soft lockup detected on CPU#0!
Call Trace:
[C00779AD74C0] [C000F588] .show_stack+0x68/0x1b4 (unreliable)
[C00779AD7570] [C007C5E0] .softlockup_tick+0xec/0x140

Re: [patch 04/26] Xen-paravirt_ops: Add pagetable accessors to pack and unpack pagetable entries

2007-02-28 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
>> Yes, but it happens after asm/paravirt.h has already included some 
>> things, and it ends up causing problems.  paravirt.h still defines 
>> various stub functions in the !CONFIG_PARAVIRT case, so it needs to do 
>> the includes either way.
>> 
>
> hm, it then needs to be fixed first, instead of adding to the mess.
>   

OK, I've fixed this by hoisting all the native_* implementations into
pgtable.h.  In the !PARAVIRT case the normal macros directly use the
native_* functions, and in the PARAVIRT case they're used by the native
paravirt_ops.  This has the nice property of avoiding this specific
problem, and also generally removes code duplication.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/26] Xen-paravirt_ops: paravirt_ops: allocate a fixmap slot

2007-02-28 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
> fair enough. Please rename it to FIX_PARAVIRT_BOOTUP - you can still 
> rely on it being available later on too, but we'd like to give everyone 
> the right fundamental idea about this: it's meant to be a limited, 
> inflexible interface for bootstrap only.
>   

Will do.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Oleg Nesterov
On 02/28, Rafael J. Wysocki wrote:
>
> On Wednesday, 28 February 2007 20:32, Oleg Nesterov wrote:
> > 
> > I am sorry, I lost track of this problem. As for 2.6.21, 
> > create_freezeable_workqueue
> > doesn't work and conflict with suspend. Why can't we remove it from XFS as 
> > you
> > suggested before?
> 
> Yes, we can (preparing a patch).  I was just curious. :-)

OK, thanks.

We can (I think) do pretty much the same with some additional complications
in worker_thread() (check !cpu_online() after try_to_freeze() and break).

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Peter Staubach

Miklos Szeredi wrote:

While these entry points do not actually modify the file itself,
as was pointed out, they are handy points at which the kernel gains
control and could actually notice that the contents of the file are
no longer the same as they were, ie. modified.

 From the operating system viewpoint, this is where the semantics of
modification to file contents via mmap differs from the semantics of
modification to file contents via write(2).

It is desirable for the file times to be updated as quickly as
possible after the actual modification has occurred.



I disagree.

You don't worry about the timestamp being updated _during_ a large
write() call, even though the file is constantly being modified.

  


No, but you do worry about the timestamps being updated after
every write() call, no matter how large or small.


You think of write() as something instantaneous, while you think of
writing to a shared mapping, then doing msync() as something taking a
long time.  In actual fact both of these are basically equivalent
operations, the differences being, that you can easily modify
non-contiguous parts of a file with mmap, while you can't do that with
write.  The disadvantage from mmap comes from the cost of setting up
the page tables and handling the faults.

Think of it this way:

  shared mmap write + msync(MS_ASYNC)  ==  write()
  msync(MS_ASYNC) + fsync()  ==  msync(MS_SYNC)

  


I don't believe that this is a valid characterization because the
changes to the contents of the file, made through the mmap'd region,
are immediately visible to any and all other applications accessing
the file.  Since the contents of the file are changing, then so
should the timestamps to reflect this.


A better design for all of this would be to update the file times
and mark the inode as needing to be written out when a page fault
is taken for a page which either does not exist or needs to be made
writable and that page is part of an appropriate style mapping.



I think this would just be a waste of CPU.


I think that we are going to have to agree to disagree because
I don't agree either with your characterizations of the desirable
semantics associated with shared mmap or that maintaining the
correctness in the system is a waste of CPU.

I view mmap as a way for an application to treat the contents of
a file as another segment in its address space.  This allows it to
manipulate the contents of a file without incurring the overhead
of the read and write system calls and the double buffering that
naturally occurs with those system calls.  I think that:

   char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
   *p = 1;
   *(p + 4096) = 2;

should have the same effect as:

   char c = 1;
   pwrite(fd, , 1, 0);
   c = 2;
   pwrite(fd, , 1, 4096);

Clearly, the two can't be equivalent since the operating system
can only become involved at certain times in order to update the
timestamps.  That's why there are specifications about the
timestamps for things like msync.  They should be as close as
possible though.

However, since I seem to be the only one presenting a different
viewpoint, then I will agree to disagree and commit.  I will see
if I can sell your semantics to my customer and find out if that
will satisfy them.

   Thanx...

  ps
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] - platform_kernel_launch_event is noop on generic kernel

2007-02-28 Thread John Keller
Add a missing #define for the platform_kernel_launch_event.
Without this fix, a call to platform_kernel_launch_event()
becomes a noop on generic kernels. SN systems require this
fix to successfully kdump/kexec from certain hardware errors.

Signed-off-by: John Keller <[EMAIL PROTECTED]>
---

Index: linux-2.6/include/asm-ia64/machvec.h
===
--- linux-2.6.orig/include/asm-ia64/machvec.h   2007-02-28 08:39:45.764537727 
-0600
+++ linux-2.6/include/asm-ia64/machvec.h2007-02-28 08:40:01.254467899 
-0600
@@ -168,6 +168,7 @@ extern void machvec_tlb_migrate_finish (
 #  define platform_setup_msi_irq   ia64_mv.setup_msi_irq
 #  define platform_teardown_msi_irqia64_mv.teardown_msi_irq
 #  define platform_pci_fixup_bus   ia64_mv.pci_fixup_bus
+#  define platform_kernel_launch_event ia64_mv.kernel_launch_event
 # endif
 
 /* __attribute__((__aligned__(16))) is required to make size of the
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] Add LZO Compression

2007-02-28 Thread Artem Bityutskiy
On Wed, 2007-02-28 at 19:13 +, Richard Purdie wrote:
> The following patch series adds LZO compression support to the kernel
> and exposes it in a variety of places (jffs2, crypto).
> 
> This is particularly useful for jffs2 where significant boot time
> speedups (~10%) and file read speed improvements (~40%) are seen when
> its used with only a slight drop in file compression ratio.

Providing the digits are accurate, this is very good stuff.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Segher Boessenkool

Another option is to use 46..49 for UARTs #0..3,
and 192..195 for UARTs #4..7.
Or, perhaps better, use 46..49 for #0..3, and
192..199 for #0..7, handling the duplication in
the driver; and deprecate the old range.


That sounds like more hassle than it's worth.  The discontinuous range 
may be annoying, but it isn't really a huge amount of code.


Yeah.  My suggestion would allow to get rid of that
extra code some day, though (but sure, is that worth
it?)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Timur Tabi

Kumar Gala wrote:

Eh, I'm not crazy about that.  That means that I have to complicate my 
driver because someone else screwed up a long time ago.


If not you someone else.  The cost in the driver is small compared to 
fixing up all the distro's and such.  If you don't provide this change 
someone else will.


*sigh*

What about major number 205?  It also has the screwed-up /dev/ttyCPM entries, but it has 
more room, and the CPM driver doesn't actually use it.  At least, I can't see where it 
uses it.


--
Timur Tabi
Linux Kernel Developer @ Freescale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Davide Libenzi
On Wed, 28 Feb 2007, Chris Friesen wrote:

> Davide Libenzi wrote:
> 
> > struct async_syscall {
> > unsigned long nr_sysc;
> > unsigned long params[8];
> > long *result;
> > };
> > 
> > And what would async_wait() return bak? Pointers to "struct async_syscall"
> > or pointers to "result"?
> 
> Either one has downsides.  Pointer to struct async_syscall requires that the
> caller keep the struct around.  Pointer to result requires that the caller
> always reserve a location for the result.
> 
> Does the kernel care about the (possibly rare) case of callers that don't want
> to pay attention to result?  If so, what about adding some kind of
> caller-specified handle to struct async_syscall, and having async_wait()
> return the handle?  In the case where the caller does care about the result,
> the handle could just be the address of result.

Something like this (with async_wait() returning asynid's)?

struct async_syscall {
long *result;
unsigned long asynid;
unsigned long nr_sysc;
unsigned long params[8];
};



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] jffs2: Allow selection of compression mode via a sysfs attribute

2007-02-28 Thread Artem Bityutskiy
Hi Richard,

On Wed, 2007-02-28 at 19:13 +, Richard Purdie wrote:
> +/* gives us jffs2_subsys */
> +static decl_subsys(jffs2, NULL, NULL);

There is actually a file-system subsys - look up for fs_subsys. It is
declared at fs/namespace.c.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Rafael J. Wysocki
On Wednesday, 28 February 2007 20:32, Oleg Nesterov wrote:
> On 02/28, Rafael J. Wysocki wrote:
> >
> > > --- workqueue.c.org   2007-02-28 18:32:48.0 +0530
> > > +++ workqueue.c   2007-02-28 18:44:23.0 +0530
> > > @@ -718,6 +718,8 @@ static void cleanup_workqueue_thread(str
> > >   insert_wq_barrier(cwq, , 1);
> > >   cwq->should_stop = 1;
> > >   alive = 1;
> > > + if (frozen(cwq->thread))
> > > + thaw(cwq->thread);
> > >   }
> > >   spin_unlock_irq(>lock);
> >
> > Unfortunately, the above code is mm-only.  Is the analogous fix for 
> > 2.6.21-rc2
> > viable?
> 
> I am sorry, I lost track of this problem. As for 2.6.21, 
> create_freezeable_workqueue
> doesn't work and conflict with suspend. Why can't we remove it from XFS as you
> suggested before?

Yes, we can (preparing a patch).  I was just curious. :-)

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] crypto: Add LZO compression support to the crypto interface

2007-02-28 Thread Richard Purdie
Add LZO1X compression support to the crypto interface, including
a couple of tests.

Also convert test_deflate into a more generic test_compress() and
avoid duplicating the data for compression and decompression tests
since this can always work both ways in the compression case.

Signed-off-by: Richard Purdie <[EMAIL PROTECTED]>

---
 crypto/Kconfig  |8 +++
 crypto/Makefile |1 
 crypto/lzo.c|  120 
 crypto/tcrypt.c |   43 
 crypto/tcrypt.h |   75 +++
 5 files changed, 190 insertions(+), 57 deletions(-)

Index: linux/crypto/Kconfig
===
--- linux.orig/crypto/Kconfig   2007-02-28 18:12:17.0 +
+++ linux/crypto/Kconfig2007-02-28 18:12:32.0 +
@@ -406,6 +406,14 @@ config CRYPTO_DEFLATE
  
  You will most probably want this if using IPSec.
 
+config CRYPTO_LZO
+   tristate "LZO compression algorithm"
+   depends on CRYPTO
+   select LZO
+   help
+ Enable use of the LZO compression algorithm through the crypto
+ subsystem.
+
 config CRYPTO_MICHAEL_MIC
tristate "Michael MIC keyed digest algorithm"
select CRYPTO_ALGAPI
Index: linux/crypto/Makefile
===
--- linux.orig/crypto/Makefile  2007-02-28 18:12:17.0 +
+++ linux/crypto/Makefile   2007-02-28 18:12:32.0 +
@@ -44,6 +44,7 @@ obj-$(CONFIG_CRYPTO_TEA) += tea.o
 obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
 obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
+obj-$(CONFIG_CRYPTO_LZO) += lzo.o
 obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
 obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o
 
Index: linux/crypto/lzo.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux/crypto/lzo.c  2007-02-28 18:12:32.0 +
@@ -0,0 +1,120 @@
+/*
+ * Cryptographic API for LZO compression.
+ *
+ * Copyright (C) 2007 Nokia Corporation. All rights reserved.
+ *
+ * Author: Richard Purdie <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct lzo_ctx {
+   void *lzo_mem;
+};
+
+static int lzo_init(struct crypto_tfm *tfm)
+{
+   struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+   ctx->lzo_mem = vmalloc(LZO1X_MEM_COMPRESS);
+
+   if (!ctx->lzo_mem) {
+   vfree(ctx->lzo_mem);
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static void lzo_exit(struct crypto_tfm *tfm)
+{
+   struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+   vfree(ctx->lzo_mem);
+}
+
+static int lzo_compress(struct crypto_tfm *tfm, const u8 *src,
+   unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+   struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
+   unsigned long compress_size;
+   int ret;
+
+   /* Check if enough space in dst buffer for worst case expansion */
+   if (*dlen < lzo1x_worst_compress(slen))
+   return -EINVAL;
+
+   ret = lzo1x_1_compress(src, slen, dst, _size, ctx->lzo_mem);
+
+   if (ret != LZO_E_OK)
+   return -EINVAL;
+
+   *dlen = compress_size;
+
+   return 0;
+}
+
+static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+   int ret;
+
+   ret = lzo1x_decompress_safe(src, slen, dst, dlen, NULL);
+
+   if (ret != LZO_E_OK)
+   return -EINVAL;
+
+   return 0;
+}
+
+static struct crypto_alg alg = {
+   .cra_name   = "lzo1x",
+   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
+   .cra_ctxsize= sizeof(struct lzo_ctx),
+   .cra_module = THIS_MODULE,
+   .cra_list   = LIST_HEAD_INIT(alg.cra_list),
+   .cra_init   = lzo_init,
+   .cra_exit   = lzo_exit,
+   .cra_u  = { .compress = {
+   .coa_compress   = lzo_compress,
+   .coa_decompress = lzo_decompress } }
+};
+
+static int __init init(void)
+{
+   return crypto_register_alg();

[PATCH 5/5] jffs2: Allow selection of compression mode via a sysfs attribute

2007-02-28 Thread Richard Purdie
Allow selection of the compression mode for jffs2 via a sysfs 
attribute. This establishes a sysfs presence for jffs2 through
which other compression options could easily be exported too.

Signed-off-by: Richard Purdie <[EMAIL PROTECTED]>

---
 fs/jffs2/compr.c |  131 +++
 1 file changed, 94 insertions(+), 37 deletions(-)

Index: linux/fs/jffs2/compr.c
===
--- linux.orig/fs/jffs2/compr.c 2007-02-28 18:12:33.0 +
+++ linux/fs/jffs2/compr.c  2007-02-28 18:12:33.0 +
@@ -13,6 +13,7 @@
  *
  */
 
+#include 
 #include "compr.h"
 
 static DEFINE_SPINLOCK(jffs2_compressor_list_lock);
@@ -298,6 +299,43 @@ int jffs2_unregister_compressor(struct j
 return 0;
 }
 
+char *jffs2_get_compression_mode_name(void)
+{
+switch (jffs2_compression_mode) {
+case JFFS2_COMPR_MODE_NONE:
+return "none";
+case JFFS2_COMPR_MODE_PRIORITY:
+return "priority";
+case JFFS2_COMPR_MODE_SIZE:
+return "size";
+   case JFFS2_COMPR_MODE_FAVOURLZO:
+   return "favourlzo";
+}
+return "unkown";
+}
+
+int jffs2_set_compression_mode_name(const char *name)
+{
+if (!strncmp("none", name, 4)) {
+jffs2_compression_mode = JFFS2_COMPR_MODE_NONE;
+return 0;
+}
+if (!strncmp("priority", name, 8)) {
+jffs2_compression_mode = JFFS2_COMPR_MODE_PRIORITY;
+return 0;
+}
+if (!strncmp("size", name, 4)) {
+jffs2_compression_mode = JFFS2_COMPR_MODE_SIZE;
+return 0;
+}
+   if (!strncmp("favourlzo", name, 9)) {
+   jffs2_compression_mode = JFFS2_COMPR_MODE_FAVOURLZO;
+   return 0;
+   }
+return -EINVAL;
+}
+
+
 #ifdef CONFIG_JFFS2_PROC
 
 #define JFFS2_STAT_BUF_SIZE 16000
@@ -347,42 +385,6 @@ char *jffs2_stats(void)
 return buf;
 }
 
-char *jffs2_get_compression_mode_name(void)
-{
-switch (jffs2_compression_mode) {
-case JFFS2_COMPR_MODE_NONE:
-return "none";
-case JFFS2_COMPR_MODE_PRIORITY:
-return "priority";
-case JFFS2_COMPR_MODE_SIZE:
-return "size";
-case JFFS2_COMPR_MODE_FAVOURLZO:
-return "favourlzo";
-}
-return "unkown";
-}
-
-int jffs2_set_compression_mode_name(const char *name)
-{
-if (!strcmp("none",name)) {
-jffs2_compression_mode = JFFS2_COMPR_MODE_NONE;
-return 0;
-}
-if (!strcmp("priority",name)) {
-jffs2_compression_mode = JFFS2_COMPR_MODE_PRIORITY;
-return 0;
-}
-if (!strcmp("size",name)) {
-jffs2_compression_mode = JFFS2_COMPR_MODE_SIZE;
-return 0;
-}
-if (!strncmp("favourlzo", name, 9)) {
-jffs2_compression_mode = JFFS2_COMPR_MODE_FAVOURLZO;
-return 0;
-}
-return 1;
-}
-
 static int jffs2_compressor_Xable(const char *name, int disabled)
 {
 struct jffs2_compressor *this;
@@ -448,8 +450,54 @@ void jffs2_free_comprbuf(unsigned char *
 kfree(comprbuf);
 }
 
+static struct attribute jffs2_attr_mode = {
+   .name = "mode",
+   .mode = S_IRUGO | S_IWUSR,
+};
+
+static struct attribute *jffs2_attrs[] = {
+   _attr_mode,
+   NULL,
+};
+
+static ssize_t jffs2_attr_show(struct kobject *kobj, struct attribute *attr,
+   char *page)
+{
+   if (!strcmp("mode", attr->name))
+   return sprintf(page, "%s\n", jffs2_get_compression_mode_name());
+   return 0;
+}
+
+static ssize_t jffs2_attr_store(struct kobject *kobj, struct attribute *attr,
+   const char *page, size_t count)
+{
+   int ret = -EINVAL;
+
+   if (!strcmp("mode", attr->name)) {
+   ret = jffs2_set_compression_mode_name(page);
+   if (ret >= 0)
+   return count;
+   }
+   return ret;
+}
+
+static struct sysfs_ops jffs2_sysfs_ops = {
+   .show   =   jffs2_attr_show,
+   .store  =   jffs2_attr_store,
+};
+
+static struct kobj_type jffs2_subsys_type = {
+   .default_attrs  = jffs2_attrs,
+   .sysfs_ops  = _sysfs_ops,
+};
+
+/* gives us jffs2_subsys */
+static decl_subsys(jffs2, NULL, NULL);
+
 int __init jffs2_compressors_init(void)
 {
+   int ret;
+
 /* Registering compressors */
 #ifdef CONFIG_JFFS2_ZLIB
 jffs2_zlib_init();
@@ -481,12 +529,21 @@ int __init jffs2_compressors_init(void)
 #endif
 #endif
 #endif
+   /* Errors here are not fatal */
+   kset_set_kset_s(_subsys, fs_subsys);
+   jffs2_subsys.kset.kobj.ktype = _subsys_type;
+   ret = subsystem_register(_subsys);
+   if (ret)
+   printk(KERN_WARNING "Error registering 

[PATCH 4/5] jffs2: Add a "favourlzo" compression mode to jffs2

2007-02-28 Thread Richard Purdie
Add a "favourlzo" compression mode to jffs2 which tries to 
optimise by size but gives lzo an advantage when comparing sizes.
This means the faster lzo algorithm can be preferred when there
isn't much difference in compressed size (the exact threshold can
be changed).

Signed-off-by: Richard Purdie <[EMAIL PROTECTED]>

---
 fs/Kconfig   |7 +++
 fs/jffs2/compr.c |   51 ++-
 fs/jffs2/compr.h |3 +++
 3 files changed, 56 insertions(+), 5 deletions(-)

Index: linux/fs/Kconfig
===
--- linux.orig/fs/Kconfig   2007-02-28 18:12:31.0 +
+++ linux/fs/Kconfig2007-02-28 18:12:33.0 +
@@ -1359,6 +1359,13 @@ config JFFS2_CMODE_SIZE
   Tries all compressors and chooses the one which has the smallest
   result.
 
+config JFFS2_CMODE_FAVOURLZO
+bool "Favour LZO"
+help
+  Tries all compressors and chooses the one which has the smallest
+  result but gives some preference to LZO (which has faster
+ decompression) at the expense of size.
+
 endchoice
 
 config CRAMFS
Index: linux/fs/jffs2/compr.c
===
--- linux.orig/fs/jffs2/compr.c 2007-02-28 18:12:31.0 +
+++ linux/fs/jffs2/compr.c  2007-02-28 18:13:09.0 +
@@ -26,6 +26,34 @@ static int jffs2_compression_mode = JFFS
 /* Statistics for blocks stored without compression */
 static uint32_t 
none_stat_compr_blocks=0,none_stat_decompr_blocks=0,none_stat_compr_size=0;
 
+
+/*
+ * Return 1 to use this compression
+ */
+static int jffs2_is_best_compression(struct jffs2_compressor *this,
+   struct jffs2_compressor *best, uint32_t size, uint32_t bestsize)
+{
+   switch (jffs2_compression_mode) {
+   case JFFS2_COMPR_MODE_SIZE:
+   if (bestsize > size)
+   return 1;
+   return 0;
+   case JFFS2_COMPR_MODE_FAVOURLZO:
+   if ((this->compr == JFFS2_COMPR_LZO) && (bestsize > 
size))
+   return 1;
+   if ((best->compr != JFFS2_COMPR_LZO) && (bestsize > 
size))
+   return 1;
+   if ((this->compr == JFFS2_COMPR_LZO) && (bestsize > 
(size * FAVOUR_LZO_PERCENT / 100)))
+   return 1;
+   if ((bestsize * FAVOUR_LZO_PERCENT / 100) > size)
+   return 1;
+
+   return 0;
+   }
+   /* Shouldn't happen */
+   return 0;
+}
+
 /* jffs2_compress:
  * @data: Pointer to uncompressed data
  * @cdata: Pointer to returned pointer to buffer for compressed data
@@ -91,6 +119,7 @@ uint16_t jffs2_compress(struct jffs2_sb_
 if (ret == JFFS2_COMPR_NONE) kfree(output_buf);
 break;
 case JFFS2_COMPR_MODE_SIZE:
+case JFFS2_COMPR_MODE_FAVOURLZO:
 orig_slen = *datalen;
 orig_dlen = *cdatalen;
 spin_lock(_compressor_list_lock);
@@ -99,7 +128,7 @@ uint16_t jffs2_compress(struct jffs2_sb_
 if ((!this->compress)||(this->disabled))
 continue;
 /* Allocating memory for output buffer if necessary */
-if 
((this->compr_buf_sizecompr_buf)) {
+if 
((this->compr_buf_sizecompr_buf)) {
 spin_unlock(_compressor_list_lock);
 kfree(this->compr_buf);
 spin_lock(_compressor_list_lock);
@@ -108,15 +137,15 @@ uint16_t jffs2_compress(struct jffs2_sb_
 }
 if (!this->compr_buf) {
 spin_unlock(_compressor_list_lock);
-tmp_buf = kmalloc(orig_dlen,GFP_KERNEL);
+tmp_buf = kmalloc(orig_slen,GFP_KERNEL);
 spin_lock(_compressor_list_lock);
 if (!tmp_buf) {
-printk(KERN_WARNING "JFFS2: No memory 
for compressor allocation. (%d bytes)\n",orig_dlen);
+printk(KERN_WARNING "JFFS2: No memory 
for compressor allocation. (%d bytes)\n",orig_slen);
 continue;
 }
 else {
 this->compr_buf = tmp_buf;
-this->compr_buf_size = orig_dlen;
+this->compr_buf_size = orig_slen;
 }
 }
 this->usecount++;
@@ -127,7 +156,8 @@ uint16_t jffs2_compress(struct 

[PATCH 2/5] jffs2: Add LZO compression support to jffs2

2007-02-28 Thread Richard Purdie
Add LZO1X compression/decompression support to jffs2.

LZO's interface doesn't entirely match that required by jffs2 so a 
buffer and memcpy is unavoidable.

Signed-off-by: Richard Purdie <[EMAIL PROTECTED]>

---
 fs/Kconfig|   10 
 fs/jffs2/Makefile |1 
 fs/jffs2/compr.c  |6 ++
 fs/jffs2/compr.h  |3 -
 fs/jffs2/compr_lzo.c  |  120 ++
 include/linux/jffs2.h |1 
 6 files changed, 140 insertions(+), 1 deletion(-)

Index: linux/fs/Kconfig
===
--- linux.orig/fs/Kconfig   2007-02-28 18:12:17.0 +
+++ linux/fs/Kconfig2007-02-28 18:13:10.0 +
@@ -1310,6 +1310,16 @@ config JFFS2_ZLIB
 
   Say 'Y' if unsure.
 
+config JFFS2_LZO
+   bool "JFFS2 LZO compression support" if JFFS2_COMPRESSION_OPTIONS
+   select LZO
+   depends on JFFS2_FS
+   default y
+help
+  minilzo-based compression. Generally works better than Zlib.
+
+  Say 'Y' if unsure.
+
 config JFFS2_RTIME
bool "JFFS2 RTIME compression support" if JFFS2_COMPRESSION_OPTIONS
depends on JFFS2_FS
Index: linux/fs/jffs2/Makefile
===
--- linux.orig/fs/jffs2/Makefile2007-02-28 18:12:17.0 +
+++ linux/fs/jffs2/Makefile 2007-02-28 18:12:31.0 +
@@ -18,4 +18,5 @@ jffs2-$(CONFIG_JFFS2_FS_POSIX_ACL)+= ac
 jffs2-$(CONFIG_JFFS2_RUBIN)+= compr_rubin.o
 jffs2-$(CONFIG_JFFS2_RTIME)+= compr_rtime.o
 jffs2-$(CONFIG_JFFS2_ZLIB) += compr_zlib.o
+jffs2-$(CONFIG_JFFS2_LZO)  += compr_lzo.o
 jffs2-$(CONFIG_JFFS2_SUMMARY)   += summary.o
Index: linux/fs/jffs2/compr.c
===
--- linux.orig/fs/jffs2/compr.c 2007-02-28 18:12:17.0 +
+++ linux/fs/jffs2/compr.c  2007-02-28 18:13:10.0 +
@@ -425,6 +425,9 @@ int __init jffs2_compressors_init(void)
 jffs2_rubinmips_init();
 jffs2_dynrubin_init();
 #endif
+#ifdef CONFIG_JFFS2_LZO
+jffs2_lzo_init();
+#endif
 /* Setting default compression mode */
 #ifdef CONFIG_JFFS2_CMODE_NONE
 jffs2_compression_mode = JFFS2_COMPR_MODE_NONE;
@@ -443,6 +446,9 @@ int __init jffs2_compressors_init(void)
 int jffs2_compressors_exit(void)
 {
 /* Unregistering compressors */
+#ifdef CONFIG_JFFS2_LZO
+jffs2_lzo_exit();
+#endif
 #ifdef CONFIG_JFFS2_RUBIN
 jffs2_dynrubin_exit();
 jffs2_rubinmips_exit();
Index: linux/fs/jffs2/compr_lzo.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux/fs/jffs2/compr_lzo.c  2007-02-28 18:12:31.0 +
@@ -0,0 +1,120 @@
+/*
+ * JFFS2 LZO Compression Interface
+ *
+ * Copyright (C) 2007 Nokia Corporation. All rights reserved.
+ *
+ * Author: Richard Purdie <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "compr.h"
+
+static void *lzo_mem;
+static void *lzo_compress_buf;
+static DEFINE_MUTEX(deflate_mutex);
+
+static void free_workspace(void)
+{
+   vfree(lzo_mem);
+   vfree(lzo_compress_buf);
+}
+
+static int __init alloc_workspace(void)
+{
+   lzo_mem = vmalloc(LZO1X_MEM_COMPRESS);
+   lzo_compress_buf = vmalloc(lzo1x_worst_compress(PAGE_SIZE));
+
+   if (!lzo_mem || !lzo_compress_buf) {
+   printk(KERN_WARNING "Failed to allocate lzo deflate 
workspace\n");
+   free_workspace();
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static int jffs2_lzo_compress(unsigned char *data_in, unsigned char *cpage_out,
+ uint32_t *sourcelen, uint32_t *dstlen, void 
*model)
+{
+   unsigned long compress_size;
+   int ret;
+
+   mutex_lock(_mutex);
+   ret = lzo1x_1_compress(data_in, *sourcelen, lzo_compress_buf, 
_size, lzo_mem);
+   mutex_unlock(_mutex);
+
+   if (ret != LZO_E_OK)
+   return -1;
+
+   if (compress_size > *dstlen)
+   return -1;
+
+   memcpy(cpage_out, lzo_compress_buf, compress_size);
+   *dstlen = compress_size;
+
+   return 0;
+}
+
+static int jffs2_lzo_decompress(unsigned char 

[PATCH 1/5] Add LZO compression support to the kernel

2007-02-28 Thread Richard Purdie
Add LZO1X compression/decompression support to the kernel.

This is based on the standard userspace lzo library, particularly 
minilzo with the headers much trimmed down and simplified for kernel
use. Its structured so that it should still diff with the userspace
version for ease of future updating.

Signed-off-by: Richard Purdie <[EMAIL PROTECTED]>

---
 include/linux/lzo.h |   63 +
 lib/Kconfig |5 
 lib/Makefile|1 
 lib/lzo/Makefile|3 
 lib/lzo/lzoconf.h   |  186 +
 lib/lzo/lzodefs.h   |  463 +
 lib/lzo/lzointf.c   |   37 +
 lib/lzo/minilzo.c   | 1771

 8 files changed, 2529 insertions(+)

http://folks.o-hand.com/richard/lzo/lzo_kernel.patch
(since it exceeds the file size limit for LKML)

I can email inline if anyone prefers it that way.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] Add LZO Compression

2007-02-28 Thread Richard Purdie
The following patch series adds LZO compression support to the kernel
and exposes it in a variety of places (jffs2, crypto).

This is particularly useful for jffs2 where significant boot time
speedups (~10%) and file read speed improvements (~40%) are seen when
its used with only a slight drop in file compression ratio.

It also adds a favourlzo mode to jffs2 which is similar to the existing
size mode but lets lzo compression win if the lzo compressed size is
"similar" to but not the best compression ratio. This means we can keep
zlib compression where it makes a significant difference to compressed
file size.

The final jffs2 patch which starts adding sysfs support is something I
have around from testing and I'm including it for comments to see if its
desirable upstream. It could be extended further to allow greater
control of jffs2 at runtime.

Richard





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/3] Freezer: Fix vfork problem

2007-02-28 Thread Rafael J. Wysocki
On Wednesday, 28 February 2007 12:00, Oleg Nesterov wrote:
> On 02/28, Rafael J. Wysocki wrote:
> >
> > On Wednesday, 28 February 2007 02:23, Srivatsa Vaddagiri wrote:
> > > On Wed, Feb 28, 2007 at 12:53:14AM +0300, Oleg Nesterov wrote:
> > > > I think it is good. Srivatsa?
> > > 
> > > Maybe additional comments on why we don't skip vfork kernel tasks may be 
> > > good. 
> > 
> > Which is because we don't want the kernel threads to be frozen in unexpected
> > places, so we allow them to block freeze_processes() instead or to set
> > PF_NOFREEZE?
> 
> ... and because in fact it won't block freeze_processes(), 
> call_usermodehelper
> (the child) does a minimum before exec/exit, and it can't be frozen until it 
> wakes
> up the parent.

Okay, I have added a comment to freezer.h.  Please have a look.

Rafael

---
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Currently try_to_freeze_tasks() has to wait until all of the vforked processes
exit and for this reason every user can make it fail.  To fix this problem
we can introduce the additional process flag PF_FREEZER_SKIP to be used by tasks
that do not want to be counted as freezable by the freezer and want to have
TIF_FREEZE set nevertheless.  Then, this flag can be set by tasks using
sys_vfork() before they call wait_for_completion() and cleared after they have
woken up.  After clearing it, the tasks should call try_to_freeze() as soon as
possible.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
 include/linux/freezer.h |   48 ++--
 include/linux/sched.h   |1 +
 kernel/fork.c   |3 +++
 kernel/power/process.c  |   27 ---
 4 files changed, 58 insertions(+), 21 deletions(-)

Index: linux-2.6.20-mm2/include/linux/sched.h
===
--- linux-2.6.20-mm2.orig/include/linux/sched.h
+++ linux-2.6.20-mm2/include/linux/sched.h
@@ -1189,6 +1189,7 @@ static inline void put_task_struct(struc
 #define PF_SPREAD_SLAB 0x0200  /* Spread some slab caches over cpuset 
*/
 #define PF_MEMPOLICY   0x1000  /* Non-default NUMA mempolicy */
 #define PF_MUTEX_TESTER0x2000  /* Thread belongs to the rt 
mutex tester */
+#define PF_FREEZER_SKIP0x4000  /* Freezer should not count it 
as freezeable */
 
 /*
  * Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6.20-mm2/include/linux/freezer.h
===
--- linux-2.6.20-mm2.orig/include/linux/freezer.h
+++ linux-2.6.20-mm2/include/linux/freezer.h
@@ -75,7 +75,49 @@ static inline int try_to_freeze(void)
return 0;
 }
 
-extern void thaw_some_processes(int all);
+/*
+ * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
+ * calls wait_for_completion() and reset right after it returns from this
+ * function.  Next, the parent should call try_to_freeze() to freeze itself
+ * appropriately in case the child has exited before the freezing of tasks is
+ * complete.  However, we don't want kernel threads to be frozen in unexpected
+ * places, so we allow them to block freeze_processes() instead or to set
+ * PF_NOFREEZE if needed and PF_FREEZER_SKIP is only set for userland vfork
+ * parents.  Fortunately, in the call_usermodehelper() case the parent 
won't
+ * really block freeze_processes(), since call_usermodehelper() (the child)
+ * does a little before exec/exit and it can't be frozen before waking up the
+ * parent.
+ */
+
+/*
+ * If the current task is a user space one, tell the freezer not to count it as
+ * freezable.
+ */
+static inline void freezer_do_not_count(void)
+{
+   if (current->mm)
+   current->flags |= PF_FREEZER_SKIP;
+}
+
+/*
+ * If the current task is a user space one, tell the freezer to count it as
+ * freezable again and try to freeze it.
+ */
+static inline void freezer_count(void)
+{
+   if (current->mm) {
+   current->flags &= ~PF_FREEZER_SKIP;
+   try_to_freeze();
+   }
+}
+
+/*
+ * Check if the task should be counted as freezeable by the freezer
+ */
+static inline int freezer_should_skip(struct task_struct *p)
+{
+   return !!(p->flags & PF_FREEZER_SKIP);
+}
 
 #else
 static inline int frozen(struct task_struct *p) { return 0; }
@@ -90,5 +132,7 @@ static inline void thaw_processes(void) 
 
 static inline int try_to_freeze(void) { return 0; }
 
-
+static inline void freezer_do_not_count(void) {}
+static inline void freezer_count(void) {}
+static inline int freezer_should_skip(struct task_struct *p) { return 0; }
 #endif
Index: linux-2.6.20-mm2/kernel/fork.c
===
--- linux-2.6.20-mm2.orig/kernel/fork.c
+++ linux-2.6.20-mm2/kernel/fork.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1393,7 +1394,9 @@ long do_fork(unsigned long 

Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Kumar Gala


On Feb 28, 2007, at 1:30 PM, Timur Tabi wrote:


H. Peter Anvin wrote:

Kumar Gala wrote:


Why don't we allocate the 2nd group of four as well, just at a  
new location.  They'll be discontinuous, but at least we'll have  
support for all 8.



Right, it means two tty driver structures, but that's not a problem.


Eh, I'm not crazy about that.  That means that I have to complicate  
my driver because someone else screwed up a long time ago.


If not you someone else.  The cost in the driver is small compared to  
fixing up all the distro's and such.  If you don't provide this  
change someone else will.


- k
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Oleg Nesterov
On 02/28, Rafael J. Wysocki wrote:
>
> > --- workqueue.c.org 2007-02-28 18:32:48.0 +0530
> > +++ workqueue.c 2007-02-28 18:44:23.0 +0530
> > @@ -718,6 +718,8 @@ static void cleanup_workqueue_thread(str
> > insert_wq_barrier(cwq, , 1);
> > cwq->should_stop = 1;
> > alive = 1;
> > +   if (frozen(cwq->thread))
> > +   thaw(cwq->thread);
> > }
> > spin_unlock_irq(>lock);
>
> Unfortunately, the above code is mm-only.  Is the analogous fix for 2.6.21-rc2
> viable?

I am sorry, I lost track of this problem. As for 2.6.21, 
create_freezeable_workqueue
doesn't work and conflict with suspend. Why can't we remove it from XFS as you
suggested before?

Iirc,
On 02/28, Nigel Cunningham wrote:
>
> On Wed, 2007-02-28 at 01:08 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, 28 February 2007 01:01, Johannes Berg wrote:
> > > On Wed, 2007-02-28 at 00:57 +0100, Rafael J. Wysocki wrote:
> > >
> > > > Okay, in that case I'd suggest removing 
create_freezeable_workqueue() and
> > > > make all workqueues nonfreezable once again for 2.6.21 (as far 
as I know, only
> > > > the two XFS workqueues are affected).
> > >
> > > I think Nigel might object but I forgot what specific trouble XFS 
was
> > > causing him.
> >
> > We suspected that the XFS' worker threads might commit I/O after
> > freeze_processes() has returned, but that hasn't been supported by 
evidence,
> > as far as I can recall.
> >
> > Also, making them freezable was controversial ...
>
> Controversy is no reason to give in! Nevertheless, I think you're 
right
> - I believe the XFS guys said they fixed the issue that had caused I/O
> to be submitted post-freeze. Well, we'll see if it appears again, 
won't
> we?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Timur Tabi

H. Peter Anvin wrote:

Kumar Gala wrote:


Why don't we allocate the 2nd group of four as well, just at a new 
location.  They'll be discontinuous, but at least we'll have support 
for all 8.




Right, it means two tty driver structures, but that's not a problem.


Eh, I'm not crazy about that.  That means that I have to complicate my driver because 
someone else screwed up a long time ago.


--
Timur Tabi
Linux Kernel Developer @ Freescale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread H. Peter Anvin

Segher Boessenkool wrote:

Just allocate the four slots and we'll deal with
anything above this in custom products.


Another option is to use 46..49 for UARTs #0..3,
and 192..195 for UARTs #4..7.

Or, perhaps better, use 46..49 for #0..3, and
192..199 for #0..7, handling the duplication in
the driver; and deprecate the old range.


That sounds like more hassle than it's worth.  The discontinuous range 
may be annoying, but it isn't really a huge amount of code.


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Segher Boessenkool

Just allocate the four slots and we'll deal with
anything above this in custom products.


Another option is to use 46..49 for UARTs #0..3,
and 192..195 for UARTs #4..7.

Or, perhaps better, use 46..49 for #0..3, and
192..199 for #0..7, handling the duplication in
the driver; and deprecate the old range.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]: Fix __init declarations in Compaq SMART2 Controller driver

2007-02-28 Thread Prarit Bhargava
Fix __init declarations in Compaq SMART2 Controller driver.

Resolves MODPOST warnings similar to:

WARNING: drivers/block/cpqarray.o - Section mismatch: reference to
.init.text:cpqarray_init_one from .data.rel.local between 'cpqarray_pci_driver'
(at offset 0x20) and 'smart1_access'

Signed-off-by: Prarit Bhargava <[EMAIL PROTECTED]>

--- linux-2.6.18.ia64.orig/drivers/block/cpqarray.c 2007-02-14 
11:36:20.0 -0500
+++ linux-2.6.18.ia64/drivers/block/cpqarray.c  2007-02-14 13:08:57.0 
-0500
@@ -212,7 +212,7 @@ static struct proc_dir_entry *proc_array
  * Get us a file in /proc/array that says something about each controller.
  * Create /proc/array if it doesn't exist yet.
  */
-static void __init ida_procinit(int i)
+static void __devinit ida_procinit(int i)
 {
if (proc_array == NULL) {
proc_array = proc_mkdir("cpqarray", proc_root_driver);
@@ -390,7 +390,7 @@ static void __devexit cpqarray_remove_on
 }
 
 /* pdev is NULL for eisa */
-static int __init cpqarray_register_ctlr( int i, struct pci_dev *pdev)
+static int __devinit cpqarray_register_ctlr( int i, struct pci_dev *pdev)
 {
request_queue_t *q;
int j;
@@ -511,7 +511,7 @@ Enomem4:
return -1;
 }
 
-static int __init cpqarray_init_one( struct pci_dev *pdev,
+static int __devinit cpqarray_init_one( struct pci_dev *pdev,
const struct pci_device_id *ent)
 {
int i;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch - v3] epoll ready set loops diet ...

2007-02-28 Thread Davide Libenzi
On Wed, 28 Feb 2007, Eric Dumazet wrote:

> On Wednesday 28 February 2007 19:37, Davide Libenzi wrote:
> 
> > +   list_del(>rdllink);
> > +   if (!(epi->event.events & EPOLLET) && (revents & 
> > epi->event.events))
> > +   list_add_tail(>rdllink, );
> > +   else {
> 
> Is the ( ... & epi->event.events) really necessary ? (It seems already done)

Yes, look here:

if (epi->event.events & EPOLLONESHOT)
 epi->event.events &= EP_PRIVATE_BITS;

Oneshot events should not be requeued.


> I was wrong about the size of epitem : it is now 68 bytes instead of 72.
> At least we now use/dirty one cache line instead of two per epitem.
> 
> Do you have another brilliant idea to shrink 4 more bytes ? :)

I don't think we can cleanly shove more stuff out of it.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-02-28 Thread Chuck Ebbert
Chuck Ebbert wrote:
> There are two patches for raid5/6 out there that might fix this. I'll
> attach them (the second just fixes a minor bug in the first one.)

Never mind, those patches are already in 2.6.21-rc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread H. Peter Anvin

Kumar Gala wrote:


Why don't we allocate the 2nd group of four as well, just at a new 
location.  They'll be discontinuous, but at least we'll have support for 
all 8.




Right, it means two tty driver structures, but that's not a problem.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Timur Tabi

Dan Malek wrote:


Just allocate the four slots and we'll deal with
anything above this in custom products. 


Assuming that this is the agreed-upon standard, should I arbitrarily restrict my driver to 
4 ports, or allow all 8?


I assume that if a driver already claims a particular major/minor combo, then when the 2nd 
driver calls uart_add_one_port(), that call will fail?


--
Timur Tabi
Linux Kernel Developer @ Freescale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lanana: Add major/minor entries for PPC QE UART devices

2007-02-28 Thread Kumar Gala


On Feb 28, 2007, at 12:20 PM, Dan Malek wrote:



On Feb 28, 2007, at 1:00 PM, H. Peter Anvin wrote:


I would much rather see these devices moved to a different minor
range.


No.  We just did that all too recently, and
i don't know why the minors didn't get
allocated properly.  I don't want to have to
update all of our embedded software distributions
just because someone doesn't like minor
numbers that aren't causing trouble.
If we allocate unique spaces for all of the
possible UART variations, there isn't going
to be enough space.

Just allocate the four slots and we'll deal with
anything above this in custom products.  Using
more than four of these processor resources
as UARTs isn't likely to happen because there
won't be anything left for the interesting
communication ports.


Why don't we allocate the 2nd group of four as well, just at a new  
location.  They'll be discontinuous, but at least we'll have support  
for all 8.


- k 
-

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Rafael J. Wysocki
On Wednesday, 28 February 2007 14:17, Srivatsa Vaddagiri wrote:
> On Wed, Feb 28, 2007 at 12:11:03PM +0100, Rafael J. Wysocki wrote:
> > > In addition to thawing worker thread before kthread_stopping it, there
> > > are minor changes required in worker threads, to check for
> > > is_cpu_offline(bind_cpu) when they come out of refrigerator and jump to
> > > wait_to_die if so (ex: softirq.c).
> > > 
> > > I guess you would need these changes before freezer-based hotplug is
> > > merged, in which case Gautham can send those patches out first.
> > 
> > Yes, please, if that's possible.
> 
> After looking at the current workqueue code, the above minor change I
> suggested is not required.
> 
> So you should be able to fix your "kthread_stop on a frozen worker
> thread hangs" problem by just a simple patch like this (against
> 2.6.20-mm2):
> 
> 
> --- workqueue.c.org   2007-02-28 18:32:48.0 +0530
> +++ workqueue.c   2007-02-28 18:44:23.0 +0530
> @@ -718,6 +718,8 @@ static void cleanup_workqueue_thread(str
>   insert_wq_barrier(cwq, , 1);
>   cwq->should_stop = 1;
>   alive = 1;
> + if (frozen(cwq->thread))
> + thaw(cwq->thread);
>   }
>   spin_unlock_irq(>lock);
>  
> 
> Can you test with this?

Unfortunately, the above code is mm-only.  Is the analogous fix for 2.6.21-rc2
viable?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch - v3] epoll ready set loops diet ...

2007-02-28 Thread Eric Dumazet
On Wednesday 28 February 2007 19:37, Davide Libenzi wrote:

> + list_del(>rdllink);
> + if (!(epi->event.events & EPOLLET) && (revents & 
> epi->event.events))
> + list_add_tail(>rdllink, );
> + else {

Is the ( ... & epi->event.events) really necessary ? (It seems already done)

I was wrong about the size of epitem : it is now 68 bytes instead of 72.
At least we now use/dirty one cache line instead of two per epitem.

Do you have another brilliant idea to shrink 4 more bytes ? :)

It seems to me 'nwait' is only used at init time (so that 
ep_ptable_queue_proc() can signal an error occured).

Maybe another mechanism could let us delete nwait from epitem ?

We could use a field in task_struct for example (see usage of total_link_count 
for example)

Thank you
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]: __init to __cpuinit in mtrr code

2007-02-28 Thread Prarit Bhargava
(Resending to wider audience)

__init to __cpuinit in mtrr code.

Resolves warnings similar to:

WARNING: vmlinux - Section mismatch: reference to .init.text:mtrr_bp_init from 
.text between 'identify_cpu' (at offset 0xc040b38e) and 'detect_ht'

Signed-off-by: Prarit Bhargava <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/cpu/mtrr/amd.c b/arch/i386/kernel/cpu/mtrr/amd.c
index 0949cdb..375752a 100644
--- a/arch/i386/kernel/cpu/mtrr/amd.c
+++ b/arch/i386/kernel/cpu/mtrr/amd.c
@@ -112,7 +112,7 @@ static struct mtrr_ops amd_mtrr_ops = {
.have_wrcomb   = positive_have_wrcomb,
 };
 
-int __init amd_init_mtrr(void)
+int __cpuinit amd_init_mtrr(void)
 {
set_mtrr_ops(_mtrr_ops);
return 0;
diff --git a/arch/i386/kernel/cpu/mtrr/centaur.c 
b/arch/i386/kernel/cpu/mtrr/centaur.c
index cb9aa3a..8b61016 100644
--- a/arch/i386/kernel/cpu/mtrr/centaur.c
+++ b/arch/i386/kernel/cpu/mtrr/centaur.c
@@ -215,7 +215,7 @@ static struct mtrr_ops centaur_mtrr_ops = {
.have_wrcomb   = positive_have_wrcomb,
 };
 
-int __init centaur_init_mtrr(void)
+int __cpuinit centaur_init_mtrr(void)
 {
set_mtrr_ops(_mtrr_ops);
return 0;
diff --git a/arch/i386/kernel/cpu/mtrr/cyrix.c 
b/arch/i386/kernel/cpu/mtrr/cyrix.c
index 0737a59..df38d8c 100644
--- a/arch/i386/kernel/cpu/mtrr/cyrix.c
+++ b/arch/i386/kernel/cpu/mtrr/cyrix.c
@@ -370,7 +370,7 @@ static struct mtrr_ops cyrix_mtrr_ops = {
.have_wrcomb   = positive_have_wrcomb,
 };
 
-int __init cyrix_init_mtrr(void)
+int __cpuinit cyrix_init_mtrr(void)
 {
set_mtrr_ops(_mtrr_ops);
return 0;
diff --git a/arch/i386/kernel/cpu/mtrr/generic.c 
b/arch/i386/kernel/cpu/mtrr/generic.c
index f77fc53..fd97f84 100644
--- a/arch/i386/kernel/cpu/mtrr/generic.c
+++ b/arch/i386/kernel/cpu/mtrr/generic.c
@@ -30,14 +30,14 @@ static __initdata int mtrr_show;
 module_param_named(show, mtrr_show, bool, 0);
 
 /*  Get the MSR pair relating to a var range  */
-static void __init
+static void __cpuinit
 get_mtrr_var_range(unsigned int index, struct mtrr_var_range *vr)
 {
rdmsr(MTRRphysBase_MSR(index), vr->base_lo, vr->base_hi);
rdmsr(MTRRphysMask_MSR(index), vr->mask_lo, vr->mask_hi);
 }
 
-static void __init
+static void __cpuinit
 get_fixed_ranges(mtrr_type * frs)
 {
unsigned int *p = (unsigned int *) frs;
@@ -60,7 +60,7 @@ static void __init print_fixed(unsigned base, unsigned step, 
const mtrr_type*typ
 }
 
 /*  Grab all of the MTRR state for this CPU into *state  */
-void __init get_mtrr_state(void)
+void __cpuinit get_mtrr_state(void)
 {
unsigned int i;
struct mtrr_var_range *vrs;
diff --git a/arch/i386/kernel/cpu/mtrr/main.c b/arch/i386/kernel/cpu/mtrr/main.c
index 0acfb6a..cdbca55 100644
--- a/arch/i386/kernel/cpu/mtrr/main.c
+++ b/arch/i386/kernel/cpu/mtrr/main.c
@@ -103,7 +103,7 @@ static int have_wrcomb(void)
 }
 
 /*  This function returns the number of variable MTRRs  */
-static void __init set_num_var_ranges(void)
+static void __cpuinit set_num_var_ranges(void)
 {
unsigned long config = 0, dummy;
 
@@ -116,7 +116,7 @@ static void __init set_num_var_ranges(void)
num_var_ranges = config & 0xff;
 }
 
-static void __init init_table(void)
+static void __cpuinit init_table(void)
 {
int i, max;
 
@@ -571,7 +571,7 @@ extern void amd_init_mtrr(void);
 extern void cyrix_init_mtrr(void);
 extern void centaur_init_mtrr(void);
 
-static void __init init_ifs(void)
+static void __cpuinit init_ifs(void)
 {
 #ifndef CONFIG_X86_64
amd_init_mtrr();
@@ -639,7 +639,7 @@ static struct sysdev_driver mtrr_sysdev_driver = {
  * initialized (i.e. before smp_init()).
  * 
  */
-void __init mtrr_bp_init(void)
+void __cpuinit mtrr_bp_init(void)
 {
init_ifs();
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fix locking in mousedev

2007-02-28 Thread Pete Zaitcev
If a process is closing /dev/input/mice and an mouse disconnects simulta-
neously, the system is likely to oops. This usually happens when someone
hits F1 or logs out from X, and flips a KVM while the system
is reacting.

I reproduced the issue by running this:
  while true; do cat /dev/input/mice; done
This way, it oopses on 2nd or 3rd disconnect reliably. With the patch,
I can disconnect the mouse 20 times.

Signed-off-by: Pete Zaitcev <[EMAIL PROTECTED]>

---

Discussion

One of the race scenarios is related to the list of handles. The cat
calls mousedev_close -> mixdev_release, does list_for_each to walk for
all handles for a given handler. Iterations are longish while it does
input_close_device -> hidinput_close -> usbhid_close -> usb_kill_urb,
which sleeps briefly. Into this gap goes khubd and does hid_disconnect ->
hidinput_disconnect -> input_unregister_device. This corrupts the list
of handles which cat process is walking.

I was unable to devise a scheme to protect the stock h_list adequately,
so I implemented a private list of mousedev instances, which can be
protected correctly.

Dmitry, please consider getting rid of the list of handles entirely.
The other major user is drivers/char/keyboard.c.

Other than that, the patch is straightforward. It adds a static mutex
to guard common data structures. It has to be static because instances
of mousedev share common structures, such as the mousedev_table[].

This should be uncontroversial, but please let me know if I missed
something obvious.

-- Pete

diff --git a/drivers/input/mousedev.c b/drivers/input/mousedev.c
index 664bcc8..2425c2a 100644
--- a/drivers/input/mousedev.c
+++ b/drivers/input/mousedev.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -64,6 +65,7 @@ struct mousedev {
char name[16];
wait_queue_head_t wait;
struct list_head list;
+   struct list_head h_node;
struct input_handle handle;
 
struct mousedev_hw_data packet;
@@ -108,10 +110,13 @@ static unsigned char mousedev_imps_seq[] = { 0xf3, 200, 
0xf3, 100, 0xf3, 80 };
 static unsigned char mousedev_imex_seq[] = { 0xf3, 200, 0xf3, 200, 0xf3, 80 };
 
 static struct input_handler mousedev_handler;
+static LIST_HEAD(mousedev_h_list);
 
 static struct mousedev *mousedev_table[MOUSEDEV_MINORS];
 static struct mousedev mousedev_mix;
 
+static DEFINE_MUTEX(mousedev_lock);
+
 #define fx(i)  (mousedev->old_x[(mousedev->pkt_count - (i)) & 03])
 #define fy(i)  (mousedev->old_y[(mousedev->pkt_count - (i)) & 03])
 
@@ -366,11 +371,9 @@ static void mousedev_free(struct mousedev *mousedev)
 
 static void mixdev_release(void)
 {
-   struct input_handle *handle;
-
-   list_for_each_entry(handle, _handler.h_list, h_node) {
-   struct mousedev *mousedev = handle->private;
+   struct mousedev *mousedev;
 
+   list_for_each_entry(mousedev, _h_list, h_node) {
if (!mousedev->open) {
if (mousedev->exist)
input_close_device(>handle);
@@ -386,6 +389,7 @@ static int mousedev_release(struct inode * inode, struct 
file * file)
 
mousedev_fasync(-1, file, 0);
 
+   mutex_lock(_lock);
list_del(>node);
 
if (!--list->mousedev->open) {
@@ -398,6 +402,7 @@ static int mousedev_release(struct inode * inode, struct 
file * file)
mousedev_free(list->mousedev);
}
}
+   mutex_unlock(_lock);
 
kfree(list);
return 0;
@@ -406,7 +411,6 @@ static int mousedev_release(struct inode * inode, struct 
file * file)
 static int mousedev_open(struct inode * inode, struct file * file)
 {
struct mousedev_list *list;
-   struct input_handle *handle;
struct mousedev *mousedev;
int i;
 
@@ -417,11 +421,16 @@ static int mousedev_open(struct inode * inode, struct 
file * file)
 #endif
i = iminor(inode) - MOUSEDEV_MINOR_BASE;
 
-   if (i >= MOUSEDEV_MINORS || !mousedev_table[i])
+   mutex_lock(_lock);
+   if (i >= MOUSEDEV_MINORS || !mousedev_table[i]) {
+   mutex_unlock(_lock);
return -ENODEV;
+   }
 
-   if (!(list = kzalloc(sizeof(struct mousedev_list), GFP_KERNEL)))
+   if (!(list = kzalloc(sizeof(struct mousedev_list), GFP_KERNEL))) {
+   mutex_unlock(_lock);
return -ENOMEM;
+   }
 
spin_lock_init(>packet_lock);
list->pos_x = xres / 2;
@@ -432,16 +441,16 @@ static int mousedev_open(struct inode * inode, struct 
file * file)
 
if (!list->mousedev->open++) {
if (list->mousedev->minor == MOUSEDEV_MIX) {
-   list_for_each_entry(handle, _handler.h_list, 
h_node) {
-   mousedev = handle->private;
+   list_for_each_entry(mousedev, _h_list, h_node) 
{
if (!mousedev->open && mousedev->exist)

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Chris Friesen

Davide Libenzi wrote:


struct async_syscall {
unsigned long nr_sysc;
unsigned long params[8];
long *result;
};

And what would async_wait() return bak? Pointers to "struct async_syscall"
or pointers to "result"?


Either one has downsides.  Pointer to struct async_syscall requires that 
the caller keep the struct around.  Pointer to result requires that the 
caller always reserve a location for the result.


Does the kernel care about the (possibly rare) case of callers that 
don't want to pay attention to result?  If so, what about adding some 
kind of caller-specified handle to struct async_syscall, and having 
async_wait() return the handle?  In the case where the caller does care 
about the result, the handle could just be the address of result.


Chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Johannes Berg
On Wed, 2007-02-28 at 10:51 -0800, Jean Tourrilhes wrote:

>   That's why I always specify the kernel version. I'll look into
> that, I'm sure it's not the end of the world ;-)

Sure, just wanted to point it out.

>   In which sense ? Wireless interface are regular netdevices.

Yeah but in mac80211 we have the wiphy concept since multiple virtual
interfaces can be associated to one hardware, and that is where QoS is
done, not the netdevs. Of course, those interested can just listen to
nl80211 events to figure out if someone renamed a 802.11 phy, but things
like hal would probably not want to and still know about the name
change.

>   I'm just trying to follow the established pattern. Both
> class_device_add() and class_device_del() are generating the
> event. Also, I'm not sure if other subsystem would benefit from it, I
> don't want to generate too many useless events.

I don't think many other subsystems (can) rename things ;)

johannes


signature.asc
Description: This is a digitally signed message part


Re: [patch] Add insmod option to force the use of the backup timer.

2007-02-28 Thread Dave Jones
On Wed, Feb 28, 2007 at 11:23:46AM +0100, Gerd Hoffmann wrote:
 > The test which automatically enables the backup timer on some HP
 > machines doesn't trigger on other hardware which needs the backup
 > timer too.

Did you figure out *why* that test doesn't trigger?
Making that work seems a better solution to me than adding magic
options that users won't know they have to use.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes
On Wed, Feb 28, 2007 at 10:16:05AM +0100, Johannes Berg wrote:
> Hi,
> 
> > Patch for 2.6.20 is attached.
> 
> ... and in the meantime netdevices aren't class_device any more :) IOW,
> your patch isn't going to work any more.

That's why I always specify the kernel version. I'll look into
that, I'm sure it's not the end of the world ;-)

> Also, I think wireless could benefit from this as well.

In which sense ? Wireless interface are regular netdevices.

> > The kobject framework is well designed, so adding these
> > features is trivial change and won't run the risk of breaking anything
> > (famous last words). Obviously, hotplug apps are free to ignore those
> > additional features.
> 
> Why not just add this to base kobject_rename instead? That way,
> userspace is notified for all renames in sysfs.
> The patch then collapses down to the change in net's sysfs code to add
> the ifindex to the environment, and another change in kobject to invoke
> a new event when a name changes and show the old name.

I'm just trying to follow the established pattern. Both
class_device_add() and class_device_del() are generating the
event. Also, I'm not sure if other subsystem would benefit from it, I
don't want to generate too many useless events.

> johannes

Thanks !

Jean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ecryptfs: check xattr operation support fix

2007-02-28 Thread Michael Halcrow
On Wed, Feb 28, 2007 at 08:05:16PM +0300, Dmitriy Monakhov wrote:
>   - ecryptfs_write_inode_size_to_metadata() error code was ignored.
>   - i_op->setxattr() must be supported by lower fs because used below.
> 
> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>

Acked-by: Michael Halcrow <[EMAIL PROTECTED]>

> ---
>  fs/ecryptfs/inode.c |6 +++---
>  fs/ecryptfs/mmap.c  |3 ++-
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 27fd14a..9ccefad 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -168,9 +168,9 @@ static int grow_file(struct dentry *ecryptfs_dentry, 
> struct file *lower_file,
>   goto out;
>   }
>   i_size_write(inode, 0);
> - ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, inode,
> -   ecryptfs_dentry,
> -   ECRYPTFS_LOWER_I_MUTEX_NOT_HELD);
> + rc = ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode,
> + inode, ecryptfs_dentry,
> + ECRYPTFS_LOWER_I_MUTEX_NOT_HELD);
>   ecryptfs_inode_to_private(inode)->crypt_stat.flags |= ECRYPTFS_NEW_FILE;
>  out:
>   return rc;
> diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
> index 1e5d2ba..416985f 100644
> --- a/fs/ecryptfs/mmap.c
> +++ b/fs/ecryptfs/mmap.c
> @@ -491,7 +491,8 @@ static int ecryptfs_write_inode_size_to_xattr(struct 
> inode *lower_inode,
>   goto out;
>   }
>   lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry);
> - if (!lower_dentry->d_inode->i_op->getxattr) {
> + if (!lower_dentry->d_inode->i_op->getxattr ||
> + !lower_dentry->d_inode->i_op->setxattr) {
>   printk(KERN_WARNING
>  "No support for setting xattr in lower filesystem\n");
>   rc = -ENOSYS;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)

2007-02-28 Thread Chuck Ebbert
Dan Williams wrote:
> I can reliably reproduce a null pointer dereference on 2.6.20 and
> 2.6.21-rc2.  I will keep digging to find the kernel version where this
> last worked, but wanted to see if there were any immediate experiments I
> should try.
> 
> The failure is caused by running tiobench on a MD raid6 array with 6 out
> of 8 disks available.  The commands I issued to reproduce this are:
> 
>   mdadm -A /dev/md0 /dev/sd[bcdefg]
>   mount /dev/md0 /mnt/raid
>   tiobench --numruns 5 --size 2048 --dir /mnt/raid
> 
> The filesystem is ext3.  The controller is an LSI 1068.  Here are the
> two BUG messages first 2.6.21-rc2 followed by 2.6.20.  I will reply to
> this message with the config.
> Kernel 2.6.20 on an i686
> 
> [  177.299787] BUG: unable to handle kernel NULL pointer dereference at 
> virtual address 005c
> [  177.308526]  printing eip:
> [  177.311287] c01de510
> [  177.313521] *pde = 34d40001
> [  177.316353] Oops:  [#1]
> [  177.319202] SMP 
> [  177.321107] Modules linked in: raid456 xor nfsd exportfs lockd nfs_acl 
> sunrpc autofs4 hidp l2cap bluetooth iptable_raw xt_policy xt_multiport 
> ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT 
> ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN 
> ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_tcpmss xt_pkttype xt_physdev 
> xt_NFQUEUE xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp 
> xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state 
> iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink 
> iptable_filter ip_tables x_tables video sbs i2c_ec dock button battery 
> asus_acpi ac radeon drm ipv6 lp parport_pc parport e1000 uhci_hcd floppy 
> mptsas mptscsih mptbase sg ehci_hcd scsi_transport_sas i2c_i801 i2c_core 
> pcspkr dm_snapshot dm_zero dm_mirror dm_mod ata_piix ata_generic libata 
> sd_mod scsi_mod ext3 jbd
> [  177.402252] CPU:2
> [  177.402253] EIP:0060:[]Not tainted VLI
> [  177.402253] EFLAGS: 00210016   (2.6.20 #5)
> [  177.414194] EIP is at cfq_dispatch_insert+0xb/0x53
> [  177.419056] eax: f7773ec0   ebx:    ecx: f7773cc0   edx: 
> [  177.425982] esi: f70abae0   edi: f7773cc0   ebp:    esp: f34dbcbc
> [  177.432953] ds: 007b   es: 007b   ss: 0068
> [  177.437127] Process tiotest (pid: 5405, ti=f34db000 task=f7efc030 
> task.ti=f34db000)
> [  177.444763] Stack: 0049 f77d3b9c f7773cc0  c01de6ce c014041e 
> f8a26806 0082 
> [  177.453456]f7efc030 fffe22d6    0004 
> f7efc030 f7773cc0 
> [  177.462121]   f70abae0 f7cd5800 f70abae0 
> c01d4fcc 0001 
> [  177.470798] Call Trace:
> [  177.473503]  [] cfq_dispatch_requests+0x12d/0x466
> [  177.479223]  [] __lock_acquire+0x9e9/0xa72
> [  177.484285]  [] scsi_request_fn+0x286/0x336 [scsi_mod]
> [  177.490485]  [] elv_next_request+0x1a2/0x1b2
> [  177.495766]  [] scsi_request_fn+0x286/0x336 [scsi_mod]
> [  177.501912]  [] _spin_lock_irq+0x38/0x43
> [  177.506840]  [] scsi_request_fn+0x59/0x336 [scsi_mod]
> [  177.512981]  [] blk_remove_plug+0x5a/0x66
> [  177.517983]  [] __generic_unplug_device+0x1d/0x1f
> [  177.523705]  [] generic_unplug_device+0x15/0x21
> [  177.529272]  [] unplug_slaves+0x54/0x88 [raid456]
> [  177.535013]  [] blk_backing_dev_unplug+0x73/0x7b
> [  177.540657]  [] _spin_unlock_irqrestore+0x3e/0x4d
> [  177.546382]  [] sync_page+0x0/0x3b
> [  177.550774]  [] trace_hardirqs_on+0x12e/0x158
> [  177.556108]  [] sync_page+0x0/0x3b
> [  177.560471]  [] block_sync_page+0x31/0x32
> [  177.565449]  [] sync_page+0x33/0x3b
> [  177.569916]  [] __wait_on_bit_lock+0x2a/0x52
> [  177.575201]  [] __lock_page+0x58/0x5e
> [  177.579810]  [] wake_bit_function+0x0/0x3c
> [  177.584905]  [] do_generic_mapping_read+0x1db/0x44f
> [  177.590911]  [] generic_file_aio_read+0x173/0x1a4
> [  177.596617]  [] file_read_actor+0x0/0xdb
> [  177.601525]  [] do_sync_read+0xc7/0x10a
> [  177.606365]  [] autoremove_wake_function+0x0/0x35
> [  177.612130]  [] do_sync_read+0x0/0x10a
> [  177.616867]  [] vfs_read+0xa6/0x152
> [  177.621362]  [] sys_read+0x41/0x67
> [  177.625794]  [] syscall_call+0x7/0xb
> [  177.630403]  ===
> [  177.634031] Code: da 11 3b c0 c7 04 24 51 9d 39 c0 e8 c9 a1 f4 ff e8 ca 6e 
> f2 ff ff 4f 34 83 c4 18 5b 5e 5f 5d c3 55 57 56 89 c6 53 8b 40 0c 89 d3 <8b> 
> 7a 5c 8b 68 04 89 d0 e8 b5 fe ff ff 8b 43 14 89 da 25 01 80 
> [  177.654378] EIP: [] cfq_dispatch_insert+0xb/0x53 SS:ESP 
> 0068:f34dbcbc

cfq_dispatch_requests() has called cfq_dispatch_insert() with a NULL
second argument (struct request *rq)

There are two patches for raid5/6 out there that might fix this. I'll
attach them (the second just fixes a minor bug in the first one.)

From: Neil Brown <[EMAIL PROTECTED]>

On Sunday February 11, [EMAIL PROTECTED] wrote:
> > Greetings,
> > 
> > I've been running md on my server for some time now and a few days ago one 
> > of

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Davide Libenzi
On Wed, 28 Feb 2007, Linus Torvalds wrote:

> On Wed, 28 Feb 2007, Davide Libenzi wrote:
> > 
> > Here we very much agree. The way I'd like it:
> > 
> > struct async_syscall {
> > unsigned long nr_sysc;
> > unsigned long params[8];
> > long result;
> > };
> 
> No, the "result" needs to go somewhere else. The caller may be totally 
> uninterested in keeping the system call number or parameters around until 
> the operation completes, but if you put them in the same structure with 
> the result, you obviously cannot sanely get rid of them.
> 
> I also don't much like read-write interfaces (which the above would be: 
> the kernel would read most of the structure, and then write one member of 
> the structure). 
> 
> It's entirely possible, for example, that the operation we submit is some 
> legacy "aio_read()", which has soem other structure layout than the new 
> one (but one field will be the result code).

Ok, makes sense. Something like this then?

struct async_syscall {
unsigned long nr_sysc;
unsigned long params[8];
long *result;
};

And what would async_wait() return bak? Pointers to "struct async_syscall"
or pointers to "result"?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes
On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote:
> On 28-02-2007 02:27, Jean Tourrilhes wrote:
> > Hi all,
> ...
> > Patch for 2.6.20 is attached. The patch was tested on a system
> > running the hotplug scripts, and on another system running udev.
> > 
> > Have fun...
> > 
> > Jean
> > 
> > Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]>
> > 
> > -
> ...
> > diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c
> > --- linux/net/core/net-sysfs.j1.c   2007-02-27 15:01:08.0 -0800
> > +++ linux/net/core/net-sysfs.c  2007-02-27 15:06:49.0 -0800
> > @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de
> > if ((size <= 0) || (i >= num_envp))
> > return -ENOMEM;
> >  
> > +   /* pass ifindex to uevent.
> > +* ifindex is useful as it won't change (interface name may change)
> > +* and is what RtNetlink uses natively. */
> > +   envp[i++] = buf;
> > +   n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1;
> > +   buf += n;
> > +   size -= n;
> > +
> > +   if ((size <= 0) || (i >= num_envp))
> 
> Btw.:
> 1. if size == 10 and snprintf returns 9 (without NULL)
>then n == 10 (with NULL), so isn't it enough (here and above):
>  
>   if ((size < 0) || (i >= num_envp))

I just cut'n'pasted the code a few line above. If the original
code is incorrect, it need fixing. And it will need fixing in probably
a lot of places.

> 2. shouldn't there be (here and above):
>  
>   envp[--i] = NULL;
> 

No, envp is local, so who cares.
Thanks.

Jean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-02-28 Thread Linus Torvalds


On Wed, 28 Feb 2007, Davide Libenzi wrote:
> 
> Here we very much agree. The way I'd like it:
> 
> struct async_syscall {
>   unsigned long nr_sysc;
>   unsigned long params[8];
>   long result;
> };

No, the "result" needs to go somewhere else. The caller may be totally 
uninterested in keeping the system call number or parameters around until 
the operation completes, but if you put them in the same structure with 
the result, you obviously cannot sanely get rid of them.

I also don't much like read-write interfaces (which the above would be: 
the kernel would read most of the structure, and then write one member of 
the structure). 

It's entirely possible, for example, that the operation we submit is some 
legacy "aio_read()", which has soem other structure layout than the new 
one (but one field will be the result code).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes
On Wed, Feb 28, 2007 at 07:36:17AM -0800, Greg KH wrote:
> On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote:
> > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c
> > --- linux/drivers/base/class.j1.c   2007-02-26 18:38:10.0 -0800
> > +++ linux/drivers/base/class.c  2007-02-27 15:52:37.0 -0800
> > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev
> 
> This function is not in the 2.6.21-rc2 kernel, so you might want to
> rework this patch a bit :)

It was a trial balloon to gather feedback. I will do.

> Also, it's userspace that causes the rename to happen, so it knows it
> did it, why should the kernel have to emit a message to tell userspace
> again what just happened?

Username is not one big program, but a collection of program,
and one program does not know what another program do.
In particular, udev does not know when people are using
iproute2 to rename interface and loose its marbles. We don't really
want to ban iproute2 or udev ;-)

> thanks,
> 
> greg k-h

Have fun...

Jean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with freezable workqueues

2007-02-28 Thread Rafael J. Wysocki
On Wednesday, 28 February 2007 19:17, Gautham R Shenoy wrote:
> On Wed, Feb 28, 2007 at 08:37:26AM +0530, Srivatsa Vaddagiri wrote:
> > 
> > Hmm ..good point. So can we assume that disable/enable_nonboot_cpus() are 
> > called
> > with processes frozen already?
> > 
> > Gautham, you need to take this into account in your patchset!
> 
> Yup. That would mean making the freezer reentrant since we will
> be freezing twice (once for suspend and later on for hotplug). This is
> ok since the api in my patches looks like
> freeze_processes(int freeze_event);
> 
> But thaw will be interesting. If we are thawing for hotplug, we gotta
> only thaw processes which were frozen *only* for hotplug.
> 
> Rafael, does that mean more status flags?!

Well, I don't really think so, but we need to store some information in the
freezer (eg. in a status variable).  Namely, we can define a variable, say
tasks_frozen, the value of which will be the bitwise or of the flags
SPE_SUSPEND, SPE_HOTPLUG etc.  In a fully functional system, tasks_frozen
is equal to zero.  If freeze_processes(SPE_SUSPEND) is run, it does
tasks_frozen |= SPE_SUSPEND and analogously for SPE_HOTPLUG etc.
If tasks_frozen is equal to SPE_SUSPEND|SPE_HOTPLUG,  for example, and
thaw_tasks(SPE_HOTPLUG) runs, it only thaws the tasks that need not stay frozen
for the suspend and does tasks_frozen &= ~SPE_SUSPEND etc.

I think something like this should work.

Greetings,
Rafael 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   >