Re: [patch 1/2] x86,fpu: split FPU state from task struct
On Sun, Feb 24, 2008 at 08:22:02AM +0100, Ingo Molnar wrote: > > * Suresh Siddha <[EMAIL PROTECTED]> wrote: > > > Split the FPU save area from the task struct. This allows easy > > migration of FPU context, and it's generally cleaner. It also allows > > the following two optimizations: > > > > 1) only allocate when the application actually uses FPU, so in the > > first lazy FPU trap. This could save memory for non-fpu using apps. > > Next patch does this lazy allocation. > > > > 2) allocate the right size for the actual cpu rather than 512 bytes > > always. Patches enabling xsave/xrstor support (coming shortly) will > > take advantage of this. > > i like the concept. Please clean up the issues found by Christoph and > please also base it against x86.git#testing [this is clear 2.6.26 > material and there are already some changes in this area]: > >http://people.redhat.com/mingo/x86.git/README Sure. Will do that in a day. thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] x86,fpu: split FPU state from task struct
On Sun, Feb 24, 2008 at 08:27:30AM +0100, Roger While wrote: > > On Sat, Feb 23, 2008 at 06:34:38PM -0800, Suresh Siddha wrote: > > Split the FPU save area from the task struct. This allows easy migration > > of FPU context, and it's generally cleaner. It also allows the following > > two optimizations: > > > > 1) only allocate when the application actually uses FPU, so in the first > > lazy FPU trap. This could save memory for non-fpu using apps. Next patch > > does this lazy allocation. > > > > 2) allocate the right size for the actual cpu rather than 512 bytes > always. > > Patches enabling xsave/xrstor support (coming shortly) will take advantage > > of this. > > > if (next_p->fpu_counter>5) > > - prefetch(>i387.fxsave); > > + prefetch(FXSAVE(next_p)); > > Shouldn't that be prefetch(FXSAVE(next)); ? No. 'next_p' which is the task_struct is what FXSAVE macro takes. thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] x86,fpu: split FPU state from task struct
On Sun, Feb 24, 2008 at 08:27:30AM +0100, Roger While wrote: On Sat, Feb 23, 2008 at 06:34:38PM -0800, Suresh Siddha wrote: Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. if (next_p-fpu_counter5) - prefetch(next-i387.fxsave); + prefetch(FXSAVE(next_p)); Shouldn't that be prefetch(FXSAVE(next)); ? No. 'next_p' which is the task_struct is what FXSAVE macro takes. thanks, suresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] x86,fpu: split FPU state from task struct
On Sun, Feb 24, 2008 at 08:22:02AM +0100, Ingo Molnar wrote: * Suresh Siddha [EMAIL PROTECTED] wrote: Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. i like the concept. Please clean up the issues found by Christoph and please also base it against x86.git#testing [this is clear 2.6.26 material and there are already some changes in this area]: http://people.redhat.com/mingo/x86.git/README Sure. Will do that in a day. thanks, suresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] x86,fpu: split FPU state from task struct
On Sat, Feb 23, 2008 at 06:34:38PM -0800, Suresh Siddha wrote: > Split the FPU save area from the task struct. This allows easy migration > of FPU context, and it's generally cleaner. It also allows the following > two optimizations: > > 1) only allocate when the application actually uses FPU, so in the first > lazy FPU trap. This could save memory for non-fpu using apps. Next patch > does this lazy allocation. > > 2) allocate the right size for the actual cpu rather than 512 bytes always. > Patches enabling xsave/xrstor support (coming shortly) will take advantage > of this. >if (next_p->fpu_counter>5) > - prefetch(>i387.fxsave); > + prefetch(FXSAVE(next_p)); Shouldn't that be prefetch(FXSAVE(next)); ? Roger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] x86,fpu: split FPU state from task struct
* Suresh Siddha <[EMAIL PROTECTED]> wrote: > Split the FPU save area from the task struct. This allows easy > migration of FPU context, and it's generally cleaner. It also allows > the following two optimizations: > > 1) only allocate when the application actually uses FPU, so in the > first lazy FPU trap. This could save memory for non-fpu using apps. > Next patch does this lazy allocation. > > 2) allocate the right size for the actual cpu rather than 512 bytes > always. Patches enabling xsave/xrstor support (coming shortly) will > take advantage of this. i like the concept. Please clean up the issues found by Christoph and please also base it against x86.git#testing [this is clear 2.6.26 material and there are already some changes in this area]: http://people.redhat.com/mingo/x86.git/README Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] x86,fpu: split FPU state from task struct
On Sat, Feb 23, 2008 at 06:34:38PM -0800, Suresh Siddha wrote: > Split the FPU save area from the task struct. This allows easy migration > of FPU context, and it's generally cleaner. It also allows the following > two optimizations: > > 1) only allocate when the application actually uses FPU, so in the first > lazy FPU trap. This could save memory for non-fpu using apps. Next patch > does this lazy allocation. > > 2) allocate the right size for the actual cpu rather than 512 bytes always. > Patches enabling xsave/xrstor support (coming shortly) will take advantage > of this. This sounds like a wonderful idea. But I'm a little unhappy with some of the rather cosmetic things in this patch: > if (next_p->fpu_counter>5) > - prefetch(>i387.fxsave); > + prefetch(FXSAVE(next_p)); These macros are rather ugly. If you really want them please a) make them inlines and lowercase with a descriptive name b) introduce them in a separate patch before the first real path in the series. > +++ linux-2.6-x86/kernel/fork.c 2008-02-23 15:08:53.0 -0800 > @@ -87,6 +87,7 @@ > #ifndef __HAVE_ARCH_TASK_STRUCT_ALLOCATOR > # define alloc_task_struct() kmem_cache_alloc(task_struct_cachep, GFP_KERNEL) > # define free_task_struct(tsk) kmem_cache_free(task_struct_cachep, > (tsk)) > +# define memcpy_task_struct(dst, src) do { *dst = *src; } while (0) > static struct kmem_cache *task_struct_cachep; > #endif > > @@ -142,6 +143,8 @@ > task_struct_cachep = > kmem_cache_create("task_struct", sizeof(struct task_struct), > ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL); > +#else > + task_struct_slab_init(); > #endif > > /* > @@ -181,7 +184,8 @@ > return NULL; > } > > - *tsk = *orig; > + memcpy_task_struct(tsk, orig); I think this is a bad name for this helper, arch_dup_task_struct would be more descriptive. But we actually have an arch hook for this kind of thing called setup_thread_stack which is used by ia64 and m68k just a few lines later, so it'd be better to look into having a single hook. (And possibly rename it to arch_dup_task_struct because the name is a lot more descriptive) setup_thread_stack > + memset(FSAVE(tsk), 0, math_cntxt_size); > + FSAVE(tsk)->cwd = 0x037fu; > + FSAVE(tsk)->swd = 0xu; > + FSAVE(tsk)->twd = 0xu; > + FSAVE(tsk)->fos = 0xu; Also if you reference the save area so often it'd be better to just have a local variable for it. Much better readable. > +struct task_struct * alloc_task_struct(void) this should be struct task_struct *alloc_task_struct(void) > +void free_task_struct(struct task_struct *tsk) > +{ > + kmem_cache_free(task_cntxt_cachep, tsk->thread.cntxt); > + tsk->thread.cntxt=NULL; missing spaces around the '=' > -#define I387 (current->thread.i387) > -#define FPU_info (I387.soft.info) > +#define I387 (current->thread.cntxt) > +#define FPU_info (I387->soft.info) > +#define SOFT(t) (&(t->thread.cntxt->soft)) This is quite butt ugly. But then again it's fpemu, so there's probably no point touching it until a bored janitor comes around. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/2] x86,fpu: split FPU state from task struct
Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> --- Index: linux-2.6-x86/arch/x86/kernel/process_64.c === --- linux-2.6-x86.orig/arch/x86/kernel/process_64.c 2008-02-22 18:33:17.0 -0800 +++ linux-2.6-x86/arch/x86/kernel/process_64.c 2008-02-23 15:04:30.0 -0800 @@ -630,7 +630,7 @@ /* we're going to use this soon, after a few expensive things */ if (next_p->fpu_counter>5) - prefetch(>i387.fxsave); + prefetch(FXSAVE(next_p)); /* * Reload esp0, LDT and the page table pointer: Index: linux-2.6-x86/arch/x86/kernel/traps_64.c === --- linux-2.6-x86.orig/arch/x86/kernel/traps_64.c 2008-02-22 18:33:17.0 -0800 +++ linux-2.6-x86/arch/x86/kernel/traps_64.c2008-02-23 15:04:30.0 -0800 @@ -1118,7 +1118,7 @@ if (!used_math()) init_fpu(me); - restore_fpu_checking(>thread.i387.fxsave); + restore_fpu_checking(FXSAVE(me)); task_thread_info(me)->status |= TS_USEDFPU; me->fpu_counter++; } @@ -1154,6 +1154,10 @@ #endif /* +* initialize the per thread FP context: +*/ +init_thread_context(); + /* * Should be a barrier for any external CPU state. */ cpu_init(); Index: linux-2.6-x86/kernel/fork.c === --- linux-2.6-x86.orig/kernel/fork.c2008-02-23 15:03:35.0 -0800 +++ linux-2.6-x86/kernel/fork.c 2008-02-23 15:08:53.0 -0800 @@ -87,6 +87,7 @@ #ifndef __HAVE_ARCH_TASK_STRUCT_ALLOCATOR # define alloc_task_struct() kmem_cache_alloc(task_struct_cachep, GFP_KERNEL) # define free_task_struct(tsk) kmem_cache_free(task_struct_cachep, (tsk)) +# define memcpy_task_struct(dst, src) do { *dst = *src; } while (0) static struct kmem_cache *task_struct_cachep; #endif @@ -142,6 +143,8 @@ task_struct_cachep = kmem_cache_create("task_struct", sizeof(struct task_struct), ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL); +#else + task_struct_slab_init(); #endif /* @@ -181,7 +184,8 @@ return NULL; } - *tsk = *orig; + memcpy_task_struct(tsk, orig); + tsk->stack = ti; err = prop_local_init_single(>dirties); Index: linux-2.6-x86/arch/x86/kernel/i387.c === --- linux-2.6-x86.orig/arch/x86/kernel/i387.c 2008-02-23 15:03:35.0 -0800 +++ linux-2.6-x86/arch/x86/kernel/i387.c2008-02-23 15:08:53.0 -0800 @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -40,16 +41,17 @@ #endif static unsigned int mxcsr_feature_mask __read_mostly = 0xu; +unsigned int math_cntxt_size; +static struct i387_fxsave_struct fx_scratch __cpuinitdata; -void mxcsr_feature_mask_init(void) +void __cpuinit mxcsr_feature_mask_init(void) { unsigned long mask = 0; clts(); if (cpu_has_fxsr) { - memset(>thread.i387.fxsave, 0, - sizeof(struct i387_fxsave_struct)); - asm volatile("fxsave %0" : : "m" (current->thread.i387.fxsave)); - mask = current->thread.i387.fxsave.mxcsr_mask; + memset(_scratch, 0, sizeof(struct i387_fxsave_struct)); + asm volatile("fxsave %0" : : "m" (fx_scratch)); + mask = fx_scratch.mxcsr_mask; if (mask == 0) mask = 0xffbf; } @@ -57,6 +59,17 @@ stts(); } +void __init init_thread_context(void) +{ + if (cpu_has_fxsr) + math_cntxt_size = sizeof(struct i387_fxsave_struct); +#ifdef CONFIG_X86_32 + else + math_cntxt_size = sizeof(struct i387_fsave_struct); +#endif + init_task.thread.cntxt = alloc_bootmem(math_cntxt_size); +} + #ifdef CONFIG_X86_64 /* * Called at bootup to set up the initial FPU state that is later cloned @@ -65,10 +78,7 @@ void __cpuinit fpu_init(void) { unsigned long oldcr0 = read_cr0(); - extern void __bad_fxsave_alignment(void); - if (offsetof(struct task_struct, thread.i387.fxsave) & 15) - __bad_fxsave_alignment();
[patch 1/2] x86,fpu: split FPU state from task struct
Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. Signed-off-by: Suresh Siddha [EMAIL PROTECTED] Signed-off-by: Arjan van de Ven [EMAIL PROTECTED] --- Index: linux-2.6-x86/arch/x86/kernel/process_64.c === --- linux-2.6-x86.orig/arch/x86/kernel/process_64.c 2008-02-22 18:33:17.0 -0800 +++ linux-2.6-x86/arch/x86/kernel/process_64.c 2008-02-23 15:04:30.0 -0800 @@ -630,7 +630,7 @@ /* we're going to use this soon, after a few expensive things */ if (next_p-fpu_counter5) - prefetch(next-i387.fxsave); + prefetch(FXSAVE(next_p)); /* * Reload esp0, LDT and the page table pointer: Index: linux-2.6-x86/arch/x86/kernel/traps_64.c === --- linux-2.6-x86.orig/arch/x86/kernel/traps_64.c 2008-02-22 18:33:17.0 -0800 +++ linux-2.6-x86/arch/x86/kernel/traps_64.c2008-02-23 15:04:30.0 -0800 @@ -1118,7 +1118,7 @@ if (!used_math()) init_fpu(me); - restore_fpu_checking(me-thread.i387.fxsave); + restore_fpu_checking(FXSAVE(me)); task_thread_info(me)-status |= TS_USEDFPU; me-fpu_counter++; } @@ -1154,6 +1154,10 @@ #endif /* +* initialize the per thread FP context: +*/ +init_thread_context(); + /* * Should be a barrier for any external CPU state. */ cpu_init(); Index: linux-2.6-x86/kernel/fork.c === --- linux-2.6-x86.orig/kernel/fork.c2008-02-23 15:03:35.0 -0800 +++ linux-2.6-x86/kernel/fork.c 2008-02-23 15:08:53.0 -0800 @@ -87,6 +87,7 @@ #ifndef __HAVE_ARCH_TASK_STRUCT_ALLOCATOR # define alloc_task_struct() kmem_cache_alloc(task_struct_cachep, GFP_KERNEL) # define free_task_struct(tsk) kmem_cache_free(task_struct_cachep, (tsk)) +# define memcpy_task_struct(dst, src) do { *dst = *src; } while (0) static struct kmem_cache *task_struct_cachep; #endif @@ -142,6 +143,8 @@ task_struct_cachep = kmem_cache_create(task_struct, sizeof(struct task_struct), ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL); +#else + task_struct_slab_init(); #endif /* @@ -181,7 +184,8 @@ return NULL; } - *tsk = *orig; + memcpy_task_struct(tsk, orig); + tsk-stack = ti; err = prop_local_init_single(tsk-dirties); Index: linux-2.6-x86/arch/x86/kernel/i387.c === --- linux-2.6-x86.orig/arch/x86/kernel/i387.c 2008-02-23 15:03:35.0 -0800 +++ linux-2.6-x86/arch/x86/kernel/i387.c2008-02-23 15:08:53.0 -0800 @@ -9,6 +9,7 @@ #include linux/sched.h #include linux/module.h #include linux/regset.h +#include linux/bootmem.h #include asm/processor.h #include asm/i387.h #include asm/math_emu.h @@ -40,16 +41,17 @@ #endif static unsigned int mxcsr_feature_mask __read_mostly = 0xu; +unsigned int math_cntxt_size; +static struct i387_fxsave_struct fx_scratch __cpuinitdata; -void mxcsr_feature_mask_init(void) +void __cpuinit mxcsr_feature_mask_init(void) { unsigned long mask = 0; clts(); if (cpu_has_fxsr) { - memset(current-thread.i387.fxsave, 0, - sizeof(struct i387_fxsave_struct)); - asm volatile(fxsave %0 : : m (current-thread.i387.fxsave)); - mask = current-thread.i387.fxsave.mxcsr_mask; + memset(fx_scratch, 0, sizeof(struct i387_fxsave_struct)); + asm volatile(fxsave %0 : : m (fx_scratch)); + mask = fx_scratch.mxcsr_mask; if (mask == 0) mask = 0xffbf; } @@ -57,6 +59,17 @@ stts(); } +void __init init_thread_context(void) +{ + if (cpu_has_fxsr) + math_cntxt_size = sizeof(struct i387_fxsave_struct); +#ifdef CONFIG_X86_32 + else + math_cntxt_size = sizeof(struct i387_fsave_struct); +#endif + init_task.thread.cntxt = alloc_bootmem(math_cntxt_size); +} + #ifdef CONFIG_X86_64 /* * Called at bootup to set up the initial FPU state that is later cloned @@ -65,10 +78,7 @@ void __cpuinit fpu_init(void) { unsigned long oldcr0 = read_cr0(); - extern void __bad_fxsave_alignment(void); - if (offsetof(struct
Re: [patch 1/2] x86,fpu: split FPU state from task struct
On Sat, Feb 23, 2008 at 06:34:38PM -0800, Suresh Siddha wrote: Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. This sounds like a wonderful idea. But I'm a little unhappy with some of the rather cosmetic things in this patch: if (next_p-fpu_counter5) - prefetch(next-i387.fxsave); + prefetch(FXSAVE(next_p)); These macros are rather ugly. If you really want them please a) make them inlines and lowercase with a descriptive name b) introduce them in a separate patch before the first real path in the series. +++ linux-2.6-x86/kernel/fork.c 2008-02-23 15:08:53.0 -0800 @@ -87,6 +87,7 @@ #ifndef __HAVE_ARCH_TASK_STRUCT_ALLOCATOR # define alloc_task_struct() kmem_cache_alloc(task_struct_cachep, GFP_KERNEL) # define free_task_struct(tsk) kmem_cache_free(task_struct_cachep, (tsk)) +# define memcpy_task_struct(dst, src) do { *dst = *src; } while (0) static struct kmem_cache *task_struct_cachep; #endif @@ -142,6 +143,8 @@ task_struct_cachep = kmem_cache_create(task_struct, sizeof(struct task_struct), ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL); +#else + task_struct_slab_init(); #endif /* @@ -181,7 +184,8 @@ return NULL; } - *tsk = *orig; + memcpy_task_struct(tsk, orig); I think this is a bad name for this helper, arch_dup_task_struct would be more descriptive. But we actually have an arch hook for this kind of thing called setup_thread_stack which is used by ia64 and m68k just a few lines later, so it'd be better to look into having a single hook. (And possibly rename it to arch_dup_task_struct because the name is a lot more descriptive) setup_thread_stack + memset(FSAVE(tsk), 0, math_cntxt_size); + FSAVE(tsk)-cwd = 0x037fu; + FSAVE(tsk)-swd = 0xu; + FSAVE(tsk)-twd = 0xu; + FSAVE(tsk)-fos = 0xu; Also if you reference the save area so often it'd be better to just have a local variable for it. Much better readable. +struct task_struct * alloc_task_struct(void) this should be struct task_struct *alloc_task_struct(void) +void free_task_struct(struct task_struct *tsk) +{ + kmem_cache_free(task_cntxt_cachep, tsk-thread.cntxt); + tsk-thread.cntxt=NULL; missing spaces around the '=' -#define I387 (current-thread.i387) -#define FPU_info (I387.soft.info) +#define I387 (current-thread.cntxt) +#define FPU_info (I387-soft.info) +#define SOFT(t) ((t-thread.cntxt-soft)) This is quite butt ugly. But then again it's fpemu, so there's probably no point touching it until a bored janitor comes around. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] x86,fpu: split FPU state from task struct
* Suresh Siddha [EMAIL PROTECTED] wrote: Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. i like the concept. Please clean up the issues found by Christoph and please also base it against x86.git#testing [this is clear 2.6.26 material and there are already some changes in this area]: http://people.redhat.com/mingo/x86.git/README Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] x86,fpu: split FPU state from task struct
On Sat, Feb 23, 2008 at 06:34:38PM -0800, Suresh Siddha wrote: Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. if (next_p-fpu_counter5) - prefetch(next-i387.fxsave); + prefetch(FXSAVE(next_p)); Shouldn't that be prefetch(FXSAVE(next)); ? Roger -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/