Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thursday 29 November 2007 10:36:06 Christoph Lameter wrote: > The code becomes much simpler if gs would point to the beginning of the > per cpu area and if the __per_cpu_offset[i] would do the same. No weird > __per_cpu_start offsetting anymore. It is a little weird, but it gave flexibility for most archs. ISTR I had issues relocating the percpu area to 0, but I look forward to your code! > The generic write/readpercpu functionality introduced by the cpu_alloc > patchset works best with offsets relative to an arch dependent > register. All per cpu data (pda, percpu and allocpercpu) is handles as an > offset relative to the start of the per cpu data. Hmm, did someone cc me on the patchset and I missed it? > If the current offset by __per_cpu_start is kept then a per cpu allocator > may have to dish out addresses that go beyond __per_cpu_end. Of course; you just need congruence in your allocation across CPUs. It's possible, but no worse than the requirements on other schemes where you can reach a variable with a single addition for the CPU. > I think dealing with a per cpu variable as if it would be an offset > relative to a base is natural for the typical addressing of cpus based on > an offset relative to some register. We've had practical problems getting the compiler to eke out the potential benefit. That's why we settled for an offset between where the compiler expected and where the variable actually was. Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thursday 29 November 2007 10:36:06 Christoph Lameter wrote: The code becomes much simpler if gs would point to the beginning of the per cpu area and if the __per_cpu_offset[i] would do the same. No weird __per_cpu_start offsetting anymore. It is a little weird, but it gave flexibility for most archs. ISTR I had issues relocating the percpu area to 0, but I look forward to your code! The generic write/readpercpu functionality introduced by the cpu_alloc patchset works best with offsets relative to an arch dependent register. All per cpu data (pda, percpu and allocpercpu) is handles as an offset relative to the start of the per cpu data. Hmm, did someone cc me on the patchset and I missed it? If the current offset by __per_cpu_start is kept then a per cpu allocator may have to dish out addresses that go beyond __per_cpu_end. Of course; you just need congruence in your allocation across CPUs. It's possible, but no worse than the requirements on other schemes where you can reach a variable with a single addition for the CPU. I think dealing with a per cpu variable as if it would be an offset relative to a base is natural for the typical addressing of cpus based on an offset relative to some register. We've had practical problems getting the compiler to eke out the potential benefit. That's why we settled for an offset between where the compiler expected and where the variable actually was. Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Second portion. Add a new seg_offset macro to calculate the offset. This can be avoided if the linker relocates the per cpu area to zero. Includes a patch to read trickle count via both methods to verify that it actually works. Both patches on top of the per cpu cleanup patches that I sent today too. x86_64: Make the x86_32 percpu operations usable on x86_64 Calculate the offset relative to gs in order to be able to address per cpu data using the x86_64 per cpu macros. The subtraction of __per_cpu_start will make the offset based from the beginning of the per cpu area. That is where gs points to. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/char/random.c|2 +- include/asm-x86/percpu.h | 29 ++--- init/main.c |5 + 3 files changed, 24 insertions(+), 12 deletions(-) Index: linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-x86/percpu.h 2007-11-28 17:50:01.861182410 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h 2007-11-28 21:22:50.845872906 -0800 @@ -16,7 +16,13 @@ #define __my_cpu_offset read_pda(data_offset) #define per_cpu_offset(x) (__per_cpu_offset(x)) +#define __percpu_seg "%%gs:" +/* Calculate the offset to use with the segment register */ +#define seg_offset(name) (*SHIFT_PTR(_cpu_var(name), - (unsigned long)__per_cpu_start)) +#else +#define __percpu_seg "" +#define seg_offset(name) per_cpu_var(name) #endif #include @@ -64,16 +70,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda); *PER_CPU(cpu_gdt_descr, %ebx) */ #ifdef CONFIG_SMP - #define __my_cpu_offset x86_read_percpu(this_cpu_off) - /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */ #define __percpu_seg "%%fs:" - #else /* !SMP */ - #define __percpu_seg "" - #endif /* SMP */ #include @@ -81,6 +82,13 @@ DECLARE_PER_CPU(struct x8664_pda, pda); /* We can use this directly for local CPU (faster). */ DECLARE_PER_CPU(unsigned long, this_cpu_off); +#define seg_offset(name) per_cpu_var(name) + +#endif /* __ASSEMBLY__ */ +#endif /* !CONFIG_X86_64 */ + +#ifndef __ASSEMBLY__ + /* For arch-specific code, we can use direct single-insn ops (they * don't give an lvalue though). */ extern void __bad_percpu_size(void); @@ -132,11 +140,10 @@ extern void __bad_percpu_size(void); } \ ret__; }) -#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var) -#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val) -#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val) -#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val) -#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val) +#define x86_read_percpu(var) percpu_from_op("mov", seg_offset(var)) +#define x86_write_percpu(var,val) percpu_to_op("mov", seg_offset(var), val) +#define x86_add_percpu(var,val) percpu_to_op("add", seg_offset(var), val) +#define x86_sub_percpu(var,val) percpu_to_op("sub", seg_offset(var), val) +#define x86_or_percpu(var,val) percpu_to_op("or", seg_offset(var), val) #endif /* !__ASSEMBLY__ */ -#endif /* !CONFIG_X86_64 */ #endif /* _ASM_X86_PERCPU_H_ */ Index: linux-2.6.24-rc3-mm2/drivers/char/random.c === --- linux-2.6.24-rc3-mm2.orig/drivers/char/random.c 2007-11-28 21:20:58.225804398 -0800 +++ linux-2.6.24-rc3-mm2/drivers/char/random.c 2007-11-28 21:28:38.967363573 -0800 @@ -272,7 +272,7 @@ static int random_write_wakeup_thresh = static int trickle_thresh __read_mostly = INPUT_POOL_WORDS * 28; -static DEFINE_PER_CPU(int, trickle_count) = 0; +DEFINE_PER_CPU(int, trickle_count) = 55; /* * A pool of size .poolwords is stirred with a primitive polynomial Index: linux-2.6.24-rc3-mm2/init/main.c === --- linux-2.6.24-rc3-mm2.orig/init/main.c 2007-11-28 21:10:54.245804225 -0800 +++ linux-2.6.24-rc3-mm2/init/main.c2007-11-28 21:22:17.769053628 -0800 @@ -504,6 +504,8 @@ void __init __attribute__((weak)) smp_se { } +DECLARE_PER_CPU(int, trickle_count); + asmlinkage void __init start_kernel(void) { char * command_line; @@ -645,6 +647,9 @@ asmlinkage void __init start_kernel(void acpi_early_init(); /* before LAPIC and SMP init */ + printk("Reading trickle cound =%lu. Is %lu\n", + x86_read_percpu(trickle_count), + __raw_get_cpu_var(trickle_count)); /* Do the rest non-__init'ed, we're now alive */ rest_init(); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Here is the first of two patches for x86_64 that move the pda into the per cpu area and then make the x86 percpu macros work for x86_64. This needs to be generalized for other arches. The __per_cpu_start offsets can be taken care of by the linker. We can also tell the linker to completely relocate the percpu area to 0. X86_64: Declare pda as per cpu data thereby moving it into the cpu area Declare the pda as a per cpu variable. This will have the effect of moving the pda data into the cpu area managed by cpu alloc. The boot_pdas are only needed in head64.c so move the declaration over there and make it static. Remove the code that allocates special pda data structures. The pda is moved to the beginning of the per cpu area. gs is pointing to the pda. And therefore gs: is now pointing to the per cpu area of the current processor. A per cpu variable can then be reached at %gs:[_cpu_ - __per_cpu_start] Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86/kernel/head64.c |6 ++ arch/x86/kernel/setup64.c | 13 ++--- arch/x86/kernel/smpboot_64.c | 16 include/asm-generic/vmlinux.lds.h |1 + include/asm-x86/pda.h |1 - include/linux/percpu.h|4 6 files changed, 21 insertions(+), 20 deletions(-) Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-28 20:59:13.124188194 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c 2007-11-28 21:08:50.473347382 -0800 @@ -30,7 +30,9 @@ cpumask_t cpu_initialized __cpuinitdata struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly; EXPORT_SYMBOL(_cpu_pda); -struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned; + +DEFINE_PER_CPU_FIRST(struct x8664_pda, pda); +EXPORT_PER_CPU_SYMBOL(pda); struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table }; @@ -109,10 +111,15 @@ void __init setup_per_cpu_areas(void) } if (!ptr) panic("Cannot allocate cpu data for CPU %d\n", i); - cpu_pda(i)->data_offset = ptr - __per_cpu_start; memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start); + /* Relocate the pda */ + memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda)); + cpu_pda(i) = (struct x8664_pda *)ptr; + cpu_pda(i)->data_offset = ptr - __per_cpu_start; } -} + /* Fix up pda for this processor */ + pda_init(0); +} void pda_init(int cpu) { Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/smpboot_64.c 2007-11-28 20:59:13.136188167 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c 2007-11-28 20:59:35.399937395 -0800 @@ -556,22 +556,6 @@ static int __cpuinit do_boot_cpu(int cpu return -1; } - /* Allocate node local memory for AP pdas */ - if (cpu_pda(cpu) == _cpu_pda[cpu]) { - struct x8664_pda *newpda, *pda; - int node = cpu_to_node(cpu); - pda = cpu_pda(cpu); - newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC, - node); - if (newpda) { - memcpy(newpda, pda, sizeof (struct x8664_pda)); - cpu_pda(cpu) = newpda; - } else - printk(KERN_ERR - "Could not allocate node local PDA for CPU %d on node %d\n", - cpu, node); - } - alternatives_smp_switch(1); c_idle.idle = get_idle_for_cpu(cpu); Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/head64.c 2007-11-28 20:59:13.152187359 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c 2007-11-28 20:59:35.403937534 -0800 @@ -22,6 +22,12 @@ #include #include +/* + * Only used before the per cpu areas are setup. The use for the non possible + * cpus continues after boot + */ +static struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned; + static void __init zap_identity_mappings(void) { pgd_t *pgd = pgd_offset_k(0UL); Index: linux-2.6.24-rc3-mm2/include/asm-x86/pda.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-x86/pda.h 2007-11-28 20:59:13.164187921 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-x86/pda.h 2007-11-28 20:59:35.403937534 -0800 @@ -39,7 +39,6 @@ struct x8664_pda { } cacheline_aligned_in_smp; extern struct x8664_pda *_cpu_pda[]; -extern struct x8664_pda boot_cpu_pda[]; extern void pda_init(int); #define cpu_pda(i)
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: > x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses > the small model. A load of a 64 bit address would require much more > expensive instructions. A load of a 64 bit address is currently avoided > through the use of the pda that contains the full 64 bit address in the > data_offset field. Operations on per cpu data on x86_64 must therefore > first load data_offset via gs and then add the per cpu address to this > offset. Then the per cpu operation is performed on that address. > Hm. Certainly a non-one-instruction access would be considerably less useful than one that is, because of preemption issues. (In general you need to pin yourself to a cpu if you're using percpu data, but sometimes it doesn't matter. In particular, the reason I'm interested in this at all is because Xen puts its interrupt mask flag in per-cpu data, and a single instruction means that masking interrupts [=disable preemption] can be done in one instruction with no scope for preemption in the middle doing something unexpected.) > In order to avoid this situation through one instruction we need a small > 32 bit offset relative to gs. Otherwise we cannot get away from the PDA > and the use of data_offset. > Hm, yes, I see. Dratted large address space. What's wrong with 4G anyway? ;) Anyway, I can see the problem with my thinking about this so far. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: > > percpu references are quite frequent already (vm statistics) and will be > > more frequent after we have converted the per cpu arrays to per cpu > > allocations. > > > > Well, I think the point is moot, because x86 will always use 32-bit > offsets. Each reference will only be 1 byte bigger than a normal > variable reference. Just because i386 is not able to use it does not mean that other arches are not. F.e. IA64 can embedd offsets in the actual instruction (but of course not 64bit). x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses the small model. A load of a 64 bit address would require much more expensive instructions. A load of a 64 bit address is currently avoided through the use of the pda that contains the full 64 bit address in the data_offset field. Operations on per cpu data on x86_64 must therefore first load data_offset via gs and then add the per cpu address to this offset. Then the per cpu operation is performed on that address. In order to avoid this situation through one instruction we need a small 32 bit offset relative to gs. Otherwise we cannot get away from the PDA and the use of data_offset. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: > The percpu areas need to be allocated in a NUMA aware fashion. Otherwise > you use distant memory for the most performance sensitive areas. The NUMA > subsystem must be so far up that these allocations can be performed in the > right way. And this means at least you need to know on which node each > processor is located. That is what the PDA is currently used for and i386 > has no other way of doing that. I think we could use an array [NR_CPUS] > for this one but we want to avoid these arrays because NR_CPUS may get > very big. > Oh, you mean there needs to be some percpu data mechanism operating in order to do numa-aware allocations, which would be necessary to allocate the percpu memory itself? I can see how that would be awkward. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: > Don't think it matters either way. Before percpu is allocated, NUMA > issues don't matter. Once they are - by whatever mechanism - you can > set the segment bases up appropriately. The fact that you chose to put > percpu data at address X doesn't affect the percpu mechanism one way or > the other. The percpu areas need to be allocated in a NUMA aware fashion. Otherwise you use distant memory for the most performance sensitive areas. The NUMA subsystem must be so far up that these allocations can be performed in the right way. And this means at least you need to know on which node each processor is located. That is what the PDA is currently used for and i386 has no other way of doing that. I think we could use an array [NR_CPUS] for this one but we want to avoid these arrays because NR_CPUS may get very big. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: > On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: > > >> I don't see the problem. The way i386 does it inherently supports >> per-cpu data very early on (it uses the prototype percpu section until >> the real percpu values are set up). >> > > Ok so we could do that for x86_64 as well? There is more complicated > bootstrap since i386 does not support NUMA aware placement of per cpu > areas. > Don't think it matters either way. Before percpu is allocated, NUMA issues don't matter. Once they are - by whatever mechanism - you can set the segment bases up appropriately. The fact that you chose to put percpu data at address X doesn't affect the percpu mechanism one way or the other. > percpu references are quite frequent already (vm statistics) and will be > more frequent after we have converted the per cpu arrays to per cpu > allocations. > Well, I think the point is moot, because x86 will always use 32-bit offsets. Each reference will only be 1 byte bigger than a normal variable reference. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: > I don't see the problem. The way i386 does it inherently supports > per-cpu data very early on (it uses the prototype percpu section until > the real percpu values are set up). Ok so we could do that for x86_64 as well? There is more complicated bootstrap since i386 does not support NUMA aware placement of per cpu areas. > > The i386 way of referring to per cpu data is not optimal because it is > > always offset by __per_cpu_start. per cpu data offsets need to be relative > > to the beginning of the per cpu area. per cpu data is less than 64k so 2 > > byte offsets would be enough. > > > > I don't see that's terribly important. percpu references aren't all > that common overall, and - at least on x86 - using a 16-bit offset > (assuming its possible) would require a prefix anyway, so it would only > save 1 byte per reference. But I can't convince gas to generate a > 16-bit offset anyway. percpu references are quite frequent already (vm statistics) and will be more frequent after we have converted the per cpu arrays to per cpu allocations. > > That way the __per_cpu_offset array and the registers that are used on > > various platforms are pointing to the actual data and can be loaded > > directly into a register and then a load with a small offset to that > > register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, > > on ia64 a fixed address stands in for the register. > > The asm used to generate these references is inherently arch-specific > anyway, so the type and size of offset needed from the per-cpu base > register to the data itself can be arch-dependent without loss of > generality. Well yes that is already the case and made explicit by the percpu cleanup done so far. The offset of a base is used by multiple architectures. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
> * drop support for stack-protector (does it really help? do people > use it?) AFAIK we only ever had a single classical stack buffer overflow in the kernel. It certainly doesn't seem to be a common security problem it is solving. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: > On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: > > > Yes, I would like to convert x86_64 to match i386's percpu, and drop the > >> pda altogether. The only thing preventing this is the stack canary, and >> I'm wondering how much value there is in keeping it, given the >> disadvantages of having this divergence between 32 and 64 bit. >> > > I think most of the PDA could be gotten rid of. The problems are > > 1. The stack canary > Yes, this is a biggie. It needs one of: * fix gcc * post-process the .s file * drop support for stack-protector (does it really help? do people use it?) > 2. The PDA is used to store per cpu data before the per cpu areas >are setup. > I don't see the problem. The way i386 does it inherently supports per-cpu data very early on (it uses the prototype percpu section until the real percpu values are set up). > The i386 way of referring to per cpu data is not optimal because it is > always offset by __per_cpu_start. per cpu data offsets need to be relative > to the beginning of the per cpu area. per cpu data is less than 64k so 2 > byte offsets would be enough. > I don't see that's terribly important. percpu references aren't all that common overall, and - at least on x86 - using a 16-bit offset (assuming its possible) would require a prefix anyway, so it would only save 1 byte per reference. But I can't convince gas to generate a 16-bit offset anyway. > That way the __per_cpu_offset array and the registers that are used on > various platforms are pointing to the actual data and can be loaded > directly into a register and then a load with a small offset to that > register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, > on ia64 a fixed address stands in for the register. The asm used to generate these references is inherently arch-specific anyway, so the type and size of offset needed from the per-cpu base register to the data itself can be arch-dependent without loss of generality. I definitely see that small offsets might be useful for other architectures, but for x86 it doesn't help and makes things more complex. The only difference between 32- and 64-bit is whether we generate an offset from %fs, %gs or nothing (for the UP case). > In loops over all per > cpu variables this will also simplify the code. > Why's that? > And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply > becomes the adding of the base address in a register to a per cpu offset. > I was never quite sure what that was for. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thu, 29 Nov 2007, Andi Kleen wrote: > On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote: > > 1. The stack canary > > You would need to change gcc with a new option and only allow the stack > checking when the compiler supports the new option. However the problem > is still how to get a reasonable fixed offset. Or perhaps just change > gcc to use a linker symbol relative to %gs that could be set to anything? I still think we should leave the canary as is. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote: > 1. The stack canary You would need to change gcc with a new option and only allow the stack checking when the compiler supports the new option. However the problem is still how to get a reasonable fixed offset. Or perhaps just change gcc to use a linker symbol relative to %gs that could be set to anything? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: > Yes, I would like to convert x86_64 to match i386's percpu, and drop the > pda altogether. The only thing preventing this is the stack canary, and > I'm wondering how much value there is in keeping it, given the > disadvantages of having this divergence between 32 and 64 bit. I think most of the PDA could be gotten rid of. The problems are 1. The stack canary 2. The PDA is used to store per cpu data before the per cpu areas are setup. The i386 way of referring to per cpu data is not optimal because it is always offset by __per_cpu_start. per cpu data offsets need to be relative to the beginning of the per cpu area. per cpu data is less than 64k so 2 byte offsets would be enough. That way the __per_cpu_offset array and the registers that are used on various platforms are pointing to the actual data and can be loaded directly into a register and then a load with a small offset to that register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, on ia64 a fixed address stands in for the register. In loops over all per cpu variables this will also simplify the code. And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply becomes the adding of the base address in a register to a per cpu offset. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Rusty Russell wrote: > On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote: > >> On Wed, 28 Nov 2007, Rusty Russell wrote: >> >>> On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote: >>> On Tue, 27 Nov 2007, Rusty Russell wrote: > Have you considered moving x86-64's setup_per_cpu_areas into generic > code? It's a bit messier because some archs might not have set up > NUMA stuff yet, but it's logically generic... > Yes that will happen later. This is just the early cleanup work. I plan to generally bring the two x86 arches in line. The pda will be folded into the per cpu area and after that its easy to do. >>> Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you >>> lose the ability to use the stack protection config option. That's >>> because it assumes that gs:0x68 (or something) is the stack canary; we >>> need a YA gcc change to make this gs:__builtin_stack_canary_off (where >>> gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we >>> can override it for the kernel. >>> >> This works if you rebase the per cpu area at zero. gs:0x68 is still the >> stack canary. >> >> The i386 method does not work because the segment register does not >> directly point to the pda. >> > > But the PDA itself is silly (Jeremy ported it to i386 and I balked). We have > a generic one: it's called the per-cpu data. Having a completely separate > per-cpu structure for x86-64 is a mistake. > Yes, I would like to convert x86_64 to match i386's percpu, and drop the pda altogether. The only thing preventing this is the stack canary, and I'm wondering how much value there is in keeping it, given the disadvantages of having this divergence between 32 and 64 bit. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thu, 29 Nov 2007, Rusty Russell wrote: > But the PDA itself is silly (Jeremy ported it to i386 and I balked). We have > a generic one: it's called the per-cpu data. Having a completely separate > per-cpu structure for x86-64 is a mistake. Yes ultimately the pda can be dissolved. However, the stack canary probably has to be kept for backward compatibility. > Setting up gs as the per-cpu offset has lovely properties and avoids YA > arch-specific concept; see the i386 code. Introducing a generic > read_percpu()/write_percpu() would even make it optimal. The code becomes much simpler if gs would point to the beginning of the per cpu area and if the __per_cpu_offset[i] would do the same. No weird __per_cpu_start offsetting anymore. The offsets are smaller if they are relative to the per cpu areas which will make more compact instructions possible. The generic write/readpercpu functionality introduced by the cpu_alloc patchset works best with offsets relative to an arch dependent register. All per cpu data (pda, percpu and allocpercpu) is handles as an offset relative to the start of the per cpu data. If the current offset by __per_cpu_start is kept then a per cpu allocator may have to dish out addresses that go beyond __per_cpu_end. I think dealing with a per cpu variable as if it would be an offset relative to a base is natural for the typical addressing of cpus based on an offset relative to some register. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote: > On Wed, 28 Nov 2007, Rusty Russell wrote: > > On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote: > > > On Tue, 27 Nov 2007, Rusty Russell wrote: > > > > Have you considered moving x86-64's setup_per_cpu_areas into generic > > > > code? It's a bit messier because some archs might not have set up > > > > NUMA stuff yet, but it's logically generic... > > > > > > Yes that will happen later. This is just the early cleanup work. I > > > plan to generally bring the two x86 arches in line. The pda will be > > > folded into the per cpu area and after that its easy to do. > > > > Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you > > lose the ability to use the stack protection config option. That's > > because it assumes that gs:0x68 (or something) is the stack canary; we > > need a YA gcc change to make this gs:__builtin_stack_canary_off (where > > gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we > > can override it for the kernel. > > This works if you rebase the per cpu area at zero. gs:0x68 is still the > stack canary. > > The i386 method does not work because the segment register does not > directly point to the pda. But the PDA itself is silly (Jeremy ported it to i386 and I balked). We have a generic one: it's called the per-cpu data. Having a completely separate per-cpu structure for x86-64 is a mistake. Setting up gs as the per-cpu offset has lovely properties and avoids YA arch-specific concept; see the i386 code. Introducing a generic read_percpu()/write_percpu() would even make it optimal. Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Rusty Russell wrote: > On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote: > > On Tue, 27 Nov 2007, Rusty Russell wrote: > > > Have you considered moving x86-64's setup_per_cpu_areas into generic > > > code? It's a bit messier because some archs might not have set up NUMA > > > stuff yet, but it's logically generic... > > > > Yes that will happen later. This is just the early cleanup work. I > > plan to generally bring the two x86 arches in line. The pda will be > > folded into the per cpu area and after that its easy to do. > > Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose > the ability to use the stack protection config option. That's because it > assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc > change to make this gs:__builtin_stack_canary_off (where gcc can emit > __builtin_stack_canary_off as a weak absolute symbol, so we can override it > for the kernel. This works if you rebase the per cpu area at zero. gs:0x68 is still the stack canary. The i386 method does not work because the segment register does not directly point to the pda. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Rusty Russell wrote: On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote: On Tue, 27 Nov 2007, Rusty Russell wrote: Have you considered moving x86-64's setup_per_cpu_areas into generic code? It's a bit messier because some archs might not have set up NUMA stuff yet, but it's logically generic... Yes that will happen later. This is just the early cleanup work. I plan to generally bring the two x86 arches in line. The pda will be folded into the per cpu area and after that its easy to do. Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose the ability to use the stack protection config option. That's because it assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc change to make this gs:__builtin_stack_canary_off (where gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we can override it for the kernel. This works if you rebase the per cpu area at zero. gs:0x68 is still the stack canary. The i386 method does not work because the segment register does not directly point to the pda. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote: On Wed, 28 Nov 2007, Rusty Russell wrote: On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote: On Tue, 27 Nov 2007, Rusty Russell wrote: Have you considered moving x86-64's setup_per_cpu_areas into generic code? It's a bit messier because some archs might not have set up NUMA stuff yet, but it's logically generic... Yes that will happen later. This is just the early cleanup work. I plan to generally bring the two x86 arches in line. The pda will be folded into the per cpu area and after that its easy to do. Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose the ability to use the stack protection config option. That's because it assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc change to make this gs:__builtin_stack_canary_off (where gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we can override it for the kernel. This works if you rebase the per cpu area at zero. gs:0x68 is still the stack canary. The i386 method does not work because the segment register does not directly point to the pda. But the PDA itself is silly (Jeremy ported it to i386 and I balked). We have a generic one: it's called the per-cpu data. Having a completely separate per-cpu structure for x86-64 is a mistake. Setting up gs as the per-cpu offset has lovely properties and avoids YA arch-specific concept; see the i386 code. Introducing a generic read_percpu()/write_percpu() would even make it optimal. Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thu, 29 Nov 2007, Rusty Russell wrote: But the PDA itself is silly (Jeremy ported it to i386 and I balked). We have a generic one: it's called the per-cpu data. Having a completely separate per-cpu structure for x86-64 is a mistake. Yes ultimately the pda can be dissolved. However, the stack canary probably has to be kept for backward compatibility. Setting up gs as the per-cpu offset has lovely properties and avoids YA arch-specific concept; see the i386 code. Introducing a generic read_percpu()/write_percpu() would even make it optimal. The code becomes much simpler if gs would point to the beginning of the per cpu area and if the __per_cpu_offset[i] would do the same. No weird __per_cpu_start offsetting anymore. The offsets are smaller if they are relative to the per cpu areas which will make more compact instructions possible. The generic write/readpercpu functionality introduced by the cpu_alloc patchset works best with offsets relative to an arch dependent register. All per cpu data (pda, percpu and allocpercpu) is handles as an offset relative to the start of the per cpu data. If the current offset by __per_cpu_start is kept then a per cpu allocator may have to dish out addresses that go beyond __per_cpu_end. I think dealing with a per cpu variable as if it would be an offset relative to a base is natural for the typical addressing of cpus based on an offset relative to some register. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Rusty Russell wrote: On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote: On Wed, 28 Nov 2007, Rusty Russell wrote: On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote: On Tue, 27 Nov 2007, Rusty Russell wrote: Have you considered moving x86-64's setup_per_cpu_areas into generic code? It's a bit messier because some archs might not have set up NUMA stuff yet, but it's logically generic... Yes that will happen later. This is just the early cleanup work. I plan to generally bring the two x86 arches in line. The pda will be folded into the per cpu area and after that its easy to do. Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose the ability to use the stack protection config option. That's because it assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc change to make this gs:__builtin_stack_canary_off (where gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we can override it for the kernel. This works if you rebase the per cpu area at zero. gs:0x68 is still the stack canary. The i386 method does not work because the segment register does not directly point to the pda. But the PDA itself is silly (Jeremy ported it to i386 and I balked). We have a generic one: it's called the per-cpu data. Having a completely separate per-cpu structure for x86-64 is a mistake. Yes, I would like to convert x86_64 to match i386's percpu, and drop the pda altogether. The only thing preventing this is the stack canary, and I'm wondering how much value there is in keeping it, given the disadvantages of having this divergence between 32 and 64 bit. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: Yes, I would like to convert x86_64 to match i386's percpu, and drop the pda altogether. The only thing preventing this is the stack canary, and I'm wondering how much value there is in keeping it, given the disadvantages of having this divergence between 32 and 64 bit. I think most of the PDA could be gotten rid of. The problems are 1. The stack canary 2. The PDA is used to store per cpu data before the per cpu areas are setup. The i386 way of referring to per cpu data is not optimal because it is always offset by __per_cpu_start. per cpu data offsets need to be relative to the beginning of the per cpu area. per cpu data is less than 64k so 2 byte offsets would be enough. That way the __per_cpu_offset array and the registers that are used on various platforms are pointing to the actual data and can be loaded directly into a register and then a load with a small offset to that register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, on ia64 a fixed address stands in for the register. In loops over all per cpu variables this will also simplify the code. And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply becomes the adding of the base address in a register to a per cpu offset. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Thu, 29 Nov 2007, Andi Kleen wrote: On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote: 1. The stack canary You would need to change gcc with a new option and only allow the stack checking when the compiler supports the new option. However the problem is still how to get a reasonable fixed offset. Or perhaps just change gcc to use a linker symbol relative to %gs that could be set to anything? I still think we should leave the canary as is. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: Yes, I would like to convert x86_64 to match i386's percpu, and drop the pda altogether. The only thing preventing this is the stack canary, and I'm wondering how much value there is in keeping it, given the disadvantages of having this divergence between 32 and 64 bit. I think most of the PDA could be gotten rid of. The problems are 1. The stack canary Yes, this is a biggie. It needs one of: * fix gcc * post-process the .s file * drop support for stack-protector (does it really help? do people use it?) 2. The PDA is used to store per cpu data before the per cpu areas are setup. I don't see the problem. The way i386 does it inherently supports per-cpu data very early on (it uses the prototype percpu section until the real percpu values are set up). The i386 way of referring to per cpu data is not optimal because it is always offset by __per_cpu_start. per cpu data offsets need to be relative to the beginning of the per cpu area. per cpu data is less than 64k so 2 byte offsets would be enough. I don't see that's terribly important. percpu references aren't all that common overall, and - at least on x86 - using a 16-bit offset (assuming its possible) would require a prefix anyway, so it would only save 1 byte per reference. But I can't convince gas to generate a 16-bit offset anyway. That way the __per_cpu_offset array and the registers that are used on various platforms are pointing to the actual data and can be loaded directly into a register and then a load with a small offset to that register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, on ia64 a fixed address stands in for the register. The asm used to generate these references is inherently arch-specific anyway, so the type and size of offset needed from the per-cpu base register to the data itself can be arch-dependent without loss of generality. I definitely see that small offsets might be useful for other architectures, but for x86 it doesn't help and makes things more complex. The only difference between 32- and 64-bit is whether we generate an offset from %fs, %gs or nothing (for the UP case). In loops over all per cpu variables this will also simplify the code. Why's that? And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply becomes the adding of the base address in a register to a per cpu offset. I was never quite sure what that was for. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote: 1. The stack canary You would need to change gcc with a new option and only allow the stack checking when the compiler supports the new option. However the problem is still how to get a reasonable fixed offset. Or perhaps just change gcc to use a linker symbol relative to %gs that could be set to anything? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
* drop support for stack-protector (does it really help? do people use it?) AFAIK we only ever had a single classical stack buffer overflow in the kernel. It certainly doesn't seem to be a common security problem it is solving. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: I don't see the problem. The way i386 does it inherently supports per-cpu data very early on (it uses the prototype percpu section until the real percpu values are set up). Ok so we could do that for x86_64 as well? There is more complicated bootstrap since i386 does not support NUMA aware placement of per cpu areas. Don't think it matters either way. Before percpu is allocated, NUMA issues don't matter. Once they are - by whatever mechanism - you can set the segment bases up appropriately. The fact that you chose to put percpu data at address X doesn't affect the percpu mechanism one way or the other. percpu references are quite frequent already (vm statistics) and will be more frequent after we have converted the per cpu arrays to per cpu allocations. Well, I think the point is moot, because x86 will always use 32-bit offsets. Each reference will only be 1 byte bigger than a normal variable reference. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: Don't think it matters either way. Before percpu is allocated, NUMA issues don't matter. Once they are - by whatever mechanism - you can set the segment bases up appropriately. The fact that you chose to put percpu data at address X doesn't affect the percpu mechanism one way or the other. The percpu areas need to be allocated in a NUMA aware fashion. Otherwise you use distant memory for the most performance sensitive areas. The NUMA subsystem must be so far up that these allocations can be performed in the right way. And this means at least you need to know on which node each processor is located. That is what the PDA is currently used for and i386 has no other way of doing that. I think we could use an array [NR_CPUS] for this one but we want to avoid these arrays because NR_CPUS may get very big. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: I don't see the problem. The way i386 does it inherently supports per-cpu data very early on (it uses the prototype percpu section until the real percpu values are set up). Ok so we could do that for x86_64 as well? There is more complicated bootstrap since i386 does not support NUMA aware placement of per cpu areas. The i386 way of referring to per cpu data is not optimal because it is always offset by __per_cpu_start. per cpu data offsets need to be relative to the beginning of the per cpu area. per cpu data is less than 64k so 2 byte offsets would be enough. I don't see that's terribly important. percpu references aren't all that common overall, and - at least on x86 - using a 16-bit offset (assuming its possible) would require a prefix anyway, so it would only save 1 byte per reference. But I can't convince gas to generate a 16-bit offset anyway. percpu references are quite frequent already (vm statistics) and will be more frequent after we have converted the per cpu arrays to per cpu allocations. That way the __per_cpu_offset array and the registers that are used on various platforms are pointing to the actual data and can be loaded directly into a register and then a load with a small offset to that register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, on ia64 a fixed address stands in for the register. The asm used to generate these references is inherently arch-specific anyway, so the type and size of offset needed from the per-cpu base register to the data itself can be arch-dependent without loss of generality. Well yes that is already the case and made explicit by the percpu cleanup done so far. The offset of a base is used by multiple architectures. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: The percpu areas need to be allocated in a NUMA aware fashion. Otherwise you use distant memory for the most performance sensitive areas. The NUMA subsystem must be so far up that these allocations can be performed in the right way. And this means at least you need to know on which node each processor is located. That is what the PDA is currently used for and i386 has no other way of doing that. I think we could use an array [NR_CPUS] for this one but we want to avoid these arrays because NR_CPUS may get very big. Oh, you mean there needs to be some percpu data mechanism operating in order to do numa-aware allocations, which would be necessary to allocate the percpu memory itself? I can see how that would be awkward. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote: percpu references are quite frequent already (vm statistics) and will be more frequent after we have converted the per cpu arrays to per cpu allocations. Well, I think the point is moot, because x86 will always use 32-bit offsets. Each reference will only be 1 byte bigger than a normal variable reference. Just because i386 is not able to use it does not mean that other arches are not. F.e. IA64 can embedd offsets in the actual instruction (but of course not 64bit). x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses the small model. A load of a 64 bit address would require much more expensive instructions. A load of a 64 bit address is currently avoided through the use of the pda that contains the full 64 bit address in the data_offset field. Operations on per cpu data on x86_64 must therefore first load data_offset via gs and then add the per cpu address to this offset. Then the per cpu operation is performed on that address. In order to avoid this situation through one instruction we need a small 32 bit offset relative to gs. Otherwise we cannot get away from the PDA and the use of data_offset. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses the small model. A load of a 64 bit address would require much more expensive instructions. A load of a 64 bit address is currently avoided through the use of the pda that contains the full 64 bit address in the data_offset field. Operations on per cpu data on x86_64 must therefore first load data_offset via gs and then add the per cpu address to this offset. Then the per cpu operation is performed on that address. Hm. Certainly a non-one-instruction access would be considerably less useful than one that is, because of preemption issues. (In general you need to pin yourself to a cpu if you're using percpu data, but sometimes it doesn't matter. In particular, the reason I'm interested in this at all is because Xen puts its interrupt mask flag in per-cpu data, and a single instruction means that masking interrupts [=disable preemption] can be done in one instruction with no scope for preemption in the middle doing something unexpected.) In order to avoid this situation through one instruction we need a small 32 bit offset relative to gs. Otherwise we cannot get away from the PDA and the use of data_offset. Hm, yes, I see. Dratted large address space. What's wrong with 4G anyway? ;) Anyway, I can see the problem with my thinking about this so far. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Second portion. Add a new seg_offset macro to calculate the offset. This can be avoided if the linker relocates the per cpu area to zero. Includes a patch to read trickle count via both methods to verify that it actually works. Both patches on top of the per cpu cleanup patches that I sent today too. x86_64: Make the x86_32 percpu operations usable on x86_64 Calculate the offset relative to gs in order to be able to address per cpu data using the x86_64 per cpu macros. The subtraction of __per_cpu_start will make the offset based from the beginning of the per cpu area. That is where gs points to. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- drivers/char/random.c|2 +- include/asm-x86/percpu.h | 29 ++--- init/main.c |5 + 3 files changed, 24 insertions(+), 12 deletions(-) Index: linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-x86/percpu.h 2007-11-28 17:50:01.861182410 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h 2007-11-28 21:22:50.845872906 -0800 @@ -16,7 +16,13 @@ #define __my_cpu_offset read_pda(data_offset) #define per_cpu_offset(x) (__per_cpu_offset(x)) +#define __percpu_seg %%gs: +/* Calculate the offset to use with the segment register */ +#define seg_offset(name) (*SHIFT_PTR(per_cpu_var(name), - (unsigned long)__per_cpu_start)) +#else +#define __percpu_seg +#define seg_offset(name) per_cpu_var(name) #endif #include asm-generic/percpu.h @@ -64,16 +70,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda); *PER_CPU(cpu_gdt_descr, %ebx) */ #ifdef CONFIG_SMP - #define __my_cpu_offset x86_read_percpu(this_cpu_off) - /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */ #define __percpu_seg %%fs: - #else /* !SMP */ - #define __percpu_seg - #endif /* SMP */ #include asm-generic/percpu.h @@ -81,6 +82,13 @@ DECLARE_PER_CPU(struct x8664_pda, pda); /* We can use this directly for local CPU (faster). */ DECLARE_PER_CPU(unsigned long, this_cpu_off); +#define seg_offset(name) per_cpu_var(name) + +#endif /* __ASSEMBLY__ */ +#endif /* !CONFIG_X86_64 */ + +#ifndef __ASSEMBLY__ + /* For arch-specific code, we can use direct single-insn ops (they * don't give an lvalue though). */ extern void __bad_percpu_size(void); @@ -132,11 +140,10 @@ extern void __bad_percpu_size(void); } \ ret__; }) -#define x86_read_percpu(var) percpu_from_op(mov, per_cpu__##var) -#define x86_write_percpu(var,val) percpu_to_op(mov, per_cpu__##var, val) -#define x86_add_percpu(var,val) percpu_to_op(add, per_cpu__##var, val) -#define x86_sub_percpu(var,val) percpu_to_op(sub, per_cpu__##var, val) -#define x86_or_percpu(var,val) percpu_to_op(or, per_cpu__##var, val) +#define x86_read_percpu(var) percpu_from_op(mov, seg_offset(var)) +#define x86_write_percpu(var,val) percpu_to_op(mov, seg_offset(var), val) +#define x86_add_percpu(var,val) percpu_to_op(add, seg_offset(var), val) +#define x86_sub_percpu(var,val) percpu_to_op(sub, seg_offset(var), val) +#define x86_or_percpu(var,val) percpu_to_op(or, seg_offset(var), val) #endif /* !__ASSEMBLY__ */ -#endif /* !CONFIG_X86_64 */ #endif /* _ASM_X86_PERCPU_H_ */ Index: linux-2.6.24-rc3-mm2/drivers/char/random.c === --- linux-2.6.24-rc3-mm2.orig/drivers/char/random.c 2007-11-28 21:20:58.225804398 -0800 +++ linux-2.6.24-rc3-mm2/drivers/char/random.c 2007-11-28 21:28:38.967363573 -0800 @@ -272,7 +272,7 @@ static int random_write_wakeup_thresh = static int trickle_thresh __read_mostly = INPUT_POOL_WORDS * 28; -static DEFINE_PER_CPU(int, trickle_count) = 0; +DEFINE_PER_CPU(int, trickle_count) = 55; /* * A pool of size .poolwords is stirred with a primitive polynomial Index: linux-2.6.24-rc3-mm2/init/main.c === --- linux-2.6.24-rc3-mm2.orig/init/main.c 2007-11-28 21:10:54.245804225 -0800 +++ linux-2.6.24-rc3-mm2/init/main.c2007-11-28 21:22:17.769053628 -0800 @@ -504,6 +504,8 @@ void __init __attribute__((weak)) smp_se { } +DECLARE_PER_CPU(int, trickle_count); + asmlinkage void __init start_kernel(void) { char * command_line; @@ -645,6 +647,9 @@ asmlinkage void __init start_kernel(void acpi_early_init(); /* before LAPIC and SMP init */ + printk(Reading trickle cound =%lu. Is %lu\n, + x86_read_percpu(trickle_count), + __raw_get_cpu_var(trickle_count)); /* Do the rest non-__init'ed, we're now alive */ rest_init(); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Here is the first of two patches for x86_64 that move the pda into the per cpu area and then make the x86 percpu macros work for x86_64. This needs to be generalized for other arches. The __per_cpu_start offsets can be taken care of by the linker. We can also tell the linker to completely relocate the percpu area to 0. X86_64: Declare pda as per cpu data thereby moving it into the cpu area Declare the pda as a per cpu variable. This will have the effect of moving the pda data into the cpu area managed by cpu alloc. The boot_pdas are only needed in head64.c so move the declaration over there and make it static. Remove the code that allocates special pda data structures. The pda is moved to the beginning of the per cpu area. gs is pointing to the pda. And therefore gs: is now pointing to the per cpu area of the current processor. A per cpu variable can then be reached at %gs:[per_cpu_ - __per_cpu_start] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/x86/kernel/head64.c |6 ++ arch/x86/kernel/setup64.c | 13 ++--- arch/x86/kernel/smpboot_64.c | 16 include/asm-generic/vmlinux.lds.h |1 + include/asm-x86/pda.h |1 - include/linux/percpu.h|4 6 files changed, 21 insertions(+), 20 deletions(-) Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-28 20:59:13.124188194 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c 2007-11-28 21:08:50.473347382 -0800 @@ -30,7 +30,9 @@ cpumask_t cpu_initialized __cpuinitdata struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly; EXPORT_SYMBOL(_cpu_pda); -struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned; + +DEFINE_PER_CPU_FIRST(struct x8664_pda, pda); +EXPORT_PER_CPU_SYMBOL(pda); struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table }; @@ -109,10 +111,15 @@ void __init setup_per_cpu_areas(void) } if (!ptr) panic(Cannot allocate cpu data for CPU %d\n, i); - cpu_pda(i)-data_offset = ptr - __per_cpu_start; memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start); + /* Relocate the pda */ + memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda)); + cpu_pda(i) = (struct x8664_pda *)ptr; + cpu_pda(i)-data_offset = ptr - __per_cpu_start; } -} + /* Fix up pda for this processor */ + pda_init(0); +} void pda_init(int cpu) { Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/smpboot_64.c 2007-11-28 20:59:13.136188167 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c 2007-11-28 20:59:35.399937395 -0800 @@ -556,22 +556,6 @@ static int __cpuinit do_boot_cpu(int cpu return -1; } - /* Allocate node local memory for AP pdas */ - if (cpu_pda(cpu) == boot_cpu_pda[cpu]) { - struct x8664_pda *newpda, *pda; - int node = cpu_to_node(cpu); - pda = cpu_pda(cpu); - newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC, - node); - if (newpda) { - memcpy(newpda, pda, sizeof (struct x8664_pda)); - cpu_pda(cpu) = newpda; - } else - printk(KERN_ERR - Could not allocate node local PDA for CPU %d on node %d\n, - cpu, node); - } - alternatives_smp_switch(1); c_idle.idle = get_idle_for_cpu(cpu); Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c === --- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/head64.c 2007-11-28 20:59:13.152187359 -0800 +++ linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c 2007-11-28 20:59:35.403937534 -0800 @@ -22,6 +22,12 @@ #include asm/sections.h #include asm/kdebug.h +/* + * Only used before the per cpu areas are setup. The use for the non possible + * cpus continues after boot + */ +static struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned; + static void __init zap_identity_mappings(void) { pgd_t *pgd = pgd_offset_k(0UL); Index: linux-2.6.24-rc3-mm2/include/asm-x86/pda.h === --- linux-2.6.24-rc3-mm2.orig/include/asm-x86/pda.h 2007-11-28 20:59:13.164187921 -0800 +++ linux-2.6.24-rc3-mm2/include/asm-x86/pda.h 2007-11-28 20:59:35.403937534 -0800 @@ -39,7 +39,6 @@ struct x8664_pda { } cacheline_aligned_in_smp; extern struct x8664_pda *_cpu_pda[]; -extern struct x8664_pda boot_cpu_pda[]; extern void pda_init(int);
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote: > On Tue, 27 Nov 2007, Rusty Russell wrote: > > Have you considered moving x86-64's setup_per_cpu_areas into generic > > code? It's a bit messier because some archs might not have set up NUMA > > stuff yet, but it's logically generic... > > Yes that will happen later. This is just the early cleanup work. I > plan to generally bring the two x86 arches in line. The pda will be > folded into the per cpu area and after that its easy to do. Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose the ability to use the stack protection config option. That's because it assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc change to make this gs:__builtin_stack_canary_off (where gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we can override it for the kernel. Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: On Tue, 27 Nov 2007, Randy Dunlap wrote: +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y def_bool y is the preferred form for those 2-liners above... + config ARCH_NO_VIRT_TO_BUS def_bool y Ok. Changed. x86 should use config ARCH_SETS_UP_PER_CPU_AREA def_bool X86_64 ? Yes, you can do def_bool as well to make the new symbol be variable instead of constant. -- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Tue, 27 Nov 2007, Randy Dunlap wrote: > > +config ARCH_SETS_UP_PER_CPU_AREA > > + bool > > + default y > > def_bool y > is the preferred form for those 2-liners above... > > > > + > > config ARCH_NO_VIRT_TO_BUS > > def_bool y > > Ok. Changed. x86 should use config ARCH_SETS_UP_PER_CPU_AREA def_bool X86_64 ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Mon, 26 Nov 2007 16:14:12 -0800 Christoph Lameter wrote: > The use of the __GENERIC_PERCPU is a bit problematic since arches > may want to run their own percpu setup while using the generic > percpu definitions. Replace it through a kconfig variable. > > Cc: Rusty Russell <[EMAIL PROTECTED]> > Cc: Andi Kleen <[EMAIL PROTECTED]> > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > --- > > Index: linux-2.6/arch/ia64/Kconfig > === > --- linux-2.6.orig/arch/ia64/Kconfig 2007-11-26 15:38:56.415112360 -0800 > +++ linux-2.6/arch/ia64/Kconfig 2007-11-26 15:40:10.425862722 -0800 > @@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL > bool > default y > > +config ARCH_SETS_UP_PER_CPU_AREA > + bool > + default y > + > config DMI > bool > default y > Index: linux-2.6/arch/sparc64/Kconfig > === > --- linux-2.6.orig/arch/sparc64/Kconfig 2007-11-26 15:38:56.447111936 > -0800 > +++ linux-2.6/arch/sparc64/Kconfig2007-11-26 15:40:10.425862722 -0800 > @@ -66,6 +66,10 @@ config AUDIT_ARCH > bool > default y > > +config ARCH_SETS_UP_PER_CPU_AREA > + bool > + default y def_bool y is the preferred form for those 2-liners above... > + > config ARCH_NO_VIRT_TO_BUS > def_bool y > --- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Tue, 27 Nov 2007, Rusty Russell wrote: > Have you considered moving x86-64's setup_per_cpu_areas into generic code? > It's a bit messier because some archs might not have set up NUMA stuff yet, > but it's logically generic... Yes that will happen later. This is just the early cleanup work. I plan to generally bring the two x86 arches in line. The pda will be folded into the per cpu area and after that its easy to do. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Tue, 27 Nov 2007, Rusty Russell wrote: Have you considered moving x86-64's setup_per_cpu_areas into generic code? It's a bit messier because some archs might not have set up NUMA stuff yet, but it's logically generic... Yes that will happen later. This is just the early cleanup work. I plan to generally bring the two x86 arches in line. The pda will be folded into the per cpu area and after that its easy to do. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Mon, 26 Nov 2007 16:14:12 -0800 Christoph Lameter wrote: The use of the __GENERIC_PERCPU is a bit problematic since arches may want to run their own percpu setup while using the generic percpu definitions. Replace it through a kconfig variable. Cc: Rusty Russell [EMAIL PROTECTED] Cc: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- Index: linux-2.6/arch/ia64/Kconfig === --- linux-2.6.orig/arch/ia64/Kconfig 2007-11-26 15:38:56.415112360 -0800 +++ linux-2.6/arch/ia64/Kconfig 2007-11-26 15:40:10.425862722 -0800 @@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL bool default y +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y + config DMI bool default y Index: linux-2.6/arch/sparc64/Kconfig === --- linux-2.6.orig/arch/sparc64/Kconfig 2007-11-26 15:38:56.447111936 -0800 +++ linux-2.6/arch/sparc64/Kconfig2007-11-26 15:40:10.425862722 -0800 @@ -66,6 +66,10 @@ config AUDIT_ARCH bool default y +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y def_bool y is the preferred form for those 2-liners above... + config ARCH_NO_VIRT_TO_BUS def_bool y --- ~Randy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Tue, 27 Nov 2007, Randy Dunlap wrote: +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y def_bool y is the preferred form for those 2-liners above... + config ARCH_NO_VIRT_TO_BUS def_bool y Ok. Changed. x86 should use config ARCH_SETS_UP_PER_CPU_AREA def_bool X86_64 ? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
Christoph Lameter wrote: On Tue, 27 Nov 2007, Randy Dunlap wrote: +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y def_bool y is the preferred form for those 2-liners above... + config ARCH_NO_VIRT_TO_BUS def_bool y Ok. Changed. x86 should use config ARCH_SETS_UP_PER_CPU_AREA def_bool X86_64 ? Yes, you can do def_bool config symbol as well to make the new symbol be variable instead of constant. -- ~Randy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote: On Tue, 27 Nov 2007, Rusty Russell wrote: Have you considered moving x86-64's setup_per_cpu_areas into generic code? It's a bit messier because some archs might not have set up NUMA stuff yet, but it's logically generic... Yes that will happen later. This is just the early cleanup work. I plan to generally bring the two x86 arches in line. The pda will be folded into the per cpu area and after that its easy to do. Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose the ability to use the stack protection config option. That's because it assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc change to make this gs:__builtin_stack_canary_off (where gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we can override it for the kernel. Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Tuesday 27 November 2007 11:14:12 Christoph Lameter wrote: > The use of the __GENERIC_PERCPU is a bit problematic since arches > may want to run their own percpu setup while using the generic > percpu definitions. Replace it through a kconfig variable. Thanks for this Christoph! These patches are great: the early experiments are obviously over, and so this consolidation is overdue. Have you considered moving x86-64's setup_per_cpu_areas into generic code? It's a bit messier because some archs might not have set up NUMA stuff yet, but it's logically generic... Thanks! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
The use of the __GENERIC_PERCPU is a bit problematic since arches may want to run their own percpu setup while using the generic percpu definitions. Replace it through a kconfig variable. Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/ia64/Kconfig|4 arch/powerpc/Kconfig |4 arch/sparc64/Kconfig |4 arch/x86/Kconfig |6 +++--- include/asm-generic/percpu.h |1 - include/asm-s390/percpu.h|2 -- include/asm-x86/percpu_32.h |2 -- init/main.c |4 ++-- 8 files changed, 17 insertions(+), 10 deletions(-) Index: linux-2.6/init/main.c === --- linux-2.6.orig/init/main.c 2007-11-26 15:38:56.407111768 -0800 +++ linux-2.6/init/main.c 2007-11-26 15:40:10.425862722 -0800 @@ -363,7 +363,7 @@ static inline void smp_prepare_cpus(unsi #else -#ifdef __GENERIC_PER_CPU +#ifndef CONFIG_ARCH_SETS_UP_PER_CPU_AREA unsigned long __per_cpu_offset[NR_CPUS] __read_mostly; EXPORT_SYMBOL(__per_cpu_offset); @@ -384,7 +384,7 @@ static void __init setup_per_cpu_areas(v ptr += size; } } -#endif /* !__GENERIC_PER_CPU */ +#endif /* CONFIG_ARCH_SETS_UP_CPU_AREA */ /* Called by boot processor to activate the rest. */ static void __init smp_init(void) Index: linux-2.6/arch/ia64/Kconfig === --- linux-2.6.orig/arch/ia64/Kconfig2007-11-26 15:38:56.415112360 -0800 +++ linux-2.6/arch/ia64/Kconfig 2007-11-26 15:40:10.425862722 -0800 @@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL bool default y +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y + config DMI bool default y Index: linux-2.6/arch/powerpc/Kconfig === --- linux-2.6.orig/arch/powerpc/Kconfig 2007-11-26 15:38:56.427111914 -0800 +++ linux-2.6/arch/powerpc/Kconfig 2007-11-26 15:40:10.425862722 -0800 @@ -42,6 +42,10 @@ config GENERIC_HARDIRQS bool default y +config ARCH_SETS_UP_PER_CPU_AREA + bool + default PPC64 + config IRQ_PER_CPU bool default y Index: linux-2.6/arch/sparc64/Kconfig === --- linux-2.6.orig/arch/sparc64/Kconfig 2007-11-26 15:38:56.447111936 -0800 +++ linux-2.6/arch/sparc64/Kconfig 2007-11-26 15:40:10.425862722 -0800 @@ -66,6 +66,10 @@ config AUDIT_ARCH bool default y +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y + config ARCH_NO_VIRT_TO_BUS def_bool y Index: linux-2.6/arch/x86/Kconfig === --- linux-2.6.orig/arch/x86/Kconfig 2007-11-26 15:38:58.234361975 -0800 +++ linux-2.6/arch/x86/Kconfig 2007-11-26 15:40:52.465611449 -0800 @@ -112,9 +112,9 @@ config GENERIC_TIME_VSYSCALL bool default X86_64 - - - +config ARCH_SETS_UP_PER_CPU_AREA + bool + default X86_64 config ZONE_DMA32 bool Index: linux-2.6/include/asm-generic/percpu.h === --- linux-2.6.orig/include/asm-generic/percpu.h 2007-11-26 15:40:04.469611815 -0800 +++ linux-2.6/include/asm-generic/percpu.h 2007-11-26 15:40:10.437861790 -0800 @@ -3,7 +3,6 @@ #include #include -#define __GENERIC_PER_CPU #ifdef CONFIG_SMP extern unsigned long __per_cpu_offset[NR_CPUS]; Index: linux-2.6/include/asm-x86/percpu_32.h === --- linux-2.6.orig/include/asm-x86/percpu_32.h 2007-11-26 15:40:04.469611815 -0800 +++ linux-2.6/include/asm-x86/percpu_32.h 2007-11-26 15:40:10.441861845 -0800 @@ -41,8 +41,6 @@ *PER_CPU(cpu_gdt_descr, %ebx) */ #ifdef CONFIG_SMP -/* Same as generic implementation except for optimized local access. */ -#define __GENERIC_PER_CPU /* This is used for other cpus to find our section. */ extern unsigned long __per_cpu_offset[]; Index: linux-2.6/include/asm-s390/percpu.h === --- linux-2.6.orig/include/asm-s390/percpu.h2007-11-26 15:40:04.469611815 -0800 +++ linux-2.6/include/asm-s390/percpu.h 2007-11-26 15:40:10.441861845 -0800 @@ -4,8 +4,6 @@ #include #include -#define __GENERIC_PER_CPU - /* * s390 uses its own implementation for per cpu data, the offset of * the cpu local data area is cached in the cpu's lowcore memory. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
The use of the __GENERIC_PERCPU is a bit problematic since arches may want to run their own percpu setup while using the generic percpu definitions. Replace it through a kconfig variable. Cc: Rusty Russell [EMAIL PROTECTED] Cc: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/ia64/Kconfig|4 arch/powerpc/Kconfig |4 arch/sparc64/Kconfig |4 arch/x86/Kconfig |6 +++--- include/asm-generic/percpu.h |1 - include/asm-s390/percpu.h|2 -- include/asm-x86/percpu_32.h |2 -- init/main.c |4 ++-- 8 files changed, 17 insertions(+), 10 deletions(-) Index: linux-2.6/init/main.c === --- linux-2.6.orig/init/main.c 2007-11-26 15:38:56.407111768 -0800 +++ linux-2.6/init/main.c 2007-11-26 15:40:10.425862722 -0800 @@ -363,7 +363,7 @@ static inline void smp_prepare_cpus(unsi #else -#ifdef __GENERIC_PER_CPU +#ifndef CONFIG_ARCH_SETS_UP_PER_CPU_AREA unsigned long __per_cpu_offset[NR_CPUS] __read_mostly; EXPORT_SYMBOL(__per_cpu_offset); @@ -384,7 +384,7 @@ static void __init setup_per_cpu_areas(v ptr += size; } } -#endif /* !__GENERIC_PER_CPU */ +#endif /* CONFIG_ARCH_SETS_UP_CPU_AREA */ /* Called by boot processor to activate the rest. */ static void __init smp_init(void) Index: linux-2.6/arch/ia64/Kconfig === --- linux-2.6.orig/arch/ia64/Kconfig2007-11-26 15:38:56.415112360 -0800 +++ linux-2.6/arch/ia64/Kconfig 2007-11-26 15:40:10.425862722 -0800 @@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL bool default y +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y + config DMI bool default y Index: linux-2.6/arch/powerpc/Kconfig === --- linux-2.6.orig/arch/powerpc/Kconfig 2007-11-26 15:38:56.427111914 -0800 +++ linux-2.6/arch/powerpc/Kconfig 2007-11-26 15:40:10.425862722 -0800 @@ -42,6 +42,10 @@ config GENERIC_HARDIRQS bool default y +config ARCH_SETS_UP_PER_CPU_AREA + bool + default PPC64 + config IRQ_PER_CPU bool default y Index: linux-2.6/arch/sparc64/Kconfig === --- linux-2.6.orig/arch/sparc64/Kconfig 2007-11-26 15:38:56.447111936 -0800 +++ linux-2.6/arch/sparc64/Kconfig 2007-11-26 15:40:10.425862722 -0800 @@ -66,6 +66,10 @@ config AUDIT_ARCH bool default y +config ARCH_SETS_UP_PER_CPU_AREA + bool + default y + config ARCH_NO_VIRT_TO_BUS def_bool y Index: linux-2.6/arch/x86/Kconfig === --- linux-2.6.orig/arch/x86/Kconfig 2007-11-26 15:38:58.234361975 -0800 +++ linux-2.6/arch/x86/Kconfig 2007-11-26 15:40:52.465611449 -0800 @@ -112,9 +112,9 @@ config GENERIC_TIME_VSYSCALL bool default X86_64 - - - +config ARCH_SETS_UP_PER_CPU_AREA + bool + default X86_64 config ZONE_DMA32 bool Index: linux-2.6/include/asm-generic/percpu.h === --- linux-2.6.orig/include/asm-generic/percpu.h 2007-11-26 15:40:04.469611815 -0800 +++ linux-2.6/include/asm-generic/percpu.h 2007-11-26 15:40:10.437861790 -0800 @@ -3,7 +3,6 @@ #include linux/compiler.h #include linux/threads.h -#define __GENERIC_PER_CPU #ifdef CONFIG_SMP extern unsigned long __per_cpu_offset[NR_CPUS]; Index: linux-2.6/include/asm-x86/percpu_32.h === --- linux-2.6.orig/include/asm-x86/percpu_32.h 2007-11-26 15:40:04.469611815 -0800 +++ linux-2.6/include/asm-x86/percpu_32.h 2007-11-26 15:40:10.441861845 -0800 @@ -41,8 +41,6 @@ *PER_CPU(cpu_gdt_descr, %ebx) */ #ifdef CONFIG_SMP -/* Same as generic implementation except for optimized local access. */ -#define __GENERIC_PER_CPU /* This is used for other cpus to find our section. */ extern unsigned long __per_cpu_offset[]; Index: linux-2.6/include/asm-s390/percpu.h === --- linux-2.6.orig/include/asm-s390/percpu.h2007-11-26 15:40:04.469611815 -0800 +++ linux-2.6/include/asm-s390/percpu.h 2007-11-26 15:40:10.441861845 -0800 @@ -4,8 +4,6 @@ #include linux/compiler.h #include asm/lowcore.h -#define __GENERIC_PER_CPU - /* * s390 uses its own implementation for per cpu data, the offset of * the cpu local data area is cached in the cpu's lowcore memory. -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup
On Tuesday 27 November 2007 11:14:12 Christoph Lameter wrote: The use of the __GENERIC_PERCPU is a bit problematic since arches may want to run their own percpu setup while using the generic percpu definitions. Replace it through a kconfig variable. Thanks for this Christoph! These patches are great: the early experiments are obviously over, and so this consolidation is overdue. Have you considered moving x86-64's setup_per_cpu_areas into generic code? It's a bit messier because some archs might not have set up NUMA stuff yet, but it's logically generic... Thanks! Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/