Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-29 Thread Rusty Russell
On Thursday 29 November 2007 10:36:06 Christoph Lameter wrote:
> The code becomes much simpler if gs would point to the beginning of the
> per cpu area and if the __per_cpu_offset[i] would do the same. No weird
> __per_cpu_start offsetting anymore.

It is a little weird, but it gave flexibility for most archs.

ISTR I had issues relocating the percpu area to 0, but I look forward to your 
code!

> The generic write/readpercpu functionality introduced by the cpu_alloc
> patchset works best with offsets relative to an arch dependent
> register. All per cpu data (pda, percpu and allocpercpu) is handles as an
> offset relative to the start of the per cpu data.

Hmm, did someone cc me on the patchset and I missed it?

> If the current offset by __per_cpu_start is kept then a per cpu allocator
> may have to dish out addresses that go beyond __per_cpu_end.

Of course; you just need congruence in your allocation across CPUs.  It's 
possible, but no worse than the requirements on other schemes where you can 
reach a variable with a single addition for the CPU.

> I think dealing with a per cpu variable as if it would be an offset
> relative to a base is natural for the typical addressing of cpus based on
> an offset relative to some register.

We've had practical problems getting the compiler to eke out the potential 
benefit.  That's why we settled for an offset between where the compiler 
expected and where the variable actually was.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-29 Thread Rusty Russell
On Thursday 29 November 2007 10:36:06 Christoph Lameter wrote:
 The code becomes much simpler if gs would point to the beginning of the
 per cpu area and if the __per_cpu_offset[i] would do the same. No weird
 __per_cpu_start offsetting anymore.

It is a little weird, but it gave flexibility for most archs.

ISTR I had issues relocating the percpu area to 0, but I look forward to your 
code!

 The generic write/readpercpu functionality introduced by the cpu_alloc
 patchset works best with offsets relative to an arch dependent
 register. All per cpu data (pda, percpu and allocpercpu) is handles as an
 offset relative to the start of the per cpu data.

Hmm, did someone cc me on the patchset and I missed it?

 If the current offset by __per_cpu_start is kept then a per cpu allocator
 may have to dish out addresses that go beyond __per_cpu_end.

Of course; you just need congruence in your allocation across CPUs.  It's 
possible, but no worse than the requirements on other schemes where you can 
reach a variable with a single addition for the CPU.

 I think dealing with a per cpu variable as if it would be an offset
 relative to a base is natural for the typical addressing of cpus based on
 an offset relative to some register.

We've had practical problems getting the compiler to eke out the potential 
benefit.  That's why we settled for an offset between where the compiler 
expected and where the variable actually was.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
Second portion. Add a new seg_offset macro to calculate the offset. This 
can be avoided if the linker relocates the per cpu area to zero. Includes 
a patch to read trickle count via both methods to verify that it actually 
works. Both patches on top of the per cpu cleanup patches that I sent 
today too.


x86_64: Make the x86_32 percpu operations usable on x86_64

Calculate the offset relative to gs in order to be able to address
per cpu data using the x86_64 per cpu macros.

The subtraction of __per_cpu_start will make the offset based
from the beginning of the per cpu area. That is where gs points to.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 drivers/char/random.c|2 +-
 include/asm-x86/percpu.h |   29 ++---
 init/main.c  |5 +
 3 files changed, 24 insertions(+), 12 deletions(-)

Index: linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/percpu.h  2007-11-28 
17:50:01.861182410 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h   2007-11-28 
21:22:50.845872906 -0800
@@ -16,7 +16,13 @@
 #define __my_cpu_offset read_pda(data_offset)
 
 #define per_cpu_offset(x) (__per_cpu_offset(x))
+#define __percpu_seg "%%gs:"
+/* Calculate the offset to use with the segment register */
+#define seg_offset(name)   (*SHIFT_PTR(_cpu_var(name), - (unsigned 
long)__per_cpu_start))
 
+#else
+#define __percpu_seg ""
+#define seg_offset(name)   per_cpu_var(name)
 #endif
 #include 
 
@@ -64,16 +70,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
  *PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-
 #define __my_cpu_offset x86_read_percpu(this_cpu_off)
-
 /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
 #define __percpu_seg "%%fs:"
-
 #else  /* !SMP */
-
 #define __percpu_seg ""
-
 #endif /* SMP */
 
 #include 
@@ -81,6 +82,13 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
 /* We can use this directly for local CPU (faster). */
 DECLARE_PER_CPU(unsigned long, this_cpu_off);
 
+#define seg_offset(name)   per_cpu_var(name)
+
+#endif /* __ASSEMBLY__ */
+#endif /* !CONFIG_X86_64 */
+
+#ifndef __ASSEMBLY__
+
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -132,11 +140,10 @@ extern void __bad_percpu_size(void);
}   \
ret__; })
 
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)
+#define x86_read_percpu(var) percpu_from_op("mov", seg_offset(var))
+#define x86_write_percpu(var,val) percpu_to_op("mov", seg_offset(var), val)
+#define x86_add_percpu(var,val) percpu_to_op("add", seg_offset(var), val)
+#define x86_sub_percpu(var,val) percpu_to_op("sub", seg_offset(var), val)
+#define x86_or_percpu(var,val) percpu_to_op("or", seg_offset(var), val)
 #endif /* !__ASSEMBLY__ */
-#endif /* !CONFIG_X86_64 */
 #endif /* _ASM_X86_PERCPU_H_ */
Index: linux-2.6.24-rc3-mm2/drivers/char/random.c
===
--- linux-2.6.24-rc3-mm2.orig/drivers/char/random.c 2007-11-28 
21:20:58.225804398 -0800
+++ linux-2.6.24-rc3-mm2/drivers/char/random.c  2007-11-28 21:28:38.967363573 
-0800
@@ -272,7 +272,7 @@ static int random_write_wakeup_thresh = 
 
 static int trickle_thresh __read_mostly = INPUT_POOL_WORDS * 28;
 
-static DEFINE_PER_CPU(int, trickle_count) = 0;
+DEFINE_PER_CPU(int, trickle_count) = 55;
 
 /*
  * A pool of size .poolwords is stirred with a primitive polynomial
Index: linux-2.6.24-rc3-mm2/init/main.c
===
--- linux-2.6.24-rc3-mm2.orig/init/main.c   2007-11-28 21:10:54.245804225 
-0800
+++ linux-2.6.24-rc3-mm2/init/main.c2007-11-28 21:22:17.769053628 -0800
@@ -504,6 +504,8 @@ void __init __attribute__((weak)) smp_se
 {
 }
 
+DECLARE_PER_CPU(int, trickle_count);
+
 asmlinkage void __init start_kernel(void)
 {
char * command_line;
@@ -645,6 +647,9 @@ asmlinkage void __init start_kernel(void
 
acpi_early_init(); /* before LAPIC and SMP init */
 
+   printk("Reading trickle cound =%lu. Is %lu\n",
+   x86_read_percpu(trickle_count),
+   __raw_get_cpu_var(trickle_count));
/* Do the rest non-__init'ed, we're now alive */
rest_init();
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
Here is the first of two patches for x86_64 that move the pda into the per 
cpu area and then make the x86 percpu macros work for x86_64. This needs 
to be generalized for other arches. The __per_cpu_start offsets can be 
taken care of by the linker. We can also tell the linker to completely 
relocate the percpu area to 0.



X86_64: Declare pda as per cpu data thereby moving it into the cpu area

Declare the pda as a per cpu variable. This will have the effect of moving
the pda data into the cpu area managed by cpu alloc.

The boot_pdas are only needed in head64.c so move the declaration
over there and make it static.

Remove the code that allocates special pda data structures.

The pda is moved to the beginning of the per cpu area. gs is pointing to the
pda. And therefore gs: is now pointing to the per cpu area of the current
processor. A per cpu variable can then be reached at

%gs:[_cpu_ - __per_cpu_start]

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/x86/kernel/head64.c  |6 ++
 arch/x86/kernel/setup64.c |   13 ++---
 arch/x86/kernel/smpboot_64.c  |   16 
 include/asm-generic/vmlinux.lds.h |1 +
 include/asm-x86/pda.h |1 -
 include/linux/percpu.h|4 
 6 files changed, 21 insertions(+), 20 deletions(-)

Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-28 
20:59:13.124188194 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c  2007-11-28 
21:08:50.473347382 -0800
@@ -30,7 +30,9 @@ cpumask_t cpu_initialized __cpuinitdata 
 
 struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
-struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
+DEFINE_PER_CPU_FIRST(struct x8664_pda, pda);
+EXPORT_PER_CPU_SYMBOL(pda);
 
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
@@ -109,10 +111,15 @@ void __init setup_per_cpu_areas(void)
}
if (!ptr)
panic("Cannot allocate cpu data for CPU %d\n", i);
-   cpu_pda(i)->data_offset = ptr - __per_cpu_start;
memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+   /* Relocate the pda */
+   memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda));
+   cpu_pda(i) = (struct x8664_pda *)ptr;
+   cpu_pda(i)->data_offset = ptr - __per_cpu_start;
}
-} 
+   /* Fix up pda for this processor  */
+   pda_init(0);
+}
 
 void pda_init(int cpu)
 { 
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/smpboot_64.c  2007-11-28 
20:59:13.136188167 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c   2007-11-28 
20:59:35.399937395 -0800
@@ -556,22 +556,6 @@ static int __cpuinit do_boot_cpu(int cpu
return -1;
}
 
-   /* Allocate node local memory for AP pdas */
-   if (cpu_pda(cpu) == _cpu_pda[cpu]) {
-   struct x8664_pda *newpda, *pda;
-   int node = cpu_to_node(cpu);
-   pda = cpu_pda(cpu);
-   newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC,
- node);
-   if (newpda) {
-   memcpy(newpda, pda, sizeof (struct x8664_pda));
-   cpu_pda(cpu) = newpda;
-   } else
-   printk(KERN_ERR
-   "Could not allocate node local PDA for CPU %d on node %d\n",
-   cpu, node);
-   }
-
alternatives_smp_switch(1);
 
c_idle.idle = get_idle_for_cpu(cpu);
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/head64.c  2007-11-28 
20:59:13.152187359 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c   2007-11-28 
20:59:35.403937534 -0800
@@ -22,6 +22,12 @@
 #include 
 #include 
 
+/*
+ * Only used before the per cpu areas are setup. The use for the non possible
+ * cpus continues after boot
+ */
+static struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
 static void __init zap_identity_mappings(void)
 {
pgd_t *pgd = pgd_offset_k(0UL);
Index: linux-2.6.24-rc3-mm2/include/asm-x86/pda.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/pda.h 2007-11-28 
20:59:13.164187921 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/pda.h  2007-11-28 20:59:35.403937534 
-0800
@@ -39,7 +39,6 @@ struct x8664_pda {
 } cacheline_aligned_in_smp;
 
 extern struct x8664_pda *_cpu_pda[];
-extern struct x8664_pda boot_cpu_pda[];
 extern void pda_init(int);
 
 #define cpu_pda(i) 

Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses 
> the small model. A load of a 64 bit address would require much more 
> expensive instructions. A load of a 64 bit address is currently avoided 
> through the use of the pda that contains the full 64 bit address in the
> data_offset field. Operations on per cpu data on x86_64 must therefore 
> first load data_offset via gs and then add the per cpu address to this
> offset. Then the per cpu operation is performed on that address.
>   

Hm.  Certainly a non-one-instruction access would be considerably less
useful than one that is, because of preemption issues.

(In general you need to pin yourself to a cpu if you're using percpu
data, but sometimes it doesn't matter.  In particular, the reason I'm
interested in this at all is because Xen puts its interrupt mask flag in
per-cpu data, and a single instruction means that masking interrupts
[=disable preemption] can be done in one instruction with no scope for
preemption in the middle doing something unexpected.)

> In order to avoid this situation through one instruction we need a small 
> 32 bit offset relative to gs. Otherwise we cannot get away from the PDA 
> and the use of data_offset.
>   

Hm, yes, I see.  Dratted large address space.  What's wrong with 4G
anyway? ;)

Anyway, I can see the problem with my thinking about this so far.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> > percpu references are quite frequent already (vm statistics) and will be 
> > more frequent after we have converted the per cpu arrays to per cpu 
> > allocations.
> >   
> 
> Well, I think the point is moot, because x86 will always use 32-bit
> offsets.  Each reference will only be 1 byte bigger than a normal
> variable reference.

Just because i386 is not able to use it does not mean that other arches 
are not. F.e. IA64 can embedd offsets in the actual instruction (but of 
course not 64bit).

x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses 
the small model. A load of a 64 bit address would require much more 
expensive instructions. A load of a 64 bit address is currently avoided 
through the use of the pda that contains the full 64 bit address in the
data_offset field. Operations on per cpu data on x86_64 must therefore 
first load data_offset via gs and then add the per cpu address to this
offset. Then the per cpu operation is performed on that address.

In order to avoid this situation through one instruction we need a small 
32 bit offset relative to gs. Otherwise we cannot get away from the PDA 
and the use of data_offset.

 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> The percpu areas need to be allocated in a NUMA aware fashion. Otherwise 
> you use distant memory for the most performance sensitive areas. The NUMA 
> subsystem must be so far up that these allocations can be performed in the 
> right way. And this means at least you need to know on which node each 
> processor is located. That is what the PDA is currently used for and i386 
> has no other way of doing that. I think we could use an array [NR_CPUS] 
> for this one but we want to avoid these arrays because NR_CPUS may get 
> very big.
>   

Oh, you mean there needs to be some percpu data mechanism operating in
order to do numa-aware allocations, which would be necessary to allocate
the percpu memory itself?

I can see how that would be awkward.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> Don't think it matters either way.  Before percpu is allocated, NUMA
> issues don't matter.  Once they are - by whatever mechanism - you can
> set the segment bases up appropriately.  The fact that you chose to put
> percpu data at address X doesn't affect the percpu mechanism one way or
> the other.

The percpu areas need to be allocated in a NUMA aware fashion. Otherwise 
you use distant memory for the most performance sensitive areas. The NUMA 
subsystem must be so far up that these allocations can be performed in the 
right way. And this means at least you need to know on which node each 
processor is located. That is what the PDA is currently used for and i386 
has no other way of doing that. I think we could use an array [NR_CPUS] 
for this one but we want to avoid these arrays because NR_CPUS may get 
very big.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:
>
>   
>> I don't see the problem.  The way i386 does it inherently supports
>> per-cpu data very early on (it uses the prototype percpu section until
>> the real percpu values are set up).
>> 
>
> Ok so we could do that for x86_64 as well? There is more complicated 
> bootstrap since i386 does not support NUMA aware placement of per cpu 
> areas.
>   

Don't think it matters either way.  Before percpu is allocated, NUMA
issues don't matter.  Once they are - by whatever mechanism - you can
set the segment bases up appropriately.  The fact that you chose to put
percpu data at address X doesn't affect the percpu mechanism one way or
the other.

> percpu references are quite frequent already (vm statistics) and will be 
> more frequent after we have converted the per cpu arrays to per cpu 
> allocations.
>   

Well, I think the point is moot, because x86 will always use 32-bit
offsets.  Each reference will only be 1 byte bigger than a normal
variable reference.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> I don't see the problem.  The way i386 does it inherently supports
> per-cpu data very early on (it uses the prototype percpu section until
> the real percpu values are set up).

Ok so we could do that for x86_64 as well? There is more complicated 
bootstrap since i386 does not support NUMA aware placement of per cpu 
areas.

> > The i386 way of referring to per cpu data is not optimal because it is 
> > always offset by __per_cpu_start. per cpu data offsets need to be relative 
> > to the beginning of the per cpu area. per cpu data is less than 64k so 2 
> > byte offsets would be enough.
> >   
> 
> I don't see that's terribly important.  percpu references aren't all
> that common overall, and - at least on x86 - using a 16-bit offset
> (assuming its possible) would require a prefix anyway, so it would only
> save 1 byte per reference.  But I can't convince gas to generate a
> 16-bit offset anyway.

percpu references are quite frequent already (vm statistics) and will be 
more frequent after we have converted the per cpu arrays to per cpu 
allocations.


> > That way the __per_cpu_offset array and the registers that are used on 
> > various platforms are pointing to the actual data and can be loaded
> > directly into a register and then a load with a small offset to that 
> > register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
> > on ia64 a fixed address stands in for the register.
> 
> The asm used to generate these references is inherently arch-specific
> anyway, so the type and size of offset needed from the per-cpu base
> register to the data itself can be arch-dependent without loss of
> generality.  

Well yes that is already the case and made explicit by the percpu cleanup 
done so far. The offset of a base is used by multiple architectures.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Andi Kleen

> * drop support for stack-protector (does it really help? do people
>   use it?)

AFAIK we only ever had a single classical stack buffer overflow in the kernel.
It certainly doesn't seem to be a common security problem it is solving.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:
>
>  > Yes, I would like to convert x86_64 to match i386's percpu, and drop the
>   
>> pda altogether.  The only thing preventing this is the stack canary, and
>> I'm wondering how much value there is in keeping it, given the
>> disadvantages of having this divergence between 32 and 64 bit.
>> 
>
> I think most of the PDA could be gotten rid of. The problems are
>
> 1. The stack canary
>   

Yes, this is a biggie.  It needs one of:

* fix gcc
* post-process the .s file
* drop support for stack-protector (does it really help? do people
  use it?)


> 2. The PDA is used to store per cpu data before the per cpu areas
>are setup.
>   

I don't see the problem.  The way i386 does it inherently supports
per-cpu data very early on (it uses the prototype percpu section until
the real percpu values are set up).

> The i386 way of referring to per cpu data is not optimal because it is 
> always offset by __per_cpu_start. per cpu data offsets need to be relative 
> to the beginning of the per cpu area. per cpu data is less than 64k so 2 
> byte offsets would be enough.
>   

I don't see that's terribly important.  percpu references aren't all
that common overall, and - at least on x86 - using a 16-bit offset
(assuming its possible) would require a prefix anyway, so it would only
save 1 byte per reference.  But I can't convince gas to generate a
16-bit offset anyway.

> That way the __per_cpu_offset array and the registers that are used on 
> various platforms are pointing to the actual data and can be loaded
> directly into a register and then a load with a small offset to that 
> register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
> on ia64 a fixed address stands in for the register.

The asm used to generate these references is inherently arch-specific
anyway, so the type and size of offset needed from the per-cpu base
register to the data itself can be arch-dependent without loss of
generality.  

I definitely see that small offsets might be useful for other
architectures, but for x86 it doesn't help and makes things more
complex.  The only difference between 32- and 64-bit is whether we
generate an offset from %fs, %gs or nothing (for the UP case).


>  In loops over all per 
> cpu variables this will also simplify the code.
>   

Why's that?

> And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply 
> becomes the adding of the base address in a register to a per cpu offset.
>   

I was never quite sure what that was for.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, Andi Kleen wrote:

> On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote:
> > 1. The stack canary
> 
> You would need to change gcc with a new option and only allow the stack
> checking when the compiler supports the new option. However the problem
> is still how to get a reasonable fixed offset. Or perhaps just change
> gcc to use a linker symbol relative to %gs that could be set to anything?

I still think we should leave the canary as is.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Andi Kleen
On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote:
> 1. The stack canary

You would need to change gcc with a new option and only allow the stack
checking when the compiler supports the new option. However the problem
is still how to get a reasonable fixed offset. Or perhaps just change
gcc to use a linker symbol relative to %gs that could be set to anything?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

 > Yes, I would like to convert x86_64 to match i386's percpu, and drop the
> pda altogether.  The only thing preventing this is the stack canary, and
> I'm wondering how much value there is in keeping it, given the
> disadvantages of having this divergence between 32 and 64 bit.

I think most of the PDA could be gotten rid of. The problems are

1. The stack canary

2. The PDA is used to store per cpu data before the per cpu areas
   are setup.

The i386 way of referring to per cpu data is not optimal because it is 
always offset by __per_cpu_start. per cpu data offsets need to be relative 
to the beginning of the per cpu area. per cpu data is less than 64k so 2 
byte offsets would be enough.

That way the __per_cpu_offset array and the registers that are used on 
various platforms are pointing to the actual data and can be loaded
directly into a register and then a load with a small offset to that 
register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
on ia64 a fixed address stands in for the register. In loops over all per 
cpu variables this will also simplify the code.

And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply 
becomes the adding of the base address in a register to a per cpu offset.






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Rusty Russell wrote:
> On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote:
>   
>> On Wed, 28 Nov 2007, Rusty Russell wrote:
>> 
>>> On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
>>>   
 On Tue, 27 Nov 2007, Rusty Russell wrote:
 
> Have you considered moving x86-64's setup_per_cpu_areas into generic
> code? It's a bit messier because some archs might not have set up
> NUMA stuff yet, but it's logically generic...
>   
 Yes that will happen later. This is just the early cleanup work. I
 plan to generally bring the two x86 arches in line. The pda will be
 folded into the per cpu area and after that its easy to do.
 
>>> Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you
>>> lose the ability to use the stack protection config option.  That's
>>> because it assumes that gs:0x68 (or something) is the stack canary; we
>>> need a YA gcc change to make this gs:__builtin_stack_canary_off (where
>>> gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we
>>> can override it for the kernel.
>>>   
>> This works if you rebase the per cpu area at zero. gs:0x68 is still the
>> stack canary.
>>
>> The i386 method does not work because the segment register does not
>> directly point to the pda.
>> 
>
> But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
> a generic one: it's called the per-cpu data.  Having a completely separate 
> per-cpu structure for x86-64 is a mistake.
>   

Yes, I would like to convert x86_64 to match i386's percpu, and drop the
pda altogether.  The only thing preventing this is the stack canary, and
I'm wondering how much value there is in keeping it, given the
disadvantages of having this divergence between 32 and 64 bit.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, Rusty Russell wrote:

> But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
> a generic one: it's called the per-cpu data.  Having a completely separate 
> per-cpu structure for x86-64 is a mistake.

Yes ultimately the pda can be dissolved. However, the stack canary 
probably has to be kept for backward compatibility.
 
> Setting up gs as the per-cpu offset has lovely properties and avoids YA 
> arch-specific concept; see the i386 code.  Introducing a generic 
> read_percpu()/write_percpu() would even make it optimal.

The code becomes much simpler if gs would point to the beginning of the 
per cpu area and if the __per_cpu_offset[i] would do the same. No weird 
__per_cpu_start offsetting anymore.

The offsets are smaller if they are relative to the per cpu areas which 
will make more compact instructions possible.

The generic write/readpercpu functionality introduced by the cpu_alloc 
patchset works best with offsets relative to an arch dependent 
register. All per cpu data (pda, percpu and allocpercpu) is handles as an 
offset relative to the start of the per cpu data.

If the current offset by __per_cpu_start is kept then a per cpu allocator 
may have to dish out addresses that go beyond __per_cpu_end.

I think dealing with a per cpu variable as if it would be an offset 
relative to a base is natural for the typical addressing of cpus based on 
an offset relative to some register.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Rusty Russell
On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote:
> On Wed, 28 Nov 2007, Rusty Russell wrote:
> > On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
> > > On Tue, 27 Nov 2007, Rusty Russell wrote:
> > > > Have you considered moving x86-64's setup_per_cpu_areas into generic
> > > > code? It's a bit messier because some archs might not have set up
> > > > NUMA stuff yet, but it's logically generic...
> > >
> > > Yes that will happen later. This is just the early cleanup work. I
> > > plan to generally bring the two x86 arches in line. The pda will be
> > > folded into the per cpu area and after that its easy to do.
> >
> > Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you
> > lose the ability to use the stack protection config option.  That's
> > because it assumes that gs:0x68 (or something) is the stack canary; we
> > need a YA gcc change to make this gs:__builtin_stack_canary_off (where
> > gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we
> > can override it for the kernel.
>
> This works if you rebase the per cpu area at zero. gs:0x68 is still the
> stack canary.
>
> The i386 method does not work because the segment register does not
> directly point to the pda.

But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
a generic one: it's called the per-cpu data.  Having a completely separate 
per-cpu structure for x86-64 is a mistake.

Setting up gs as the per-cpu offset has lovely properties and avoids YA 
arch-specific concept; see the i386 code.  Introducing a generic 
read_percpu()/write_percpu() would even make it optimal.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Rusty Russell wrote:

> On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
> > On Tue, 27 Nov 2007, Rusty Russell wrote:
> > > Have you considered moving x86-64's setup_per_cpu_areas into generic
> > > code? It's a bit messier because some archs might not have set up NUMA
> > > stuff yet, but it's logically generic...
> >
> > Yes that will happen later. This is just the early cleanup work. I
> > plan to generally bring the two x86 arches in line. The pda will be
> > folded into the per cpu area and after that its easy to do.
> 
> Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose 
> the ability to use the stack protection config option.  That's because it 
> assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc 
> change to make this gs:__builtin_stack_canary_off (where gcc can emit 
> __builtin_stack_canary_off as a weak absolute symbol, so we can override it 
> for the kernel.

This works if you rebase the per cpu area at zero. gs:0x68 is still the 
stack canary.

The i386 method does not work because the segment register does not 
directly point to the pda.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Rusty Russell wrote:

 On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
  On Tue, 27 Nov 2007, Rusty Russell wrote:
   Have you considered moving x86-64's setup_per_cpu_areas into generic
   code? It's a bit messier because some archs might not have set up NUMA
   stuff yet, but it's logically generic...
 
  Yes that will happen later. This is just the early cleanup work. I
  plan to generally bring the two x86 arches in line. The pda will be
  folded into the per cpu area and after that its easy to do.
 
 Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose 
 the ability to use the stack protection config option.  That's because it 
 assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc 
 change to make this gs:__builtin_stack_canary_off (where gcc can emit 
 __builtin_stack_canary_off as a weak absolute symbol, so we can override it 
 for the kernel.

This works if you rebase the per cpu area at zero. gs:0x68 is still the 
stack canary.

The i386 method does not work because the segment register does not 
directly point to the pda.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Rusty Russell
On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote:
 On Wed, 28 Nov 2007, Rusty Russell wrote:
  On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
   On Tue, 27 Nov 2007, Rusty Russell wrote:
Have you considered moving x86-64's setup_per_cpu_areas into generic
code? It's a bit messier because some archs might not have set up
NUMA stuff yet, but it's logically generic...
  
   Yes that will happen later. This is just the early cleanup work. I
   plan to generally bring the two x86 arches in line. The pda will be
   folded into the per cpu area and after that its easy to do.
 
  Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you
  lose the ability to use the stack protection config option.  That's
  because it assumes that gs:0x68 (or something) is the stack canary; we
  need a YA gcc change to make this gs:__builtin_stack_canary_off (where
  gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we
  can override it for the kernel.

 This works if you rebase the per cpu area at zero. gs:0x68 is still the
 stack canary.

 The i386 method does not work because the segment register does not
 directly point to the pda.

But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
a generic one: it's called the per-cpu data.  Having a completely separate 
per-cpu structure for x86-64 is a mistake.

Setting up gs as the per-cpu offset has lovely properties and avoids YA 
arch-specific concept; see the i386 code.  Introducing a generic 
read_percpu()/write_percpu() would even make it optimal.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, Rusty Russell wrote:

 But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
 a generic one: it's called the per-cpu data.  Having a completely separate 
 per-cpu structure for x86-64 is a mistake.

Yes ultimately the pda can be dissolved. However, the stack canary 
probably has to be kept for backward compatibility.
 
 Setting up gs as the per-cpu offset has lovely properties and avoids YA 
 arch-specific concept; see the i386 code.  Introducing a generic 
 read_percpu()/write_percpu() would even make it optimal.

The code becomes much simpler if gs would point to the beginning of the 
per cpu area and if the __per_cpu_offset[i] would do the same. No weird 
__per_cpu_start offsetting anymore.

The offsets are smaller if they are relative to the per cpu areas which 
will make more compact instructions possible.

The generic write/readpercpu functionality introduced by the cpu_alloc 
patchset works best with offsets relative to an arch dependent 
register. All per cpu data (pda, percpu and allocpercpu) is handles as an 
offset relative to the start of the per cpu data.

If the current offset by __per_cpu_start is kept then a per cpu allocator 
may have to dish out addresses that go beyond __per_cpu_end.

I think dealing with a per cpu variable as if it would be an offset 
relative to a base is natural for the typical addressing of cpus based on 
an offset relative to some register.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Rusty Russell wrote:
 On Thursday 29 November 2007 05:51:29 Christoph Lameter wrote:
   
 On Wed, 28 Nov 2007, Rusty Russell wrote:
 
 On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
   
 On Tue, 27 Nov 2007, Rusty Russell wrote:
 
 Have you considered moving x86-64's setup_per_cpu_areas into generic
 code? It's a bit messier because some archs might not have set up
 NUMA stuff yet, but it's logically generic...
   
 Yes that will happen later. This is just the early cleanup work. I
 plan to generally bring the two x86 arches in line. The pda will be
 folded into the per cpu area and after that its easy to do.
 
 Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you
 lose the ability to use the stack protection config option.  That's
 because it assumes that gs:0x68 (or something) is the stack canary; we
 need a YA gcc change to make this gs:__builtin_stack_canary_off (where
 gcc can emit __builtin_stack_canary_off as a weak absolute symbol, so we
 can override it for the kernel.
   
 This works if you rebase the per cpu area at zero. gs:0x68 is still the
 stack canary.

 The i386 method does not work because the segment register does not
 directly point to the pda.
 

 But the PDA itself is silly (Jeremy ported it to i386 and I balked).  We have 
 a generic one: it's called the per-cpu data.  Having a completely separate 
 per-cpu structure for x86-64 is a mistake.
   

Yes, I would like to convert x86_64 to match i386's percpu, and drop the
pda altogether.  The only thing preventing this is the stack canary, and
I'm wondering how much value there is in keeping it, given the
disadvantages of having this divergence between 32 and 64 bit.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

  Yes, I would like to convert x86_64 to match i386's percpu, and drop the
 pda altogether.  The only thing preventing this is the stack canary, and
 I'm wondering how much value there is in keeping it, given the
 disadvantages of having this divergence between 32 and 64 bit.

I think most of the PDA could be gotten rid of. The problems are

1. The stack canary

2. The PDA is used to store per cpu data before the per cpu areas
   are setup.

The i386 way of referring to per cpu data is not optimal because it is 
always offset by __per_cpu_start. per cpu data offsets need to be relative 
to the beginning of the per cpu area. per cpu data is less than 64k so 2 
byte offsets would be enough.

That way the __per_cpu_offset array and the registers that are used on 
various platforms are pointing to the actual data and can be loaded
directly into a register and then a load with a small offset to that 
register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
on ia64 a fixed address stands in for the register. In loops over all per 
cpu variables this will also simplify the code.

And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply 
becomes the adding of the base address in a register to a per cpu offset.






-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, Andi Kleen wrote:

 On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote:
  1. The stack canary
 
 You would need to change gcc with a new option and only allow the stack
 checking when the compiler supports the new option. However the problem
 is still how to get a reasonable fixed offset. Or perhaps just change
 gcc to use a linker symbol relative to %gs that could be set to anything?

I still think we should leave the canary as is.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
 On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

   Yes, I would like to convert x86_64 to match i386's percpu, and drop the
   
 pda altogether.  The only thing preventing this is the stack canary, and
 I'm wondering how much value there is in keeping it, given the
 disadvantages of having this divergence between 32 and 64 bit.
 

 I think most of the PDA could be gotten rid of. The problems are

 1. The stack canary
   

Yes, this is a biggie.  It needs one of:

* fix gcc
* post-process the .s file
* drop support for stack-protector (does it really help? do people
  use it?)


 2. The PDA is used to store per cpu data before the per cpu areas
are setup.
   

I don't see the problem.  The way i386 does it inherently supports
per-cpu data very early on (it uses the prototype percpu section until
the real percpu values are set up).

 The i386 way of referring to per cpu data is not optimal because it is 
 always offset by __per_cpu_start. per cpu data offsets need to be relative 
 to the beginning of the per cpu area. per cpu data is less than 64k so 2 
 byte offsets would be enough.
   

I don't see that's terribly important.  percpu references aren't all
that common overall, and - at least on x86 - using a 16-bit offset
(assuming its possible) would require a prefix anyway, so it would only
save 1 byte per reference.  But I can't convince gas to generate a
16-bit offset anyway.

 That way the __per_cpu_offset array and the registers that are used on 
 various platforms are pointing to the actual data and can be loaded
 directly into a register and then a load with a small offset to that 
 register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
 on ia64 a fixed address stands in for the register.

The asm used to generate these references is inherently arch-specific
anyway, so the type and size of offset needed from the per-cpu base
register to the data itself can be arch-dependent without loss of
generality.  

I definitely see that small offsets might be useful for other
architectures, but for x86 it doesn't help and makes things more
complex.  The only difference between 32- and 64-bit is whether we
generate an offset from %fs, %gs or nothing (for the UP case).


  In loops over all per 
 cpu variables this will also simplify the code.
   

Why's that?

 And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply 
 becomes the adding of the base address in a register to a per cpu offset.
   

I was never quite sure what that was for.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Andi Kleen
On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote:
 1. The stack canary

You would need to change gcc with a new option and only allow the stack
checking when the compiler supports the new option. However the problem
is still how to get a reasonable fixed offset. Or perhaps just change
gcc to use a linker symbol relative to %gs that could be set to anything?

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Andi Kleen

 * drop support for stack-protector (does it really help? do people
   use it?)

AFAIK we only ever had a single classical stack buffer overflow in the kernel.
It certainly doesn't seem to be a common security problem it is solving.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
 On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

   
 I don't see the problem.  The way i386 does it inherently supports
 per-cpu data very early on (it uses the prototype percpu section until
 the real percpu values are set up).
 

 Ok so we could do that for x86_64 as well? There is more complicated 
 bootstrap since i386 does not support NUMA aware placement of per cpu 
 areas.
   

Don't think it matters either way.  Before percpu is allocated, NUMA
issues don't matter.  Once they are - by whatever mechanism - you can
set the segment bases up appropriately.  The fact that you chose to put
percpu data at address X doesn't affect the percpu mechanism one way or
the other.

 percpu references are quite frequent already (vm statistics) and will be 
 more frequent after we have converted the per cpu arrays to per cpu 
 allocations.
   

Well, I think the point is moot, because x86 will always use 32-bit
offsets.  Each reference will only be 1 byte bigger than a normal
variable reference.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

 Don't think it matters either way.  Before percpu is allocated, NUMA
 issues don't matter.  Once they are - by whatever mechanism - you can
 set the segment bases up appropriately.  The fact that you chose to put
 percpu data at address X doesn't affect the percpu mechanism one way or
 the other.

The percpu areas need to be allocated in a NUMA aware fashion. Otherwise 
you use distant memory for the most performance sensitive areas. The NUMA 
subsystem must be so far up that these allocations can be performed in the 
right way. And this means at least you need to know on which node each 
processor is located. That is what the PDA is currently used for and i386 
has no other way of doing that. I think we could use an array [NR_CPUS] 
for this one but we want to avoid these arrays because NR_CPUS may get 
very big.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

 I don't see the problem.  The way i386 does it inherently supports
 per-cpu data very early on (it uses the prototype percpu section until
 the real percpu values are set up).

Ok so we could do that for x86_64 as well? There is more complicated 
bootstrap since i386 does not support NUMA aware placement of per cpu 
areas.

  The i386 way of referring to per cpu data is not optimal because it is 
  always offset by __per_cpu_start. per cpu data offsets need to be relative 
  to the beginning of the per cpu area. per cpu data is less than 64k so 2 
  byte offsets would be enough.

 
 I don't see that's terribly important.  percpu references aren't all
 that common overall, and - at least on x86 - using a 16-bit offset
 (assuming its possible) would require a prefix anyway, so it would only
 save 1 byte per reference.  But I can't convince gas to generate a
 16-bit offset anyway.

percpu references are quite frequent already (vm statistics) and will be 
more frequent after we have converted the per cpu arrays to per cpu 
allocations.


  That way the __per_cpu_offset array and the registers that are used on 
  various platforms are pointing to the actual data and can be loaded
  directly into a register and then a load with a small offset to that 
  register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
  on ia64 a fixed address stands in for the register.
 
 The asm used to generate these references is inherently arch-specific
 anyway, so the type and size of offset needed from the per-cpu base
 register to the data itself can be arch-dependent without loss of
 generality.  

Well yes that is already the case and made explicit by the percpu cleanup 
done so far. The offset of a base is used by multiple architectures.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
 The percpu areas need to be allocated in a NUMA aware fashion. Otherwise 
 you use distant memory for the most performance sensitive areas. The NUMA 
 subsystem must be so far up that these allocations can be performed in the 
 right way. And this means at least you need to know on which node each 
 processor is located. That is what the PDA is currently used for and i386 
 has no other way of doing that. I think we could use an array [NR_CPUS] 
 for this one but we want to avoid these arrays because NR_CPUS may get 
 very big.
   

Oh, you mean there needs to be some percpu data mechanism operating in
order to do numa-aware allocations, which would be necessary to allocate
the percpu memory itself?

I can see how that would be awkward.

J

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

  percpu references are quite frequent already (vm statistics) and will be 
  more frequent after we have converted the per cpu arrays to per cpu 
  allocations.

 
 Well, I think the point is moot, because x86 will always use 32-bit
 offsets.  Each reference will only be 1 byte bigger than a normal
 variable reference.

Just because i386 is not able to use it does not mean that other arches 
are not. F.e. IA64 can embedd offsets in the actual instruction (but of 
course not 64bit).

x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses 
the small model. A load of a 64 bit address would require much more 
expensive instructions. A load of a 64 bit address is currently avoided 
through the use of the pda that contains the full 64 bit address in the
data_offset field. Operations on per cpu data on x86_64 must therefore 
first load data_offset via gs and then add the per cpu address to this
offset. Then the per cpu operation is performed on that address.

In order to avoid this situation through one instruction we need a small 
32 bit offset relative to gs. Otherwise we cannot get away from the PDA 
and the use of data_offset.

 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
 x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses 
 the small model. A load of a 64 bit address would require much more 
 expensive instructions. A load of a 64 bit address is currently avoided 
 through the use of the pda that contains the full 64 bit address in the
 data_offset field. Operations on per cpu data on x86_64 must therefore 
 first load data_offset via gs and then add the per cpu address to this
 offset. Then the per cpu operation is performed on that address.
   

Hm.  Certainly a non-one-instruction access would be considerably less
useful than one that is, because of preemption issues.

(In general you need to pin yourself to a cpu if you're using percpu
data, but sometimes it doesn't matter.  In particular, the reason I'm
interested in this at all is because Xen puts its interrupt mask flag in
per-cpu data, and a single instruction means that masking interrupts
[=disable preemption] can be done in one instruction with no scope for
preemption in the middle doing something unexpected.)

 In order to avoid this situation through one instruction we need a small 
 32 bit offset relative to gs. Otherwise we cannot get away from the PDA 
 and the use of data_offset.
   

Hm, yes, I see.  Dratted large address space.  What's wrong with 4G
anyway? ;)

Anyway, I can see the problem with my thinking about this so far.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
Second portion. Add a new seg_offset macro to calculate the offset. This 
can be avoided if the linker relocates the per cpu area to zero. Includes 
a patch to read trickle count via both methods to verify that it actually 
works. Both patches on top of the per cpu cleanup patches that I sent 
today too.


x86_64: Make the x86_32 percpu operations usable on x86_64

Calculate the offset relative to gs in order to be able to address
per cpu data using the x86_64 per cpu macros.

The subtraction of __per_cpu_start will make the offset based
from the beginning of the per cpu area. That is where gs points to.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 drivers/char/random.c|2 +-
 include/asm-x86/percpu.h |   29 ++---
 init/main.c  |5 +
 3 files changed, 24 insertions(+), 12 deletions(-)

Index: linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/percpu.h  2007-11-28 
17:50:01.861182410 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h   2007-11-28 
21:22:50.845872906 -0800
@@ -16,7 +16,13 @@
 #define __my_cpu_offset read_pda(data_offset)
 
 #define per_cpu_offset(x) (__per_cpu_offset(x))
+#define __percpu_seg %%gs:
+/* Calculate the offset to use with the segment register */
+#define seg_offset(name)   (*SHIFT_PTR(per_cpu_var(name), - (unsigned 
long)__per_cpu_start))
 
+#else
+#define __percpu_seg 
+#define seg_offset(name)   per_cpu_var(name)
 #endif
 #include asm-generic/percpu.h
 
@@ -64,16 +70,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
  *PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-
 #define __my_cpu_offset x86_read_percpu(this_cpu_off)
-
 /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
 #define __percpu_seg %%fs:
-
 #else  /* !SMP */
-
 #define __percpu_seg 
-
 #endif /* SMP */
 
 #include asm-generic/percpu.h
@@ -81,6 +82,13 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
 /* We can use this directly for local CPU (faster). */
 DECLARE_PER_CPU(unsigned long, this_cpu_off);
 
+#define seg_offset(name)   per_cpu_var(name)
+
+#endif /* __ASSEMBLY__ */
+#endif /* !CONFIG_X86_64 */
+
+#ifndef __ASSEMBLY__
+
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -132,11 +140,10 @@ extern void __bad_percpu_size(void);
}   \
ret__; })
 
-#define x86_read_percpu(var) percpu_from_op(mov, per_cpu__##var)
-#define x86_write_percpu(var,val) percpu_to_op(mov, per_cpu__##var, val)
-#define x86_add_percpu(var,val) percpu_to_op(add, per_cpu__##var, val)
-#define x86_sub_percpu(var,val) percpu_to_op(sub, per_cpu__##var, val)
-#define x86_or_percpu(var,val) percpu_to_op(or, per_cpu__##var, val)
+#define x86_read_percpu(var) percpu_from_op(mov, seg_offset(var))
+#define x86_write_percpu(var,val) percpu_to_op(mov, seg_offset(var), val)
+#define x86_add_percpu(var,val) percpu_to_op(add, seg_offset(var), val)
+#define x86_sub_percpu(var,val) percpu_to_op(sub, seg_offset(var), val)
+#define x86_or_percpu(var,val) percpu_to_op(or, seg_offset(var), val)
 #endif /* !__ASSEMBLY__ */
-#endif /* !CONFIG_X86_64 */
 #endif /* _ASM_X86_PERCPU_H_ */
Index: linux-2.6.24-rc3-mm2/drivers/char/random.c
===
--- linux-2.6.24-rc3-mm2.orig/drivers/char/random.c 2007-11-28 
21:20:58.225804398 -0800
+++ linux-2.6.24-rc3-mm2/drivers/char/random.c  2007-11-28 21:28:38.967363573 
-0800
@@ -272,7 +272,7 @@ static int random_write_wakeup_thresh = 
 
 static int trickle_thresh __read_mostly = INPUT_POOL_WORDS * 28;
 
-static DEFINE_PER_CPU(int, trickle_count) = 0;
+DEFINE_PER_CPU(int, trickle_count) = 55;
 
 /*
  * A pool of size .poolwords is stirred with a primitive polynomial
Index: linux-2.6.24-rc3-mm2/init/main.c
===
--- linux-2.6.24-rc3-mm2.orig/init/main.c   2007-11-28 21:10:54.245804225 
-0800
+++ linux-2.6.24-rc3-mm2/init/main.c2007-11-28 21:22:17.769053628 -0800
@@ -504,6 +504,8 @@ void __init __attribute__((weak)) smp_se
 {
 }
 
+DECLARE_PER_CPU(int, trickle_count);
+
 asmlinkage void __init start_kernel(void)
 {
char * command_line;
@@ -645,6 +647,9 @@ asmlinkage void __init start_kernel(void
 
acpi_early_init(); /* before LAPIC and SMP init */
 
+   printk(Reading trickle cound =%lu. Is %lu\n,
+   x86_read_percpu(trickle_count),
+   __raw_get_cpu_var(trickle_count));
/* Do the rest non-__init'ed, we're now alive */
rest_init();
 }

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
Here is the first of two patches for x86_64 that move the pda into the per 
cpu area and then make the x86 percpu macros work for x86_64. This needs 
to be generalized for other arches. The __per_cpu_start offsets can be 
taken care of by the linker. We can also tell the linker to completely 
relocate the percpu area to 0.



X86_64: Declare pda as per cpu data thereby moving it into the cpu area

Declare the pda as a per cpu variable. This will have the effect of moving
the pda data into the cpu area managed by cpu alloc.

The boot_pdas are only needed in head64.c so move the declaration
over there and make it static.

Remove the code that allocates special pda data structures.

The pda is moved to the beginning of the per cpu area. gs is pointing to the
pda. And therefore gs: is now pointing to the per cpu area of the current
processor. A per cpu variable can then be reached at

%gs:[per_cpu_ - __per_cpu_start]

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 arch/x86/kernel/head64.c  |6 ++
 arch/x86/kernel/setup64.c |   13 ++---
 arch/x86/kernel/smpboot_64.c  |   16 
 include/asm-generic/vmlinux.lds.h |1 +
 include/asm-x86/pda.h |1 -
 include/linux/percpu.h|4 
 6 files changed, 21 insertions(+), 20 deletions(-)

Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-28 
20:59:13.124188194 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c  2007-11-28 
21:08:50.473347382 -0800
@@ -30,7 +30,9 @@ cpumask_t cpu_initialized __cpuinitdata 
 
 struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
-struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
+DEFINE_PER_CPU_FIRST(struct x8664_pda, pda);
+EXPORT_PER_CPU_SYMBOL(pda);
 
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
@@ -109,10 +111,15 @@ void __init setup_per_cpu_areas(void)
}
if (!ptr)
panic(Cannot allocate cpu data for CPU %d\n, i);
-   cpu_pda(i)-data_offset = ptr - __per_cpu_start;
memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+   /* Relocate the pda */
+   memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda));
+   cpu_pda(i) = (struct x8664_pda *)ptr;
+   cpu_pda(i)-data_offset = ptr - __per_cpu_start;
}
-} 
+   /* Fix up pda for this processor  */
+   pda_init(0);
+}
 
 void pda_init(int cpu)
 { 
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/smpboot_64.c  2007-11-28 
20:59:13.136188167 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c   2007-11-28 
20:59:35.399937395 -0800
@@ -556,22 +556,6 @@ static int __cpuinit do_boot_cpu(int cpu
return -1;
}
 
-   /* Allocate node local memory for AP pdas */
-   if (cpu_pda(cpu) == boot_cpu_pda[cpu]) {
-   struct x8664_pda *newpda, *pda;
-   int node = cpu_to_node(cpu);
-   pda = cpu_pda(cpu);
-   newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC,
- node);
-   if (newpda) {
-   memcpy(newpda, pda, sizeof (struct x8664_pda));
-   cpu_pda(cpu) = newpda;
-   } else
-   printk(KERN_ERR
-   Could not allocate node local PDA for CPU %d on node %d\n,
-   cpu, node);
-   }
-
alternatives_smp_switch(1);
 
c_idle.idle = get_idle_for_cpu(cpu);
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/head64.c  2007-11-28 
20:59:13.152187359 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c   2007-11-28 
20:59:35.403937534 -0800
@@ -22,6 +22,12 @@
 #include asm/sections.h
 #include asm/kdebug.h
 
+/*
+ * Only used before the per cpu areas are setup. The use for the non possible
+ * cpus continues after boot
+ */
+static struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
 static void __init zap_identity_mappings(void)
 {
pgd_t *pgd = pgd_offset_k(0UL);
Index: linux-2.6.24-rc3-mm2/include/asm-x86/pda.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/pda.h 2007-11-28 
20:59:13.164187921 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/pda.h  2007-11-28 20:59:35.403937534 
-0800
@@ -39,7 +39,6 @@ struct x8664_pda {
 } cacheline_aligned_in_smp;
 
 extern struct x8664_pda *_cpu_pda[];
-extern struct x8664_pda boot_cpu_pda[];
 extern void pda_init(int);
 
 

Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Rusty Russell
On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
> On Tue, 27 Nov 2007, Rusty Russell wrote:
> > Have you considered moving x86-64's setup_per_cpu_areas into generic
> > code? It's a bit messier because some archs might not have set up NUMA
> > stuff yet, but it's logically generic...
>
> Yes that will happen later. This is just the early cleanup work. I
> plan to generally bring the two x86 arches in line. The pda will be
> folded into the per cpu area and after that its easy to do.

Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose 
the ability to use the stack protection config option.  That's because it 
assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc 
change to make this gs:__builtin_stack_canary_off (where gcc can emit 
__builtin_stack_canary_off as a weak absolute symbol, so we can override it 
for the kernel.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Randy Dunlap

Christoph Lameter wrote:

On Tue, 27 Nov 2007, Randy Dunlap wrote:


+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default y

def_bool y
  is the preferred form for those 2-liners above...



+
 config ARCH_NO_VIRT_TO_BUS
def_bool y
 


Ok. Changed.

x86 should use

config ARCH_SETS_UP_PER_CPU_AREA
def_bool X86_64

?


Yes, you can do
def_bool 
as well to make the new symbol be variable instead of constant.


--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Christoph Lameter
On Tue, 27 Nov 2007, Randy Dunlap wrote:

> > +config ARCH_SETS_UP_PER_CPU_AREA
> > +   bool
> > +   default y
> 
>   def_bool y
>   is the preferred form for those 2-liners above...
> 
> 
> > +
> >  config ARCH_NO_VIRT_TO_BUS
> > def_bool y
> >  

Ok. Changed.

x86 should use

config ARCH_SETS_UP_PER_CPU_AREA
def_bool X86_64

?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Randy Dunlap
On Mon, 26 Nov 2007 16:14:12 -0800 Christoph Lameter wrote:

> The use of the __GENERIC_PERCPU is a bit problematic since arches
> may want to run their own percpu setup while using the generic
> percpu definitions. Replace it through a kconfig variable.
> 
> Cc: Rusty Russell <[EMAIL PROTECTED]>
> Cc: Andi Kleen <[EMAIL PROTECTED]>
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> ---
> 
> Index: linux-2.6/arch/ia64/Kconfig
> ===
> --- linux-2.6.orig/arch/ia64/Kconfig  2007-11-26 15:38:56.415112360 -0800
> +++ linux-2.6/arch/ia64/Kconfig   2007-11-26 15:40:10.425862722 -0800
> @@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL
>   bool
>   default y
>  
> +config ARCH_SETS_UP_PER_CPU_AREA
> + bool
> + default y
> +
>  config DMI
>   bool
>   default y

> Index: linux-2.6/arch/sparc64/Kconfig
> ===
> --- linux-2.6.orig/arch/sparc64/Kconfig   2007-11-26 15:38:56.447111936 
> -0800
> +++ linux-2.6/arch/sparc64/Kconfig2007-11-26 15:40:10.425862722 -0800
> @@ -66,6 +66,10 @@ config AUDIT_ARCH
>   bool
>   default y
>  
> +config ARCH_SETS_UP_PER_CPU_AREA
> + bool
> + default y

def_bool y
  is the preferred form for those 2-liners above...


> +
>  config ARCH_NO_VIRT_TO_BUS
>   def_bool y
>  


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Christoph Lameter
On Tue, 27 Nov 2007, Rusty Russell wrote:

> Have you considered moving x86-64's setup_per_cpu_areas into generic code?  
> It's a bit messier because some archs might not have set up NUMA stuff yet, 
> but it's logically generic...

Yes that will happen later. This is just the early cleanup work. I 
plan to generally bring the two x86 arches in line. The pda will be 
folded into the per cpu area and after that its easy to do.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Christoph Lameter
On Tue, 27 Nov 2007, Rusty Russell wrote:

 Have you considered moving x86-64's setup_per_cpu_areas into generic code?  
 It's a bit messier because some archs might not have set up NUMA stuff yet, 
 but it's logically generic...

Yes that will happen later. This is just the early cleanup work. I 
plan to generally bring the two x86 arches in line. The pda will be 
folded into the per cpu area and after that its easy to do.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Randy Dunlap
On Mon, 26 Nov 2007 16:14:12 -0800 Christoph Lameter wrote:

 The use of the __GENERIC_PERCPU is a bit problematic since arches
 may want to run their own percpu setup while using the generic
 percpu definitions. Replace it through a kconfig variable.
 
 Cc: Rusty Russell [EMAIL PROTECTED]
 Cc: Andi Kleen [EMAIL PROTECTED]
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 ---
 
 Index: linux-2.6/arch/ia64/Kconfig
 ===
 --- linux-2.6.orig/arch/ia64/Kconfig  2007-11-26 15:38:56.415112360 -0800
 +++ linux-2.6/arch/ia64/Kconfig   2007-11-26 15:40:10.425862722 -0800
 @@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL
   bool
   default y
  
 +config ARCH_SETS_UP_PER_CPU_AREA
 + bool
 + default y
 +
  config DMI
   bool
   default y

 Index: linux-2.6/arch/sparc64/Kconfig
 ===
 --- linux-2.6.orig/arch/sparc64/Kconfig   2007-11-26 15:38:56.447111936 
 -0800
 +++ linux-2.6/arch/sparc64/Kconfig2007-11-26 15:40:10.425862722 -0800
 @@ -66,6 +66,10 @@ config AUDIT_ARCH
   bool
   default y
  
 +config ARCH_SETS_UP_PER_CPU_AREA
 + bool
 + default y

def_bool y
  is the preferred form for those 2-liners above...


 +
  config ARCH_NO_VIRT_TO_BUS
   def_bool y
  


---
~Randy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Christoph Lameter
On Tue, 27 Nov 2007, Randy Dunlap wrote:

  +config ARCH_SETS_UP_PER_CPU_AREA
  +   bool
  +   default y
 
   def_bool y
   is the preferred form for those 2-liners above...
 
 
  +
   config ARCH_NO_VIRT_TO_BUS
  def_bool y
   

Ok. Changed.

x86 should use

config ARCH_SETS_UP_PER_CPU_AREA
def_bool X86_64

?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Randy Dunlap

Christoph Lameter wrote:

On Tue, 27 Nov 2007, Randy Dunlap wrote:


+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default y

def_bool y
  is the preferred form for those 2-liners above...



+
 config ARCH_NO_VIRT_TO_BUS
def_bool y
 


Ok. Changed.

x86 should use

config ARCH_SETS_UP_PER_CPU_AREA
def_bool X86_64

?


Yes, you can do
def_bool config symbol
as well to make the new symbol be variable instead of constant.


--
~Randy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-27 Thread Rusty Russell
On Wednesday 28 November 2007 05:14:47 Christoph Lameter wrote:
 On Tue, 27 Nov 2007, Rusty Russell wrote:
  Have you considered moving x86-64's setup_per_cpu_areas into generic
  code? It's a bit messier because some archs might not have set up NUMA
  stuff yet, but it's logically generic...

 Yes that will happen later. This is just the early cleanup work. I
 plan to generally bring the two x86 arches in line. The pda will be
 folded into the per cpu area and after that its easy to do.

Unfortunately, we tried to get rid of the x86-64 pda (like i386) but you lose 
the ability to use the stack protection config option.  That's because it 
assumes that gs:0x68 (or something) is the stack canary; we need a YA gcc 
change to make this gs:__builtin_stack_canary_off (where gcc can emit 
__builtin_stack_canary_off as a weak absolute symbol, so we can override it 
for the kernel.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-26 Thread Rusty Russell
On Tuesday 27 November 2007 11:14:12 Christoph Lameter wrote:
> The use of the __GENERIC_PERCPU is a bit problematic since arches
> may want to run their own percpu setup while using the generic
> percpu definitions. Replace it through a kconfig variable.

Thanks for this Christoph!

These patches are great: the early experiments are obviously over, and so this 
consolidation is overdue.

Have you considered moving x86-64's setup_per_cpu_areas into generic code?  
It's a bit messier because some archs might not have set up NUMA stuff yet, 
but it's logically generic...

Thanks!
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-26 Thread Christoph Lameter
The use of the __GENERIC_PERCPU is a bit problematic since arches
may want to run their own percpu setup while using the generic
percpu definitions. Replace it through a kconfig variable.

Cc: Rusty Russell <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/ia64/Kconfig|4 
 arch/powerpc/Kconfig |4 
 arch/sparc64/Kconfig |4 
 arch/x86/Kconfig |6 +++---
 include/asm-generic/percpu.h |1 -
 include/asm-s390/percpu.h|2 --
 include/asm-x86/percpu_32.h  |2 --
 init/main.c  |4 ++--
 8 files changed, 17 insertions(+), 10 deletions(-)

Index: linux-2.6/init/main.c
===
--- linux-2.6.orig/init/main.c  2007-11-26 15:38:56.407111768 -0800
+++ linux-2.6/init/main.c   2007-11-26 15:40:10.425862722 -0800
@@ -363,7 +363,7 @@ static inline void smp_prepare_cpus(unsi
 
 #else
 
-#ifdef __GENERIC_PER_CPU
+#ifndef CONFIG_ARCH_SETS_UP_PER_CPU_AREA
 unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
 
 EXPORT_SYMBOL(__per_cpu_offset);
@@ -384,7 +384,7 @@ static void __init setup_per_cpu_areas(v
ptr += size;
}
 }
-#endif /* !__GENERIC_PER_CPU */
+#endif /* CONFIG_ARCH_SETS_UP_CPU_AREA */
 
 /* Called by boot processor to activate the rest. */
 static void __init smp_init(void)
Index: linux-2.6/arch/ia64/Kconfig
===
--- linux-2.6.orig/arch/ia64/Kconfig2007-11-26 15:38:56.415112360 -0800
+++ linux-2.6/arch/ia64/Kconfig 2007-11-26 15:40:10.425862722 -0800
@@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL
bool
default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default y
+
 config DMI
bool
default y
Index: linux-2.6/arch/powerpc/Kconfig
===
--- linux-2.6.orig/arch/powerpc/Kconfig 2007-11-26 15:38:56.427111914 -0800
+++ linux-2.6/arch/powerpc/Kconfig  2007-11-26 15:40:10.425862722 -0800
@@ -42,6 +42,10 @@ config GENERIC_HARDIRQS
bool
default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default PPC64
+
 config IRQ_PER_CPU
bool
default y
Index: linux-2.6/arch/sparc64/Kconfig
===
--- linux-2.6.orig/arch/sparc64/Kconfig 2007-11-26 15:38:56.447111936 -0800
+++ linux-2.6/arch/sparc64/Kconfig  2007-11-26 15:40:10.425862722 -0800
@@ -66,6 +66,10 @@ config AUDIT_ARCH
bool
default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default y
+
 config ARCH_NO_VIRT_TO_BUS
def_bool y
 
Index: linux-2.6/arch/x86/Kconfig
===
--- linux-2.6.orig/arch/x86/Kconfig 2007-11-26 15:38:58.234361975 -0800
+++ linux-2.6/arch/x86/Kconfig  2007-11-26 15:40:52.465611449 -0800
@@ -112,9 +112,9 @@ config GENERIC_TIME_VSYSCALL
bool
default X86_64
 
-
-
-
+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default X86_64
 
 config ZONE_DMA32
bool
Index: linux-2.6/include/asm-generic/percpu.h
===
--- linux-2.6.orig/include/asm-generic/percpu.h 2007-11-26 15:40:04.469611815 
-0800
+++ linux-2.6/include/asm-generic/percpu.h  2007-11-26 15:40:10.437861790 
-0800
@@ -3,7 +3,6 @@
 #include 
 #include 
 
-#define __GENERIC_PER_CPU
 #ifdef CONFIG_SMP
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
Index: linux-2.6/include/asm-x86/percpu_32.h
===
--- linux-2.6.orig/include/asm-x86/percpu_32.h  2007-11-26 15:40:04.469611815 
-0800
+++ linux-2.6/include/asm-x86/percpu_32.h   2007-11-26 15:40:10.441861845 
-0800
@@ -41,8 +41,6 @@
  *PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-/* Same as generic implementation except for optimized local access. */
-#define __GENERIC_PER_CPU
 
 /* This is used for other cpus to find our section. */
 extern unsigned long __per_cpu_offset[];
Index: linux-2.6/include/asm-s390/percpu.h
===
--- linux-2.6.orig/include/asm-s390/percpu.h2007-11-26 15:40:04.469611815 
-0800
+++ linux-2.6/include/asm-s390/percpu.h 2007-11-26 15:40:10.441861845 -0800
@@ -4,8 +4,6 @@
 #include 
 #include 
 
-#define __GENERIC_PER_CPU
-
 /*
  * s390 uses its own implementation for per cpu data, the offset of
  * the cpu local data area is cached in the cpu's lowcore memory.

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-26 Thread Christoph Lameter
The use of the __GENERIC_PERCPU is a bit problematic since arches
may want to run their own percpu setup while using the generic
percpu definitions. Replace it through a kconfig variable.

Cc: Rusty Russell [EMAIL PROTECTED]
Cc: Andi Kleen [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 arch/ia64/Kconfig|4 
 arch/powerpc/Kconfig |4 
 arch/sparc64/Kconfig |4 
 arch/x86/Kconfig |6 +++---
 include/asm-generic/percpu.h |1 -
 include/asm-s390/percpu.h|2 --
 include/asm-x86/percpu_32.h  |2 --
 init/main.c  |4 ++--
 8 files changed, 17 insertions(+), 10 deletions(-)

Index: linux-2.6/init/main.c
===
--- linux-2.6.orig/init/main.c  2007-11-26 15:38:56.407111768 -0800
+++ linux-2.6/init/main.c   2007-11-26 15:40:10.425862722 -0800
@@ -363,7 +363,7 @@ static inline void smp_prepare_cpus(unsi
 
 #else
 
-#ifdef __GENERIC_PER_CPU
+#ifndef CONFIG_ARCH_SETS_UP_PER_CPU_AREA
 unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
 
 EXPORT_SYMBOL(__per_cpu_offset);
@@ -384,7 +384,7 @@ static void __init setup_per_cpu_areas(v
ptr += size;
}
 }
-#endif /* !__GENERIC_PER_CPU */
+#endif /* CONFIG_ARCH_SETS_UP_CPU_AREA */
 
 /* Called by boot processor to activate the rest. */
 static void __init smp_init(void)
Index: linux-2.6/arch/ia64/Kconfig
===
--- linux-2.6.orig/arch/ia64/Kconfig2007-11-26 15:38:56.415112360 -0800
+++ linux-2.6/arch/ia64/Kconfig 2007-11-26 15:40:10.425862722 -0800
@@ -75,6 +75,10 @@ config GENERIC_TIME_VSYSCALL
bool
default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default y
+
 config DMI
bool
default y
Index: linux-2.6/arch/powerpc/Kconfig
===
--- linux-2.6.orig/arch/powerpc/Kconfig 2007-11-26 15:38:56.427111914 -0800
+++ linux-2.6/arch/powerpc/Kconfig  2007-11-26 15:40:10.425862722 -0800
@@ -42,6 +42,10 @@ config GENERIC_HARDIRQS
bool
default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default PPC64
+
 config IRQ_PER_CPU
bool
default y
Index: linux-2.6/arch/sparc64/Kconfig
===
--- linux-2.6.orig/arch/sparc64/Kconfig 2007-11-26 15:38:56.447111936 -0800
+++ linux-2.6/arch/sparc64/Kconfig  2007-11-26 15:40:10.425862722 -0800
@@ -66,6 +66,10 @@ config AUDIT_ARCH
bool
default y
 
+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default y
+
 config ARCH_NO_VIRT_TO_BUS
def_bool y
 
Index: linux-2.6/arch/x86/Kconfig
===
--- linux-2.6.orig/arch/x86/Kconfig 2007-11-26 15:38:58.234361975 -0800
+++ linux-2.6/arch/x86/Kconfig  2007-11-26 15:40:52.465611449 -0800
@@ -112,9 +112,9 @@ config GENERIC_TIME_VSYSCALL
bool
default X86_64
 
-
-
-
+config ARCH_SETS_UP_PER_CPU_AREA
+   bool
+   default X86_64
 
 config ZONE_DMA32
bool
Index: linux-2.6/include/asm-generic/percpu.h
===
--- linux-2.6.orig/include/asm-generic/percpu.h 2007-11-26 15:40:04.469611815 
-0800
+++ linux-2.6/include/asm-generic/percpu.h  2007-11-26 15:40:10.437861790 
-0800
@@ -3,7 +3,6 @@
 #include linux/compiler.h
 #include linux/threads.h
 
-#define __GENERIC_PER_CPU
 #ifdef CONFIG_SMP
 
 extern unsigned long __per_cpu_offset[NR_CPUS];
Index: linux-2.6/include/asm-x86/percpu_32.h
===
--- linux-2.6.orig/include/asm-x86/percpu_32.h  2007-11-26 15:40:04.469611815 
-0800
+++ linux-2.6/include/asm-x86/percpu_32.h   2007-11-26 15:40:10.441861845 
-0800
@@ -41,8 +41,6 @@
  *PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-/* Same as generic implementation except for optimized local access. */
-#define __GENERIC_PER_CPU
 
 /* This is used for other cpus to find our section. */
 extern unsigned long __per_cpu_offset[];
Index: linux-2.6/include/asm-s390/percpu.h
===
--- linux-2.6.orig/include/asm-s390/percpu.h2007-11-26 15:40:04.469611815 
-0800
+++ linux-2.6/include/asm-s390/percpu.h 2007-11-26 15:40:10.441861845 -0800
@@ -4,8 +4,6 @@
 #include linux/compiler.h
 #include asm/lowcore.h
 
-#define __GENERIC_PER_CPU
-
 /*
  * s390 uses its own implementation for per cpu data, the offset of
  * the cpu local data area is cached in the cpu's lowcore memory.

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-26 Thread Rusty Russell
On Tuesday 27 November 2007 11:14:12 Christoph Lameter wrote:
 The use of the __GENERIC_PERCPU is a bit problematic since arches
 may want to run their own percpu setup while using the generic
 percpu definitions. Replace it through a kconfig variable.

Thanks for this Christoph!

These patches are great: the early experiments are obviously over, and so this 
consolidation is overdue.

Have you considered moving x86-64's setup_per_cpu_areas into generic code?  
It's a bit messier because some archs might not have set up NUMA stuff yet, 
but it's logically generic...

Thanks!
Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/