Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-04 Thread Christoph Lameter
Acked-by: Christoph Lameter <[EMAIL PROTECTED]>

for all thats worth since I am not a i386 specialist.

How much of the issues with page struct sharing between slab and arch code 
does this address?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-04 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> Acked-by: Christoph Lameter <[EMAIL PROTECTED]>
>
> for all thats worth since I am not a i386 specialist.
>
> How much of the issues with page struct sharing between slab and arch code 
> does this address?
>   

I haven't been following that thread as closely as I should be, so I
don't have an answer.  I guess the interesting thing in this patch is
that it only uses the pmd cache for usermode pmds (which are
pre-zeroed), and normal page allocations for kernel pmds.  Also, if the
kernel pmds are unshared, the pgds are page-sized, so its not really
making good use of the pgd cache.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-04 Thread Chris Wright
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> Acked-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> for all thats worth since I am not a i386 specialist.
> 
> How much of the issues with page struct sharing between slab and arch code 
> does this address?

I think the answer is 'none yet.'  It uses page sized slab and still
needs pgd_list, for example.  But the mm_list chaining should work too,
so it shouldn't make things any worse.

thanks,
-chris
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-06 Thread Andrew Morton
On Wed, 04 Apr 2007 12:11:58 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Normally when running in PAE mode, the 4th PMD maps the kernel address
> space, which can be shared among all processes (since they all need
> the same kernel mappings).
> 
> Xen, however, does not allow guests to have the kernel pmd shared
> between page tables, so parameterize pgtable.c to allow both modes of
> operation.
> 
> There are several side-effects of this.  One is that vmalloc will
> update the kernel address space mappings, and those updates need to be
> propagated into all processes if the kernel mappings are not
> intrinsically shared.  In the non-PAE case, this is done by
> maintaining a pgd_list of all processes; this list is used when all
> process pagetables must be updated.  pgd_list is threaded via
> otherwise unused entries in the page structure for the pgd, which
> means that the pgd must be page-sized for this to work.
> 
> Normally the PAE pgd is only 4x64 byte entries large, but Xen requires
> the PAE pgd to page aligned anyway, so this patch forces the pgd to be
> page aligned+sized when the kernel pmd is unshared, to accomodate both
> these requirements.
> 
> Also, since there may be several distinct kernel pmds (if the
> user/kernel split is below 3G), there's no point in allocating them
> from a slab cache; they're just allocated with get_free_page and
> initialized appropriately.  (Of course the could be cached if there is
> just a single kernel pmd - which is the default with a 3G user/kernel
> split - but it doesn't seem worthwhile to add yet another case into
> this code).

All this paravirt stuff isn't making the kernel any prettier, is it?

> ...
>  
> -#ifndef CONFIG_X86_PAE
> -void vmalloc_sync_all(void)
> +void _vmalloc_sync_all(void)
>  {
>   /*
>* Note that races in the updates of insync and start aren't
> @@ -600,6 +599,8 @@ void vmalloc_sync_all(void)
>   static DECLARE_BITMAP(insync, PTRS_PER_PGD);
>   static unsigned long start = TASK_SIZE;
>   unsigned long address;
> +
> + BUG_ON(SHARED_KERNEL_PMD);
>  
>   BUILD_BUG_ON(TASK_SIZE & ~PGDIR_MASK);
>   for (address = start; address >= TASK_SIZE; address += PGDIR_SIZE) {
> @@ -623,4 +624,3 @@ void vmalloc_sync_all(void)
>   start = address + PGDIR_SIZE;
>   }
>  }

This is a functional change for non-paravirt kernels.  Non-PAE kernels now
get a vmalloc_sync_all().  How come?

We normally use double-underscore for things like this.

Your change clashes pretty fundamantally with
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/broken-out/move-die-notifier-handling-to-common-code-fix-vmalloc_sync_all.patch,
and
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/broken-out/move-die-notifier-handling-to-common-code.patch
_does_ make the kernel prettier.

But I'm a bit reluctant to rework
move-die-notifier-handling-to-common-code-fix-vmalloc_sync_all.patch
(somehow) until I understand why your patch is a) futzing with non-PAE,
non-paravirt code and b) overengineered.

Why didn't you just stick a

if (SHARED_KERNEL_PMD)
return;

into vmalloc_sync_all()?


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-06 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> All this paravirt stuff isn't making the kernel any prettier, is it?
>   

You're too kind.  wli's comment on the first version of this patch was
something along the lines of "this patch causes a surprising amount of
damage for what little it achieves".

>> ...
>>  
>> -#ifndef CONFIG_X86_PAE
>> -void vmalloc_sync_all(void)
>> +void _vmalloc_sync_all(void)
>>  {
>>  /*
>>   * Note that races in the updates of insync and start aren't
>> @@ -600,6 +599,8 @@ void vmalloc_sync_all(void)
>>  static DECLARE_BITMAP(insync, PTRS_PER_PGD);
>>  static unsigned long start = TASK_SIZE;
>>  unsigned long address;
>> +
>> +BUG_ON(SHARED_KERNEL_PMD);
>>  
>>  BUILD_BUG_ON(TASK_SIZE & ~PGDIR_MASK);
>>  for (address = start; address >= TASK_SIZE; address += PGDIR_SIZE) {
>> @@ -623,4 +624,3 @@ void vmalloc_sync_all(void)
>>  start = address + PGDIR_SIZE;
>>  }
>>  }
>> 
>
> This is a functional change for non-paravirt kernels.  Non-PAE kernels now
> get a vmalloc_sync_all().  How come?
>   

You mean PAE kernels get a vmalloc_sync_all?

When we're in PAE mode, but SHARED_KERNEL_PMD is false (which is true
for Xen, but not for native execution), then the kernel mappings are not
implicitly shared between processes.  This means that the vmalloc
mappings are not shared, and so need to be explicitly synchronized
between pagetables, like in the !PAE case.

> We normally use double-underscore for things like this.
>   

OK.

> Your change clashes pretty fundamantally with
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/broken-out/move-die-notifier-handling-to-common-code-fix-vmalloc_sync_all.patch,
> and
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/broken-out/move-die-notifier-handling-to-common-code.patch
> _does_ make the kernel prettier.
>   

Hm, it doesn't look like a deep clash.  Dropping the inline function and
just putting the "if (SHARED_KERNEL_PMD) return;" at the start of the
real vmalloc_sync_all() would work fine.

And I like vmalloc_sync_all() being a non-arch-specific interface; it
cleans up another of the xen patches.

> But I'm a bit reluctant to rework
> move-die-notifier-handling-to-common-code-fix-vmalloc_sync_all.patch
> (somehow) until I understand why your patch is a) futzing with non-PAE,
> non-paravirt code

There should be no functional difference for non-paravirt code, PAE or
non-PAE.

>  and b) overengineered.
>   

Overall, or just this bit?

> Why didn't you just stick a
>
>   if (SHARED_KERNEL_PMD)
>   return;
>
> into vmalloc_sync_all()?
>   

That would work, but when building !PARAVIRT && PAE, SHARED_KERNEL_PMD
is just constant 1, so it would end up making a pointless function
call.  With the wrapper, the call disappears entirely.  It probably
doesn't matter, but I didn't want anyone to complain about making the
!PARAVIRT generated code worse (hi, Ingo!).

However, if you're making vmalloc_sync_all a weak function anyway, then
there's no difference with the paravirt patches in place.  The

if (SHARED_KERNEL_PMD)
return;

will evaluate to

if (1)
return;

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-06 Thread Andrew Morton
On Fri, 06 Apr 2007 17:02:58 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Andrew Morton wrote:
> > All this paravirt stuff isn't making the kernel any prettier, is it?
> >   
> 
> You're too kind.  wli's comment on the first version of this patch was
> something along the lines of "this patch causes a surprising amount of
> damage for what little it achieves".

Damn, I wish I'd said that.

> >> ...
> >>  
> >> -#ifndef CONFIG_X86_PAE
> >> -void vmalloc_sync_all(void)
> >> +void _vmalloc_sync_all(void)
> >>  {
> >>/*
> >> * Note that races in the updates of insync and start aren't
> >> @@ -600,6 +599,8 @@ void vmalloc_sync_all(void)
> >>static DECLARE_BITMAP(insync, PTRS_PER_PGD);
> >>static unsigned long start = TASK_SIZE;
> >>unsigned long address;
> >> +
> >> +  BUG_ON(SHARED_KERNEL_PMD);
> >>  
> >>BUILD_BUG_ON(TASK_SIZE & ~PGDIR_MASK);
> >>for (address = start; address >= TASK_SIZE; address += PGDIR_SIZE) {
> >> @@ -623,4 +624,3 @@ void vmalloc_sync_all(void)
> >>start = address + PGDIR_SIZE;
> >>}
> >>  }
> >> 
> >
> > This is a functional change for non-paravirt kernels.  Non-PAE kernels now
> > get a vmalloc_sync_all().  How come?
> >   
> 
> You mean PAE kernels get a vmalloc_sync_all?

err, yes.

> When we're in PAE mode, but SHARED_KERNEL_PMD is false (which is true
> for Xen, but not for native execution), then the kernel mappings are not
> implicitly shared between processes.  This means that the vmalloc
> mappings are not shared, and so need to be explicitly synchronized
> between pagetables, like in the !PAE case.

head spins.

> > Your change clashes pretty fundamantally with
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/broken-out/move-die-notifier-handling-to-common-code-fix-vmalloc_sync_all.patch,
> > and
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/broken-out/move-die-notifier-handling-to-common-code.patch
> > _does_ make the kernel prettier.
> >   
> 
> Hm, it doesn't look like a deep clash.  Dropping the inline function and
> just putting the "if (SHARED_KERNEL_PMD) return;" at the start of the
> real vmalloc_sync_all() would work fine.

Something like that.  I don't want to redo my patch if we're going to change
your patch ;)

> And I like vmalloc_sync_all() being a non-arch-specific interface; it
> cleans up another of the xen patches.

OK.

> > But I'm a bit reluctant to rework
> > move-die-notifier-handling-to-common-code-fix-vmalloc_sync_all.patch
> > (somehow) until I understand why your patch is a) futzing with non-PAE,
> > non-paravirt code
> 
> There should be no functional difference for non-paravirt code, PAE or
> non-PAE.
> 
> >  and b) overengineered.
> >   
> 
> Overall, or just this bit?

this bit.

> > Why didn't you just stick a
> >
> > if (SHARED_KERNEL_PMD)
> > return;
> >
> > into vmalloc_sync_all()?
> >   
> 
> That would work, but when building !PARAVIRT && PAE, SHARED_KERNEL_PMD
> is just constant 1, so it would end up making a pointless function
> call.  With the wrapper, the call disappears entirely.  It probably
> doesn't matter, but I didn't want anyone to complain about making the
> !PARAVIRT generated code worse (hi, Ingo!).

vmalloc_sync_all() is a) tremendously slow and b) only called by
register_die_notifier().  We can afford to add a few cycles to it.

> However, if you're making vmalloc_sync_all a weak function anyway, then
> there's no difference with the paravirt patches in place.  The
> 
>   if (SHARED_KERNEL_PMD)
>   return;
> 
> will evaluate to
> 
>   if (1)
>   return;
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-06 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> Something like that.  I don't want to redo my patch if we're going to change
> your patch ;)
>   

OK.  I won't specifically redo it on top of your patches, but I'll
rework it to remove the inline function and add the if() statement.  Do
you want an incremental update or a complete replacement?

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-06 Thread Andrew Morton
On Fri, 06 Apr 2007 17:40:13 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:

> Andrew Morton wrote:
> > Something like that.  I don't want to redo my patch if we're going to change
> > your patch ;)
> >   
> 
> OK.  I won't specifically redo it on top of your patches, but I'll
> rework it to remove the inline function and add the if() statement.  Do
> you want an incremental update or a complete replacement?
> 

Thanks.  A replacement would suit.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-06 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> Thanks.  A replacement would suit.
>   
Subject: Allow paravirt backend to choose kernel PMD sharing

Normally when running in PAE mode, the 4th PMD maps the kernel address
space, which can be shared among all processes (since they all need
the same kernel mappings).

Xen, however, does not allow guests to have the kernel pmd shared
between page tables, so parameterize pgtable.c to allow both modes of
operation.

There are several side-effects of this.  One is that vmalloc will
update the kernel address space mappings, and those updates need to be
propagated into all processes if the kernel mappings are not
intrinsically shared.  In the non-PAE case, this is done by
maintaining a pgd_list of all processes; this list is used when all
process pagetables must be updated.  pgd_list is threaded via
otherwise unused entries in the page structure for the pgd, which
means that the pgd must be page-sized for this to work.

Normally the PAE pgd is only 4x64 byte entries large, but Xen requires
the PAE pgd to page aligned anyway, so this patch forces the pgd to be
page aligned+sized when the kernel pmd is unshared, to accomodate both
these requirements.

Also, since there may be several distinct kernel pmds (if the
user/kernel split is below 3G), there's no point in allocating them
from a slab cache; they're just allocated with get_free_page and
initialized appropriately.  (Of course the could be cached if there is
just a single kernel pmd - which is the default with a 3G user/kernel
split - but it doesn't seem worthwhile to add yet another case into
this code).

[ Many thanks to wli for review comments. ]

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Signed-off-by: William Lee Irwin III <[EMAIL PROTECTED]>
Cc: Zachary Amsden <[EMAIL PROTECTED]>
Cc: Christoph Lameter <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/i386/kernel/paravirt.c|1 
 arch/i386/mm/fault.c   |5 +
 arch/i386/mm/init.c|   18 +-
 arch/i386/mm/pageattr.c|2 
 arch/i386/mm/pgtable.c |   84 ++--
 include/asm-i386/paravirt.h|1 
 include/asm-i386/pgtable-2level-defs.h |2 
 include/asm-i386/pgtable-2level.h  |2 
 include/asm-i386/pgtable-3level-defs.h |6 ++
 include/asm-i386/pgtable-3level.h  |2 
 include/asm-i386/pgtable.h |2 
 11 files changed, 100 insertions(+), 25 deletions(-)

===
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -132,6 +132,7 @@ struct paravirt_ops paravirt_ops = {
.name = "bare hardware",
.paravirt_enabled = 0,
.kernel_rpl = 0,
+   .shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */
 
.patch = native_patch,
.banner = default_banner,
===
--- a/arch/i386/mm/fault.c
+++ b/arch/i386/mm/fault.c
@@ -603,7 +603,6 @@ do_sigbus:
force_sig_info_fault(SIGBUS, BUS_ADRERR, address, tsk);
 }
 
-#ifndef CONFIG_X86_PAE
 void vmalloc_sync_all(void)
 {
/*
@@ -615,6 +614,9 @@ void vmalloc_sync_all(void)
static DECLARE_BITMAP(insync, PTRS_PER_PGD);
static unsigned long start = TASK_SIZE;
unsigned long address;
+
+   if (SHARED_KERNEL_PMD)
+   return;
 
BUILD_BUG_ON(TASK_SIZE & ~PGDIR_MASK);
for (address = start; address >= TASK_SIZE; address += PGDIR_SIZE) {
@@ -638,4 +640,3 @@ void vmalloc_sync_all(void)
start = address + PGDIR_SIZE;
}
 }
-#endif
===
--- a/arch/i386/mm/init.c
+++ b/arch/i386/mm/init.c
@@ -757,6 +757,8 @@ struct kmem_cache *pmd_cache;
 
 void __init pgtable_cache_init(void)
 {
+   size_t pgd_size = PTRS_PER_PGD*sizeof(pgd_t);
+
if (PTRS_PER_PMD > 1) {
pmd_cache = kmem_cache_create("pmd",
PTRS_PER_PMD*sizeof(pmd_t),
@@ -766,13 +768,23 @@ void __init pgtable_cache_init(void)
NULL);
if (!pmd_cache)
panic("pgtable_cache_init(): cannot create pmd cache");
+
+   if (!SHARED_KERNEL_PMD) {
+   /* If we're in PAE mode and have a non-shared
+  kernel pmd, then the pgd size must be a
+  page size.  This is because the pgd_list
+  links through the page structure, so there
+  can only be one pgd per page for this to
+  work. */
+   pgd_size = PAGE_SIZE;
+   }
}
pgd_cache = kmem_cache_create("pgd",
-   PTRS_PER_PGD*sizeof(pgd_t),
-   PTRS_PER_PGD*siz

Re: [patch 07/20] Allow paravirt backend to choose kernel PMD sharing

2007-04-08 Thread William Lee Irwin III
On Fri, 06 Apr 2007 17:02:58 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> 
wrote:
>> You're too kind.  wli's comment on the first version of this patch was
>> something along the lines of "this patch causes a surprising amount of
>> damage for what little it achieves".

On Fri, Apr 06, 2007 at 05:28:44PM -0700, Andrew Morton wrote:
> Damn, I wish I'd said that.

ISTR it went:

On Fri, Feb 16, 2007 at 02:21:07PM -0800, William Lee Irwin III wrote:
> The amount of violence this patch manages to commit is phenomenal for
> what little it actually does. There are also a number of oddities

Cheers.


-- wli
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization