RE: [PATCH 1/1] Implement shared page tables

2005-09-02 Thread Dave McCracken

--On Thursday, September 01, 2005 18:58:23 -0700 "Chen, Kenneth W"
<[EMAIL PROTECTED]> wrote:

>> +prio_tree_iter_init(, >i_mmap,
>> +vma->vm_start, vma->vm_end);
> 
> 
> I think this is a bug.  The radix priority tree for address_space->
> i_mmap is keyed on vma->vm_pgoff.  Your patch uses the vma virtual
> address to find a shareable range, Which will always fail a match
> even though there is one.
>
> Do you really have to iterate through all the vma?  Can't you just break
> out of the while loop on first successful match and populating the pmd?
> I would think you will find them to be the same pte page. Or did I miss
> some thing?

Man, I spaced that whole search code.  I was sure I'd tested to make sure
it was finding matches.  I'll fix all that up in my next release.

Dave McCracken

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] Implement shared page tables

2005-09-02 Thread Dave McCracken

--On Thursday, September 01, 2005 18:58:23 -0700 Chen, Kenneth W
[EMAIL PROTECTED] wrote:

 +prio_tree_iter_init(iter, mapping-i_mmap,
 +vma-vm_start, vma-vm_end);
 
 
 I think this is a bug.  The radix priority tree for address_space-
 i_mmap is keyed on vma-vm_pgoff.  Your patch uses the vma virtual
 address to find a shareable range, Which will always fail a match
 even though there is one.

 Do you really have to iterate through all the vma?  Can't you just break
 out of the while loop on first successful match and populating the pmd?
 I would think you will find them to be the same pte page. Or did I miss
 some thing?

Man, I spaced that whole search code.  I was sure I'd tested to make sure
it was finding matches.  I'll fix all that up in my next release.

Dave McCracken

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] Implement shared page tables

2005-09-01 Thread Chen, Kenneth W
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM
> This patch implements page table sharing for all shared memory regions that
> span an entire page table page.  It supports sharing at multiple page
> levels, depending on the architecture.

In function pt_share_pte():

> + while ((svma = next_shareable_vma(vma, svma, ))) {
> + spgd = pgd_offset(svma->vm_mm, address);
> + if (pgd_none(*spgd))
> + continue;
> +
> + spud = pud_offset(spgd, address);
> + if (pud_none(*spud))
> + continue;
> +
> + spmd = pmd_offset(spud, address);
> + if (pmd_none(*spmd))
> + continue;

> + page = pmd_page(*spmd);
> + pt_increment_share(page);
> + pmd_populate(vma->vm_mm, pmd, page);
> + }


Do you really have to iterate through all the vma?  Can't you just break
out of the while loop on first successful match and populating the pmd?
I would think you will find them to be the same pte page. Or did I miss
some thing?


--- ./mm/ptshare.c.orig 2005-09-01 21:16:35.311915518 -0700
+++ ./mm/ptshare.c  2005-09-01 21:18:24.629296992 -0700
@@ -200,6 +200,7 @@ pt_share_pte(struct vm_area_struct *vma,
page = pmd_page(*spmd);
pt_increment_share(page);
pmd_populate(vma->vm_mm, pmd, page);
+   break;
}
}
pte = pte_alloc_map(vma->vm_mm, pmd, address);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] Implement shared page tables

2005-09-01 Thread Chen, Kenneth W
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM
> This patch implements page table sharing for all shared memory regions that
> span an entire page table page.  It supports sharing at multiple page
> levels, depending on the architecture.
> 
> 
> This version of the patch supports i386 and x86_64.  I have additional
> patches to support ppc64, but they are not quite ready for public
> consumption.
> 
>  
> + prio_tree_iter_init(, >i_mmap,
> + vma->vm_start, vma->vm_end);


I think this is a bug.  The radix priority tree for address_space->
i_mmap is keyed on vma->vm_pgoff.  Your patch uses the vma virtual
address to find a shareable range, Which will always fail a match
even though there is one.  The following is a quick hack I did to
make it work.

- Ken

--- linux-2.6.13/mm/ptshare.c.orig  2005-09-01 18:58:12.299321918 -0700
+++ linux-2.6.13/mm/ptshare.c   2005-09-01 18:58:39.846196580 -0700
@@ -26,6 +26,11 @@
 #include 
 #include 
 
+#define RADIX_INDEX(vma)  ((vma)->vm_pgoff)
+#define VMA_SIZE(vma)(((vma)->vm_end - (vma)->vm_start) >> PAGE_SHIFT)
+/* avoid overflow */
+#define HEAP_INDEX(vma)  ((vma)->vm_pgoff + (VMA_SIZE(vma) - 1))
+
 #undef PT_DEBUG
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -173,7 +178,7 @@ pt_share_pte(struct vm_area_struct *vma,
   address);
 #endif
prio_tree_iter_init(, >i_mmap,
-   vma->vm_start, vma->vm_end);
+   RADIX_INDEX(vma), HEAP_INDEX(vma));
 
while ((svma = next_shareable_vma(vma, svma, ))) {
spgd = pgd_offset(svma->vm_mm, address);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] Implement shared page tables

2005-09-01 Thread Chen, Kenneth W
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM
 This patch implements page table sharing for all shared memory regions that
 span an entire page table page.  It supports sharing at multiple page
 levels, depending on the architecture.
 
 
 This version of the patch supports i386 and x86_64.  I have additional
 patches to support ppc64, but they are not quite ready for public
 consumption.
 
  
 + prio_tree_iter_init(iter, mapping-i_mmap,
 + vma-vm_start, vma-vm_end);


I think this is a bug.  The radix priority tree for address_space-
i_mmap is keyed on vma-vm_pgoff.  Your patch uses the vma virtual
address to find a shareable range, Which will always fail a match
even though there is one.  The following is a quick hack I did to
make it work.

- Ken

--- linux-2.6.13/mm/ptshare.c.orig  2005-09-01 18:58:12.299321918 -0700
+++ linux-2.6.13/mm/ptshare.c   2005-09-01 18:58:39.846196580 -0700
@@ -26,6 +26,11 @@
 #include asm/pgtable.h
 #include asm/pgalloc.h
 
+#define RADIX_INDEX(vma)  ((vma)-vm_pgoff)
+#define VMA_SIZE(vma)(((vma)-vm_end - (vma)-vm_start)  PAGE_SHIFT)
+/* avoid overflow */
+#define HEAP_INDEX(vma)  ((vma)-vm_pgoff + (VMA_SIZE(vma) - 1))
+
 #undef PT_DEBUG
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -173,7 +178,7 @@ pt_share_pte(struct vm_area_struct *vma,
   address);
 #endif
prio_tree_iter_init(iter, mapping-i_mmap,
-   vma-vm_start, vma-vm_end);
+   RADIX_INDEX(vma), HEAP_INDEX(vma));
 
while ((svma = next_shareable_vma(vma, svma, iter))) {
spgd = pgd_offset(svma-vm_mm, address);


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/1] Implement shared page tables

2005-09-01 Thread Chen, Kenneth W
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM
 This patch implements page table sharing for all shared memory regions that
 span an entire page table page.  It supports sharing at multiple page
 levels, depending on the architecture.

In function pt_share_pte():

 + while ((svma = next_shareable_vma(vma, svma, iter))) {
 + spgd = pgd_offset(svma-vm_mm, address);
 + if (pgd_none(*spgd))
 + continue;
 +
 + spud = pud_offset(spgd, address);
 + if (pud_none(*spud))
 + continue;
 +
 + spmd = pmd_offset(spud, address);
 + if (pmd_none(*spmd))
 + continue;

 + page = pmd_page(*spmd);
 + pt_increment_share(page);
 + pmd_populate(vma-vm_mm, pmd, page);
 + }


Do you really have to iterate through all the vma?  Can't you just break
out of the while loop on first successful match and populating the pmd?
I would think you will find them to be the same pte page. Or did I miss
some thing?


--- ./mm/ptshare.c.orig 2005-09-01 21:16:35.311915518 -0700
+++ ./mm/ptshare.c  2005-09-01 21:18:24.629296992 -0700
@@ -200,6 +200,7 @@ pt_share_pte(struct vm_area_struct *vma,
page = pmd_page(*spmd);
pt_increment_share(page);
pmd_populate(vma-vm_mm, pmd, page);
+   break;
}
}
pte = pte_alloc_map(vma-vm_mm, pmd, address);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Dave McCracken

--On Wednesday, August 31, 2005 12:44:24 +0100 Hugh Dickins
<[EMAIL PROTECTED]> wrote:

> So you don't have Nick's test at the start of copy_page_range():
>   if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) {
>   if (!vma->anon_vma)
>   return 0;
>   }
> Experimental, yes, but Linus likes it enough to have fast-tracked it into
> his tree for 2.6.14.  My guess is that that patch (if its downsides prove
> manageable) takes away a lot of the point of shared page tables -
> I wonder how much of your "3% improvement".

Very little, actually.  The test does not create new processes as part of
the run.  The improvement is due to sharing of existing areas.

> I was going to say, doesn't randomize_va_space take away the rest of
> the point?  But no, it appears "randomize_va_space", as it currently
> appears in mainline anyway, is somewhat an exaggeration: it just shifts
> the stack a little, with no effect on the rest of the va space.
> But if it is to do more later, it may conflict with your interest.

I've been considering a future enhancement to my patch where it could share
page tables of any areas that share alignment, not just the same virtual
address.  That might allow sharing with randomization if the randomization
aligns things properly.

> The pud sharing and pmd sharing: perhaps they complicate the patch for
> negligible benefit?

The pmd sharing is necessary for ppc64 since it has to share at segment
size, plus it will be useful for very large regions.  I did pud for
completeness but you may be right that it's not useful.  It's all
configurable in any event.

>> +if ((vma->vm_start <= base) &&
>> +(vma->vm_end >= end))
>> +return 1;
>> 
> New Adventures in Coding Style ;)

New Adventures in Typos, actually :)  I'll fix.

> But most seriously: search the patch for the string "lock" and I find
> no change whatever to locking.  You're introducing page tables shared
> between different mms yet relying on the old mm->page_table_lock?
> You're searching a prio_tree for suitable matches to share, but
> taking no lock on that?  You're counting shares in an atomic,
> but not detecting when the count falls to 0 atomically?
> 
> And allied with that point on locking mms: there's no change to rmap.c,
> so how is its TLB flushing and cache flushing now supposed to work?
> page_referenced_one and try_to_unmap_one will visit all the vmas
> sharing the page table, yes, but (usually) only the first will
> satisfy the conditions and get flushed.

I'll go over the locking again.

> I'm not sure if it's worth pursuing shared page tables again or not.

The immediate clear benefits I see are a reduction in the number of page
table pages and a reduction in minor faults.  Keep in mind that faulting a
page into a shared page table makes it available to all other processes
sharing that area, eliminating the need for them to also take faults on it.

> You certainly need to sort the locking out to do so.  Wait a couple
> of weeks and I should have sent all the per-page-table-page locking
> in to -mm (to replace the pte xchging currently there): that should
> give what you need for locking pts independent of the mm.

I'll look things over in more detail.  I thought I had the locking issues
settled, but you raised some points I should revisit.

Dave McCracken

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Martin J. Bligh
>> They're incompatible, but you could be left to choose one or the other
>> via config option.
> 
> Wouldn't need config option: there's /proc/sys/kernel/randomize_va_space
> for the whole running system, compatibility check on the ELFs run, and
> the infinite stack rlimit: enough ways to suppress randomization if it
> doesn't suit you.

Even better - much easier to deal with distro stuff if we can do it at
runtime.
 
>> 3% on "a certain industry-standard database benchmark" (cough) is huge,
>> and we expect the benefit for PPC64 will be larger as we can share the
>> underlying hardware PTEs without TLB flushing as well.
> 
> Okay - and you're implying that 3% comes from _using_ the shared page
> tables, rather than from avoiding the fork/exit overhead of setting
> them up and tearing them down.  And it can't use huge TLB pages
> because...  fragmentation?

Yes - as I understand it, that was a straight measurement with/without the
patch, and the shmem segment was already using hugetlb (in both cases). 
Yes, I find that a bit odd as to why as well - they are still trying 
to get some detailed profiling to explain. 

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Hugh Dickins
On Wed, 31 Aug 2005, Martin J. Bligh wrote:
> --Hugh Dickins <[EMAIL PROTECTED]> wrote (on Wednesday, August 31, 2005 
> 14:42:38 +0100):
> > 
> > Which is indeed a further disincentive against shared page tables.
> 
> Or shared pagetables a disincentive to randomizing the mmap space ;-)

Fair point!

> They're incompatible, but you could be left to choose one or the other
> via config option.

Wouldn't need config option: there's /proc/sys/kernel/randomize_va_space
for the whole running system, compatibility check on the ELFs run, and
the infinite stack rlimit: enough ways to suppress randomization if it
doesn't suit you.

> 3% on "a certain industry-standard database benchmark" (cough) is huge,
> and we expect the benefit for PPC64 will be larger as we can share the
> underlying hardware PTEs without TLB flushing as well.

Okay - and you're implying that 3% comes from _using_ the shared page
tables, rather than from avoiding the fork/exit overhead of setting
them up and tearing them down.  And it can't use huge TLB pages
because...  fragmentation?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Arjan van de Ven

> > Which is indeed a further disincentive against shared page tables.
> 
> Or shared pagetables a disincentive to randomizing the mmap space ;-)
> They're incompatible, but you could be left to choose one or the other
> via config option.
> 
> 3% on "a certain industry-standard database benchmark" (cough) is huge,
> and we expect the benefit for PPC64 will be larger as we can share the
> underlying hardware PTEs without TLB flushing as well.
> 

surely the benchmark people know that the database in question always
mmaps the shared area at the address where the first one started it?
(if not, could make it so ;)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Martin J. Bligh
--Hugh Dickins <[EMAIL PROTECTED]> wrote (on Wednesday, August 31, 2005 
14:42:38 +0100):

> On Wed, 31 Aug 2005, Arjan van de Ven wrote:
>> On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote:
>> > I was going to say, doesn't randomize_va_space take away the rest of
>> > the point?  But no, it appears "randomize_va_space", as it currently
>> > appears in mainline anyway, is somewhat an exaggeration: it just shifts
>> > the stack a little, with no effect on the rest of the va space.
>> 
>> it also randomizes mmaps
> 
> Ah, via PF_RANDOMIZE, yes, thanks: so long as certain conditions are
> fulfilled - and my RLIM_INFINITY RLIMIT_STACK has been preventing it.
> 
> And mmaps include shmats: so unless the process specifies non-NULL
> shmaddr to attach at, it'll choose a randomized address for that too
> (subject to those various conditions).
> 
> Which is indeed a further disincentive against shared page tables.

Or shared pagetables a disincentive to randomizing the mmap space ;-)
They're incompatible, but you could be left to choose one or the other
via config option.

3% on "a certain industry-standard database benchmark" (cough) is huge,
and we expect the benefit for PPC64 will be larger as we can share the
underlying hardware PTEs without TLB flushing as well.

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Hugh Dickins
On Wed, 31 Aug 2005, Arjan van de Ven wrote:
> On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote:
> > I was going to say, doesn't randomize_va_space take away the rest of
> > the point?  But no, it appears "randomize_va_space", as it currently
> > appears in mainline anyway, is somewhat an exaggeration: it just shifts
> > the stack a little, with no effect on the rest of the va space.
> 
> it also randomizes mmaps

Ah, via PF_RANDOMIZE, yes, thanks: so long as certain conditions are
fulfilled - and my RLIM_INFINITY RLIMIT_STACK has been preventing it.

And mmaps include shmats: so unless the process specifies non-NULL
shmaddr to attach at, it'll choose a randomized address for that too
(subject to those various conditions).

Which is indeed a further disincentive against shared page tables.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Arjan van de Ven
On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote:
> I was going to say, doesn't randomize_va_space take away the rest of
> the point?  But no, it appears "randomize_va_space", as it currently
> appears in mainline anyway, is somewhat an exaggeration: it just shifts
> the stack a little, with no effect on the rest of the va space.

it also randomizes mmaps


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Hugh Dickins
On Tue, 30 Aug 2005, Dave McCracken wrote:
> 
> This patch implements page table sharing for all shared memory regions that
> span an entire page table page.  It supports sharing at multiple page
> levels, depending on the architecture.
> 
> Performance testing has shown no degradation with this patch for tests with
> small processes.  Preliminary tests with large benchmarks have shown as
> much as 3% improvement in overall results.

Hmm.  A few points.

> The patch is against 2.6.13.

So you don't have Nick's test at the start of copy_page_range():
if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) {
if (!vma->anon_vma)
return 0;
}
Experimental, yes, but Linus likes it enough to have fast-tracked it into
his tree for 2.6.14.  My guess is that that patch (if its downsides prove
manageable) takes away a lot of the point of shared page tables -
I wonder how much of your "3% improvement".

I was going to say, doesn't randomize_va_space take away the rest of
the point?  But no, it appears "randomize_va_space", as it currently
appears in mainline anyway, is somewhat an exaggeration: it just shifts
the stack a little, with no effect on the rest of the va space.
But if it is to do more later, it may conflict with your interest.

The pud sharing and pmd sharing: perhaps they complicate the patch for
negligible benefit?

> + if ((vma->vm_start <= base) &&
> + (vma->vm_end >= end))
> + return 1;
> 
New Adventures in Coding Style ;)

But most seriously: search the patch for the string "lock" and I find
no change whatever to locking.  You're introducing page tables shared
between different mms yet relying on the old mm->page_table_lock?
You're searching a prio_tree for suitable matches to share, but
taking no lock on that?  You're counting shares in an atomic,
but not detecting when the count falls to 0 atomically?

And allied with that point on locking mms: there's no change to rmap.c,
so how is its TLB flushing and cache flushing now supposed to work?
page_referenced_one and try_to_unmap_one will visit all the vmas
sharing the page table, yes, but (usually) only the first will
satisfy the conditions and get flushed.

I'm not sure if it's worth pursuing shared page tables again or not.

You certainly need to sort the locking out to do so.  Wait a couple
of weeks and I should have sent all the per-page-table-page locking
in to -mm (to replace the pte xchging currently there): that should
give what you need for locking pts independent of the mm.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Hugh Dickins
On Tue, 30 Aug 2005, Dave McCracken wrote:
 
 This patch implements page table sharing for all shared memory regions that
 span an entire page table page.  It supports sharing at multiple page
 levels, depending on the architecture.
 
 Performance testing has shown no degradation with this patch for tests with
 small processes.  Preliminary tests with large benchmarks have shown as
 much as 3% improvement in overall results.

Hmm.  A few points.

 The patch is against 2.6.13.

So you don't have Nick's test at the start of copy_page_range():
if (!(vma-vm_flags  (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) {
if (!vma-anon_vma)
return 0;
}
Experimental, yes, but Linus likes it enough to have fast-tracked it into
his tree for 2.6.14.  My guess is that that patch (if its downsides prove
manageable) takes away a lot of the point of shared page tables -
I wonder how much of your 3% improvement.

I was going to say, doesn't randomize_va_space take away the rest of
the point?  But no, it appears randomize_va_space, as it currently
appears in mainline anyway, is somewhat an exaggeration: it just shifts
the stack a little, with no effect on the rest of the va space.
But if it is to do more later, it may conflict with your interest.

The pud sharing and pmd sharing: perhaps they complicate the patch for
negligible benefit?

 + if ((vma-vm_start = base) 
 + (vma-vm_end = end))
 + return 1;
 
New Adventures in Coding Style ;)

But most seriously: search the patch for the string lock and I find
no change whatever to locking.  You're introducing page tables shared
between different mms yet relying on the old mm-page_table_lock?
You're searching a prio_tree for suitable matches to share, but
taking no lock on that?  You're counting shares in an atomic,
but not detecting when the count falls to 0 atomically?

And allied with that point on locking mms: there's no change to rmap.c,
so how is its TLB flushing and cache flushing now supposed to work?
page_referenced_one and try_to_unmap_one will visit all the vmas
sharing the page table, yes, but (usually) only the first will
satisfy the conditions and get flushed.

I'm not sure if it's worth pursuing shared page tables again or not.

You certainly need to sort the locking out to do so.  Wait a couple
of weeks and I should have sent all the per-page-table-page locking
in to -mm (to replace the pte xchging currently there): that should
give what you need for locking pts independent of the mm.

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Arjan van de Ven
On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote:
 I was going to say, doesn't randomize_va_space take away the rest of
 the point?  But no, it appears randomize_va_space, as it currently
 appears in mainline anyway, is somewhat an exaggeration: it just shifts
 the stack a little, with no effect on the rest of the va space.

it also randomizes mmaps


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Hugh Dickins
On Wed, 31 Aug 2005, Arjan van de Ven wrote:
 On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote:
  I was going to say, doesn't randomize_va_space take away the rest of
  the point?  But no, it appears randomize_va_space, as it currently
  appears in mainline anyway, is somewhat an exaggeration: it just shifts
  the stack a little, with no effect on the rest of the va space.
 
 it also randomizes mmaps

Ah, via PF_RANDOMIZE, yes, thanks: so long as certain conditions are
fulfilled - and my RLIM_INFINITY RLIMIT_STACK has been preventing it.

And mmaps include shmats: so unless the process specifies non-NULL
shmaddr to attach at, it'll choose a randomized address for that too
(subject to those various conditions).

Which is indeed a further disincentive against shared page tables.

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Martin J. Bligh
--Hugh Dickins [EMAIL PROTECTED] wrote (on Wednesday, August 31, 2005 
14:42:38 +0100):

 On Wed, 31 Aug 2005, Arjan van de Ven wrote:
 On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote:
  I was going to say, doesn't randomize_va_space take away the rest of
  the point?  But no, it appears randomize_va_space, as it currently
  appears in mainline anyway, is somewhat an exaggeration: it just shifts
  the stack a little, with no effect on the rest of the va space.
 
 it also randomizes mmaps
 
 Ah, via PF_RANDOMIZE, yes, thanks: so long as certain conditions are
 fulfilled - and my RLIM_INFINITY RLIMIT_STACK has been preventing it.
 
 And mmaps include shmats: so unless the process specifies non-NULL
 shmaddr to attach at, it'll choose a randomized address for that too
 (subject to those various conditions).
 
 Which is indeed a further disincentive against shared page tables.

Or shared pagetables a disincentive to randomizing the mmap space ;-)
They're incompatible, but you could be left to choose one or the other
via config option.

3% on a certain industry-standard database benchmark (cough) is huge,
and we expect the benefit for PPC64 will be larger as we can share the
underlying hardware PTEs without TLB flushing as well.

M.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Arjan van de Ven

  Which is indeed a further disincentive against shared page tables.
 
 Or shared pagetables a disincentive to randomizing the mmap space ;-)
 They're incompatible, but you could be left to choose one or the other
 via config option.
 
 3% on a certain industry-standard database benchmark (cough) is huge,
 and we expect the benefit for PPC64 will be larger as we can share the
 underlying hardware PTEs without TLB flushing as well.
 

surely the benchmark people know that the database in question always
mmaps the shared area at the address where the first one started it?
(if not, could make it so ;)



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Hugh Dickins
On Wed, 31 Aug 2005, Martin J. Bligh wrote:
 --Hugh Dickins [EMAIL PROTECTED] wrote (on Wednesday, August 31, 2005 
 14:42:38 +0100):
  
  Which is indeed a further disincentive against shared page tables.
 
 Or shared pagetables a disincentive to randomizing the mmap space ;-)

Fair point!

 They're incompatible, but you could be left to choose one or the other
 via config option.

Wouldn't need config option: there's /proc/sys/kernel/randomize_va_space
for the whole running system, compatibility check on the ELFs run, and
the infinite stack rlimit: enough ways to suppress randomization if it
doesn't suit you.

 3% on a certain industry-standard database benchmark (cough) is huge,
 and we expect the benefit for PPC64 will be larger as we can share the
 underlying hardware PTEs without TLB flushing as well.

Okay - and you're implying that 3% comes from _using_ the shared page
tables, rather than from avoiding the fork/exit overhead of setting
them up and tearing them down.  And it can't use huge TLB pages
because...  fragmentation?

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Martin J. Bligh
 They're incompatible, but you could be left to choose one or the other
 via config option.
 
 Wouldn't need config option: there's /proc/sys/kernel/randomize_va_space
 for the whole running system, compatibility check on the ELFs run, and
 the infinite stack rlimit: enough ways to suppress randomization if it
 doesn't suit you.

Even better - much easier to deal with distro stuff if we can do it at
runtime.
 
 3% on a certain industry-standard database benchmark (cough) is huge,
 and we expect the benefit for PPC64 will be larger as we can share the
 underlying hardware PTEs without TLB flushing as well.
 
 Okay - and you're implying that 3% comes from _using_ the shared page
 tables, rather than from avoiding the fork/exit overhead of setting
 them up and tearing them down.  And it can't use huge TLB pages
 because...  fragmentation?

Yes - as I understand it, that was a straight measurement with/without the
patch, and the shmem segment was already using hugetlb (in both cases). 
Yes, I find that a bit odd as to why as well - they are still trying 
to get some detailed profiling to explain. 

M.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Implement shared page tables

2005-08-31 Thread Dave McCracken

--On Wednesday, August 31, 2005 12:44:24 +0100 Hugh Dickins
[EMAIL PROTECTED] wrote:

 So you don't have Nick's test at the start of copy_page_range():
   if (!(vma-vm_flags  (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) {
   if (!vma-anon_vma)
   return 0;
   }
 Experimental, yes, but Linus likes it enough to have fast-tracked it into
 his tree for 2.6.14.  My guess is that that patch (if its downsides prove
 manageable) takes away a lot of the point of shared page tables -
 I wonder how much of your 3% improvement.

Very little, actually.  The test does not create new processes as part of
the run.  The improvement is due to sharing of existing areas.

 I was going to say, doesn't randomize_va_space take away the rest of
 the point?  But no, it appears randomize_va_space, as it currently
 appears in mainline anyway, is somewhat an exaggeration: it just shifts
 the stack a little, with no effect on the rest of the va space.
 But if it is to do more later, it may conflict with your interest.

I've been considering a future enhancement to my patch where it could share
page tables of any areas that share alignment, not just the same virtual
address.  That might allow sharing with randomization if the randomization
aligns things properly.

 The pud sharing and pmd sharing: perhaps they complicate the patch for
 negligible benefit?

The pmd sharing is necessary for ppc64 since it has to share at segment
size, plus it will be useful for very large regions.  I did pud for
completeness but you may be right that it's not useful.  It's all
configurable in any event.

 +if ((vma-vm_start = base) 
 +(vma-vm_end = end))
 +return 1;
 
 New Adventures in Coding Style ;)

New Adventures in Typos, actually :)  I'll fix.

 But most seriously: search the patch for the string lock and I find
 no change whatever to locking.  You're introducing page tables shared
 between different mms yet relying on the old mm-page_table_lock?
 You're searching a prio_tree for suitable matches to share, but
 taking no lock on that?  You're counting shares in an atomic,
 but not detecting when the count falls to 0 atomically?
 
 And allied with that point on locking mms: there's no change to rmap.c,
 so how is its TLB flushing and cache flushing now supposed to work?
 page_referenced_one and try_to_unmap_one will visit all the vmas
 sharing the page table, yes, but (usually) only the first will
 satisfy the conditions and get flushed.

I'll go over the locking again.

 I'm not sure if it's worth pursuing shared page tables again or not.

The immediate clear benefits I see are a reduction in the number of page
table pages and a reduction in minor faults.  Keep in mind that faulting a
page into a shared page table makes it available to all other processes
sharing that area, eliminating the need for them to also take faults on it.

 You certainly need to sort the locking out to do so.  Wait a couple
 of weeks and I should have sent all the per-page-table-page locking
 in to -mm (to replace the pte xchging currently there): that should
 give what you need for locking pts independent of the mm.

I'll look things over in more detail.  I thought I had the locking issues
settled, but you raised some points I should revisit.

Dave McCracken

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/