RE: [PATCH 1/1] Implement shared page tables
--On Thursday, September 01, 2005 18:58:23 -0700 "Chen, Kenneth W" <[EMAIL PROTECTED]> wrote: >> +prio_tree_iter_init(, >i_mmap, >> +vma->vm_start, vma->vm_end); > > > I think this is a bug. The radix priority tree for address_space-> > i_mmap is keyed on vma->vm_pgoff. Your patch uses the vma virtual > address to find a shareable range, Which will always fail a match > even though there is one. > > Do you really have to iterate through all the vma? Can't you just break > out of the while loop on first successful match and populating the pmd? > I would think you will find them to be the same pte page. Or did I miss > some thing? Man, I spaced that whole search code. I was sure I'd tested to make sure it was finding matches. I'll fix all that up in my next release. Dave McCracken - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/1] Implement shared page tables
--On Thursday, September 01, 2005 18:58:23 -0700 Chen, Kenneth W [EMAIL PROTECTED] wrote: +prio_tree_iter_init(iter, mapping-i_mmap, +vma-vm_start, vma-vm_end); I think this is a bug. The radix priority tree for address_space- i_mmap is keyed on vma-vm_pgoff. Your patch uses the vma virtual address to find a shareable range, Which will always fail a match even though there is one. Do you really have to iterate through all the vma? Can't you just break out of the while loop on first successful match and populating the pmd? I would think you will find them to be the same pte page. Or did I miss some thing? Man, I spaced that whole search code. I was sure I'd tested to make sure it was finding matches. I'll fix all that up in my next release. Dave McCracken - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/1] Implement shared page tables
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM > This patch implements page table sharing for all shared memory regions that > span an entire page table page. It supports sharing at multiple page > levels, depending on the architecture. In function pt_share_pte(): > + while ((svma = next_shareable_vma(vma, svma, ))) { > + spgd = pgd_offset(svma->vm_mm, address); > + if (pgd_none(*spgd)) > + continue; > + > + spud = pud_offset(spgd, address); > + if (pud_none(*spud)) > + continue; > + > + spmd = pmd_offset(spud, address); > + if (pmd_none(*spmd)) > + continue; > + page = pmd_page(*spmd); > + pt_increment_share(page); > + pmd_populate(vma->vm_mm, pmd, page); > + } Do you really have to iterate through all the vma? Can't you just break out of the while loop on first successful match and populating the pmd? I would think you will find them to be the same pte page. Or did I miss some thing? --- ./mm/ptshare.c.orig 2005-09-01 21:16:35.311915518 -0700 +++ ./mm/ptshare.c 2005-09-01 21:18:24.629296992 -0700 @@ -200,6 +200,7 @@ pt_share_pte(struct vm_area_struct *vma, page = pmd_page(*spmd); pt_increment_share(page); pmd_populate(vma->vm_mm, pmd, page); + break; } } pte = pte_alloc_map(vma->vm_mm, pmd, address); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/1] Implement shared page tables
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM > This patch implements page table sharing for all shared memory regions that > span an entire page table page. It supports sharing at multiple page > levels, depending on the architecture. > > > This version of the patch supports i386 and x86_64. I have additional > patches to support ppc64, but they are not quite ready for public > consumption. > > > + prio_tree_iter_init(, >i_mmap, > + vma->vm_start, vma->vm_end); I think this is a bug. The radix priority tree for address_space-> i_mmap is keyed on vma->vm_pgoff. Your patch uses the vma virtual address to find a shareable range, Which will always fail a match even though there is one. The following is a quick hack I did to make it work. - Ken --- linux-2.6.13/mm/ptshare.c.orig 2005-09-01 18:58:12.299321918 -0700 +++ linux-2.6.13/mm/ptshare.c 2005-09-01 18:58:39.846196580 -0700 @@ -26,6 +26,11 @@ #include #include +#define RADIX_INDEX(vma) ((vma)->vm_pgoff) +#define VMA_SIZE(vma)(((vma)->vm_end - (vma)->vm_start) >> PAGE_SHIFT) +/* avoid overflow */ +#define HEAP_INDEX(vma) ((vma)->vm_pgoff + (VMA_SIZE(vma) - 1)) + #undef PT_DEBUG #ifndef __PAGETABLE_PMD_FOLDED @@ -173,7 +178,7 @@ pt_share_pte(struct vm_area_struct *vma, address); #endif prio_tree_iter_init(, >i_mmap, - vma->vm_start, vma->vm_end); + RADIX_INDEX(vma), HEAP_INDEX(vma)); while ((svma = next_shareable_vma(vma, svma, ))) { spgd = pgd_offset(svma->vm_mm, address); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/1] Implement shared page tables
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM This patch implements page table sharing for all shared memory regions that span an entire page table page. It supports sharing at multiple page levels, depending on the architecture. This version of the patch supports i386 and x86_64. I have additional patches to support ppc64, but they are not quite ready for public consumption. + prio_tree_iter_init(iter, mapping-i_mmap, + vma-vm_start, vma-vm_end); I think this is a bug. The radix priority tree for address_space- i_mmap is keyed on vma-vm_pgoff. Your patch uses the vma virtual address to find a shareable range, Which will always fail a match even though there is one. The following is a quick hack I did to make it work. - Ken --- linux-2.6.13/mm/ptshare.c.orig 2005-09-01 18:58:12.299321918 -0700 +++ linux-2.6.13/mm/ptshare.c 2005-09-01 18:58:39.846196580 -0700 @@ -26,6 +26,11 @@ #include asm/pgtable.h #include asm/pgalloc.h +#define RADIX_INDEX(vma) ((vma)-vm_pgoff) +#define VMA_SIZE(vma)(((vma)-vm_end - (vma)-vm_start) PAGE_SHIFT) +/* avoid overflow */ +#define HEAP_INDEX(vma) ((vma)-vm_pgoff + (VMA_SIZE(vma) - 1)) + #undef PT_DEBUG #ifndef __PAGETABLE_PMD_FOLDED @@ -173,7 +178,7 @@ pt_share_pte(struct vm_area_struct *vma, address); #endif prio_tree_iter_init(iter, mapping-i_mmap, - vma-vm_start, vma-vm_end); + RADIX_INDEX(vma), HEAP_INDEX(vma)); while ((svma = next_shareable_vma(vma, svma, iter))) { spgd = pgd_offset(svma-vm_mm, address); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/1] Implement shared page tables
Dave McCracken wrote on Tuesday, August 30, 2005 3:13 PM This patch implements page table sharing for all shared memory regions that span an entire page table page. It supports sharing at multiple page levels, depending on the architecture. In function pt_share_pte(): + while ((svma = next_shareable_vma(vma, svma, iter))) { + spgd = pgd_offset(svma-vm_mm, address); + if (pgd_none(*spgd)) + continue; + + spud = pud_offset(spgd, address); + if (pud_none(*spud)) + continue; + + spmd = pmd_offset(spud, address); + if (pmd_none(*spmd)) + continue; + page = pmd_page(*spmd); + pt_increment_share(page); + pmd_populate(vma-vm_mm, pmd, page); + } Do you really have to iterate through all the vma? Can't you just break out of the while loop on first successful match and populating the pmd? I would think you will find them to be the same pte page. Or did I miss some thing? --- ./mm/ptshare.c.orig 2005-09-01 21:16:35.311915518 -0700 +++ ./mm/ptshare.c 2005-09-01 21:18:24.629296992 -0700 @@ -200,6 +200,7 @@ pt_share_pte(struct vm_area_struct *vma, page = pmd_page(*spmd); pt_increment_share(page); pmd_populate(vma-vm_mm, pmd, page); + break; } } pte = pte_alloc_map(vma-vm_mm, pmd, address); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
--On Wednesday, August 31, 2005 12:44:24 +0100 Hugh Dickins <[EMAIL PROTECTED]> wrote: > So you don't have Nick's test at the start of copy_page_range(): > if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) { > if (!vma->anon_vma) > return 0; > } > Experimental, yes, but Linus likes it enough to have fast-tracked it into > his tree for 2.6.14. My guess is that that patch (if its downsides prove > manageable) takes away a lot of the point of shared page tables - > I wonder how much of your "3% improvement". Very little, actually. The test does not create new processes as part of the run. The improvement is due to sharing of existing areas. > I was going to say, doesn't randomize_va_space take away the rest of > the point? But no, it appears "randomize_va_space", as it currently > appears in mainline anyway, is somewhat an exaggeration: it just shifts > the stack a little, with no effect on the rest of the va space. > But if it is to do more later, it may conflict with your interest. I've been considering a future enhancement to my patch where it could share page tables of any areas that share alignment, not just the same virtual address. That might allow sharing with randomization if the randomization aligns things properly. > The pud sharing and pmd sharing: perhaps they complicate the patch for > negligible benefit? The pmd sharing is necessary for ppc64 since it has to share at segment size, plus it will be useful for very large regions. I did pud for completeness but you may be right that it's not useful. It's all configurable in any event. >> +if ((vma->vm_start <= base) && >> +(vma->vm_end >= end)) >> +return 1; >> > New Adventures in Coding Style ;) New Adventures in Typos, actually :) I'll fix. > But most seriously: search the patch for the string "lock" and I find > no change whatever to locking. You're introducing page tables shared > between different mms yet relying on the old mm->page_table_lock? > You're searching a prio_tree for suitable matches to share, but > taking no lock on that? You're counting shares in an atomic, > but not detecting when the count falls to 0 atomically? > > And allied with that point on locking mms: there's no change to rmap.c, > so how is its TLB flushing and cache flushing now supposed to work? > page_referenced_one and try_to_unmap_one will visit all the vmas > sharing the page table, yes, but (usually) only the first will > satisfy the conditions and get flushed. I'll go over the locking again. > I'm not sure if it's worth pursuing shared page tables again or not. The immediate clear benefits I see are a reduction in the number of page table pages and a reduction in minor faults. Keep in mind that faulting a page into a shared page table makes it available to all other processes sharing that area, eliminating the need for them to also take faults on it. > You certainly need to sort the locking out to do so. Wait a couple > of weeks and I should have sent all the per-page-table-page locking > in to -mm (to replace the pte xchging currently there): that should > give what you need for locking pts independent of the mm. I'll look things over in more detail. I thought I had the locking issues settled, but you raised some points I should revisit. Dave McCracken - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
>> They're incompatible, but you could be left to choose one or the other >> via config option. > > Wouldn't need config option: there's /proc/sys/kernel/randomize_va_space > for the whole running system, compatibility check on the ELFs run, and > the infinite stack rlimit: enough ways to suppress randomization if it > doesn't suit you. Even better - much easier to deal with distro stuff if we can do it at runtime. >> 3% on "a certain industry-standard database benchmark" (cough) is huge, >> and we expect the benefit for PPC64 will be larger as we can share the >> underlying hardware PTEs without TLB flushing as well. > > Okay - and you're implying that 3% comes from _using_ the shared page > tables, rather than from avoiding the fork/exit overhead of setting > them up and tearing them down. And it can't use huge TLB pages > because... fragmentation? Yes - as I understand it, that was a straight measurement with/without the patch, and the shmem segment was already using hugetlb (in both cases). Yes, I find that a bit odd as to why as well - they are still trying to get some detailed profiling to explain. M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
On Wed, 31 Aug 2005, Martin J. Bligh wrote: > --Hugh Dickins <[EMAIL PROTECTED]> wrote (on Wednesday, August 31, 2005 > 14:42:38 +0100): > > > > Which is indeed a further disincentive against shared page tables. > > Or shared pagetables a disincentive to randomizing the mmap space ;-) Fair point! > They're incompatible, but you could be left to choose one or the other > via config option. Wouldn't need config option: there's /proc/sys/kernel/randomize_va_space for the whole running system, compatibility check on the ELFs run, and the infinite stack rlimit: enough ways to suppress randomization if it doesn't suit you. > 3% on "a certain industry-standard database benchmark" (cough) is huge, > and we expect the benefit for PPC64 will be larger as we can share the > underlying hardware PTEs without TLB flushing as well. Okay - and you're implying that 3% comes from _using_ the shared page tables, rather than from avoiding the fork/exit overhead of setting them up and tearing them down. And it can't use huge TLB pages because... fragmentation? Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
> > Which is indeed a further disincentive against shared page tables. > > Or shared pagetables a disincentive to randomizing the mmap space ;-) > They're incompatible, but you could be left to choose one or the other > via config option. > > 3% on "a certain industry-standard database benchmark" (cough) is huge, > and we expect the benefit for PPC64 will be larger as we can share the > underlying hardware PTEs without TLB flushing as well. > surely the benchmark people know that the database in question always mmaps the shared area at the address where the first one started it? (if not, could make it so ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
--Hugh Dickins <[EMAIL PROTECTED]> wrote (on Wednesday, August 31, 2005 14:42:38 +0100): > On Wed, 31 Aug 2005, Arjan van de Ven wrote: >> On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote: >> > I was going to say, doesn't randomize_va_space take away the rest of >> > the point? But no, it appears "randomize_va_space", as it currently >> > appears in mainline anyway, is somewhat an exaggeration: it just shifts >> > the stack a little, with no effect on the rest of the va space. >> >> it also randomizes mmaps > > Ah, via PF_RANDOMIZE, yes, thanks: so long as certain conditions are > fulfilled - and my RLIM_INFINITY RLIMIT_STACK has been preventing it. > > And mmaps include shmats: so unless the process specifies non-NULL > shmaddr to attach at, it'll choose a randomized address for that too > (subject to those various conditions). > > Which is indeed a further disincentive against shared page tables. Or shared pagetables a disincentive to randomizing the mmap space ;-) They're incompatible, but you could be left to choose one or the other via config option. 3% on "a certain industry-standard database benchmark" (cough) is huge, and we expect the benefit for PPC64 will be larger as we can share the underlying hardware PTEs without TLB flushing as well. M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
On Wed, 31 Aug 2005, Arjan van de Ven wrote: > On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote: > > I was going to say, doesn't randomize_va_space take away the rest of > > the point? But no, it appears "randomize_va_space", as it currently > > appears in mainline anyway, is somewhat an exaggeration: it just shifts > > the stack a little, with no effect on the rest of the va space. > > it also randomizes mmaps Ah, via PF_RANDOMIZE, yes, thanks: so long as certain conditions are fulfilled - and my RLIM_INFINITY RLIMIT_STACK has been preventing it. And mmaps include shmats: so unless the process specifies non-NULL shmaddr to attach at, it'll choose a randomized address for that too (subject to those various conditions). Which is indeed a further disincentive against shared page tables. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote: > I was going to say, doesn't randomize_va_space take away the rest of > the point? But no, it appears "randomize_va_space", as it currently > appears in mainline anyway, is somewhat an exaggeration: it just shifts > the stack a little, with no effect on the rest of the va space. it also randomizes mmaps - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
On Tue, 30 Aug 2005, Dave McCracken wrote: > > This patch implements page table sharing for all shared memory regions that > span an entire page table page. It supports sharing at multiple page > levels, depending on the architecture. > > Performance testing has shown no degradation with this patch for tests with > small processes. Preliminary tests with large benchmarks have shown as > much as 3% improvement in overall results. Hmm. A few points. > The patch is against 2.6.13. So you don't have Nick's test at the start of copy_page_range(): if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) { if (!vma->anon_vma) return 0; } Experimental, yes, but Linus likes it enough to have fast-tracked it into his tree for 2.6.14. My guess is that that patch (if its downsides prove manageable) takes away a lot of the point of shared page tables - I wonder how much of your "3% improvement". I was going to say, doesn't randomize_va_space take away the rest of the point? But no, it appears "randomize_va_space", as it currently appears in mainline anyway, is somewhat an exaggeration: it just shifts the stack a little, with no effect on the rest of the va space. But if it is to do more later, it may conflict with your interest. The pud sharing and pmd sharing: perhaps they complicate the patch for negligible benefit? > + if ((vma->vm_start <= base) && > + (vma->vm_end >= end)) > + return 1; > New Adventures in Coding Style ;) But most seriously: search the patch for the string "lock" and I find no change whatever to locking. You're introducing page tables shared between different mms yet relying on the old mm->page_table_lock? You're searching a prio_tree for suitable matches to share, but taking no lock on that? You're counting shares in an atomic, but not detecting when the count falls to 0 atomically? And allied with that point on locking mms: there's no change to rmap.c, so how is its TLB flushing and cache flushing now supposed to work? page_referenced_one and try_to_unmap_one will visit all the vmas sharing the page table, yes, but (usually) only the first will satisfy the conditions and get flushed. I'm not sure if it's worth pursuing shared page tables again or not. You certainly need to sort the locking out to do so. Wait a couple of weeks and I should have sent all the per-page-table-page locking in to -mm (to replace the pte xchging currently there): that should give what you need for locking pts independent of the mm. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
On Tue, 30 Aug 2005, Dave McCracken wrote: This patch implements page table sharing for all shared memory regions that span an entire page table page. It supports sharing at multiple page levels, depending on the architecture. Performance testing has shown no degradation with this patch for tests with small processes. Preliminary tests with large benchmarks have shown as much as 3% improvement in overall results. Hmm. A few points. The patch is against 2.6.13. So you don't have Nick's test at the start of copy_page_range(): if (!(vma-vm_flags (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) { if (!vma-anon_vma) return 0; } Experimental, yes, but Linus likes it enough to have fast-tracked it into his tree for 2.6.14. My guess is that that patch (if its downsides prove manageable) takes away a lot of the point of shared page tables - I wonder how much of your 3% improvement. I was going to say, doesn't randomize_va_space take away the rest of the point? But no, it appears randomize_va_space, as it currently appears in mainline anyway, is somewhat an exaggeration: it just shifts the stack a little, with no effect on the rest of the va space. But if it is to do more later, it may conflict with your interest. The pud sharing and pmd sharing: perhaps they complicate the patch for negligible benefit? + if ((vma-vm_start = base) + (vma-vm_end = end)) + return 1; New Adventures in Coding Style ;) But most seriously: search the patch for the string lock and I find no change whatever to locking. You're introducing page tables shared between different mms yet relying on the old mm-page_table_lock? You're searching a prio_tree for suitable matches to share, but taking no lock on that? You're counting shares in an atomic, but not detecting when the count falls to 0 atomically? And allied with that point on locking mms: there's no change to rmap.c, so how is its TLB flushing and cache flushing now supposed to work? page_referenced_one and try_to_unmap_one will visit all the vmas sharing the page table, yes, but (usually) only the first will satisfy the conditions and get flushed. I'm not sure if it's worth pursuing shared page tables again or not. You certainly need to sort the locking out to do so. Wait a couple of weeks and I should have sent all the per-page-table-page locking in to -mm (to replace the pte xchging currently there): that should give what you need for locking pts independent of the mm. Hugh - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote: I was going to say, doesn't randomize_va_space take away the rest of the point? But no, it appears randomize_va_space, as it currently appears in mainline anyway, is somewhat an exaggeration: it just shifts the stack a little, with no effect on the rest of the va space. it also randomizes mmaps - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
On Wed, 31 Aug 2005, Arjan van de Ven wrote: On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote: I was going to say, doesn't randomize_va_space take away the rest of the point? But no, it appears randomize_va_space, as it currently appears in mainline anyway, is somewhat an exaggeration: it just shifts the stack a little, with no effect on the rest of the va space. it also randomizes mmaps Ah, via PF_RANDOMIZE, yes, thanks: so long as certain conditions are fulfilled - and my RLIM_INFINITY RLIMIT_STACK has been preventing it. And mmaps include shmats: so unless the process specifies non-NULL shmaddr to attach at, it'll choose a randomized address for that too (subject to those various conditions). Which is indeed a further disincentive against shared page tables. Hugh - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
--Hugh Dickins [EMAIL PROTECTED] wrote (on Wednesday, August 31, 2005 14:42:38 +0100): On Wed, 31 Aug 2005, Arjan van de Ven wrote: On Wed, 2005-08-31 at 12:44 +0100, Hugh Dickins wrote: I was going to say, doesn't randomize_va_space take away the rest of the point? But no, it appears randomize_va_space, as it currently appears in mainline anyway, is somewhat an exaggeration: it just shifts the stack a little, with no effect on the rest of the va space. it also randomizes mmaps Ah, via PF_RANDOMIZE, yes, thanks: so long as certain conditions are fulfilled - and my RLIM_INFINITY RLIMIT_STACK has been preventing it. And mmaps include shmats: so unless the process specifies non-NULL shmaddr to attach at, it'll choose a randomized address for that too (subject to those various conditions). Which is indeed a further disincentive against shared page tables. Or shared pagetables a disincentive to randomizing the mmap space ;-) They're incompatible, but you could be left to choose one or the other via config option. 3% on a certain industry-standard database benchmark (cough) is huge, and we expect the benefit for PPC64 will be larger as we can share the underlying hardware PTEs without TLB flushing as well. M. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
Which is indeed a further disincentive against shared page tables. Or shared pagetables a disincentive to randomizing the mmap space ;-) They're incompatible, but you could be left to choose one or the other via config option. 3% on a certain industry-standard database benchmark (cough) is huge, and we expect the benefit for PPC64 will be larger as we can share the underlying hardware PTEs without TLB flushing as well. surely the benchmark people know that the database in question always mmaps the shared area at the address where the first one started it? (if not, could make it so ;) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
On Wed, 31 Aug 2005, Martin J. Bligh wrote: --Hugh Dickins [EMAIL PROTECTED] wrote (on Wednesday, August 31, 2005 14:42:38 +0100): Which is indeed a further disincentive against shared page tables. Or shared pagetables a disincentive to randomizing the mmap space ;-) Fair point! They're incompatible, but you could be left to choose one or the other via config option. Wouldn't need config option: there's /proc/sys/kernel/randomize_va_space for the whole running system, compatibility check on the ELFs run, and the infinite stack rlimit: enough ways to suppress randomization if it doesn't suit you. 3% on a certain industry-standard database benchmark (cough) is huge, and we expect the benefit for PPC64 will be larger as we can share the underlying hardware PTEs without TLB flushing as well. Okay - and you're implying that 3% comes from _using_ the shared page tables, rather than from avoiding the fork/exit overhead of setting them up and tearing them down. And it can't use huge TLB pages because... fragmentation? Hugh - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
They're incompatible, but you could be left to choose one or the other via config option. Wouldn't need config option: there's /proc/sys/kernel/randomize_va_space for the whole running system, compatibility check on the ELFs run, and the infinite stack rlimit: enough ways to suppress randomization if it doesn't suit you. Even better - much easier to deal with distro stuff if we can do it at runtime. 3% on a certain industry-standard database benchmark (cough) is huge, and we expect the benefit for PPC64 will be larger as we can share the underlying hardware PTEs without TLB flushing as well. Okay - and you're implying that 3% comes from _using_ the shared page tables, rather than from avoiding the fork/exit overhead of setting them up and tearing them down. And it can't use huge TLB pages because... fragmentation? Yes - as I understand it, that was a straight measurement with/without the patch, and the shmem segment was already using hugetlb (in both cases). Yes, I find that a bit odd as to why as well - they are still trying to get some detailed profiling to explain. M. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] Implement shared page tables
--On Wednesday, August 31, 2005 12:44:24 +0100 Hugh Dickins [EMAIL PROTECTED] wrote: So you don't have Nick's test at the start of copy_page_range(): if (!(vma-vm_flags (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) { if (!vma-anon_vma) return 0; } Experimental, yes, but Linus likes it enough to have fast-tracked it into his tree for 2.6.14. My guess is that that patch (if its downsides prove manageable) takes away a lot of the point of shared page tables - I wonder how much of your 3% improvement. Very little, actually. The test does not create new processes as part of the run. The improvement is due to sharing of existing areas. I was going to say, doesn't randomize_va_space take away the rest of the point? But no, it appears randomize_va_space, as it currently appears in mainline anyway, is somewhat an exaggeration: it just shifts the stack a little, with no effect on the rest of the va space. But if it is to do more later, it may conflict with your interest. I've been considering a future enhancement to my patch where it could share page tables of any areas that share alignment, not just the same virtual address. That might allow sharing with randomization if the randomization aligns things properly. The pud sharing and pmd sharing: perhaps they complicate the patch for negligible benefit? The pmd sharing is necessary for ppc64 since it has to share at segment size, plus it will be useful for very large regions. I did pud for completeness but you may be right that it's not useful. It's all configurable in any event. +if ((vma-vm_start = base) +(vma-vm_end = end)) +return 1; New Adventures in Coding Style ;) New Adventures in Typos, actually :) I'll fix. But most seriously: search the patch for the string lock and I find no change whatever to locking. You're introducing page tables shared between different mms yet relying on the old mm-page_table_lock? You're searching a prio_tree for suitable matches to share, but taking no lock on that? You're counting shares in an atomic, but not detecting when the count falls to 0 atomically? And allied with that point on locking mms: there's no change to rmap.c, so how is its TLB flushing and cache flushing now supposed to work? page_referenced_one and try_to_unmap_one will visit all the vmas sharing the page table, yes, but (usually) only the first will satisfy the conditions and get flushed. I'll go over the locking again. I'm not sure if it's worth pursuing shared page tables again or not. The immediate clear benefits I see are a reduction in the number of page table pages and a reduction in minor faults. Keep in mind that faulting a page into a shared page table makes it available to all other processes sharing that area, eliminating the need for them to also take faults on it. You certainly need to sort the locking out to do so. Wait a couple of weeks and I should have sent all the per-page-table-page locking in to -mm (to replace the pte xchging currently there): that should give what you need for locking pts independent of the mm. I'll look things over in more detail. I thought I had the locking issues settled, but you raised some points I should revisit. Dave McCracken - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/