Re: [PATCH 04/19] mm, thp: Preserve pgprot across huge page split

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:08PM +0200, Peter Zijlstra wrote:
> If we marked a THP with our special PROT_NONE protections, ensure we
> don't loose them over a split.
> 
> Collapse seems to always allocate a new (huge) page which should
> already end up on the new target node so loosing protections there
> isn't a problem.

This looks an optimization too, as it reduces a few branches.

If you didn't introduce an unnecessary goto it would have made the
actual change more readable and the patch much smaller. (you could
have cleaned it up with a later patch if you disliked the codying
style that tried to avoid using unnecessary gotos)

The s/barrier/ACCESS_ONCE/ I'll merge it in my tree as a separate
commit, as it's not related to sched-numa.

> 
> Cc: Rik van Riel 
> Cc: Paul Turner 
> Signed-off-by: Peter Zijlstra 
> ---
>  arch/x86/include/asm/pgtable.h |1 
>  mm/huge_memory.c   |  104 
> +++--
>  2 files changed, 50 insertions(+), 55 deletions(-)
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -350,6 +350,7 @@ static inline pgprot_t pgprot_modify(pgp
>  }
>  
>  #define pte_pgprot(x) __pgprot(pte_flags(x) & PTE_FLAGS_MASK)
> +#define pmd_pgprot(x) __pgprot(pmd_val(x) & ~_HPAGE_CHG_MASK)
>  
>  #define canon_pgprot(p) __pgprot(massage_pgprot(p))
>  
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1353,64 +1353,60 @@ static int __split_huge_page_map(struct 
>   int ret = 0, i;
>   pgtable_t pgtable;
>   unsigned long haddr;
> + pgprot_t prot;
>  
>   spin_lock(&mm->page_table_lock);
>   pmd = page_check_address_pmd(page, mm, address,
>PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG);
> - if (pmd) {
> - pgtable = get_pmd_huge_pte(mm);
> - pmd_populate(mm, &_pmd, pgtable);
> -
> - for (i = 0, haddr = address; i < HPAGE_PMD_NR;
> -  i++, haddr += PAGE_SIZE) {
> - pte_t *pte, entry;
> - BUG_ON(PageCompound(page+i));
> - entry = mk_pte(page + i, vma->vm_page_prot);
> - entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> - if (!pmd_write(*pmd))
> - entry = pte_wrprotect(entry);
> - else
> - BUG_ON(page_mapcount(page) != 1);
> - if (!pmd_young(*pmd))
> - entry = pte_mkold(entry);
> - pte = pte_offset_map(&_pmd, haddr);
> - BUG_ON(!pte_none(*pte));
> - set_pte_at(mm, haddr, pte, entry);
> - pte_unmap(pte);
> - }
> + if (!pmd)
> + goto unlock;
>  
> - smp_wmb(); /* make pte visible before pmd */
> - /*
> -  * Up to this point the pmd is present and huge and
> -  * userland has the whole access to the hugepage
> -  * during the split (which happens in place). If we
> -  * overwrite the pmd with the not-huge version
> -  * pointing to the pte here (which of course we could
> -  * if all CPUs were bug free), userland could trigger
> -  * a small page size TLB miss on the small sized TLB
> -  * while the hugepage TLB entry is still established
> -  * in the huge TLB. Some CPU doesn't like that. See
> -  * http://support.amd.com/us/Processor_TechDocs/41322.pdf,
> -  * Erratum 383 on page 93. Intel should be safe but is
> -  * also warns that it's only safe if the permission
> -  * and cache attributes of the two entries loaded in
> -  * the two TLB is identical (which should be the case
> -  * here). But it is generally safer to never allow
> -  * small and huge TLB entries for the same virtual
> -  * address to be loaded simultaneously. So instead of
> -  * doing "pmd_populate(); flush_tlb_range();" we first
> -  * mark the current pmd notpresent (atomically because
> -  * here the pmd_trans_huge and pmd_trans_splitting
> -  * must remain set at all times on the pmd until the
> -  * split is complete for this pmd), then we flush the
> -  * SMP TLB and finally we write the non-huge version
> -  * of the pmd entry with pmd_populate.
> -  */
> - set_pmd_at(mm, address, pmd, pmd_mknotpresent(*pmd));
> - flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
> - pmd_populate(mm, pmd, pgtable);
> - ret = 1;
> + prot = pmd_pgprot(*pmd);
> + pgtable = get_pmd_huge_pte(mm);
> + pmd_populate(mm, &_pmd, pgtable);
> +
> + for (i = 0, haddr = address; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) 
> {
> + pte_t *pte, en

Re: [PATCH 04/19] mm, thp: Preserve pgprot across huge page split

2012-07-31 Thread Rik van Riel

On 07/31/2012 03:12 PM, Peter Zijlstra wrote:

Acked-by: Rik van Riel 

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/19] mm, thp: Preserve pgprot across huge page split

2012-07-31 Thread Peter Zijlstra
If we marked a THP with our special PROT_NONE protections, ensure we
don't loose them over a split.

Collapse seems to always allocate a new (huge) page which should
already end up on the new target node so loosing protections there
isn't a problem.

Cc: Rik van Riel 
Cc: Paul Turner 
Signed-off-by: Peter Zijlstra 
---
 arch/x86/include/asm/pgtable.h |1 
 mm/huge_memory.c   |  104 +++--
 2 files changed, 50 insertions(+), 55 deletions(-)
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -350,6 +350,7 @@ static inline pgprot_t pgprot_modify(pgp
 }
 
 #define pte_pgprot(x) __pgprot(pte_flags(x) & PTE_FLAGS_MASK)
+#define pmd_pgprot(x) __pgprot(pmd_val(x) & ~_HPAGE_CHG_MASK)
 
 #define canon_pgprot(p) __pgprot(massage_pgprot(p))
 
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1353,64 +1353,60 @@ static int __split_huge_page_map(struct 
int ret = 0, i;
pgtable_t pgtable;
unsigned long haddr;
+   pgprot_t prot;
 
spin_lock(&mm->page_table_lock);
pmd = page_check_address_pmd(page, mm, address,
 PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG);
-   if (pmd) {
-   pgtable = get_pmd_huge_pte(mm);
-   pmd_populate(mm, &_pmd, pgtable);
-
-   for (i = 0, haddr = address; i < HPAGE_PMD_NR;
-i++, haddr += PAGE_SIZE) {
-   pte_t *pte, entry;
-   BUG_ON(PageCompound(page+i));
-   entry = mk_pte(page + i, vma->vm_page_prot);
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   if (!pmd_write(*pmd))
-   entry = pte_wrprotect(entry);
-   else
-   BUG_ON(page_mapcount(page) != 1);
-   if (!pmd_young(*pmd))
-   entry = pte_mkold(entry);
-   pte = pte_offset_map(&_pmd, haddr);
-   BUG_ON(!pte_none(*pte));
-   set_pte_at(mm, haddr, pte, entry);
-   pte_unmap(pte);
-   }
+   if (!pmd)
+   goto unlock;
 
-   smp_wmb(); /* make pte visible before pmd */
-   /*
-* Up to this point the pmd is present and huge and
-* userland has the whole access to the hugepage
-* during the split (which happens in place). If we
-* overwrite the pmd with the not-huge version
-* pointing to the pte here (which of course we could
-* if all CPUs were bug free), userland could trigger
-* a small page size TLB miss on the small sized TLB
-* while the hugepage TLB entry is still established
-* in the huge TLB. Some CPU doesn't like that. See
-* http://support.amd.com/us/Processor_TechDocs/41322.pdf,
-* Erratum 383 on page 93. Intel should be safe but is
-* also warns that it's only safe if the permission
-* and cache attributes of the two entries loaded in
-* the two TLB is identical (which should be the case
-* here). But it is generally safer to never allow
-* small and huge TLB entries for the same virtual
-* address to be loaded simultaneously. So instead of
-* doing "pmd_populate(); flush_tlb_range();" we first
-* mark the current pmd notpresent (atomically because
-* here the pmd_trans_huge and pmd_trans_splitting
-* must remain set at all times on the pmd until the
-* split is complete for this pmd), then we flush the
-* SMP TLB and finally we write the non-huge version
-* of the pmd entry with pmd_populate.
-*/
-   set_pmd_at(mm, address, pmd, pmd_mknotpresent(*pmd));
-   flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
-   pmd_populate(mm, pmd, pgtable);
-   ret = 1;
+   prot = pmd_pgprot(*pmd);
+   pgtable = get_pmd_huge_pte(mm);
+   pmd_populate(mm, &_pmd, pgtable);
+
+   for (i = 0, haddr = address; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) 
{
+   pte_t *pte, entry;
+
+   BUG_ON(PageCompound(page+i));
+   entry = mk_pte(page + i, prot);
+   entry = pte_mkdirty(entry);
+   if (!pmd_young(*pmd))
+   entry = pte_mkold(entry);
+   pte = pte_offset_map(&_pmd, haddr);
+   BUG_ON(!pte_none(*pte));
+   set_pte_at(mm, haddr, pte, entry);
+   pte_unmap(pte);
}
+
+   smp_wmb(); /* make ptes visible before pmd, see __pte_alloc */
+   /*
+* Up to this point the pmd is present and hug