Re: [PATCH 1/2] arm64: hugetlb: remove the wrong pmd check in find_num_contig()

2016-11-07 Thread Huang Shijie
On Fri, Nov 04, 2016 at 09:48:14AM -0600, Catalin Marinas wrote:
> On Fri, Nov 04, 2016 at 10:52:17AM +0800, Huang Shijie wrote:
> > On Thu, Nov 03, 2016 at 06:16:16PM -0600, Catalin Marinas wrote:
> > > On Thu, Nov 03, 2016 at 10:27:38AM +0800, Huang Shijie wrote:
> > > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > > > index 2e49bd2..4811ef1 100644
> > > > --- a/arch/arm64/mm/hugetlbpage.c
> > > > +++ b/arch/arm64/mm/hugetlbpage.c
> > > > @@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, 
> > > > unsigned long addr,
> > > > return 1;
> > > > }
> > > > pmd = pmd_offset(pud, addr);
> > > > -   if (!pmd_present(*pmd)) {
> > > > -   VM_BUG_ON(!pmd_present(*pmd));
> > > > -   return 1;
> > > > -   }
> > > > if ((pte_t *)pmd == ptep) {
> > > > *pgsize = PMD_SIZE;
> > > > return CONT_PMDS;
> > > 
> > > BTW, for the !pud_present() and !pgd_present() cases, shouldn't
> > > find_num_contig() actually return 0? These are more likely real bugs, so
> > > no point in setting the huge pte.
> > 
> > The kernel will not call the find_num_contig() if the PGD/PUD are empty.
> > Please see the code in the hugetlb_fault().
> > 
> >--
> > ptep = huge_pte_offset(mm, address);
> > if (ptep) {
> > ...
> > } else {
> > ptep = huge_pte_alloc(mm, address, huge_page_size(h));
> > if (!ptep)
> > return VM_FAULT_OOM;
> > }
> >--
> 
> Exactly. So what is the reason for returning 1 if !pgd_present()? Would
I think the author was too cautious for returning 1 if !pgd_present().
:)
> removing the checks entirely or adding BUG() be a better option?
I will remove the checks in the next version.

Thanks
Huang Shijie


Re: [PATCH 1/2] arm64: hugetlb: remove the wrong pmd check in find_num_contig()

2016-11-04 Thread Catalin Marinas
On Fri, Nov 04, 2016 at 10:52:17AM +0800, Huang Shijie wrote:
> On Thu, Nov 03, 2016 at 06:16:16PM -0600, Catalin Marinas wrote:
> > On Thu, Nov 03, 2016 at 10:27:38AM +0800, Huang Shijie wrote:
> > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > > index 2e49bd2..4811ef1 100644
> > > --- a/arch/arm64/mm/hugetlbpage.c
> > > +++ b/arch/arm64/mm/hugetlbpage.c
> > > @@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, 
> > > unsigned long addr,
> > >   return 1;
> > >   }
> > >   pmd = pmd_offset(pud, addr);
> > > - if (!pmd_present(*pmd)) {
> > > - VM_BUG_ON(!pmd_present(*pmd));
> > > - return 1;
> > > - }
> > >   if ((pte_t *)pmd == ptep) {
> > >   *pgsize = PMD_SIZE;
> > >   return CONT_PMDS;
> > 
> > BTW, for the !pud_present() and !pgd_present() cases, shouldn't
> > find_num_contig() actually return 0? These are more likely real bugs, so
> > no point in setting the huge pte.
> 
> The kernel will not call the find_num_contig() if the PGD/PUD are empty.
> Please see the code in the hugetlb_fault().
> 
>--
>   ptep = huge_pte_offset(mm, address);
>   if (ptep) {
>   ...
>   } else {
>   ptep = huge_pte_alloc(mm, address, huge_page_size(h));
>   if (!ptep)
>   return VM_FAULT_OOM;
>   }
>--

Exactly. So what is the reason for returning 1 if !pgd_present()? Would
removing the checks entirely or adding BUG() be a better option?

-- 
Catalin


Re: [PATCH 1/2] arm64: hugetlb: remove the wrong pmd check in find_num_contig()

2016-11-03 Thread Huang Shijie
On Thu, Nov 03, 2016 at 06:16:16PM -0600, Catalin Marinas wrote:
> On Thu, Nov 03, 2016 at 10:27:38AM +0800, Huang Shijie wrote:
> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > index 2e49bd2..4811ef1 100644
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, 
> > unsigned long addr,
> > return 1;
> > }
> > pmd = pmd_offset(pud, addr);
> > -   if (!pmd_present(*pmd)) {
> > -   VM_BUG_ON(!pmd_present(*pmd));
> > -   return 1;
> > -   }
> > if ((pte_t *)pmd == ptep) {
> > *pgsize = PMD_SIZE;
> > return CONT_PMDS;
> 
> BTW, for the !pud_present() and !pgd_present() cases, shouldn't
The kernel will not call the find_num_contig() if the PGD/PUD are empty.
Please see the code in the hugetlb_fault().

   --
ptep = huge_pte_offset(mm, address);
if (ptep) {
...
} else {
ptep = huge_pte_alloc(mm, address, huge_page_size(h));
if (!ptep)
return VM_FAULT_OOM;
}
   --


Thanks
Huang Shijie
> find_num_contig() actually return 0? These are more likely real bugs, so
> no point in setting the huge pte.


Re: [PATCH 1/2] arm64: hugetlb: remove the wrong pmd check in find_num_contig()

2016-11-03 Thread Catalin Marinas
On Thu, Nov 03, 2016 at 10:27:38AM +0800, Huang Shijie wrote:
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 2e49bd2..4811ef1 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, unsigned 
> long addr,
>   return 1;
>   }
>   pmd = pmd_offset(pud, addr);
> - if (!pmd_present(*pmd)) {
> - VM_BUG_ON(!pmd_present(*pmd));
> - return 1;
> - }
>   if ((pte_t *)pmd == ptep) {
>   *pgsize = PMD_SIZE;
>   return CONT_PMDS;

BTW, for the !pud_present() and !pgd_present() cases, shouldn't
find_num_contig() actually return 0? These are more likely real bugs, so
no point in setting the huge pte.

-- 
Catalin


[PATCH 1/2] arm64: hugetlb: remove the wrong pmd check in find_num_contig()

2016-11-02 Thread Huang Shijie
The find_num_contig() will return 1 when the pmd is not present.
It will cause a kernel dead loop in the following scenaro:

   1.) pmd entry is not present.

   2.) the page fault occurs:
   ... hugetlb_fault() --> hugetlb_no_page() --> set_huge_pte_at()

   3.) set_huge_pte_at() will only set the first PMD entry, since the
   find_num_contig just return 1 in this case. So the PMD entries
   are all empty except the first one.

   4.) when kernel accesses the address mapped by the second PMD entry,
   a new page fault occurs:
   ... hugetlb_fault() --> huge_ptep_set_access_flags()

   The second PMD entry is still empty now.

   5.) When the kernel returns, the access will cause a page fault again.
   The kernel will run like the "4)" above.
   We will see a dead loop since here.

The dead loop is caught in the 32M hugetlb page (2M PMD + Contiguous bit).

This patch removes wrong pmd check, and fixes this dead loop.

Acked-by: Steve Capper 
Signed-off-by: Huang Shijie 
---
 arch/arm64/mm/hugetlbpage.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 2e49bd2..4811ef1 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, unsigned 
long addr,
return 1;
}
pmd = pmd_offset(pud, addr);
-   if (!pmd_present(*pmd)) {
-   VM_BUG_ON(!pmd_present(*pmd));
-   return 1;
-   }
if ((pte_t *)pmd == ptep) {
*pgsize = PMD_SIZE;
return CONT_PMDS;
-- 
2.5.5