> On Jul 31, 2019, at 8:18 AM, Oleg Nesterov <[email protected]> wrote:
> 
> On 07/30, Song Liu wrote:
>> 
>> 
>>> On Jul 30, 2019, at 9:11 AM, Oleg Nesterov <[email protected]> wrote:
>>> 
>>> So after the next patch we have a single user of FOLL_SPLIT_PMD (uprobes)
>>> and a single user of FOLL_SPLIT: arch/s390/mm/gmap.c:thp_split_mm().
>>> 
>>> Hmm.
>> 
>> I think this is what we want. :)
> 
> We? I don't ;)
> 
>> FOLL_SPLIT is the fallback solution for users who cannot handle THP.
> 
> and again, we have a single user: thp_split_mm(). I do not know if it
> can use FOLL_SPLIT_PMD or not, may be you can take a look...

I haven't played with s390, so it gonna take me some time to ramp up. 
I will add it to my to-do list. 

> 
>> With
>> more THP aware code, there will be fewer users of FOLL_SPLIT.
> 
> Fewer than 1? Good ;)

Yes! It will be great if thp_split_mm() can use FOLL_SPLIT_PMD 
instead. 

> 
>>>> @@ -399,7 +399,7 @@ static struct page *follow_pmd_mask(struct 
>>>> vm_area_struct *vma,
>>>>            spin_unlock(ptl);
>>>>            return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
>>>>    }
>>>> -  if (flags & FOLL_SPLIT) {
>>>> +  if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
>>>>            int ret;
>>>>            page = pmd_page(*pmd);
>>>>            if (is_huge_zero_page(page)) {
>>>> @@ -408,7 +408,7 @@ static struct page *follow_pmd_mask(struct 
>>>> vm_area_struct *vma,
>>>>                    split_huge_pmd(vma, pmd, address);
>>>>                    if (pmd_trans_unstable(pmd))
>>>>                            ret = -EBUSY;
>>>> -          } else {
>>>> +          } else if (flags & FOLL_SPLIT) {
>>>>                    if (unlikely(!try_get_page(page))) {
>>>>                            spin_unlock(ptl);
>>>>                            return ERR_PTR(-ENOMEM);
>>>> @@ -420,6 +420,10 @@ static struct page *follow_pmd_mask(struct 
>>>> vm_area_struct *vma,
>>>>                    put_page(page);
>>>>                    if (pmd_none(*pmd))
>>>>                            return no_page_table(vma, flags);
>>>> +          } else {  /* flags & FOLL_SPLIT_PMD */
>>>> +                  spin_unlock(ptl);
>>>> +                  split_huge_pmd(vma, pmd, address);
>>>> +                  ret = pte_alloc(mm, pmd);
>>> 
>>> I fail to understand why this differs from the is_huge_zero_page() case 
>>> above.
>> 
>> split_huge_pmd() handles is_huge_zero_page() differently. In this case, we
>> cannot use the pmd_trans_unstable() check.
> 
> Please correct me, but iiuc the problem is not that split_huge_pmd() handles
> is_huge_zero_page() differently, the problem is that __split_huge_pmd_locked()
> handles the !vma_is_anonymous(vma) differently and returns with pmd_none() = T
> after pmdp_huge_clear_flush_notify(). This means that pmd_trans_unstable() 
> will
> fail.

Agreed. 

> 
> Now, I don't understand why do we need pmd_trans_unstable() after
> split_huge_pmd(huge-zero-pmd), but whatever reason we have, why can't we
> unify both cases?
> 
> IOW, could you explain why the path below is wrong?

I _think_ the following patch works (haven't fully tested yet). But I am not 
sure whether this is the best. By separating the two cases, we don't duplicate 
much code. And it is clear that the two cases are handled differently. 
Therefore, I would prefer to keep these separate for now. 

Thanks,
Song

> 
> 
> --- x/mm/gup.c
> +++ x/mm/gup.c
> @@ -399,14 +399,16 @@ static struct page *follow_pmd_mask(struct 
> vm_area_struct *vma,
>               spin_unlock(ptl);
>               return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
>       }
> -     if (flags & FOLL_SPLIT) {
> +     if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
>               int ret;
>               page = pmd_page(*pmd);
> -             if (is_huge_zero_page(page)) {
> +             if ((flags & FOLL_SPLIT_PMD) || is_huge_zero_page(page)) {
>                       spin_unlock(ptl);
> -                     ret = 0;
>                       split_huge_pmd(vma, pmd, address);
> -                     if (pmd_trans_unstable(pmd))
> +                     ret = 0;
> +                     if (pte_alloc(mm, pmd))
> +                             ret = -ENOMEM;
> +                     else if (pmd_trans_unstable(pmd))
>                               ret = -EBUSY;
>               } else {
>                       if (unlikely(!try_get_page(page))) {
> 

Reply via email to