Re: [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type

2022-01-20 Thread Joao Martins
On 1/10/22 22:31, Alex Sierra wrote:
> Avoid long term pinning for Coherent device type pages. This could
> interfere with their own device memory manager. For now, we are just
> returning error for PIN_LONGTERM Coherent device type pages. Eventually,
> these type of pages will get migrated to system memory, once the device
> migration pages support is added.
> 
> Signed-off-by: Alex Sierra 
> ---
>  mm/gup.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 886d6148d3d0..9c8a075d862d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1720,6 +1720,12 @@ static long check_and_migrate_movable_pages(unsigned 
> long nr_pages,
>* If we get a movable page, since we are going to be pinning
>* these entries, try to move them out if possible.
>*/
> + if (is_device_page(head)) {
> + WARN_ON_ONCE(is_device_private_page(head));
> + ret = -EFAULT;
> + goto unpin_pages;
> + }
> +

Wouldn't be more efficient for you failing earlier instead of after all the 
pages are pinned?

Filesystem DAX suffers from a somewhat similar issue[0] -- albeit it's more 
related to
blocking FOLL_LONGTERM in gup-fast while gup-slow can still do it. Coherent 
devmap appears
to want to block it in all gup.

On another thread Jason was suggesting about having different pgmap::flags to 
capture
these special cases[1] instead of selecting what different pgmap types can do 
in various
different places.

[0] 
https://lore.kernel.org/linux-mm/6a18179e-65f7-367d-89a9-d5162f10f...@oracle.com/
[1] https://lore.kernel.org/linux-mm/20211019160136.gh3686...@ziepe.ca/


Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-20 Thread Joao Martins
On 10/20/21 18:12, Dan Williams wrote:
> On Wed, Oct 20, 2021 at 10:09 AM Joao Martins  
> wrote:
>> On 10/19/21 20:21, Dan Williams wrote:
>>> On Tue, Oct 19, 2021 at 9:02 AM Jason Gunthorpe  wrote:
>>>> On Tue, Oct 19, 2021 at 04:13:34PM +0100, Joao Martins wrote:
>>>>> On 10/19/21 00:06, Jason Gunthorpe wrote:
>>>>>> On Mon, Oct 18, 2021 at 12:37:30PM -0700, Dan Williams wrote:
>>>>> Whats the benefit between preventing longterm at start
>>>>> versus only after mounting the filesystem? Or is the intended future 
>>>>> purpose
>>>>> to pass more context into an holder potential future callback e.g. nack 
>>>>> longterm
>>>>> pins on a page basis?
>>>>
>>>> I understood Dan's remark that the device-dax path allows
>>>> FOLL_LONGTERM and the FSDAX path does not ?
>>>>
>>>> Which, IIRC, today is signaled basd on vma properties and in all cases
>>>> fast-gup is denied.
>>>
>>> Yeah, I forgot that 7af75561e171 eliminated any possibility of
>>> longterm-gup-fast for device-dax, let's not disturb that status quo.
>>>
>> I am slightly confused by this comment -- the status quo is what we are
>> questioning here -- And we talked about changing that in the past too
>> (thread below), that longterm-gup-fast was an oversight that that commit
>> was only applicable to fsdax. [Maybe this is just my english confusion]
> 
> No, you have it correct. However that "regression" has gone unnoticed,
> so unless there is data showing that gup-fast on device-dax is
> critical for longterm page pinning workflows I'm ok for longterm to
> continue to trigger gup-slow.
> 
To be fair, it's not surprising that nobody noticed -- my general intent
was just to special-case less for device-dax. Not many places use
pin_user_pages_fast(FOLL_LONGTERM). This is only exposed on those
few cases that do try to use it (e.g. RDMA/IB, vDPA), and specifically
when the page fault occurs (that requires fallback-ing to gup-slow) at a
higher level than the amount you're pinning e.g. pinning in 2M extents on a
device-dax of 1G pagesize. Pin size is limited to a 2M extent at a time by the
users I mentioned above -- regardless of the total size of the extent you will
be pinning (i.e. 512 struct pages pointers fit one page). But even with all 
this,
this [FOLL_LONGTERM on pin-fast] would still go unnoticed because gup-fast
on devmap is just as slow as gup :)



Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-20 Thread Joao Martins
On 10/19/21 20:21, Dan Williams wrote:
> On Tue, Oct 19, 2021 at 9:02 AM Jason Gunthorpe  wrote:
>>
>> On Tue, Oct 19, 2021 at 04:13:34PM +0100, Joao Martins wrote:
>>> On 10/19/21 00:06, Jason Gunthorpe wrote:
>>>> On Mon, Oct 18, 2021 at 12:37:30PM -0700, Dan Williams wrote:
>>>>
>>>>>> device-dax uses PUD, along with TTM, they are the only places. I'm not
>>>>>> sure TTM is a real place though.
>>>>>
>>>>> I was setting device-dax aside because it can use Joao's changes to
>>>>> get compound-page support.
>>>>
>>>> Ideally, but that ideas in that patch series have been floating around
>>>> for a long time now..
>>>>
>>> The current status of the series misses a Rb on patches 6,7,10,12-14.
>>> Well, patch 8 too should now drop its tag, considering the latest
>>> discussion.
>>>
>>> If it helps moving things forward I could split my series further into:
>>>
>>> 1) the compound page introduction (patches 1-7) of my aforementioned series
>>> 2) vmemmap deduplication for memory gains (patches 9-14)
>>> 3) gup improvements (patch 8 and gup-slow improvements)
>>
>> I would split it, yes..
>>
>> I think we can see a general consensus that making compound_head/etc
>> work consistently with how THP uses it will provide value and
>> opportunity for optimization going forward.
>>

I'll go do that. Meanwhile, I'll wait a couple days for Dan to review the
dax subsystem patches (6 & 7), or otherwise send them over.

>>> Whats the benefit between preventing longterm at start
>>> versus only after mounting the filesystem? Or is the intended future purpose
>>> to pass more context into an holder potential future callback e.g. nack 
>>> longterm
>>> pins on a page basis?
>>
>> I understood Dan's remark that the device-dax path allows
>> FOLL_LONGTERM and the FSDAX path does not ?
>>
>> Which, IIRC, today is signaled basd on vma properties and in all cases
>> fast-gup is denied.
> 
> Yeah, I forgot that 7af75561e171 eliminated any possibility of
> longterm-gup-fast for device-dax, let's not disturb that status quo.
> 
I am slightly confused by this comment -- the status quo is what we are
questioning here -- And we talked about changing that in the past too
(thread below), that longterm-gup-fast was an oversight that that commit
was only applicable to fsdax. [Maybe this is just my english confusion]

>>> Maybe we can start by at least not add any flags and just prevent
>>> FOLL_LONGTERM on fsdax -- which I guess was the original purpose of
>>> commit 7af75561e171 ("mm/gup: add FOLL_LONGTERM capability to GUP fast").
>>> This patch (which I can formally send) has a sketch of that (below scissors 
>>> mark):
>>>
>>> https://lore.kernel.org/linux-mm/6a18179e-65f7-367d-89a9-d5162f10f...@oracle.com/
>>
>> Yes, basically, whatever test we want for 'deny fast gup foll
>> longterm' is fine.
>>
>> Personally I'd like to see us move toward a set of flag specifying
>> each special behavior and not a collection of types that imply special
>> behaviors.
>>
>> Eg we have at least:
>>  - Block gup fast on foll_longterm
>>  - Capture the refcount ==1 and use the pgmap free hook
>>(confusingly called page_is_devmap_managed())
>>  - Always use a swap entry
>>  - page->index/mapping are used in the usual file based way?
>>
>> Probably more things..
> 
> Yes, agree with the principle of reducing type-implied special casing.
> 
OK.

Moving from implicit devmap types to pgmap::flags is rather simple fixup.
And I suppose (respectivally) PGMAP_NO_PINF_LONGTERM, PGMAP_MANAGED_FREE_PAGE,
PGMAP_USE_SWAP_ENTRY, etc, etc.



Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-19 Thread Joao Martins
On 10/19/21 00:06, Jason Gunthorpe wrote:
> On Mon, Oct 18, 2021 at 12:37:30PM -0700, Dan Williams wrote:
> 
>>> device-dax uses PUD, along with TTM, they are the only places. I'm not
>>> sure TTM is a real place though.
>>
>> I was setting device-dax aside because it can use Joao's changes to
>> get compound-page support.
> 
> Ideally, but that ideas in that patch series have been floating around
> for a long time now..
>  
The current status of the series misses a Rb on patches 6,7,10,12-14.
Well, patch 8 too should now drop its tag, considering the latest
discussion.

If it helps moving things forward I could split my series further into:

1) the compound page introduction (patches 1-7) of my aforementioned series
2) vmemmap deduplication for memory gains (patches 9-14)
3) gup improvements (patch 8 and gup-slow improvements)

The reason being that item 1) is the the main dependency listed below.
And allows 2) and 3) to be parallelized. FWIW, it is almost fully reviewed
by Dan (as of v3->v4). For (1) patches 6 & 7 are on changes to
device-dax subsystem (drivers/dax/*) which still needs his Ack.

>>> Here I imagine the thing that creates the pgmap would specify the
>>> policy it wants. In most cases the policy is tightly coupled to what
>>> the free function in the the provided dev_pagemap_ops does..
>>
>> The thing that creates the pgmap is the device-driver, and
>> device-driver does not implement truncate or reclaim. It's not until
>> the FS mounts that the pgmap needs to start enforcing pin lifetime
>> guarantees.
> 
> I am explaining this wrong, the immediate need is really 'should
> foll_longterm fail fast-gup to the slow path' and something like the
> nvdimm driver can just set that to 1 and rely on VMA flags to control
> what the slow path does - as is today.
> 
> It is not as elegant as more flags in the pgmap, but it would get the
> job done with minimal fuss.
> 
> Might be nice to either rely fully on VMA flags or fully on pgmap
> holder flags for FOLL_LONGTERM?
>

Whats the benefit between preventing longterm at start
versus only after mounting the filesystem? Or is the intended future purpose
to pass more context into an holder potential future callback e.g. nack longterm
pins on a page basis?

Maybe we can start by at least not add any flags and just prevent
FOLL_LONGTERM on fsdax -- which I guess was the original purpose of
commit 7af75561e171 ("mm/gup: add FOLL_LONGTERM capability to GUP fast").
This patch (which I can formally send) has a sketch of that (below scissors 
mark):

https://lore.kernel.org/linux-mm/6a18179e-65f7-367d-89a9-d5162f10f...@oracle.com/

It uses pgmap->type rather than adding further fields into pgmap, given this
restriction applies only to fsdax.

... and then we could improve devmap_longterm_available(pgmap) to look at the
holder::flags or pgmap::flags should we decide that an explicit flags is 
required
from holder/pgmap .. as a further improvement?

Joao