Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-20 Thread Christoph Hellwig
On Thu, Aug 19, 2021 at 03:59:56PM -0400, Felix Kuehling wrote:
> I got lost trying to understand how DAX counts page references and how
> the PTE_SPECIAL option affects that. Theodore, can you help with this?
> Is there an easy way to test without CONFIG_ARCH_HAS_PTE_SPECIAL on x86,
> or do we need to test on a CPU architecture that doesn't support this
> feature?

I think the right answer is to simplify disallow ZONE_DEVICE pages
if ARCH_HAS_PTE_SPECIAL is not supported.  ARCH_HAS_PTE_SPECIAL is
supported by all modern architecture ports than can make use of
ZONE_DEVICE / dev_pagemap, so we can avoid this pocket of barely
testable code entirely:


diff --git a/mm/Kconfig b/mm/Kconfig
index 40a9bfcd5062e1..2823bbfd1c8c70 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -775,6 +775,7 @@ config ZONE_DMA32
 
 config ZONE_DEVICE
bool "Device memory (pmem, HMM, etc...) hotplug support"
+   depends on ARCH_HAS_PTE_SPECIAL
depends on MEMORY_HOTPLUG
depends on MEMORY_HOTREMOVE
depends on SPARSEMEM_VMEMMAP


Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-20 Thread Christoph Hellwig
On Wed, Aug 18, 2021 at 12:28:30PM -0700, Ralph Campbell wrote:
> Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined?
> In that case, mmap() of a DAX device will call insert_page() which calls
> get_page() which would trigger VM_BUG_ON_PAGE().

__vm_insert_mixed still ends up calling insert_pfn for the
!CASE_ARCH_HAS_PTE_SPECIAL if pfn_t_devmap() is true, which it should
be for DAX.  (and as said in my other mail, I suspect we should disallow
that case anyway, as no one can test it in practice).


Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-20 Thread Jerome Glisse
On Thu, Aug 19, 2021 at 11:00 AM Sierra Guiza, Alejandro (Alex)
 wrote:
>
>
> On 8/18/2021 2:28 PM, Ralph Campbell wrote:
> > On 8/17/21 5:35 PM, Felix Kuehling wrote:
> >> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:
> >>> On 8/12/21 11:31 PM, Alex Sierra wrote:
>  From: Ralph Campbell 
> 
>  ZONE_DEVICE struct pages have an extra reference count that
>  complicates the
>  code for put_page() and several places in the kernel that need to
>  check the
>  reference count to see that a page is not being used (gup, compaction,
>  migration, etc.). Clean up the code so the reference count doesn't
>  need to
>  be treated specially for ZONE_DEVICE.
> 
>  v2:
>  AS: merged this patch in linux 5.11 version
> 
>  v5:
>  AS: add condition at try_grab_page to check for the zone device type,
>  while
>  page ref counter is checked less/equal to zero. In case of device
>  zone, pages
>  ref counter are initialized to zero.
> 
>  Signed-off-by: Ralph Campbell 
>  Signed-off-by: Alex Sierra 
>  ---
> arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
> drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
> fs/dax.c   |  4 +-
> include/linux/dax.h|  2 +-
> include/linux/memremap.h   |  7 +--
> include/linux/mm.h | 13 +
> lib/test_hmm.c |  2 +-
> mm/internal.h  |  8 +++
> mm/memremap.c  | 68
>  +++---
> mm/migrate.c   |  5 --
> mm/page_alloc.c|  3 ++
> mm/swap.c  | 45 ++---
> 12 files changed, 46 insertions(+), 115 deletions(-)
> 
> >>> I haven't seen a response to the issues I raised back at v3 of this
> >>> series.
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-mm%2F4f6dd918-d79b-1aa7-3a4c-caa67ddc29bc%40nvidia.com%2F&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P7FxYm%2BkJaCkMFa3OHtuKrPOn7SvytFRmYQdIzq7rN4%3D&reserved=0
> >>>
> >>>
> >>>
> >>> Did I miss something?
> >> I think part of the response was that we did more testing. Alex added
> >> support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests
> >> recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE
> >> about a zero page refcount in try_get_page. The fix is in the latest
> >> version of patch 2. But it's already obsolete because John Hubbard is
> >> about to remove that function altogether.
> >>
> >> I think the issues you raised were more uncertainty than known bugs. It
> >> seems the fact that you can have DAX pages with 0 refcount is a feature
> >> more than a bug.
> >>
> >> Regards,
> >>Felix
> >
> > Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined?
> > In that case, mmap() of a DAX device will call insert_page() which calls
> > get_page() which would trigger VM_BUG_ON_PAGE().
> >
> > I can believe it is OK for PTE_SPECIAL page table entries to have no
> > struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with
> > a zero reference count using insert_pfn().
> Hi Ralph,
> We have tried the DAX tests with and without CONFIG_ARCH_HAS_PTE_SPECIAL
> defined.
> Apparently none of the tests touches that condition for a DAX device. Of
> course,
> that doesn't mean it could happen.
>
> Regards,
> Alex S.
>
> >
> >
> > I find it hard to believe that other MM developers don't see an issue
> > with a struct page with refcount == 0 and mapcount == 1.
> >
> > I don't see where init_page_count() is being called for the
> > MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD
> > driver allocates and passes to migrate_vma_setup().
> > Looks like svm_migrate_get_vram_page() needs to call init_page_count()
> > instead of get_page(). (I'm looking at branch
> > origin/alexsierrag/device_generic
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver.git&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IXe8HP2s8x5OdJdERBkGOYJCQk3iqCu5AYkwpDL8zec%3D&reserved=0)
> Yes, you're right. My bad. Thanks for catching this up. I didn't realize
> I was missing
> to define CONFIG_DEBUG_VM on my build. Therefore this BUG was never caught.
> It worked after I replaced get_pages by init_page_count at
> svm_migrate_get_vram_page. However, I don't think this is the

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-19 Thread Jerome Glisse
Note that you do not want GUP to succeed on device page, i do not see
where that is handled in the new code.

On Sun, Aug 15, 2021 at 1:40 PM John Hubbard  wrote:
>
> On 8/15/21 8:37 AM, Christoph Hellwig wrote:
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index 8ae31622deef..d48a1f0889d1 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -1218,7 +1218,7 @@ __maybe_unused struct page 
> >> *try_grab_compound_head(struct page *page, int refs,
> >>   static inline __must_check bool try_get_page(struct page *page)
> >>   {
> >>  page = compound_head(page);
> >> -if (WARN_ON_ONCE(page_ref_count(page) <= 0))
> >> +if (WARN_ON_ONCE(page_ref_count(page) < 
> >> (int)!is_zone_device_page(page)))
> >
> > Please avoid the overly long line.  In fact I'd be tempted to just not
> > bother here and keep the old, more lose check.  Especially given that
> > John has a patch ready that removes try_get_page entirely.
> >
>
> Yes. Andrew has accepted it into mmotm.
>
> Ralph's patch here was written well before my cleanup that removed
> try_grab_page() [1]. But now that we're here, if you drop this hunk then
> it will make merging easier, I think.
>
>
> [1] https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com
>
> thanks,
> --
> John Hubbard
> NVIDIA
>


Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-19 Thread Felix Kuehling
Am 2021-08-19 um 2:00 p.m. schrieb Sierra Guiza, Alejandro (Alex):
>
> On 8/18/2021 2:28 PM, Ralph Campbell wrote:
>> On 8/17/21 5:35 PM, Felix Kuehling wrote:
>>> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:
 On 8/12/21 11:31 PM, Alex Sierra wrote:
> From: Ralph Campbell 
>
> ZONE_DEVICE struct pages have an extra reference count that
> complicates the
> code for put_page() and several places in the kernel that need to
> check the
> reference count to see that a page is not being used (gup,
> compaction,
> migration, etc.). Clean up the code so the reference count doesn't
> need to
> be treated specially for ZONE_DEVICE.
>
> v2:
> AS: merged this patch in linux 5.11 version
>
> v5:
> AS: add condition at try_grab_page to check for the zone device type,
> while
> page ref counter is checked less/equal to zero. In case of device
> zone, pages
> ref counter are initialized to zero.
>
> Signed-off-by: Ralph Campbell 
> Signed-off-by: Alex Sierra 
> ---
>    arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
>    drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
>    fs/dax.c   |  4 +-
>    include/linux/dax.h    |  2 +-
>    include/linux/memremap.h   |  7 +--
>    include/linux/mm.h | 13 +
>    lib/test_hmm.c |  2 +-
>    mm/internal.h  |  8 +++
>    mm/memremap.c  | 68
> +++---
>    mm/migrate.c   |  5 --
>    mm/page_alloc.c    |  3 ++
>    mm/swap.c  | 45 ++---
>    12 files changed, 46 insertions(+), 115 deletions(-)
>
 I haven't seen a response to the issues I raised back at v3 of this
 series.
 https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/



 Did I miss something?
>>> I think part of the response was that we did more testing. Alex added
>>> support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests
>>> recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE
>>> about a zero page refcount in try_get_page. The fix is in the latest
>>> version of patch 2. But it's already obsolete because John Hubbard is
>>> about to remove that function altogether.
>>>
>>> I think the issues you raised were more uncertainty than known bugs. It
>>> seems the fact that you can have DAX pages with 0 refcount is a feature
>>> more than a bug.
>>>
>>> Regards,
>>>    Felix
>>
>> Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined?
>> In that case, mmap() of a DAX device will call insert_page() which calls
>> get_page() which would trigger VM_BUG_ON_PAGE().
>>
>> I can believe it is OK for PTE_SPECIAL page table entries to have no
>> struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with
>> a zero reference count using insert_pfn().
> Hi Ralph,
> We have tried the DAX tests with and without
> CONFIG_ARCH_HAS_PTE_SPECIAL defined.
> Apparently none of the tests touches that condition for a DAX device.
> Of course,
> that doesn't mean it could happen.
>
> Regards,
> Alex S.
>
>>
>>
>> I find it hard to believe that other MM developers don't see an issue
>> with a struct page with refcount == 0 and mapcount == 1.
>>
>> I don't see where init_page_count() is being called for the
>> MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD
>> driver allocates and passes to migrate_vma_setup().
>> Looks like svm_migrate_get_vram_page() needs to call init_page_count()
>> instead of get_page(). (I'm looking at branch
>> origin/alexsierrag/device_generic
>> https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver.git
> Yes, you're right. My bad. Thanks for catching this up. I didn't
> realize I was missing
> to define CONFIG_DEBUG_VM on my build. Therefore this BUG was never
> caught.
> It worked after I replaced get_pages by init_page_count at
> svm_migrate_get_vram_page. However, I don't think this is the best way
> to fix it.
> Ideally, get_pages call should work for device pages with ref count
> equal to 0
> too. Otherwise, we could overwrite refcounter if someone else is
> grabbing the page
> concurrently.

I think using init_page_count in svm_migrate_get_vram_page is the right
answer. This is where the page first gets allocated and initialized
(data migrated into it). I think nobody should have or try to take a
reference to the page before that. We should probably also add a
VM_BUG_ON_PAGE(page_ref_count(page) != 0) before calling init_page_count
to make sure of that.


> I was thinking to add a special condition in get_pages for dev pages.
> This could
> also fix the insert_page -> get_page call from a DAX device.

[+Theodore]

I got lost trying to understand how DAX counts page references an

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-19 Thread Sierra Guiza, Alejandro (Alex)



On 8/18/2021 2:28 PM, Ralph Campbell wrote:

On 8/17/21 5:35 PM, Felix Kuehling wrote:

Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:

On 8/12/21 11:31 PM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that
complicates the
code for put_page() and several places in the kernel that need to
check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't
need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type,
while
page ref counter is checked less/equal to zero. In case of device
zone, pages
ref counter are initialized to zero.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
   arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
   fs/dax.c   |  4 +-
   include/linux/dax.h    |  2 +-
   include/linux/memremap.h   |  7 +--
   include/linux/mm.h | 13 +
   lib/test_hmm.c |  2 +-
   mm/internal.h  |  8 +++
   mm/memremap.c  | 68 
+++---

   mm/migrate.c   |  5 --
   mm/page_alloc.c    |  3 ++
   mm/swap.c  | 45 ++---
   12 files changed, 46 insertions(+), 115 deletions(-)


I haven't seen a response to the issues I raised back at v3 of this
series.
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-mm%2F4f6dd918-d79b-1aa7-3a4c-caa67ddc29bc%40nvidia.com%2F&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P7FxYm%2BkJaCkMFa3OHtuKrPOn7SvytFRmYQdIzq7rN4%3D&reserved=0 




Did I miss something?

I think part of the response was that we did more testing. Alex added
support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests
recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE
about a zero page refcount in try_get_page. The fix is in the latest
version of patch 2. But it's already obsolete because John Hubbard is
about to remove that function altogether.

I think the issues you raised were more uncertainty than known bugs. It
seems the fact that you can have DAX pages with 0 refcount is a feature
more than a bug.

Regards,
   Felix


Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined?
In that case, mmap() of a DAX device will call insert_page() which calls
get_page() which would trigger VM_BUG_ON_PAGE().

I can believe it is OK for PTE_SPECIAL page table entries to have no
struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with
a zero reference count using insert_pfn().

Hi Ralph,
We have tried the DAX tests with and without CONFIG_ARCH_HAS_PTE_SPECIAL 
defined.
Apparently none of the tests touches that condition for a DAX device. Of 
course,

that doesn't mean it could happen.

Regards,
Alex S.




I find it hard to believe that other MM developers don't see an issue
with a struct page with refcount == 0 and mapcount == 1.

I don't see where init_page_count() is being called for the
MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD
driver allocates and passes to migrate_vma_setup().
Looks like svm_migrate_get_vram_page() needs to call init_page_count()
instead of get_page(). (I'm looking at branch 
origin/alexsierrag/device_generic

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver.git&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IXe8HP2s8x5OdJdERBkGOYJCQk3iqCu5AYkwpDL8zec%3D&reserved=0)
Yes, you're right. My bad. Thanks for catching this up. I didn't realize 
I was missing

to define CONFIG_DEBUG_VM on my build. Therefore this BUG was never caught.
It worked after I replaced get_pages by init_page_count at
svm_migrate_get_vram_page. However, I don't think this is the best way 
to fix it.
Ideally, get_pages call should work for device pages with ref count 
equal to 0
too. Otherwise, we could overwrite refcounter if someone else is 
grabbing the page

concurrently.
I was thinking to add a special condition in get_pages for dev pages. 
This could

also fix the insert_page -> get_page call from a DAX device.

Regards,
Alex S.



Also, what about the other places where is_device_private_page() is 
called?

Don't they need to be updated to call is_device_page() instead?
One of my goals for this patch was 

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-18 Thread Ralph Campbell

On 8/17/21 5:35 PM, Felix Kuehling wrote:

Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:

On 8/12/21 11:31 PM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that
complicates the
code for put_page() and several places in the kernel that need to
check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't
need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type,
while
page ref counter is checked less/equal to zero. In case of device
zone, pages
ref counter are initialized to zero.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
   arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
   fs/dax.c   |  4 +-
   include/linux/dax.h    |  2 +-
   include/linux/memremap.h   |  7 +--
   include/linux/mm.h | 13 +
   lib/test_hmm.c |  2 +-
   mm/internal.h  |  8 +++
   mm/memremap.c  | 68 +++---
   mm/migrate.c   |  5 --
   mm/page_alloc.c    |  3 ++
   mm/swap.c  | 45 ++---
   12 files changed, 46 insertions(+), 115 deletions(-)


I haven't seen a response to the issues I raised back at v3 of this
series.
https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/


Did I miss something?

I think part of the response was that we did more testing. Alex added
support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests
recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE
about a zero page refcount in try_get_page. The fix is in the latest
version of patch 2. But it's already obsolete because John Hubbard is
about to remove that function altogether.

I think the issues you raised were more uncertainty than known bugs. It
seems the fact that you can have DAX pages with 0 refcount is a feature
more than a bug.

Regards,
   Felix


Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined?
In that case, mmap() of a DAX device will call insert_page() which calls
get_page() which would trigger VM_BUG_ON_PAGE().

I can believe it is OK for PTE_SPECIAL page table entries to have no
struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with
a zero reference count using insert_pfn().

I find it hard to believe that other MM developers don't see an issue
with a struct page with refcount == 0 and mapcount == 1.

I don't see where init_page_count() is being called for the
MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD
driver allocates and passes to migrate_vma_setup().
Looks like svm_migrate_get_vram_page() needs to call init_page_count()
instead of get_page(). (I'm looking at branch origin/alexsierrag/device_generic
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver.git)

Also, what about the other places where is_device_private_page() is called?
Don't they need to be updated to call is_device_page() instead?
One of my goals for this patch was to remove special casing reference counts
for ZONE_DEVICE pages in rmap.c, etc.

I still think this patch needs an ACK from a FS/DAX maintainer.



Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-17 Thread Felix Kuehling


Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:
> On 8/12/21 11:31 PM, Alex Sierra wrote:
>> From: Ralph Campbell 
>>
>> ZONE_DEVICE struct pages have an extra reference count that
>> complicates the
>> code for put_page() and several places in the kernel that need to
>> check the
>> reference count to see that a page is not being used (gup, compaction,
>> migration, etc.). Clean up the code so the reference count doesn't
>> need to
>> be treated specially for ZONE_DEVICE.
>>
>> v2:
>> AS: merged this patch in linux 5.11 version
>>
>> v5:
>> AS: add condition at try_grab_page to check for the zone device type,
>> while
>> page ref counter is checked less/equal to zero. In case of device
>> zone, pages
>> ref counter are initialized to zero.
>>
>> Signed-off-by: Ralph Campbell 
>> Signed-off-by: Alex Sierra 
>> ---
>>   arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
>>   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
>>   fs/dax.c   |  4 +-
>>   include/linux/dax.h    |  2 +-
>>   include/linux/memremap.h   |  7 +--
>>   include/linux/mm.h | 13 +
>>   lib/test_hmm.c |  2 +-
>>   mm/internal.h  |  8 +++
>>   mm/memremap.c  | 68 +++---
>>   mm/migrate.c   |  5 --
>>   mm/page_alloc.c    |  3 ++
>>   mm/swap.c  | 45 ++---
>>   12 files changed, 46 insertions(+), 115 deletions(-)
>>
> I haven't seen a response to the issues I raised back at v3 of this
> series.
> https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/
>
>
> Did I miss something?

I think part of the response was that we did more testing. Alex added
support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests
recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE
about a zero page refcount in try_get_page. The fix is in the latest
version of patch 2. But it's already obsolete because John Hubbard is
about to remove that function altogether.

I think the issues you raised were more uncertainty than known bugs. It
seems the fact that you can have DAX pages with 0 refcount is a feature
more than a bug.

Regards,
  Felix




Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-17 Thread Ralph Campbell

On 8/12/21 11:31 PM, Alex Sierra wrote:

From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type, while
page ref counter is checked less/equal to zero. In case of device zone, pages
ref counter are initialized to zero.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
  arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
  fs/dax.c   |  4 +-
  include/linux/dax.h|  2 +-
  include/linux/memremap.h   |  7 +--
  include/linux/mm.h | 13 +
  lib/test_hmm.c |  2 +-
  mm/internal.h  |  8 +++
  mm/memremap.c  | 68 +++---
  mm/migrate.c   |  5 --
  mm/page_alloc.c|  3 ++
  mm/swap.c  | 45 ++---
  12 files changed, 46 insertions(+), 115 deletions(-)


I haven't seen a response to the issues I raised back at v3 of this series.
https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/

Did I miss something?



Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-16 Thread Felix Kuehling
Am 2021-08-15 um 4:40 p.m. schrieb John Hubbard:
> On 8/15/21 8:37 AM, Christoph Hellwig wrote:
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 8ae31622deef..d48a1f0889d1 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -1218,7 +1218,7 @@ __maybe_unused struct page
>>> *try_grab_compound_head(struct page *page, int refs,
>>>   static inline __must_check bool try_get_page(struct page *page)
>>>   {
>>>   page = compound_head(page);
>>> -    if (WARN_ON_ONCE(page_ref_count(page) <= 0))
>>> +    if (WARN_ON_ONCE(page_ref_count(page) <
>>> (int)!is_zone_device_page(page)))
>>
>> Please avoid the overly long line.  In fact I'd be tempted to just not
>> bother here and keep the old, more lose check.  Especially given that
>> John has a patch ready that removes try_get_page entirely.
>>
>
> Yes. Andrew has accepted it into mmotm.
>
> Ralph's patch here was written well before my cleanup that removed
> try_grab_page() [1]. But now that we're here, if you drop this hunk then
> it will make merging easier, I think.
>
>
> [1]
> https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com

Hi John,

Thanks for the pointer. We'll drop this hunk and add a statement to our
patch description to highlight the dependency on your patch.

Regards,
  Felix


>
> thanks,
> -- 
> John Hubbard
> NVIDIA
>


Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-16 Thread Christoph Hellwig
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8ae31622deef..d48a1f0889d1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1218,7 +1218,7 @@ __maybe_unused struct page 
> *try_grab_compound_head(struct page *page, int refs,
>  static inline __must_check bool try_get_page(struct page *page)
>  {
>   page = compound_head(page);
> - if (WARN_ON_ONCE(page_ref_count(page) <= 0))
> + if (WARN_ON_ONCE(page_ref_count(page) < 
> (int)!is_zone_device_page(page)))

Please avoid the overly long line.  In fact I'd be tempted to just not
bother here and keep the old, more lose check.  Especially given that
John has a patch ready that removes try_get_page entirely.


Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-16 Thread John Hubbard

On 8/15/21 8:37 AM, Christoph Hellwig wrote:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ae31622deef..d48a1f0889d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1218,7 +1218,7 @@ __maybe_unused struct page *try_grab_compound_head(struct 
page *page, int refs,
  static inline __must_check bool try_get_page(struct page *page)
  {
page = compound_head(page);
-   if (WARN_ON_ONCE(page_ref_count(page) <= 0))
+   if (WARN_ON_ONCE(page_ref_count(page) < 
(int)!is_zone_device_page(page)))


Please avoid the overly long line.  In fact I'd be tempted to just not
bother here and keep the old, more lose check.  Especially given that
John has a patch ready that removes try_get_page entirely.



Yes. Andrew has accepted it into mmotm.

Ralph's patch here was written well before my cleanup that removed
try_grab_page() [1]. But now that we're here, if you drop this hunk then
it will make merging easier, I think.


[1] https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com

thanks,
--
John Hubbard
NVIDIA


[PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-12 Thread Alex Sierra
From: Ralph Campbell 

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

v2:
AS: merged this patch in linux 5.11 version

v5:
AS: add condition at try_grab_page to check for the zone device type, while
page ref counter is checked less/equal to zero. In case of device zone, pages
ref counter are initialized to zero.

Signed-off-by: Ralph Campbell 
Signed-off-by: Alex Sierra 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c |  2 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
 fs/dax.c   |  4 +-
 include/linux/dax.h|  2 +-
 include/linux/memremap.h   |  7 +--
 include/linux/mm.h | 13 +
 lib/test_hmm.c |  2 +-
 mm/internal.h  |  8 +++
 mm/memremap.c  | 68 +++---
 mm/migrate.c   |  5 --
 mm/page_alloc.c|  3 ++
 mm/swap.c  | 45 ++---
 12 files changed, 46 insertions(+), 115 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..acee67710620 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -711,7 +711,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
 
dpage = pfn_to_page(uvmem_pfn);
dpage->zone_device_data = pvt;
-   get_page(dpage);
+   init_page_count(dpage);
lock_page(dpage);
return dpage;
 out_clear:
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 92987daa5e17..8bc7120e1216 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -324,7 +324,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
return NULL;
}
 
-   get_page(page);
+   init_page_count(page);
lock_page(page);
return page;
 }
diff --git a/fs/dax.c b/fs/dax.c
index c387d09e3e5a..1166630b7190 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -571,14 +571,14 @@ static void *grab_mapping_entry(struct xa_state *xas,
 
 /**
  * dax_layout_busy_page_range - find first pinned page in @mapping
- * @mapping: address space to scan for a page with ref count > 1
+ * @mapping: address space to scan for a page with ref count > 0
  * @start: Starting offset. Page containing 'start' is included.
  * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX,
  *   pages from 'start' till the end of file are included.
  *
  * DAX requires ZONE_DEVICE mapped pages. These pages are never
  * 'onlined' to the page allocator so they are considered idle when
- * page->count == 1. A filesystem uses this interface to determine if
+ * page->count == 0. A filesystem uses this interface to determine if
  * any page in the mapping is busy, i.e. for DMA, or other
  * get_user_pages() usages.
  *
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8b5da1d60dbc..05fc982ce153 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -245,7 +245,7 @@ static inline bool dax_mapping(struct address_space 
*mapping)
 
 static inline bool dax_page_unused(struct page *page)
 {
-   return page_ref_count(page) == 1;
+   return page_ref_count(page) == 0;
 }
 
 #define dax_wait_page(_inode, _page, _wait_cb) \
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 45a79da89c5f..77ff5fd0685f 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -66,9 +66,10 @@ enum memory_type {
 
 struct dev_pagemap_ops {
/*
-* Called once the page refcount reaches 1.  (ZONE_DEVICE pages never
-* reach 0 refcount unless there is a refcount bug. This allows the
-* device driver to implement its own memory management.)
+* Called once the page refcount reaches 0. The reference count
+* should be reset to one with init_page_count(page) before reusing
+* the page. This allows the device driver to implement its own
+* memory management.
 */
void (*page_free)(struct page *page);
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ae31622deef..d48a1f0889d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1218,7 +1218,7 @@ __maybe_unused struct page *try_grab_compound_head(struct 
page *page, int refs,
 static inline __must_check bool try_get_page(struct page *page)
 {
page = compound_head(page);
-   if (WARN_ON_ONCE(page_ref_count(page) <= 0))
+   if (WARN_ON_ONCE(page_ref_count(page) < 
(int)!is_zone_device_page(page)))