Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
On Thu, Aug 19, 2021 at 03:59:56PM -0400, Felix Kuehling wrote: > I got lost trying to understand how DAX counts page references and how > the PTE_SPECIAL option affects that. Theodore, can you help with this? > Is there an easy way to test without CONFIG_ARCH_HAS_PTE_SPECIAL on x86, > or do we need to test on a CPU architecture that doesn't support this > feature? I think the right answer is to simplify disallow ZONE_DEVICE pages if ARCH_HAS_PTE_SPECIAL is not supported. ARCH_HAS_PTE_SPECIAL is supported by all modern architecture ports than can make use of ZONE_DEVICE / dev_pagemap, so we can avoid this pocket of barely testable code entirely: diff --git a/mm/Kconfig b/mm/Kconfig index 40a9bfcd5062e1..2823bbfd1c8c70 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -775,6 +775,7 @@ config ZONE_DMA32 config ZONE_DEVICE bool "Device memory (pmem, HMM, etc...) hotplug support" + depends on ARCH_HAS_PTE_SPECIAL depends on MEMORY_HOTPLUG depends on MEMORY_HOTREMOVE depends on SPARSEMEM_VMEMMAP
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
On Wed, Aug 18, 2021 at 12:28:30PM -0700, Ralph Campbell wrote: > Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined? > In that case, mmap() of a DAX device will call insert_page() which calls > get_page() which would trigger VM_BUG_ON_PAGE(). __vm_insert_mixed still ends up calling insert_pfn for the !CASE_ARCH_HAS_PTE_SPECIAL if pfn_t_devmap() is true, which it should be for DAX. (and as said in my other mail, I suspect we should disallow that case anyway, as no one can test it in practice).
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
On Thu, Aug 19, 2021 at 11:00 AM Sierra Guiza, Alejandro (Alex) wrote: > > > On 8/18/2021 2:28 PM, Ralph Campbell wrote: > > On 8/17/21 5:35 PM, Felix Kuehling wrote: > >> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: > >>> On 8/12/21 11:31 PM, Alex Sierra wrote: > From: Ralph Campbell > > ZONE_DEVICE struct pages have an extra reference count that > complicates the > code for put_page() and several places in the kernel that need to > check the > reference count to see that a page is not being used (gup, compaction, > migration, etc.). Clean up the code so the reference count doesn't > need to > be treated specially for ZONE_DEVICE. > > v2: > AS: merged this patch in linux 5.11 version > > v5: > AS: add condition at try_grab_page to check for the zone device type, > while > page ref counter is checked less/equal to zero. In case of device > zone, pages > ref counter are initialized to zero. > > Signed-off-by: Ralph Campbell > Signed-off-by: Alex Sierra > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > fs/dax.c | 4 +- > include/linux/dax.h| 2 +- > include/linux/memremap.h | 7 +-- > include/linux/mm.h | 13 + > lib/test_hmm.c | 2 +- > mm/internal.h | 8 +++ > mm/memremap.c | 68 > +++--- > mm/migrate.c | 5 -- > mm/page_alloc.c| 3 ++ > mm/swap.c | 45 ++--- > 12 files changed, 46 insertions(+), 115 deletions(-) > > >>> I haven't seen a response to the issues I raised back at v3 of this > >>> series. > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-mm%2F4f6dd918-d79b-1aa7-3a4c-caa67ddc29bc%40nvidia.com%2F&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P7FxYm%2BkJaCkMFa3OHtuKrPOn7SvytFRmYQdIzq7rN4%3D&reserved=0 > >>> > >>> > >>> > >>> Did I miss something? > >> I think part of the response was that we did more testing. Alex added > >> support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests > >> recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE > >> about a zero page refcount in try_get_page. The fix is in the latest > >> version of patch 2. But it's already obsolete because John Hubbard is > >> about to remove that function altogether. > >> > >> I think the issues you raised were more uncertainty than known bugs. It > >> seems the fact that you can have DAX pages with 0 refcount is a feature > >> more than a bug. > >> > >> Regards, > >>Felix > > > > Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined? > > In that case, mmap() of a DAX device will call insert_page() which calls > > get_page() which would trigger VM_BUG_ON_PAGE(). > > > > I can believe it is OK for PTE_SPECIAL page table entries to have no > > struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with > > a zero reference count using insert_pfn(). > Hi Ralph, > We have tried the DAX tests with and without CONFIG_ARCH_HAS_PTE_SPECIAL > defined. > Apparently none of the tests touches that condition for a DAX device. Of > course, > that doesn't mean it could happen. > > Regards, > Alex S. > > > > > > > I find it hard to believe that other MM developers don't see an issue > > with a struct page with refcount == 0 and mapcount == 1. > > > > I don't see where init_page_count() is being called for the > > MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD > > driver allocates and passes to migrate_vma_setup(). > > Looks like svm_migrate_get_vram_page() needs to call init_page_count() > > instead of get_page(). (I'm looking at branch > > origin/alexsierrag/device_generic > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver.git&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IXe8HP2s8x5OdJdERBkGOYJCQk3iqCu5AYkwpDL8zec%3D&reserved=0) > Yes, you're right. My bad. Thanks for catching this up. I didn't realize > I was missing > to define CONFIG_DEBUG_VM on my build. Therefore this BUG was never caught. > It worked after I replaced get_pages by init_page_count at > svm_migrate_get_vram_page. However, I don't think this is the
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
Note that you do not want GUP to succeed on device page, i do not see where that is handled in the new code. On Sun, Aug 15, 2021 at 1:40 PM John Hubbard wrote: > > On 8/15/21 8:37 AM, Christoph Hellwig wrote: > >> diff --git a/include/linux/mm.h b/include/linux/mm.h > >> index 8ae31622deef..d48a1f0889d1 100644 > >> --- a/include/linux/mm.h > >> +++ b/include/linux/mm.h > >> @@ -1218,7 +1218,7 @@ __maybe_unused struct page > >> *try_grab_compound_head(struct page *page, int refs, > >> static inline __must_check bool try_get_page(struct page *page) > >> { > >> page = compound_head(page); > >> -if (WARN_ON_ONCE(page_ref_count(page) <= 0)) > >> +if (WARN_ON_ONCE(page_ref_count(page) < > >> (int)!is_zone_device_page(page))) > > > > Please avoid the overly long line. In fact I'd be tempted to just not > > bother here and keep the old, more lose check. Especially given that > > John has a patch ready that removes try_get_page entirely. > > > > Yes. Andrew has accepted it into mmotm. > > Ralph's patch here was written well before my cleanup that removed > try_grab_page() [1]. But now that we're here, if you drop this hunk then > it will make merging easier, I think. > > > [1] https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com > > thanks, > -- > John Hubbard > NVIDIA >
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
Am 2021-08-19 um 2:00 p.m. schrieb Sierra Guiza, Alejandro (Alex): > > On 8/18/2021 2:28 PM, Ralph Campbell wrote: >> On 8/17/21 5:35 PM, Felix Kuehling wrote: >>> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: On 8/12/21 11:31 PM, Alex Sierra wrote: > From: Ralph Campbell > > ZONE_DEVICE struct pages have an extra reference count that > complicates the > code for put_page() and several places in the kernel that need to > check the > reference count to see that a page is not being used (gup, > compaction, > migration, etc.). Clean up the code so the reference count doesn't > need to > be treated specially for ZONE_DEVICE. > > v2: > AS: merged this patch in linux 5.11 version > > v5: > AS: add condition at try_grab_page to check for the zone device type, > while > page ref counter is checked less/equal to zero. In case of device > zone, pages > ref counter are initialized to zero. > > Signed-off-by: Ralph Campbell > Signed-off-by: Alex Sierra > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > fs/dax.c | 4 +- > include/linux/dax.h | 2 +- > include/linux/memremap.h | 7 +-- > include/linux/mm.h | 13 + > lib/test_hmm.c | 2 +- > mm/internal.h | 8 +++ > mm/memremap.c | 68 > +++--- > mm/migrate.c | 5 -- > mm/page_alloc.c | 3 ++ > mm/swap.c | 45 ++--- > 12 files changed, 46 insertions(+), 115 deletions(-) > I haven't seen a response to the issues I raised back at v3 of this series. https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/ Did I miss something? >>> I think part of the response was that we did more testing. Alex added >>> support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests >>> recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE >>> about a zero page refcount in try_get_page. The fix is in the latest >>> version of patch 2. But it's already obsolete because John Hubbard is >>> about to remove that function altogether. >>> >>> I think the issues you raised were more uncertainty than known bugs. It >>> seems the fact that you can have DAX pages with 0 refcount is a feature >>> more than a bug. >>> >>> Regards, >>> Felix >> >> Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined? >> In that case, mmap() of a DAX device will call insert_page() which calls >> get_page() which would trigger VM_BUG_ON_PAGE(). >> >> I can believe it is OK for PTE_SPECIAL page table entries to have no >> struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with >> a zero reference count using insert_pfn(). > Hi Ralph, > We have tried the DAX tests with and without > CONFIG_ARCH_HAS_PTE_SPECIAL defined. > Apparently none of the tests touches that condition for a DAX device. > Of course, > that doesn't mean it could happen. > > Regards, > Alex S. > >> >> >> I find it hard to believe that other MM developers don't see an issue >> with a struct page with refcount == 0 and mapcount == 1. >> >> I don't see where init_page_count() is being called for the >> MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD >> driver allocates and passes to migrate_vma_setup(). >> Looks like svm_migrate_get_vram_page() needs to call init_page_count() >> instead of get_page(). (I'm looking at branch >> origin/alexsierrag/device_generic >> https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver.git > Yes, you're right. My bad. Thanks for catching this up. I didn't > realize I was missing > to define CONFIG_DEBUG_VM on my build. Therefore this BUG was never > caught. > It worked after I replaced get_pages by init_page_count at > svm_migrate_get_vram_page. However, I don't think this is the best way > to fix it. > Ideally, get_pages call should work for device pages with ref count > equal to 0 > too. Otherwise, we could overwrite refcounter if someone else is > grabbing the page > concurrently. I think using init_page_count in svm_migrate_get_vram_page is the right answer. This is where the page first gets allocated and initialized (data migrated into it). I think nobody should have or try to take a reference to the page before that. We should probably also add a VM_BUG_ON_PAGE(page_ref_count(page) != 0) before calling init_page_count to make sure of that. > I was thinking to add a special condition in get_pages for dev pages. > This could > also fix the insert_page -> get_page call from a DAX device. [+Theodore] I got lost trying to understand how DAX counts page references an
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
On 8/18/2021 2:28 PM, Ralph Campbell wrote: On 8/17/21 5:35 PM, Felix Kuehling wrote: Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: On 8/12/21 11:31 PM, Alex Sierra wrote: From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE. v2: AS: merged this patch in linux 5.11 version v5: AS: add condition at try_grab_page to check for the zone device type, while page ref counter is checked less/equal to zero. In case of device zone, pages ref counter are initialized to zero. Signed-off-by: Ralph Campbell Signed-off-by: Alex Sierra --- arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- fs/dax.c | 4 +- include/linux/dax.h | 2 +- include/linux/memremap.h | 7 +-- include/linux/mm.h | 13 + lib/test_hmm.c | 2 +- mm/internal.h | 8 +++ mm/memremap.c | 68 +++--- mm/migrate.c | 5 -- mm/page_alloc.c | 3 ++ mm/swap.c | 45 ++--- 12 files changed, 46 insertions(+), 115 deletions(-) I haven't seen a response to the issues I raised back at v3 of this series. https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-mm%2F4f6dd918-d79b-1aa7-3a4c-caa67ddc29bc%40nvidia.com%2F&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P7FxYm%2BkJaCkMFa3OHtuKrPOn7SvytFRmYQdIzq7rN4%3D&reserved=0 Did I miss something? I think part of the response was that we did more testing. Alex added support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE about a zero page refcount in try_get_page. The fix is in the latest version of patch 2. But it's already obsolete because John Hubbard is about to remove that function altogether. I think the issues you raised were more uncertainty than known bugs. It seems the fact that you can have DAX pages with 0 refcount is a feature more than a bug. Regards, Felix Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined? In that case, mmap() of a DAX device will call insert_page() which calls get_page() which would trigger VM_BUG_ON_PAGE(). I can believe it is OK for PTE_SPECIAL page table entries to have no struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with a zero reference count using insert_pfn(). Hi Ralph, We have tried the DAX tests with and without CONFIG_ARCH_HAS_PTE_SPECIAL defined. Apparently none of the tests touches that condition for a DAX device. Of course, that doesn't mean it could happen. Regards, Alex S. I find it hard to believe that other MM developers don't see an issue with a struct page with refcount == 0 and mapcount == 1. I don't see where init_page_count() is being called for the MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD driver allocates and passes to migrate_vma_setup(). Looks like svm_migrate_get_vram_page() needs to call init_page_count() instead of get_page(). (I'm looking at branch origin/alexsierrag/device_generic https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver.git&data=04%7C01%7Calex.sierra%40amd.com%7Cd2bd2d4fbf764528540908d9627e5dcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649117156919772%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IXe8HP2s8x5OdJdERBkGOYJCQk3iqCu5AYkwpDL8zec%3D&reserved=0) Yes, you're right. My bad. Thanks for catching this up. I didn't realize I was missing to define CONFIG_DEBUG_VM on my build. Therefore this BUG was never caught. It worked after I replaced get_pages by init_page_count at svm_migrate_get_vram_page. However, I don't think this is the best way to fix it. Ideally, get_pages call should work for device pages with ref count equal to 0 too. Otherwise, we could overwrite refcounter if someone else is grabbing the page concurrently. I was thinking to add a special condition in get_pages for dev pages. This could also fix the insert_page -> get_page call from a DAX device. Regards, Alex S. Also, what about the other places where is_device_private_page() is called? Don't they need to be updated to call is_device_page() instead? One of my goals for this patch was
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
On 8/17/21 5:35 PM, Felix Kuehling wrote: Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: On 8/12/21 11:31 PM, Alex Sierra wrote: From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE. v2: AS: merged this patch in linux 5.11 version v5: AS: add condition at try_grab_page to check for the zone device type, while page ref counter is checked less/equal to zero. In case of device zone, pages ref counter are initialized to zero. Signed-off-by: Ralph Campbell Signed-off-by: Alex Sierra --- arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- fs/dax.c | 4 +- include/linux/dax.h | 2 +- include/linux/memremap.h | 7 +-- include/linux/mm.h | 13 + lib/test_hmm.c | 2 +- mm/internal.h | 8 +++ mm/memremap.c | 68 +++--- mm/migrate.c | 5 -- mm/page_alloc.c | 3 ++ mm/swap.c | 45 ++--- 12 files changed, 46 insertions(+), 115 deletions(-) I haven't seen a response to the issues I raised back at v3 of this series. https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/ Did I miss something? I think part of the response was that we did more testing. Alex added support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE about a zero page refcount in try_get_page. The fix is in the latest version of patch 2. But it's already obsolete because John Hubbard is about to remove that function altogether. I think the issues you raised were more uncertainty than known bugs. It seems the fact that you can have DAX pages with 0 refcount is a feature more than a bug. Regards, Felix Did you test on a system without CONFIG_ARCH_HAS_PTE_SPECIAL defined? In that case, mmap() of a DAX device will call insert_page() which calls get_page() which would trigger VM_BUG_ON_PAGE(). I can believe it is OK for PTE_SPECIAL page table entries to have no struct page or that MEMORY_DEVICE_GENERIC struct pages be mapped with a zero reference count using insert_pfn(). I find it hard to believe that other MM developers don't see an issue with a struct page with refcount == 0 and mapcount == 1. I don't see where init_page_count() is being called for the MEMORY_DEVICE_GENERIC or MEMORY_DEVICE_PRIVATE struct pages the AMD driver allocates and passes to migrate_vma_setup(). Looks like svm_migrate_get_vram_page() needs to call init_page_count() instead of get_page(). (I'm looking at branch origin/alexsierrag/device_generic https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver.git) Also, what about the other places where is_device_private_page() is called? Don't they need to be updated to call is_device_page() instead? One of my goals for this patch was to remove special casing reference counts for ZONE_DEVICE pages in rmap.c, etc. I still think this patch needs an ACK from a FS/DAX maintainer.
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: > On 8/12/21 11:31 PM, Alex Sierra wrote: >> From: Ralph Campbell >> >> ZONE_DEVICE struct pages have an extra reference count that >> complicates the >> code for put_page() and several places in the kernel that need to >> check the >> reference count to see that a page is not being used (gup, compaction, >> migration, etc.). Clean up the code so the reference count doesn't >> need to >> be treated specially for ZONE_DEVICE. >> >> v2: >> AS: merged this patch in linux 5.11 version >> >> v5: >> AS: add condition at try_grab_page to check for the zone device type, >> while >> page ref counter is checked less/equal to zero. In case of device >> zone, pages >> ref counter are initialized to zero. >> >> Signed-off-by: Ralph Campbell >> Signed-off-by: Alex Sierra >> --- >> arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- >> drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- >> fs/dax.c | 4 +- >> include/linux/dax.h | 2 +- >> include/linux/memremap.h | 7 +-- >> include/linux/mm.h | 13 + >> lib/test_hmm.c | 2 +- >> mm/internal.h | 8 +++ >> mm/memremap.c | 68 +++--- >> mm/migrate.c | 5 -- >> mm/page_alloc.c | 3 ++ >> mm/swap.c | 45 ++--- >> 12 files changed, 46 insertions(+), 115 deletions(-) >> > I haven't seen a response to the issues I raised back at v3 of this > series. > https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/ > > > Did I miss something? I think part of the response was that we did more testing. Alex added support for DEVICE_GENERIC pages to test_hmm and he ran DAX tests recommended by Theodore Tso. In that testing he ran into a WARN_ON_ONCE about a zero page refcount in try_get_page. The fix is in the latest version of patch 2. But it's already obsolete because John Hubbard is about to remove that function altogether. I think the issues you raised were more uncertainty than known bugs. It seems the fact that you can have DAX pages with 0 refcount is a feature more than a bug. Regards, Felix
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
On 8/12/21 11:31 PM, Alex Sierra wrote: From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE. v2: AS: merged this patch in linux 5.11 version v5: AS: add condition at try_grab_page to check for the zone device type, while page ref counter is checked less/equal to zero. In case of device zone, pages ref counter are initialized to zero. Signed-off-by: Ralph Campbell Signed-off-by: Alex Sierra --- arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- fs/dax.c | 4 +- include/linux/dax.h| 2 +- include/linux/memremap.h | 7 +-- include/linux/mm.h | 13 + lib/test_hmm.c | 2 +- mm/internal.h | 8 +++ mm/memremap.c | 68 +++--- mm/migrate.c | 5 -- mm/page_alloc.c| 3 ++ mm/swap.c | 45 ++--- 12 files changed, 46 insertions(+), 115 deletions(-) I haven't seen a response to the issues I raised back at v3 of this series. https://lore.kernel.org/linux-mm/4f6dd918-d79b-1aa7-3a4c-caa67ddc2...@nvidia.com/ Did I miss something?
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
Am 2021-08-15 um 4:40 p.m. schrieb John Hubbard: > On 8/15/21 8:37 AM, Christoph Hellwig wrote: >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index 8ae31622deef..d48a1f0889d1 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -1218,7 +1218,7 @@ __maybe_unused struct page >>> *try_grab_compound_head(struct page *page, int refs, >>> static inline __must_check bool try_get_page(struct page *page) >>> { >>> page = compound_head(page); >>> - if (WARN_ON_ONCE(page_ref_count(page) <= 0)) >>> + if (WARN_ON_ONCE(page_ref_count(page) < >>> (int)!is_zone_device_page(page))) >> >> Please avoid the overly long line. In fact I'd be tempted to just not >> bother here and keep the old, more lose check. Especially given that >> John has a patch ready that removes try_get_page entirely. >> > > Yes. Andrew has accepted it into mmotm. > > Ralph's patch here was written well before my cleanup that removed > try_grab_page() [1]. But now that we're here, if you drop this hunk then > it will make merging easier, I think. > > > [1] > https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com Hi John, Thanks for the pointer. We'll drop this hunk and add a statement to our patch description to highlight the dependency on your patch. Regards, Felix > > thanks, > -- > John Hubbard > NVIDIA >
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
> diff --git a/include/linux/mm.h b/include/linux/mm.h > index 8ae31622deef..d48a1f0889d1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1218,7 +1218,7 @@ __maybe_unused struct page > *try_grab_compound_head(struct page *page, int refs, > static inline __must_check bool try_get_page(struct page *page) > { > page = compound_head(page); > - if (WARN_ON_ONCE(page_ref_count(page) <= 0)) > + if (WARN_ON_ONCE(page_ref_count(page) < > (int)!is_zone_device_page(page))) Please avoid the overly long line. In fact I'd be tempted to just not bother here and keep the old, more lose check. Especially given that John has a patch ready that removes try_get_page entirely.
Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
On 8/15/21 8:37 AM, Christoph Hellwig wrote: diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ae31622deef..d48a1f0889d1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1218,7 +1218,7 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page, int refs, static inline __must_check bool try_get_page(struct page *page) { page = compound_head(page); - if (WARN_ON_ONCE(page_ref_count(page) <= 0)) + if (WARN_ON_ONCE(page_ref_count(page) < (int)!is_zone_device_page(page))) Please avoid the overly long line. In fact I'd be tempted to just not bother here and keep the old, more lose check. Especially given that John has a patch ready that removes try_get_page entirely. Yes. Andrew has accepted it into mmotm. Ralph's patch here was written well before my cleanup that removed try_grab_page() [1]. But now that we're here, if you drop this hunk then it will make merging easier, I think. [1] https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com thanks, -- John Hubbard NVIDIA
[PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount
From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE. v2: AS: merged this patch in linux 5.11 version v5: AS: add condition at try_grab_page to check for the zone device type, while page ref counter is checked less/equal to zero. In case of device zone, pages ref counter are initialized to zero. Signed-off-by: Ralph Campbell Signed-off-by: Alex Sierra --- arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- fs/dax.c | 4 +- include/linux/dax.h| 2 +- include/linux/memremap.h | 7 +-- include/linux/mm.h | 13 + lib/test_hmm.c | 2 +- mm/internal.h | 8 +++ mm/memremap.c | 68 +++--- mm/migrate.c | 5 -- mm/page_alloc.c| 3 ++ mm/swap.c | 45 ++--- 12 files changed, 46 insertions(+), 115 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 84e5a2dc8be5..acee67710620 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -711,7 +711,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) dpage = pfn_to_page(uvmem_pfn); dpage->zone_device_data = pvt; - get_page(dpage); + init_page_count(dpage); lock_page(dpage); return dpage; out_clear: diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 92987daa5e17..8bc7120e1216 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -324,7 +324,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) return NULL; } - get_page(page); + init_page_count(page); lock_page(page); return page; } diff --git a/fs/dax.c b/fs/dax.c index c387d09e3e5a..1166630b7190 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -571,14 +571,14 @@ static void *grab_mapping_entry(struct xa_state *xas, /** * dax_layout_busy_page_range - find first pinned page in @mapping - * @mapping: address space to scan for a page with ref count > 1 + * @mapping: address space to scan for a page with ref count > 0 * @start: Starting offset. Page containing 'start' is included. * @end: End offset. Page containing 'end' is included. If 'end' is LLONG_MAX, * pages from 'start' till the end of file are included. * * DAX requires ZONE_DEVICE mapped pages. These pages are never * 'onlined' to the page allocator so they are considered idle when - * page->count == 1. A filesystem uses this interface to determine if + * page->count == 0. A filesystem uses this interface to determine if * any page in the mapping is busy, i.e. for DMA, or other * get_user_pages() usages. * diff --git a/include/linux/dax.h b/include/linux/dax.h index 8b5da1d60dbc..05fc982ce153 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -245,7 +245,7 @@ static inline bool dax_mapping(struct address_space *mapping) static inline bool dax_page_unused(struct page *page) { - return page_ref_count(page) == 1; + return page_ref_count(page) == 0; } #define dax_wait_page(_inode, _page, _wait_cb) \ diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 45a79da89c5f..77ff5fd0685f 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -66,9 +66,10 @@ enum memory_type { struct dev_pagemap_ops { /* -* Called once the page refcount reaches 1. (ZONE_DEVICE pages never -* reach 0 refcount unless there is a refcount bug. This allows the -* device driver to implement its own memory management.) +* Called once the page refcount reaches 0. The reference count +* should be reset to one with init_page_count(page) before reusing +* the page. This allows the device driver to implement its own +* memory management. */ void (*page_free)(struct page *page); diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ae31622deef..d48a1f0889d1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1218,7 +1218,7 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page, int refs, static inline __must_check bool try_get_page(struct page *page) { page = compound_head(page); - if (WARN_ON_ONCE(page_ref_count(page) <= 0)) + if (WARN_ON_ONCE(page_ref_count(page) < (int)!is_zone_device_page(page)))