Re: [Nouveau] [PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-29 Thread Dan Williams
Alistair Popple wrote:
> 
> Dan Williams  writes:
> 
> > Alistair Popple wrote:
> >>
> >> Jason Gunthorpe  writes:
> >>
> >> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
> >> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
> >> >> refcount") device private pages have no longer had an extra reference
> >> >> count when the page is in use. However before handing them back to the
> >> >> owning device driver we add an extra reference count such that free
> >> >> pages have a reference count of one.
> >> >>
> >> >> This makes it difficult to tell if a page is free or not because both
> >> >> free and in use pages will have a non-zero refcount. Instead we should
> >> >> return pages to the drivers page allocator with a zero reference count.
> >> >> Kernel code can then safely use kernel functions such as
> >> >> get_page_unless_zero().
> >> >>
> >> >> Signed-off-by: Alistair Popple 
> >> >> ---
> >> >>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 1 +
> >> >>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
> >> >>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
> >> >>  lib/test_hmm.c   | 1 +
> >> >>  mm/memremap.c| 5 -
> >> >>  mm/page_alloc.c  | 6 ++
> >> >>  6 files changed, 10 insertions(+), 5 deletions(-)
> >> >
> >> > I think this is a great idea, but I'm surprised no dax stuff is
> >> > touched here?
> >>
> >> free_zone_device_page() shouldn't be called for pgmap->type ==
> >> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
> >> there. Except that the folio code looks like it might have introduced a
> >> bug. AFAICT put_page() always calls
> >> put_devmap_managed_page(&folio->page) but folio_put() does not (although
> >> folios_put() does!). So it seems folio_put() won't end up calling
> >> __put_devmap_managed_page_refs() as I think it should.
> >>
> >> I think you're right about the change to __init_zone_device_page() - I
> >> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
> >> look at Dan's patch series more closely as I suspect it might be better
> >> to rebase this patch on top of that.
> >
> > Apologies for the delay I was travelling the past few days. Yes, I think
> > this patch slots in nicely to avoid the introduction of an init_mode
> > [1]:
> >
> > https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.st...@dwillia2-xfh.jf.intel.com/
> >
> > Mind if I steal it into my series?
> 
> No problem, although I notice Andrew has already merged it into
> mm-unstable. If you end up rebasing your series on top of mine I think
> all that's needed is a patch somewhere in your series to drop the
> various `if (pgmap->type == MEMORY_DEVICE_*)` I added to (hopefully)
> avoid breaking DAX. Assuming DAX takes a pagemap reference on struct
> page allocation something like below.

Yeah, I'll go that route and rebase on top of -mm.

Thanks again.


Re: [Nouveau] [PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-29 Thread Dan Williams
Alistair Popple wrote:
> 
> Jason Gunthorpe  writes:
> 
> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
> >> refcount") device private pages have no longer had an extra reference
> >> count when the page is in use. However before handing them back to the
> >> owning device driver we add an extra reference count such that free
> >> pages have a reference count of one.
> >>
> >> This makes it difficult to tell if a page is free or not because both
> >> free and in use pages will have a non-zero refcount. Instead we should
> >> return pages to the drivers page allocator with a zero reference count.
> >> Kernel code can then safely use kernel functions such as
> >> get_page_unless_zero().
> >>
> >> Signed-off-by: Alistair Popple 
> >> ---
> >>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 1 +
> >>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
> >>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
> >>  lib/test_hmm.c   | 1 +
> >>  mm/memremap.c| 5 -
> >>  mm/page_alloc.c  | 6 ++
> >>  6 files changed, 10 insertions(+), 5 deletions(-)
> >
> > I think this is a great idea, but I'm surprised no dax stuff is
> > touched here?
> 
> free_zone_device_page() shouldn't be called for pgmap->type ==
> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
> there. Except that the folio code looks like it might have introduced a
> bug. AFAICT put_page() always calls
> put_devmap_managed_page(&folio->page) but folio_put() does not (although
> folios_put() does!). So it seems folio_put() won't end up calling
> __put_devmap_managed_page_refs() as I think it should.
> 
> I think you're right about the change to __init_zone_device_page() - I
> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
> look at Dan's patch series more closely as I suspect it might be better
> to rebase this patch on top of that.

Apologies for the delay I was travelling the past few days. Yes, I think
this patch slots in nicely to avoid the introduction of an init_mode
[1]:

https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.st...@dwillia2-xfh.jf.intel.com/

Mind if I steal it into my series?


Re: [Nouveau] [PATCH v2 16/28] resource: Introduce alloc_free_mem_region()

2022-07-21 Thread Dan Williams
[ add dri-devel and nouveau ]

Dan Williams wrote:
> The core of devm_request_free_mem_region() is a helper that searches for
> free space in iomem_resource and performs __request_region_locked() on
> the result of that search. The policy choices of the implementation
> conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
> immediately marked busy, and a preference to search for the first-fit
> free range in descending order from the top of the physical address
> space.
> 
> CXL has a need for a similar allocator, but with the following tweaks:
> 
> 1/ Search for free space in ascending order
> 
> 2/ Search for free space relative to a given CXL window
> 
> 3/ 'insert' rather than 'request' the new resource given downstream
>drivers from the CXL Region driver (like the pmem or dax drivers) are
>responsible for request_mem_region() when they activate the memory
>range.
> 
> Rework __request_free_mem_region() into get_free_mem_region() which
> takes a set of GFR_* (Get Free Region) flags to control the allocation
> policy (ascending vs descending), and "busy" policy (insert_resource()
> vs request_region()).
> 
> As part of the consolidation of the legacy GFR_REQUEST_REGION case with
> the new default of just inserting a new resource into the free space
> some minor cleanups like not checking for NULL before calling
> devres_free() (which does its own check) is included.
> 
> Suggested-by: Jason Gunthorpe 
> Link: https://lore.kernel.org/linux-cxl/20220420143406.gy2120...@nvidia.com/
> Cc: Matthew Wilcox 
> Cc: Christoph Hellwig 

Jason, Christoph, anyone else that depends on CONFIG_DEVICE_PRIVATE,

Just a heads up that with Jonathan's review I am going to proceed with
pushing this change to linux-next. Please holler if
CONFIG_DEVICE_PRIVATE starts misbehaving, or if you have other feedback,
and I will get it fixed up.

> Signed-off-by: Dan Williams 
> ---
>  include/linux/ioport.h |2 +
>  kernel/resource.c  |  178 
> +++-
>  mm/Kconfig |5 +
>  3 files changed, 150 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 79d1ad6d6275..616b683563a9 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -330,6 +330,8 @@ struct resource *devm_request_free_mem_region(struct 
> device *dev,
>   struct resource *base, unsigned long size);
>  struct resource *request_free_mem_region(struct resource *base,
>   unsigned long size, const char *name);
> +struct resource *alloc_free_mem_region(struct resource *base,
> + unsigned long size, unsigned long align, const char *name);
>  
>  static inline void irqresource_disabled(struct resource *res, u32 irq)
>  {
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 53a534db350e..4c5e80b92f2f 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -489,8 +489,9 @@ int __weak page_is_ram(unsigned long pfn)
>  }
>  EXPORT_SYMBOL_GPL(page_is_ram);
>  
> -static int __region_intersects(resource_size_t start, size_t size,
> - unsigned long flags, unsigned long desc)
> +static int __region_intersects(struct resource *parent, resource_size_t 
> start,
> +size_t size, unsigned long flags,
> +unsigned long desc)
>  {
>   struct resource res;
>   int type = 0; int other = 0;
> @@ -499,7 +500,7 @@ static int __region_intersects(resource_size_t start, 
> size_t size,
>   res.start = start;
>   res.end = start + size - 1;
>  
> - for (p = iomem_resource.child; p ; p = p->sibling) {
> + for (p = parent->child; p ; p = p->sibling) {
>   bool is_type = (((p->flags & flags) == flags) &&
>   ((desc == IORES_DESC_NONE) ||
>(desc == p->desc)));
> @@ -543,7 +544,7 @@ int region_intersects(resource_size_t start, size_t size, 
> unsigned long flags,
>   int ret;
>  
>   read_lock(&resource_lock);
> - ret = __region_intersects(start, size, flags, desc);
> + ret = __region_intersects(&iomem_resource, start, size, flags, desc);
>   read_unlock(&resource_lock);
>  
>   return ret;
> @@ -1780,62 +1781,139 @@ void resource_list_free(struct list_head *head)
>  }
>  EXPORT_SYMBOL(resource_list_free);
>  
> -#ifdef CONFIG_DEVICE_PRIVATE
> -static struct resource *__request_free_mem_region(struct device *dev,
> - struct resource *base, unsigned long size, const char *name)
> +#ifdef CONFIG_GET_FREE_REGION
> +#defi

Re: [Nouveau] [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-08 Thread Dan Williams
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig  wrote:
[..]
> @@ -500,28 +482,27 @@ void free_devmap_managed_page(struct page *page)
>  */
> page->mapping = NULL;
> page->pgmap->ops->page_free(page);
> +
> +   /*
> +* Reset the page count to 1 to prepare for handing out the page 
> again.
> +*/
> +   set_page_count(page, 1);

Interesting. I had expected that to really fix the refcount problem
that fs/dax.c would need to start taking real page references as pages
were added to a mapping, just like page cache.

This looks ok to me, and passes my tests. So given I'm still working
my way back to fixing the references properly I'm ok for this hack to
replace the more broken hack that is there presently.

Reviewed-by: Dan Williams 


Re: [Nouveau] [PATCH 6/8] mm: don't include in

2022-02-08 Thread Dan Williams
On Mon, Feb 7, 2022 at 3:49 PM Dan Williams  wrote:
>
> On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig  wrote:
> >
> > Move the check for the actual pgmap types that need the free at refcount
> > one behavior into the out of line helper, and thus avoid the need to
> > pull memremap.h into mm.h.
>
> Looks good to me assuming the compile bots agree.
>
> Reviewed-by: Dan Williams 

Yeah, same as Logan:

mm/memcontrol.c: In function ‘get_mctgt_type’:
mm/memcontrol.c:5724:29: error: implicit declaration of function
‘is_device_private_page’; did you mean
‘is_device_private_entry’? [-Werror=implicit-function-declaration]
 5724 | if (is_device_private_page(page))
  | ^~
  | is_device_private_entry

...needs:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d1e97a54ae53..0ac7515c85f9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include 
 #include 


Re: [Nouveau] [PATCH 6/8] mm: don't include in

2022-02-07 Thread Dan Williams
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig  wrote:
>
> Move the check for the actual pgmap types that need the free at refcount
> one behavior into the out of line helper, and thus avoid the need to
> pull memremap.h into mm.h.

Looks good to me assuming the compile bots agree.

Reviewed-by: Dan Williams 

>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm64/mm/mmu.c|  1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  1 +
>  drivers/gpu/drm/drm_cache.c|  2 +-
>  drivers/gpu/drm/nouveau/nouveau_dmem.c |  1 +
>  drivers/gpu/drm/nouveau/nouveau_svm.c  |  1 +
>  drivers/infiniband/core/rw.c   |  1 +
>  drivers/nvdimm/pmem.h  |  1 +
>  drivers/nvme/host/pci.c|  1 +
>  drivers/nvme/target/io-cmd-bdev.c  |  1 +
>  fs/fuse/virtio_fs.c|  1 +
>  include/linux/memremap.h   | 18 ++
>  include/linux/mm.h | 20 
>  lib/test_hmm.c |  1 +
>  mm/memremap.c  |  6 +-
>  14 files changed, 34 insertions(+), 22 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index acfae9b41cc8c9..580abae6c0b93f 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index ea68f3b3a4e9cb..6d643b4b791d87 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -25,6 +25,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index f19d9acbe95936..50b8a088f763a6 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -27,11 +27,11 @@
>  /*
>   * Authors: Thomas Hellström 
>   */
> -
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index e886a3b9e08c7d..a5cdfbe32b5e54 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -39,6 +39,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /*
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
> b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index 266809e511e2c1..090b9b47708cca 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  struct nouveau_svm {
> diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
> index 5a3bd41b331c93..4d98f931a13ddd 100644
> --- a/drivers/infiniband/core/rw.c
> +++ b/drivers/infiniband/core/rw.c
> @@ -2,6 +2,7 @@
>  /*
>   * Copyright (c) 2016 HGST, a Western Digital Company.
>   */
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
> index 59cfe13ea8a85c..1f51a23614299b 100644
> --- a/drivers/nvdimm/pmem.h
> +++ b/drivers/nvdimm/pmem.h
> @@ -3,6 +3,7 @@
>  #define __NVDIMM_PMEM_H__
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 6a99ed68091589..ab15bc72710dbe 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c 
> b/drivers/nvme/target/io-cmd-bdev.c
> index 70ca9dfc1771a9..a141446db1bea3 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -6,6 +6,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "nvmet.h"
>
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 9d737904d07c0b..86b7dbb6a0d43e 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 1fafcc38acbad6..514ab46f597e5c 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -1,6 +1,8

Re: [Nouveau] [PATCH 5/8] mm: simplify freeing of devmap managed pages

2022-02-07 Thread Dan Williams
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig  wrote:
>
> Make put_devmap_managed_page return if it took charge of the page
> or not and remove the separate page_is_devmap_managed helper.

Looks good to me:

Reviewed-by: Dan Williams 


Re: [Nouveau] [PATCH 4/8] mm: move free_devmap_managed_page to memremap.c

2022-02-07 Thread Dan Williams
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig  wrote:
>
> free_devmap_managed_page has nothing to do with the code in swap.c,
> move it to live with the rest of the code for devmap handling.
>

Looks good.

Reviewed-by: Dan Williams 


Re: [Nouveau] [PATCH 2/8] mm: remove the __KERNEL__ guard from

2022-02-07 Thread Dan Williams
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig  wrote:
>
> __KERNEL__ ifdefs don't make sense outside of include/uapi/.

Yes.

Reviewed-by: Dan Williams 


Re: [Nouveau] [PATCH 1/8] mm: remove a pointless CONFIG_ZONE_DEVICE check in memremap_pages

2022-02-07 Thread Dan Williams
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig  wrote:
>
> memremap.c is only built when CONFIG_ZONE_DEVICE is set, so remove
> the superflous extra check.

Looks good to me.

Reviewed-by: Dan Williams 


Re: [Nouveau] [RFC] treewide: cleanup unreachable breaks

2020-10-17 Thread Dan Williams
On Sat, Oct 17, 2020 at 9:10 AM  wrote:
>
> From: Tom Rix 
>
> This is a upcoming change to clean up a new warning treewide.
> I am wondering if the change could be one mega patch (see below) or
> normal patch per file about 100 patches or somewhere half way by collecting
> early acks.
>
> clang has a number of useful, new warnings see
> https://clang.llvm.org/docs/DiagnosticsReference.html
>
> This change cleans up -Wunreachable-code-break
> https://clang.llvm.org/docs/DiagnosticsReference.html#wunreachable-code-break
> for 266 of 485 warnings in this week's linux-next, allyesconfig on x86_64.
>
> The method of fixing was to look for warnings where the preceding statement
> was a simple statement and by inspection made the subsequent break unneeded.
> In order of frequency these look like
>
> return and break
>
> switch (c->x86_vendor) {
> case X86_VENDOR_INTEL:
> intel_p5_mcheck_init(c);
> return 1;
> -   break;
>
> goto and break
>
> default:
> operation = 0; /* make gcc happy */
> goto fail_response;
> -   break;
>
> break and break
> case COLOR_SPACE_SRGB:
> /* by pass */
> REG_SET(OUTPUT_CSC_CONTROL, 0,
> OUTPUT_CSC_GRPH_MODE, 0);
> break;
> -   break;
>
> The exception to the simple statement, is a switch case with a block
> and the end of block is a return
>
> struct obj_buffer *buff = r->ptr;
> return scnprintf(str, PRIV_STR_SIZE,
> "size=%u\naddr=0x%X\n", buff->size,
> buff->addr);
> }
> -   break;
>
> Not considered obvious and excluded, breaks after
> multi level switches
> complicated if-else if-else blocks
> panic() or similar calls
>
> And there is an odd addition of a 'fallthrough' in drivers/tty/nozomi.c
[..]
> diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
> index 5a7c80053c62..2f250874b1a4 100644
> --- a/drivers/nvdimm/claim.c
> +++ b/drivers/nvdimm/claim.c
> @@ -200,11 +200,10 @@ ssize_t nd_namespace_store(struct device *dev,
> }
> break;
> default:
> len = -EBUSY;
> goto out_attach;
> -   break;
> }

Acked-by: Dan Williams 
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v2] mm/hmm: make device private reference counts zero based

2020-10-12 Thread Dan Williams
On Mon, Oct 12, 2020 at 10:46 AM Ralph Campbell  wrote:
>
> ZONE_DEVICE struct pages have an extra reference count that complicates the
> code for put_page() and several places in the kernel that need to check the
> reference count to see that a page is not being used (gup, compaction,
> migration, etc.). Clean up the code so the reference count doesn't need to
> be treated specially for device private pages, leaving DAX as still being
> a special case.

Please no half-step to removing the special casing...

[..]
> +void free_zone_device_page(struct page *page)
> +{
> +   if (!is_device_private_page(page))
> return;

That seems too subtle to be acceptable to me. All ZONE_DEVICE pages
need to have the same relationship with respect to idle-ness and the
page reference count.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC PATCH v3 0/2] mm: remove extra ZONE_DEVICE struct page refcount

2020-10-05 Thread Dan Williams
On Thu, Oct 1, 2020 at 11:17 AM Ralph Campbell  wrote:
>
> This is still an RFC because after looking at the pmem/dax code some
> more, I realized that the ZONE_DEVICE struct pages are being inserted
> into the process' page tables with vmf_insert_mixed() and a zero
> refcount on the ZONE_DEVICE struct page. This is sort of OK because
> insert_pfn() increments the reference count on the pgmap which is what
> prevents memunmap_pages() from freeing the struct pages and it doesn't
> check for a non-zero struct page reference count.
> But, any calls to get_page() will hit the VM_BUG_ON_PAGE() that
> checks for a reference count == 0.
>
> // mmap() an ext4 file that is mounted -o dax.
> ext4_dax_fault()
>   ext4_dax_huge_fault()
> dax_iomap_fault(&ext4_iomap_ops)
>   dax_iomap_pte_fault()
> ops->iomap_begin() // ext4_iomap_begin()
>   ext4_map_blocks()
>   ext4_set_iomap()
> dax_iomap_pfn()
> dax_insert_entry()
> vmf_insert_mixed(pfn)
>   __vm_insert_mixed()
> if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) &&
> !pfn_t_devmap(pfn) && pfn_t_valid(pfn))
>   insert_page()
> get_page(page) // XXX would trigger VM_BUG_ON_PAGE()
> page_add_file_rmap()
> set_pte_at()
> else
>   insert_pfn()
> pte_mkdevmap()
> set_pte_at()
>
> Should pmem set the page reference count to one before inserting the
> pfn into the page tables (and decrement when removing devmap PTEs)?
> What about MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA use cases?
> Where should they icrement/decrement the page reference count?
> I don't know enough about how these are used to really know what to
> do at this point. If people want me to continue to work on this series,
> I will need some guidance.

fs/dax could take the reference when inserting, but that would mean
that ext4 and xfs would need to go back to checking for 1 to be page
idle. I think that's ok because the filesystem is actually not
checking for page-idle it's checking for "get_user_pages()" idle.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 1/2] ext4/xfs: add page refcount helper

2020-09-25 Thread Dan Williams
On Fri, Sep 25, 2020 at 1:45 PM Ralph Campbell  wrote:
>
> There are several places where ZONE_DEVICE struct pages assume a reference
> count == 1 means the page is idle and free. Instead of open coding this,
> add a helper function to hide this detail.
>
> Signed-off-by: Ralph Campbell 
> ---
>  fs/dax.c| 8 
>  fs/ext4/inode.c | 2 +-
>  fs/xfs/xfs_file.c   | 2 +-
>  include/linux/dax.h | 5 +
>  4 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 994ab66a9907..8eddbcc0e149 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -358,7 +358,7 @@ static void dax_disassociate_entry(void *entry, struct 
> address_space *mapping,
> for_each_mapped_pfn(entry, pfn) {
> struct page *page = pfn_to_page(pfn);
>
> -   WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> +   WARN_ON_ONCE(trunc && !dax_layout_is_idle_page(page));
> WARN_ON_ONCE(page->mapping && page->mapping != mapping);
> page->mapping = NULL;
> page->index = 0;
> @@ -372,7 +372,7 @@ static struct page *dax_busy_page(void *entry)
> for_each_mapped_pfn(entry, pfn) {
> struct page *page = pfn_to_page(pfn);
>
> -   if (page_ref_count(page) > 1)
> +   if (!dax_layout_is_idle_page(page))
> return page;
> }
> return NULL;
> @@ -560,11 +560,11 @@ static void *grab_mapping_entry(struct xa_state *xas,
>
>  /**
>   * dax_layout_busy_page - find first pinned page in @mapping
> - * @mapping: address space to scan for a page with ref count > 1
> + * @mapping: address space to scan for a page with ref count > 0
>   *
>   * DAX requires ZONE_DEVICE mapped pages. These pages are never
>   * 'onlined' to the page allocator so they are considered idle when
> - * page->count == 1. A filesystem uses this interface to determine if
> + * page->count == 0. A filesystem uses this interface to determine if
>   * any page in the mapping is busy, i.e. for DMA, or other
>   * get_user_pages() usages.
>   *
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index bf596467c234..d9f8ad55523a 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3927,7 +3927,7 @@ int ext4_break_layouts(struct inode *inode)
> return 0;
>
> error = ___wait_var_event(&page->_refcount,
> -   atomic_read(&page->_refcount) == 1,
> +   dax_layout_is_idle_page(page),
> TASK_INTERRUPTIBLE, 0, 0,
> ext4_wait_dax_page(ei));
> } while (error == 0);
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a29f78a663ca..29ab96541bc1 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -750,7 +750,7 @@ xfs_break_dax_layouts(
>
> *retry = true;
> return ___wait_var_event(&page->_refcount,
> -   atomic_read(&page->_refcount) == 1, 
> TASK_INTERRUPTIBLE,
> +   dax_layout_is_idle_page(page), TASK_INTERRUPTIBLE,
> 0, 0, xfs_wait_dax_page(inode));
>  }
>
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 43b39ab9de1a..3f78ed78d1d6 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -238,4 +238,9 @@ static inline bool dax_mapping(struct address_space 
> *mapping)
> return mapping->host && IS_DAX(mapping->host);
>  }
>
> +static inline bool dax_layout_is_idle_page(struct page *page)
> +{
> +   return page_ref_count(page) <= 1;

Why convert the check from "== 1" to "<= 1" and then back to the ==
operator in the next patch? A refcount < 1 in this path before your
other change is a bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] mm: remove extra ZONE_DEVICE struct page refcount

2020-09-17 Thread Dan Williams
On Wed, Sep 16, 2020 at 5:29 PM Ralph Campbell  wrote:
>
>
> On 9/15/20 10:36 PM, Christoph Hellwig wrote:
> > On Tue, Sep 15, 2020 at 09:39:47AM -0700, Ralph Campbell wrote:
> >>> I don't think any of the three ->page_free instances even cares about
> >>> the page refcount.
> >>>
> >> Not true. The page_free() callback records the page is free by setting
> >> a bit or putting the page on a free list but when it allocates a free
> >> device private struct page to be used with migrate_vma_setup(), it needs to
> >> increment the refcount.
> >>
> >> For the ZONE_DEVICE MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA
> >> struct pages, I think you are correct because they don't define page_free()
> >> and from what I can see, don't decrement the page refcount to zero.
> >
> > Umm, the whole point of ZONE_DEVICE is to have a struct page for
> > something that is not system memory.  For both the ppc kvm case (magic
> > hypervisor pool) and Noveau (device internal) memory that clear is the
> > case.  But looks like test_hmm uses normal pages to fake this up, so
> > I was wrong about the third caller.  But I think we can just call
> > set_page_count just before freeing the page there with a comment
> > explaining what is goin on.
>
> Dan Williams thought that having the ZONE_DEVICE struct pages
> be on a free list with a refcount of one was a bit strange and
> that the driver should handle the zero to one transition.
> But, that would mean a bit more invasive change to the 3 drivers
> to set the reference count to zero after calling memremap_pages()
> and setting the reference count to one when allocating a struct
> page. What you are suggesting is what I also proposed in v1.

IIUC, isn't what Christoph recommending is that drivers handle
set_page_count() directly rather than the core since some are prepared
for it to be zero on entry?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] mm: remove extra ZONE_DEVICE struct page refcount

2020-09-14 Thread Dan Williams
On Mon, Sep 14, 2020 at 3:45 PM Ralph Campbell  wrote:
>
> ZONE_DEVICE struct pages have an extra reference count that complicates the
> code for put_page() and several places in the kernel that need to check the
> reference count to see that a page is not being used (gup, compaction,
> migration, etc.). Clean up the code so the reference count doesn't need to
> be treated specially for ZONE_DEVICE.
>
> Signed-off-by: Ralph Campbell 
> ---
>
> Matthew Wilcox, Ira Weiny, and others have complained that ZONE_DEVICE
> struct page reference counting is ugly/broken. This is my attempt to
> fix it and it works for the HMM migration self tests.

Can you link to a technical description of what's broken? Or better
yet, summarize that argument in the changelog?

> I'm only sending this out as a RFC since I'm not that familiar with the
> DAX, PMEM, XEN, and other uses of ZONE_DEVICE struct pages allocated
> with devm_memremap_pages() or memremap_pages() but my best reading of
> the code looks like it might be OK. I could use help testing these
> configurations.

Back in the 4.15 days I could not convince myself that some code paths
blindly assumed that pages with refcount==0 were on an lru list. Since
then, struct page has been reorganized to not collide the ->pgmap back
pointer with the ->lru list and there have been other cleanups for
page pinning that might make this incremental cleanup viable.

You also need to fix up ext4_break_layouts() and
xfs_break_dax_layouts() to expect ->_refcount is 0 instead of 1. This
also needs some fstests exposure.

> I have a modified THP migration patch series that applies on top of
> this one and is cleaner since I don't have to add code to handle the
> +1 reference count. The link below is for the earlier v2:
> ("mm/hmm/nouveau: add THP migration to migrate_vma_*")
> https://lore.kernel.org/linux-mm/20200902165830.5367-1-rcampb...@nvidia.com
>
>
>  arch/powerpc/kvm/book3s_hv_uvmem.c |  1 -
>  drivers/gpu/drm/nouveau/nouveau_dmem.c |  1 -
>  include/linux/memremap.h   |  6 +--
>  include/linux/mm.h | 39 ---
>  lib/test_hmm.c |  1 -
>  mm/gup.c   | 44 -
>  mm/memremap.c  | 20 
>  mm/migrate.c   |  5 --
>  mm/swap.c  | 66 +++---
>  9 files changed, 41 insertions(+), 142 deletions(-)

This diffstat is indeed appealing.

>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 84e5a2dc8be5..00d97050d7ff 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -711,7 +711,6 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
> gpa, struct kvm *kvm)
>
> dpage = pfn_to_page(uvmem_pfn);
> dpage->zone_device_data = pvt;
> -   get_page(dpage);
> lock_page(dpage);
> return dpage;
>  out_clear:
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index a13c6215bba8..2a4bbe01a455 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -324,7 +324,6 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
> return NULL;
> }
>
> -   get_page(page);
> lock_page(page);
> return page;
>  }
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 4e9c738f4b31..7dd9802d2612 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -67,9 +67,9 @@ enum memory_type {
>
>  struct dev_pagemap_ops {
> /*
> -* Called once the page refcount reaches 1.  (ZONE_DEVICE pages never
> -* reach 0 refcount unless there is a refcount bug. This allows the
> -* device driver to implement its own memory management.)
> +* Called once the page refcount reaches 0. The reference count is
> +* reset to 1 before calling page_free(). This allows the
> +* device driver to implement its own memory management.

I'd clarify the order events / responsibility of the common core
page_free() and the device specific page_free(). At the same time, why
not update drivers to expect that the page is already refcount==0 on
entry? Seems odd to go through all this trouble to make the reference
count appear to be zero to the wider kernel but expect that drivers
get a fake reference on entry to their ->page_free() callbacks.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-16 Thread Dan Williams
On Fri, Aug 16, 2019 at 5:24 AM Jason Gunthorpe  wrote:
>
> On Thu, Aug 15, 2019 at 08:54:46PM -0700, Dan Williams wrote:
>
> > > However, this means we cannot do any processing of ZONE_DEVICE pages
> > > outside the driver lock, so eg, doing any DMA map that might rely on
> > > MEMORY_DEVICE_PCI_P2PDMA has to be done in the driver lock, which is
> > > a bit unfortunate.
> >
> > Wouldn't P2PDMA use page pins? Not needing to hold a lock over
> > ZONE_DEVICE page operations was one of the motivations for plumbing
> > get_dev_pagemap() with a percpu-ref.
>
> hmm_range_fault() doesn't use page pins at all, so if a ZONE_DEVICE
> page comes out of it then it needs to use another locking pattern.
>
> If I follow it all right:
>
> We can do a get_dev_pagemap inside the page_walk and touch the pgmap,
> or we can do the 'device mutex && retry' pattern and touch the pgmap
> in the driver, under that lock.
>
> However in all cases the current get_dev_pagemap()'s in the page walk
> are not necessary, and we can delete them.

Yes, as long as 'struct page' instances resulting from that lookup are
not passed outside of that lock.


Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Dan Williams
On Thu, Aug 15, 2019 at 5:41 PM Jason Gunthorpe  wrote:
>
> On Thu, Aug 15, 2019 at 01:47:12PM -0700, Dan Williams wrote:
> > On Thu, Aug 15, 2019 at 1:41 PM Jason Gunthorpe  wrote:
> > >
> > > On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> > >
> > > > So nor HMM nor driver should dereference the struct page (i do not
> > > > think any iommu driver would either),
> > >
> > > Er, they do technically deref the struct page:
> > >
> > > nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
> > >  struct hmm_range *range)
> > > struct page *page;
> > > page = hmm_pfn_to_page(range, range->pfns[i]);
> > > if (!nouveau_dmem_page(drm, page)) {
> > >
> > >
> > > nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> > > {
> > > return is_device_private_page(page) && drm->dmem == 
> > > page_to_dmem(page)
> > >
> > >
> > > Which does touch 'page->pgmap'
> > >
> > > Is this OK without having a get_dev_pagemap() ?
> > >
> > > Noting that the collision-retry scheme doesn't protect anything here
> > > as we can have a concurrent invalidation while doing the above deref.
> >
> > As long take_driver_page_table_lock() in Jerome's flow can replace
> > percpu_ref_tryget_live() on the pagemap reference. It seems
> > nouveau_dmem_convert_pfn() happens after:
> >
> > mutex_lock(&svmm->mutex);
> > if (!nouveau_range_done(&range)) {
> >
> > ...so I would expect that to be functionally equivalent to validating
> > the reference count.
>
> Yes, OK, that makes sense, I was mostly surprised by the statement the
> driver doesn't touch the struct page..
>
> I suppose "doesn't touch the struct page out of the driver lock" is
> the case.
>
> However, this means we cannot do any processing of ZONE_DEVICE pages
> outside the driver lock, so eg, doing any DMA map that might rely on
> MEMORY_DEVICE_PCI_P2PDMA has to be done in the driver lock, which is
> a bit unfortunate.

Wouldn't P2PDMA use page pins? Not needing to hold a lock over
ZONE_DEVICE page operations was one of the motivations for plumbing
get_dev_pagemap() with a percpu-ref.


Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Dan Williams
On Thu, Aug 15, 2019 at 1:41 PM Jason Gunthorpe  wrote:
>
> On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
>
> > So nor HMM nor driver should dereference the struct page (i do not
> > think any iommu driver would either),
>
> Er, they do technically deref the struct page:
>
> nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
>  struct hmm_range *range)
> struct page *page;
> page = hmm_pfn_to_page(range, range->pfns[i]);
> if (!nouveau_dmem_page(drm, page)) {
>
>
> nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> {
> return is_device_private_page(page) && drm->dmem == page_to_dmem(page)
>
>
> Which does touch 'page->pgmap'
>
> Is this OK without having a get_dev_pagemap() ?
>
> Noting that the collision-retry scheme doesn't protect anything here
> as we can have a concurrent invalidation while doing the above deref.

As long take_driver_page_table_lock() in Jerome's flow can replace
percpu_ref_tryget_live() on the pagemap reference. It seems
nouveau_dmem_convert_pfn() happens after:

mutex_lock(&svmm->mutex);
if (!nouveau_range_done(&range)) {

...so I would expect that to be functionally equivalent to validating
the reference count.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Dan Williams
On Thu, Aug 15, 2019 at 12:44 PM Jerome Glisse  wrote:
>
> On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> > On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse  wrote:
> > >
> > > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  
> > > > wrote:
> > > > >
> > > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > > > Section alignment constraints somewhat save us here. The only 
> > > > > > > example
> > > > > > > I can think of a PMD not containing a uniform pgmap association 
> > > > > > > for
> > > > > > > each pte is the case when the pgmap overlaps normal dram, i.e. 
> > > > > > > shares
> > > > > > > the same 'struct memory_section' for a given span. Otherwise, 
> > > > > > > distinct
> > > > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > > > guarantee different mapping lifetimes.
> > > > > > >
> > > > > > > That said, this seems to want a better mechanism to determine 
> > > > > > > "pfn is
> > > > > > > ZONE_DEVICE".
> > > > > >
> > > > > > So I guess this patch is fine for now, and once you provide a better
> > > > > > mechanism we can switch over to it?
> > > > >
> > > > > What about the version I sent to just get rid of all the strange
> > > > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > > > a single pagemap, so it makes some sense to cache it once we find it?
> > > >
> > > > Yes, if the scan is over a single pmd then caching it makes sense.
> > >
> > > Quite frankly an easier an better solution is to remove the pagemap
> > > lookup as HMM user abide by mmu notifier it means we will not make
> > > use or dereference the struct page so that we are safe from any
> > > racing hotunplug of dax memory (as long as device driver using hmm
> > > do not have a bug).
> >
> > Yes, as long as the driver remove is synchronized against HMM
> > operations via another mechanism then there is no need to take pagemap
> > references. Can you briefly describe what that other mechanism is?
>
> So if you hotunplug some dax memory i assume that this can only
> happens once all the pages are unmapped (as it must have the
> zero refcount, well 1 because of the bias) and any unmap will
> trigger a mmu notifier callback. User of hmm mirror abiding by
> the API will never make use of information they get through the
> fault or snapshot function until checking for racing notifier
> under lock.

Hmm that first assumption is not guaranteed by the dev_pagemap core.
The dev_pagemap end of life model is "disable, invalidate, drain" so
it's possible to call devm_munmap_pages() while pages are still mapped
it just won't complete the teardown of the pagemap until the last
reference is dropped. New references are blocked during this teardown.

However, if the driver is validating the liveness of the mapping in
the mmu-notifier path and blocking new references it sounds like it
should be ok. Might there be GPU driver unit tests that cover this
racing teardown case?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Dan Williams
On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse  wrote:
>
> On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  wrote:
> > >
> > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > Section alignment constraints somewhat save us here. The only example
> > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > guarantee different mapping lifetimes.
> > > > >
> > > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > > ZONE_DEVICE".
> > > >
> > > > So I guess this patch is fine for now, and once you provide a better
> > > > mechanism we can switch over to it?
> > >
> > > What about the version I sent to just get rid of all the strange
> > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > a single pagemap, so it makes some sense to cache it once we find it?
> >
> > Yes, if the scan is over a single pmd then caching it makes sense.
>
> Quite frankly an easier an better solution is to remove the pagemap
> lookup as HMM user abide by mmu notifier it means we will not make
> use or dereference the struct page so that we are safe from any
> racing hotunplug of dax memory (as long as device driver using hmm
> do not have a bug).

Yes, as long as the driver remove is synchronized against HMM
operations via another mechanism then there is no need to take pagemap
references. Can you briefly describe what that other mechanism is?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-14 Thread Dan Williams
On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  wrote:
>
> On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > Section alignment constraints somewhat save us here. The only example
> > > I can think of a PMD not containing a uniform pgmap association for
> > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > pgmaps arrange to manage their own exclusive sections (and now
> > > subsections as of v5.3). Otherwise the implementation could not
> > > guarantee different mapping lifetimes.
> > >
> > > That said, this seems to want a better mechanism to determine "pfn is
> > > ZONE_DEVICE".
> >
> > So I guess this patch is fine for now, and once you provide a better
> > mechanism we can switch over to it?
>
> What about the version I sent to just get rid of all the strange
> put_dev_pagemaps while scanning? Odds are good we will work with only
> a single pagemap, so it makes some sense to cache it once we find it?

Yes, if the scan is over a single pmd then caching it makes sense.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-13 Thread Dan Williams
On Wed, Aug 7, 2019 at 11:59 PM Christoph Hellwig  wrote:
>
> On Wed, Aug 07, 2019 at 11:47:22AM -0700, Dan Williams wrote:
> > > Unrelated to this patch, but what is the point of getting checking
> > > that the pgmap exists for the page and then immediately releasing it?
> > > This code has this pattern in several places.
> > >
> > > It feels racy
> >
> > Agree, not sure what the intent is here. The only other reason call
> > get_dev_pagemap() is to just check in general if the pfn is indeed
> > owned by some ZONE_DEVICE instance, but if the intent is to make sure
> > the device is still attached/enabled that check is invalidated at
> > put_dev_pagemap().
> >
> > If it's the former case, validating ZONE_DEVICE pfns, I imagine we can
> > do something cheaper with a helper that is on the order of the same
> > cost as pfn_valid(). I.e. replace PTE_DEVMAP with a mem_section flag
> > or something similar.
>
> The hmm literally never dereferences the pgmap, so validity checking is
> the only explanation for it.
>
> > > +   /*
> > > +* We do put_dev_pagemap() here so that we can leverage
> > > +* get_dev_pagemap() optimization which will not re-take a
> > > +* reference on a pgmap if we already have one.
> > > +*/
> > > +   if (hmm_vma_walk->pgmap)
> > > +   put_dev_pagemap(hmm_vma_walk->pgmap);
> > > +
> >
> > Seems ok, but only if the caller is guaranteeing that the range does
> > not span outside of a single pagemap instance. If that guarantee is
> > met why not just have the caller pass in a pinned pagemap? If that
> > guarantee is not met, then I think we're back to your race concern.
>
> It iterates over multiple ptes in a non-huge pmd.  Is there any kind of
> limitations on different pgmap instances inside a pmd?  I can't think
> of one, so this might actually be a bug.

Section alignment constraints somewhat save us here. The only example
I can think of a PMD not containing a uniform pgmap association for
each pte is the case when the pgmap overlaps normal dram, i.e. shares
the same 'struct memory_section' for a given span. Otherwise, distinct
pgmaps arrange to manage their own exclusive sections (and now
subsections as of v5.3). Otherwise the implementation could not
guarantee different mapping lifetimes.

That said, this seems to want a better mechanism to determine "pfn is
ZONE_DEVICE".


Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-07 Thread Dan Williams
On Wed, Aug 7, 2019 at 10:45 AM Jason Gunthorpe  wrote:
>
> On Tue, Aug 06, 2019 at 07:05:42PM +0300, Christoph Hellwig wrote:
> > There is only a single place where the pgmap is passed over a function
> > call, so replace it with local variables in the places where we deal
> > with the pgmap.
> >
> > Signed-off-by: Christoph Hellwig 
> >  mm/hmm.c | 62 
> >  1 file changed, 27 insertions(+), 35 deletions(-)
> >
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 9a908902e4cc..d66fa29b42e0 100644
> > +++ b/mm/hmm.c
> > @@ -278,7 +278,6 @@ EXPORT_SYMBOL(hmm_mirror_unregister);
> >
> >  struct hmm_vma_walk {
> >   struct hmm_range*range;
> > - struct dev_pagemap  *pgmap;
> >   unsigned long   last;
> >   unsigned intflags;
> >  };
> > @@ -475,6 +474,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >   struct hmm_vma_walk *hmm_vma_walk = walk->private;
> >   struct hmm_range *range = hmm_vma_walk->range;
> > + struct dev_pagemap *pgmap = NULL;
> >   unsigned long pfn, npages, i;
> >   bool fault, write_fault;
> >   uint64_t cpu_flags;
> > @@ -490,17 +490,14 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> >   pfn = pmd_pfn(pmd) + pte_index(addr);
> >   for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
> >   if (pmd_devmap(pmd)) {
> > - hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
> > -   hmm_vma_walk->pgmap);
> > - if (unlikely(!hmm_vma_walk->pgmap))
> > + pgmap = get_dev_pagemap(pfn, pgmap);
> > + if (unlikely(!pgmap))
> >   return -EBUSY;
>
> Unrelated to this patch, but what is the point of getting checking
> that the pgmap exists for the page and then immediately releasing it?
> This code has this pattern in several places.
>
> It feels racy

Agree, not sure what the intent is here. The only other reason call
get_dev_pagemap() is to just check in general if the pfn is indeed
owned by some ZONE_DEVICE instance, but if the intent is to make sure
the device is still attached/enabled that check is invalidated at
put_dev_pagemap().

If it's the former case, validating ZONE_DEVICE pfns, I imagine we can
do something cheaper with a helper that is on the order of the same
cost as pfn_valid(). I.e. replace PTE_DEVMAP with a mem_section flag
or something similar.

>
> >   }
> >   pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
> >   }
> > - if (hmm_vma_walk->pgmap) {
> > - put_dev_pagemap(hmm_vma_walk->pgmap);
> > - hmm_vma_walk->pgmap = NULL;
>
> Putting the value in the hmm_vma_walk would have made some sense to me
> if the pgmap was not set to NULL all over the place. Then the most
> xa_loads would be eliminated, as I would expect the pgmap tends to be
> mostly uniform for these use cases.
>
> Is there some reason the pgmap ref can't be held across
> faulting/sleeping? ie like below.

No restriction on holding refs over faulting / sleeping.

>
> Anyhow, I looked over this pretty carefully and the change looks
> functionally OK, I just don't know why the code is like this in the
> first place.
>
> Reviewed-by: Jason Gunthorpe 
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9a908902e4cc38..4e30128c23a505 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -497,10 +497,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> }
> pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
> }
> -   if (hmm_vma_walk->pgmap) {
> -   put_dev_pagemap(hmm_vma_walk->pgmap);
> -   hmm_vma_walk->pgmap = NULL;
> -   }
> hmm_vma_walk->last = end;
> return 0;
>  #else
> @@ -604,10 +600,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
> unsigned long addr,
> return 0;
>
>  fault:
> -   if (hmm_vma_walk->pgmap) {
> -   put_dev_pagemap(hmm_vma_walk->pgmap);
> -   hmm_vma_walk->pgmap = NULL;
> -   }
> pte_unmap(ptep);
> /* Fault any virtual address we were asked to fault */
> return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
> @@ -690,16 +682,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> return r;
> }
> }
> -   if (hmm_vma_walk->pgmap) {
> -   /*
> -* We do put_dev_pagemap() here and not in 
> hmm_vma_handle_pte()
> -* so that we can leverage get_dev_pagemap() optimization 
> which
> -* will not re-take a reference on a pgmap if we already have
> -* one.
> -*/
> -   put_dev_pagemap(hmm_vma_walk->pgmap);
> -   hmm_vma_walk->pgmap = NULL;
> -   }
> pte_unmap(ptep - 1);
>
>

Re: [Nouveau] dev_pagemap related cleanups v4

2019-07-02 Thread Dan Williams
On Tue, Jul 2, 2019 at 11:42 AM Jason Gunthorpe  wrote:
>
> On Mon, Jul 01, 2019 at 10:25:17AM +0200, Christoph Hellwig wrote:
> > And I've demonstrated that I can't send patch series..  While this
> > has all the right patches, it also has the extra patches already
> > in the hmm tree, and four extra patches I wanted to send once
> > this series is merged.  I'll give up for now, please use the git
> > url for anything serious, as it contains the right thing.
>
> Okay, I sorted it all out and temporarily put it here:
>
> https://github.com/jgunthorpe/linux/commits/hmm
>
> Bit involved job:
> - Took Ira's v4 patch into hmm.git and confirmed it matches what
>   Andrew has in linux-next after all the fixups
> - Checked your github v4 and the v3 that hit the mailing list were
>   substantially similar (I never did get a clean v4) and largely
>   went with the github version
> - Based CH's v4 series on -rc7 and put back the removal hunk in swap.c
>   so it compiles
> - Merge'd CH's series to hmm.git and fixed all the conflicts with Ira
>   and Ralph's patches (such that swap.c remains unchanged)
> - Added Dan's ack's and tested-by's

Looks good. Test merge (with some collisions, see below) also passes
my test suite.

>
> I think this fairly closely follows what was posted to the mailing
> list.
>
> As it was more than a simple 'git am', I'll let it sit on github until
> I hear OK's then I'll move it to kernel.org's hmm.git and it will hit
> linux-next. 0-day should also run on this whole thing from my github.
>
> What I know is outstanding:
>  - The conflicting ARM patches, I understand Andrew will handle these
>post-linux-next
>  - The conflict with AMD GPU in -next, I am waiting to hear from AMD

Just a heads up that this also collides with the "sub-section" patches
in Andrew's tree. The resolution is straightforward, mostly just
colliding updates to arch_{add,remove}_memory() call sites in
kernel/memremap.c and collisions with pgmap_altmap() usage.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 16/25] device-dax: use the dev_pagemap internal refcount

2019-06-28 Thread Dan Williams
On Fri, Jun 28, 2019 at 12:02 PM Christoph Hellwig  wrote:
>
> On Fri, Jun 28, 2019 at 11:59:19AM -0700, Dan Williams wrote:
> > It's a bug that the call to put_devmap_managed_page() was gated by
> > MEMORY_DEVICE_PUBLIC in release_pages(). That path is also applicable
> > to MEMORY_DEVICE_FSDAX because it needs to trigger the ->page_free()
> > callback to wake up wait_on_var() via fsdax_pagefree().
> >
> > So I guess you could argue that the MEMORY_DEVICE_PUBLIC removal patch
> > left the original bug in place. In that sense we're no worse off, but
> > since we know about the bug, the fix and the patches have not been
> > applied yet, why not fix it now?
>
> The fix it now would simply be to apply Ira original patch now, but
> given that we are at -rc6 is this really a good time?  And if we don't
> apply it now based on the quilt based -mm worflow it just seems a lot
> easier to apply it after my series.  Unless we want to include it in
> the series, in which case I can do a quick rebase, we'd just need to
> make sure Andrew pulls it from -mm.

I believe -mm auto drops patches when they appear in the -next
baseline. So it should "just work" to pull it into the series and send
it along for -next inclusion.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH 16/25] device-dax: use the dev_pagemap internal refcount

2019-06-28 Thread Dan Williams
On Fri, Jun 28, 2019 at 11:52 AM Christoph Hellwig  wrote:
>
> On Fri, Jun 28, 2019 at 11:44:35AM -0700, Dan Williams wrote:
> > There is a problem with the series in CH's tree. It removes the
> > ->page_free() callback from the release_pages() path because it goes
> > too far and removes the put_devmap_managed_page() call.
>
> release_pages only called put_devmap_managed_page for device public
> pages.  So I can't see how that is in any way a problem.

It's a bug that the call to put_devmap_managed_page() was gated by
MEMORY_DEVICE_PUBLIC in release_pages(). That path is also applicable
to MEMORY_DEVICE_FSDAX because it needs to trigger the ->page_free()
callback to wake up wait_on_var() via fsdax_pagefree().

So I guess you could argue that the MEMORY_DEVICE_PUBLIC removal patch
left the original bug in place. In that sense we're no worse off, but
since we know about the bug, the fix and the patches have not been
applied yet, why not fix it now?


Re: [PATCH 16/25] device-dax: use the dev_pagemap internal refcount

2019-06-28 Thread Dan Williams
On Fri, Jun 28, 2019 at 11:29 AM Jason Gunthorpe  wrote:
>
> On Fri, Jun 28, 2019 at 10:10:12AM -0700, Dan Williams wrote:
> > On Fri, Jun 28, 2019 at 10:08 AM Dan Williams  
> > wrote:
> > >
> > > On Fri, Jun 28, 2019 at 10:02 AM Jason Gunthorpe  
> > > wrote:
> > > >
> > > > On Fri, Jun 28, 2019 at 09:27:44AM -0700, Dan Williams wrote:
> > > > > On Fri, Jun 28, 2019 at 8:39 AM Jason Gunthorpe  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Jun 26, 2019 at 02:27:15PM +0200, Christoph Hellwig wrote:
> > > > > > > The functionality is identical to the one currently open coded in
> > > > > > > device-dax.
> > > > > > >
> > > > > > > Signed-off-by: Christoph Hellwig 
> > > > > > > Reviewed-by: Ira Weiny 
> > > > > > >  drivers/dax/dax-private.h |  4 
> > > > > > >  drivers/dax/device.c  | 43 
> > > > > > > ---
> > > > > > >  2 files changed, 47 deletions(-)
> > > > > >
> > > > > > DanW: I think this series has reached enough review, did you want
> > > > > > to ack/test any further?
> > > > > >
> > > > > > This needs to land in hmm.git soon to make the merge window.
> > > > >
> > > > > I was awaiting a decision about resolving the collision with Ira's
> > > > > patch before testing the final result again [1]. You can go ahead and
> > > > > add my reviewed-by for the series, but my tested-by should be on the
> > > > > final state of the series.
> > > >
> > > > The conflict looks OK to me, I think we can let Andrew and Linus
> > > > resolve it.
> > >
> > > Andrew's tree effectively always rebases since it's a quilt series.
> > > I'd recommend pulling Ira's patch out of -mm and applying it with the
> > > rest of hmm reworks. Any other git tree I'd agree with just doing the
> > > late conflict resolution, but I'm not clear on what's the best
> > > practice when conflicting with -mm.
>
> What happens depends on timing as things arrive to Linus. I promised
> to send hmm.git early, so I understand that Andrew will quilt rebase
> his tree to Linus's and fix the conflict in Ira's patch before he
> sends it.
>
> > Regardless the patch is buggy. If you want to do the conflict
> > resolution it should be because the DEVICE_PUBLIC removal effectively
> > does the same fix otherwise we're knowingly leaving a broken point in
> > the history.
>
> I'm not sure I understand your concern, is there something wrong with
> CH's series as it stands? hmm is a non-rebasing git tree, so as long
> as the series is correct *when I apply it* there is no broken history.
>
> I assumed the conflict resolution for Ira's patch was to simply take
> the deletion of the if block from CH's series - right?
>
> If we do need to take Ira's patch into hmm.git it will go after CH's
> series (and Ira will have to rebase/repost it), so I think there is
> nothing to do at this moment - unless you are saying there is a
> problem with the series in CH's git tree?

There is a problem with the series in CH's tree. It removes the
->page_free() callback from the release_pages() path because it goes
too far and removes the put_devmap_managed_page() call.


Re: [PATCH 16/25] device-dax: use the dev_pagemap internal refcount

2019-06-28 Thread Dan Williams
On Fri, Jun 28, 2019 at 10:08 AM Dan Williams  wrote:
>
> On Fri, Jun 28, 2019 at 10:02 AM Jason Gunthorpe  wrote:
> >
> > On Fri, Jun 28, 2019 at 09:27:44AM -0700, Dan Williams wrote:
> > > On Fri, Jun 28, 2019 at 8:39 AM Jason Gunthorpe  wrote:
> > > >
> > > > On Wed, Jun 26, 2019 at 02:27:15PM +0200, Christoph Hellwig wrote:
> > > > > The functionality is identical to the one currently open coded in
> > > > > device-dax.
> > > > >
> > > > > Signed-off-by: Christoph Hellwig 
> > > > > Reviewed-by: Ira Weiny 
> > > > >  drivers/dax/dax-private.h |  4 
> > > > >  drivers/dax/device.c  | 43 
> > > > > ---
> > > > >  2 files changed, 47 deletions(-)
> > > >
> > > > DanW: I think this series has reached enough review, did you want
> > > > to ack/test any further?
> > > >
> > > > This needs to land in hmm.git soon to make the merge window.
> > >
> > > I was awaiting a decision about resolving the collision with Ira's
> > > patch before testing the final result again [1]. You can go ahead and
> > > add my reviewed-by for the series, but my tested-by should be on the
> > > final state of the series.
> >
> > The conflict looks OK to me, I think we can let Andrew and Linus
> > resolve it.
> >
>
> Andrew's tree effectively always rebases since it's a quilt series.
> I'd recommend pulling Ira's patch out of -mm and applying it with the
> rest of hmm reworks. Any other git tree I'd agree with just doing the
> late conflict resolution, but I'm not clear on what's the best
> practice when conflicting with -mm.

Regardless the patch is buggy. If you want to do the conflict
resolution it should be because the DEVICE_PUBLIC removal effectively
does the same fix otherwise we're knowingly leaving a broken point in
the history.


Re: [PATCH 16/25] device-dax: use the dev_pagemap internal refcount

2019-06-28 Thread Dan Williams
On Fri, Jun 28, 2019 at 10:02 AM Jason Gunthorpe  wrote:
>
> On Fri, Jun 28, 2019 at 09:27:44AM -0700, Dan Williams wrote:
> > On Fri, Jun 28, 2019 at 8:39 AM Jason Gunthorpe  wrote:
> > >
> > > On Wed, Jun 26, 2019 at 02:27:15PM +0200, Christoph Hellwig wrote:
> > > > The functionality is identical to the one currently open coded in
> > > > device-dax.
> > > >
> > > > Signed-off-by: Christoph Hellwig 
> > > > Reviewed-by: Ira Weiny 
> > > >  drivers/dax/dax-private.h |  4 
> > > >  drivers/dax/device.c  | 43 ---
> > > >  2 files changed, 47 deletions(-)
> > >
> > > DanW: I think this series has reached enough review, did you want
> > > to ack/test any further?
> > >
> > > This needs to land in hmm.git soon to make the merge window.
> >
> > I was awaiting a decision about resolving the collision with Ira's
> > patch before testing the final result again [1]. You can go ahead and
> > add my reviewed-by for the series, but my tested-by should be on the
> > final state of the series.
>
> The conflict looks OK to me, I think we can let Andrew and Linus
> resolve it.
>

Andrew's tree effectively always rebases since it's a quilt series.
I'd recommend pulling Ira's patch out of -mm and applying it with the
rest of hmm reworks. Any other git tree I'd agree with just doing the
late conflict resolution, but I'm not clear on what's the best
practice when conflicting with -mm.


Re: [Nouveau] [PATCH 16/25] device-dax: use the dev_pagemap internal refcount

2019-06-28 Thread Dan Williams
On Fri, Jun 28, 2019 at 8:39 AM Jason Gunthorpe  wrote:
>
> On Wed, Jun 26, 2019 at 02:27:15PM +0200, Christoph Hellwig wrote:
> > The functionality is identical to the one currently open coded in
> > device-dax.
> >
> > Signed-off-by: Christoph Hellwig 
> > Reviewed-by: Ira Weiny 
> > ---
> >  drivers/dax/dax-private.h |  4 
> >  drivers/dax/device.c  | 43 ---
> >  2 files changed, 47 deletions(-)
>
> DanW: I think this series has reached enough review, did you want
> to ack/test any further?
>
> This needs to land in hmm.git soon to make the merge window.

I was awaiting a decision about resolving the collision with Ira's
patch before testing the final result again [1]. You can go ahead and
add my reviewed-by for the series, but my tested-by should be on the
final state of the series.

[1]: 
https://lore.kernel.org/lkml/CAPcyv4gTOf+EWzSGrFrh2GrTZt5HVR=e+xicukepiy57px8...@mail.gmail.com/
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 05/22] mm: export alloc_pages_vma

2019-06-26 Thread Dan Williams
On Tue, Jun 25, 2019 at 10:46 PM Michal Hocko  wrote:
>
> On Tue 25-06-19 12:52:18, Dan Williams wrote:
> [...]
> > > Documentation/process/stable-api-nonsense.rst
> >
> > That document has failed to preclude symbol export fights in the past
> > and there is a reasonable argument to try not to retract functionality
> > that had been previously exported regardless of that document.
>
> Can you point me to any specific example where this would be the case
> for the core kernel symbols please?

The most recent example that comes to mind was the thrash around
__kernel_fpu_{begin,end} [1]. I referenced that when debating _GPL
symbol policy with Jérôme [2].

[1]: https://lore.kernel.org/lkml/20190522100959.ga15...@kroah.com/
[2]: 
https://lore.kernel.org/lkml/CAPcyv4gb+r==rikfxkvz7ggdnke62ybmz7xoa4ubbbyhnk9...@mail.gmail.com/
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 04/25] mm: remove MEMORY_DEVICE_PUBLIC support

2019-06-26 Thread Dan Williams
[ add Ira ]

On Wed, Jun 26, 2019 at 5:27 AM Christoph Hellwig  wrote:
>
> The code hasn't been used since it was added to the tree, and doesn't
> appear to actually be usable.
>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Jason Gunthorpe 
> Acked-by: Michal Hocko 
[..]
> diff --git a/mm/swap.c b/mm/swap.c
> index 7ede3eddc12a..83107410d29f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -740,17 +740,6 @@ void release_pages(struct page **pages, int nr)
> if (is_huge_zero_page(page))
> continue;
>
> -   /* Device public page can not be huge page */
> -   if (is_device_public_page(page)) {
> -   if (locked_pgdat) {
> -   
> spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> -  flags);
> -   locked_pgdat = NULL;
> -   }
> -   put_devmap_managed_page(page);
> -   continue;
> -   }
> -

This collides with Ira's bug fix [1]. The MEMORY_DEVICE_FSDAX case
needs this to be converted to be independent of "public" pages.
Perhaps it should be pulled out of -mm and incorporated in this
series.

[1]: https://lore.kernel.org/lkml/20190605214922.17684-1-ira.we...@intel.com/
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 05/22] mm: export alloc_pages_vma

2019-06-25 Thread Dan Williams
On Tue, Jun 25, 2019 at 12:01 PM Michal Hocko  wrote:
>
> On Tue 25-06-19 11:03:53, Dan Williams wrote:
> > On Tue, Jun 25, 2019 at 8:01 AM Michal Hocko  wrote:
> > >
> > > On Tue 25-06-19 09:23:17, Christoph Hellwig wrote:
> > > > On Mon, Jun 24, 2019 at 11:24:48AM -0700, Dan Williams wrote:
> > > > > I asked for this simply because it was not exported historically. In
> > > > > general I want to establish explicit export-type criteria so the
> > > > > community can spend less time debating when to use EXPORT_SYMBOL_GPL
> > > > > [1].
> > > > >
> > > > > The thought in this instance is that it is not historically exported
> > > > > to modules and it is safer from a maintenance perspective to start
> > > > > with GPL-only for new symbols in case we don't want to maintain that
> > > > > interface long-term for out-of-tree modules.
> > > > >
> > > > > Yes, we always reserve the right to remove / change interfaces
> > > > > regardless of the export type, but history has shown that external
> > > > > pressure to keep an interface stable (contrary to
> > > > > Documentation/process/stable-api-nonsense.rst) tends to be less for
> > > > > GPL-only exports.
> > > >
> > > > Fully agreed.  In the end the decision is with the MM maintainers,
> > > > though, although I'd prefer to keep it as in this series.
> > >
> > > I am sorry but I am not really convinced by the above reasoning wrt. to
> > > the allocator API and it has been a subject of many changes over time. I
> > > do not remember a single case where we would be bending the allocator
> > > API because of external modules and I am pretty sure we will push back
> > > heavily if that was the case in the future.
> >
> > This seems to say that you have no direct experience of dealing with
> > changing symbols that that a prominent out-of-tree module needs? GPU
> > drivers and the core-mm are on a path to increase their cooperation on
> > memory management mechanisms over time, and symbol export changes for
> > out-of-tree GPU drivers have been a significant source of friction in
> > the past.
>
> I have an experience e.g. to rework semantic of some gfp flags and that is
> something that users usualy get wrong and never heard that an out of
> tree code would insist on an old semantic and pushing us to the corner.
>
> > > So in this particular case I would go with consistency and export the
> > > same way we do with other functions. Also we do not want people to
> > > reinvent this API and screw that like we have seen in other cases when
> > > external modules try reimplement core functionality themselves.
> >
> > Consistency is a weak argument when the cost to the upstream community
> > is negligible. If the same functionality was available via another /
> > already exported interface *that* would be an argument to maintain the
> > existing export policy. "Consistency" in and of itself is not a
> > precedent we can use more widely in default export-type decisions.
> >
> > Effectively I'm arguing EXPORT_SYMBOL_GPL by default with a later
> > decision to drop the _GPL. Similar to how we are careful to mark sysfs
> > interfaces in Documentation/ABI/ that we are not fully committed to
> > maintaining over time, or are otherwise so new that there is not yet a
> > good read on whether they can be made permanent.
>
> Documentation/process/stable-api-nonsense.rst

That document has failed to preclude symbol export fights in the past
and there is a reasonable argument to try not to retract functionality
that had been previously exported regardless of that document.

> Really. If you want to play with GPL vs. EXPORT_SYMBOL else this is up
> to you but I do not see any technical argument to make this particular
> interface to the page allocator any different from all others that are
> exported to modules.

I'm failing to find any practical substance to your argument, but in
the end I agree with Chrishoph, it's up to MM maintainers.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 05/22] mm: export alloc_pages_vma

2019-06-25 Thread Dan Williams
On Tue, Jun 25, 2019 at 8:01 AM Michal Hocko  wrote:
>
> On Tue 25-06-19 09:23:17, Christoph Hellwig wrote:
> > On Mon, Jun 24, 2019 at 11:24:48AM -0700, Dan Williams wrote:
> > > I asked for this simply because it was not exported historically. In
> > > general I want to establish explicit export-type criteria so the
> > > community can spend less time debating when to use EXPORT_SYMBOL_GPL
> > > [1].
> > >
> > > The thought in this instance is that it is not historically exported
> > > to modules and it is safer from a maintenance perspective to start
> > > with GPL-only for new symbols in case we don't want to maintain that
> > > interface long-term for out-of-tree modules.
> > >
> > > Yes, we always reserve the right to remove / change interfaces
> > > regardless of the export type, but history has shown that external
> > > pressure to keep an interface stable (contrary to
> > > Documentation/process/stable-api-nonsense.rst) tends to be less for
> > > GPL-only exports.
> >
> > Fully agreed.  In the end the decision is with the MM maintainers,
> > though, although I'd prefer to keep it as in this series.
>
> I am sorry but I am not really convinced by the above reasoning wrt. to
> the allocator API and it has been a subject of many changes over time. I
> do not remember a single case where we would be bending the allocator
> API because of external modules and I am pretty sure we will push back
> heavily if that was the case in the future.

This seems to say that you have no direct experience of dealing with
changing symbols that that a prominent out-of-tree module needs? GPU
drivers and the core-mm are on a path to increase their cooperation on
memory management mechanisms over time, and symbol export changes for
out-of-tree GPU drivers have been a significant source of friction in
the past.

> So in this particular case I would go with consistency and export the
> same way we do with other functions. Also we do not want people to
> reinvent this API and screw that like we have seen in other cases when
> external modules try reimplement core functionality themselves.

Consistency is a weak argument when the cost to the upstream community
is negligible. If the same functionality was available via another /
already exported interface *that* would be an argument to maintain the
existing export policy. "Consistency" in and of itself is not a
precedent we can use more widely in default export-type decisions.

Effectively I'm arguing EXPORT_SYMBOL_GPL by default with a later
decision to drop the _GPL. Similar to how we are careful to mark sysfs
interfaces in Documentation/ABI/ that we are not fully committed to
maintaining over time, or are otherwise so new that there is not yet a
good read on whether they can be made permanent.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH 05/22] mm: export alloc_pages_vma

2019-06-24 Thread Dan Williams
On Thu, Jun 20, 2019 at 12:17 PM Michal Hocko  wrote:
>
> On Thu 13-06-19 11:43:08, Christoph Hellwig wrote:
> > noveau is currently using this through an odd hmm wrapper, and I plan
> > to switch it to the real thing later in this series.
> >
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  mm/mempolicy.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 01600d80ae01..f9023b5fba37 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2098,6 +2098,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct 
> > vm_area_struct *vma,
> >  out:
> >   return page;
> >  }
> > +EXPORT_SYMBOL_GPL(alloc_pages_vma);
>
> All allocator exported symbols are EXPORT_SYMBOL, what is a reason to
> have this one special?

I asked for this simply because it was not exported historically. In
general I want to establish explicit export-type criteria so the
community can spend less time debating when to use EXPORT_SYMBOL_GPL
[1].

The thought in this instance is that it is not historically exported
to modules and it is safer from a maintenance perspective to start
with GPL-only for new symbols in case we don't want to maintain that
interface long-term for out-of-tree modules.

Yes, we always reserve the right to remove / change interfaces
regardless of the export type, but history has shown that external
pressure to keep an interface stable (contrary to
Documentation/process/stable-api-nonsense.rst) tends to be less for
GPL-only exports.

[1]: 
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2018-September/005688.html


Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-19 Thread Dan Williams
On Wed, Jun 19, 2019 at 12:42 PM Jason Gunthorpe  wrote:
>
> On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote:
> > On 6/13/19 5:43 PM, Ira Weiny wrote:
> > > On Thu, Jun 13, 2019 at 07:58:29PM +, Jason Gunthorpe wrote:
> > >> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
> > >>>
> > ...
> > >> Hum, so the only thing this config does is short circuit here:
> > >>
> > >> static inline bool is_device_public_page(const struct page *page)
> > >> {
> > >> return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
> > >> IS_ENABLED(CONFIG_DEVICE_PUBLIC) &&
> > >> is_zone_device_page(page) &&
> > >> page->pgmap->type == MEMORY_DEVICE_PUBLIC;
> > >> }
> > >>
> > >> Which is called all over the place..
> > >
> > >   yes but the earlier patch:
> > >
> > > [PATCH 03/22] mm: remove hmm_devmem_add_resource
> > >
> > > Removes the only place type is set to MEMORY_DEVICE_PUBLIC.
> > >
> > > So I think it is ok.  Frankly I was wondering if we should remove the 
> > > public
> > > type altogether but conceptually it seems ok.  But I don't see any users 
> > > of it
> > > so...  should we get rid of it in the code rather than turning the config 
> > > off?
> > >
> > > Ira
> >
> > That seems reasonable. I recall that the hope was for those IBM Power 9
> > systems to use _PUBLIC, as they have hardware-based coherent device (GPU)
> > memory, and so the memory really is visible to the CPU. And the IBM team
> > was thinking of taking advantage of it. But I haven't seen anything on
> > that front for a while.
>
> Does anyone know who those people are and can we encourage them to
> send some patches? :)

I expect marking it CONFIG_BROKEN with the threat of deleting it if no
patches show up *is* the encouragement.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] dev_pagemap related cleanups v2

2019-06-19 Thread Dan Williams
On Wed, Jun 19, 2019 at 9:37 AM Jason Gunthorpe  wrote:
>
> On Wed, Jun 19, 2019 at 11:40:32AM +0200, Christoph Hellwig wrote:
> > On Tue, Jun 18, 2019 at 12:47:10PM -0700, Dan Williams wrote:
> > > > Git tree:
> > > >
> > > > git://git.infradead.org/users/hch/misc.git hmm-devmem-cleanup.2
> > > >
> > > > Gitweb:
> > > >
> > > > 
> > > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hmm-devmem-cleanup.2
> >
> > >
> > > Attached is my incremental fixups on top of this series, with those
> > > integrated you can add:
> >
> > I've folded your incremental bits in and pushed out a new
> > hmm-devmem-cleanup.3 to the repo above.  Let me know if I didn't mess
> > up anything else.  I'll wait for a few more comments and Jason's
> > planned rebase of the hmm branch before reposting.
>
> I said I wouldn't rebase the hmm.git (as it needs to go to DRM, AMD
> and RDMA git trees)..
>
> Instead I will merge v5.2-rc5 to the tree before applying this series.
>
> I've understood this to be Linus's prefered workflow.
>
> So, please send the next iteration of this against either
> plainv5.2-rc5 or v5.2-rc5 merged with hmm.git and I'll sort it out.

Just make sure that when you backmerge v5.2-rc5 you have a clear
reason in the merge commit message about why you needed to do it.
While needless rebasing is top of the pet peeve list, second place, as
I found out, is mystery merges without explanations.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] dev_pagemap related cleanups v2

2019-06-18 Thread Dan Williams
On Mon, Jun 17, 2019 at 5:27 AM Christoph Hellwig  wrote:
>
> Hi Dan, Jérôme and Jason,
>
> below is a series that cleans up the dev_pagemap interface so that
> it is more easily usable, which removes the need to wrap it in hmm
> and thus allowing to kill a lot of code
>
> Note: this series is on top of the rdma/hmm branch + the dev_pagemap
> releas fix series from Dan that went into 5.2-rc5.
>
> Git tree:
>
> git://git.infradead.org/users/hch/misc.git hmm-devmem-cleanup.2
>
> Gitweb:
>
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hmm-devmem-cleanup.2
>
> Changes since v1:
>  - rebase
>  - also switch p2pdma to the internal refcount
>  - add type checking for pgmap->type
>  - rename the migrate method to migrate_to_ram
>  - cleanup the altmap_valid flag
>  - various tidbits from the reviews

Attached is my incremental fixups on top of this series, with those
integrated you can add:

Tested-by: Dan Williams 

...to the patches that touch kernel/memremap.c, drivers/dax, and drivers/nvdimm.

You can also add:

Reviewed-by: Dan Williams 

...for the series.
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index a9d7c90ecf1e..1af823b2fe6b 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -428,6 +428,7 @@ int dev_dax_probe(struct device *dev)
 		return -EBUSY;
 	}
 
+	dev_dax->pgmap.type = MEMORY_DEVICE_DEVDAX;
 	addr = devm_memremap_pages(dev, &dev_dax->pgmap);
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 54500798f23a..57d3a6c3ac70 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -118,4 +118,15 @@ config NVDIMM_KEYS
 	depends on ENCRYPTED_KEYS
 	depends on (LIBNVDIMM=ENCRYPTED_KEYS) || LIBNVDIMM=m
 
+config NVDIMM_TEST_BUILD
+	bool "Build the unit test core"
+	depends on COMPILE_TEST
+	default COMPILE_TEST
+	help
+	  Build the core of the unit test infrastructure.  The result of
+	  this build is non-functional for unit test execution, but it
+	  otherwise helps catch build errors induced by changes to the
+	  core devm_memremap_pages() implementation and other
+	  infrastructure.
+
 endif
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 6f2a088afad6..40080c120363 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -28,3 +28,7 @@ libnvdimm-$(CONFIG_BTT) += btt_devs.o
 libnvdimm-$(CONFIG_NVDIMM_PFN) += pfn_devs.o
 libnvdimm-$(CONFIG_NVDIMM_DAX) += dax_devs.o
 libnvdimm-$(CONFIG_NVDIMM_KEYS) += security.o
+
+TOOLS := ../../tools
+TEST_SRC := $(TOOLS)/testing/nvdimm/test
+obj-$(CONFIG_NVDIMM_TEST_BUILD) := $(TEST_SRC)/iomap.o
diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 7e0f072ddce7..470de68dabd6 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -55,12 +55,19 @@ struct vmem_altmap {
  * MEMORY_DEVICE_PCI_P2PDMA:
  * Device memory residing in a PCI BAR intended for use with Peer-to-Peer
  * transactions.
+ *
+ * MEMORY_DEVICE_DEVDAX:
+ * Host memory that has similar access semantics as System RAM i.e. DMA
+ * coherent and supports page pinning. In contrast to
+ * MEMORY_DEVICE_FS_DAX, this memory is access via a device-dax
+ * character device.
  */
 enum memory_type {
 	MEMORY_DEVICE_PRIVATE = 1,
 	MEMORY_DEVICE_PUBLIC,
 	MEMORY_DEVICE_FS_DAX,
 	MEMORY_DEVICE_PCI_P2PDMA,
+	MEMORY_DEVICE_DEVDAX,
 };
 
 struct dev_pagemap_ops {
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 60693a1e8e92..52b4968e62cd 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -173,6 +173,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	};
 	pgprot_t pgprot = PAGE_KERNEL;
 	int error, nid, is_ram;
+	bool get_ops = true;
 
 	switch (pgmap->type) {
 	case MEMORY_DEVICE_PRIVATE:
@@ -199,6 +200,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		}
 		break;
 	case MEMORY_DEVICE_PCI_P2PDMA:
+	case MEMORY_DEVICE_DEVDAX:
+		get_ops = false;
 		break;
 	default:
 		WARN(1, "Invalid pgmap type %d\n", pgmap->type);
@@ -222,7 +225,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		}
 	}
 
-	if (pgmap->type != MEMORY_DEVICE_PCI_P2PDMA) {
+	if (get_ops) {
 		error = dev_pagemap_get_ops(dev, pgmap);
 		if (error)
 			return ERR_PTR(error);
diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index 8cd9b9873a7f..9019dd8afbc1 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -106,7 +106,7 @@ EXPORT_SYMBOL(__wrap_devm_memremap);
 
 static void nfit_test_kill(void *_pgmap)
 {
-	WARN_ON(!pgmap || !pgmap->ref)
+	struct dev_pagemap *pgmap = _pgmap;
 
 	if (pgmap->ops && pgmap->ops->kill)
 		pgmap->ops->kill(pgmap);
@@ -121,20 +121,45 @@ static void nfit_test_kill(void *_pgmap)
 	}
 }
 
+static void dev_pagemap_percpu_

Re: [Nouveau] [PATCH 15/25] device-dax: use the dev_pagemap internal refcount

2019-06-18 Thread Dan Williams
On Mon, Jun 17, 2019 at 5:28 AM Christoph Hellwig  wrote:
>
> The functionality is identical to the one currently open coded in
> device-dax.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/dax/dax-private.h |  4 
>  drivers/dax/device.c  | 43 ---
>  2 files changed, 47 deletions(-)

This needs the mock devm_memremap_pages() to setup the common
percpu_ref. Incremental patch attached:
From 875e71489c8485448a5b7df2d8a8b2ed77d2b555 Mon Sep 17 00:00:00 2001
From: Dan Williams 
Date: Tue, 18 Jun 2019 11:58:24 -0700
Subject: [PATCH] tools/testing/nvdimm: Support the 'internal' ref of
 dev_pagemap

For users of the common percpu-ref implementation, like device-dax,
arrange for nfit_test to initialize the common parameters.

Signed-off-by: Dan Williams 
---
 tools/testing/nvdimm/test/iomap.c | 41 ---
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index 3bc1c16c4ef9..9019dd8afbc1 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -108,8 +108,6 @@ static void nfit_test_kill(void *_pgmap)
 {
 	struct dev_pagemap *pgmap = _pgmap;
 
-	WARN_ON(!pgmap || !pgmap->ref);
-
 	if (pgmap->ops && pgmap->ops->kill)
 		pgmap->ops->kill(pgmap);
 	else
@@ -123,20 +121,45 @@ static void nfit_test_kill(void *_pgmap)
 	}
 }
 
+static void dev_pagemap_percpu_release(struct percpu_ref *ref)
+{
+	struct dev_pagemap *pgmap =
+		container_of(ref, struct dev_pagemap, internal_ref);
+
+	complete(&pgmap->done);
+}
+
 void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 {
+	int error;
 	resource_size_t offset = pgmap->res.start;
 	struct nfit_test_resource *nfit_res = get_nfit_res(offset);
 
-	if (nfit_res) {
-		int rc;
+	if (!nfit_res)
+		return devm_memremap_pages(dev, pgmap);
 
-		rc = devm_add_action_or_reset(dev, nfit_test_kill, pgmap);
-		if (rc)
-			return ERR_PTR(rc);
-		return nfit_res->buf + offset - nfit_res->res.start;
+	pgmap->dev = dev;
+	if (!pgmap->ref) {
+		if (pgmap->ops && (pgmap->ops->kill || pgmap->ops->cleanup))
+			return ERR_PTR(-EINVAL);
+
+		init_completion(&pgmap->done);
+		error = percpu_ref_init(&pgmap->internal_ref,
+dev_pagemap_percpu_release, 0, GFP_KERNEL);
+		if (error)
+			return ERR_PTR(error);
+		pgmap->ref = &pgmap->internal_ref;
+	} else {
+		if (!pgmap->ops || !pgmap->ops->kill || !pgmap->ops->cleanup) {
+			WARN(1, "Missing reference count teardown definition\n");
+			return ERR_PTR(-EINVAL);
+		}
 	}
-	return devm_memremap_pages(dev, pgmap);
+
+	error = devm_add_action_or_reset(dev, nfit_test_kill, pgmap);
+	if (error)
+		return ERR_PTR(error);
+	return nfit_res->buf + offset - nfit_res->res.start;
 }
 EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
 
-- 
2.20.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 08/25] memremap: move dev_pagemap callbacks into a separate structure

2019-06-17 Thread Dan Williams
On Mon, Jun 17, 2019 at 12:59 PM Christoph Hellwig  wrote:
>
> On Mon, Jun 17, 2019 at 10:51:35AM -0700, Dan Williams wrote:
> > > -   struct dev_pagemap *pgmap = _pgmap;
> >
> > Whoops, needed to keep this line to avoid:
> >
> > tools/testing/nvdimm/test/iomap.c:109:11: error: ‘pgmap’ undeclared
> > (first use in this function); did you mean ‘_pgmap’?
>
> So I really shouldn't be tripping over this anymore, but can we somehow
> this mess?
>
>  - at least add it to the normal build system and kconfig deps instead
>of stashing it away so that things like buildbot can build it?
>  - at least allow building it (under COMPILE_TEST) if needed even when
>pmem.ko and friends are built in the kernel?

Done: https://patchwork.kernel.org/patch/11000477/
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 07/25] memremap: validate the pagemap type passed to devm_memremap_pages

2019-06-17 Thread Dan Williams
On Mon, Jun 17, 2019 at 12:59 PM Christoph Hellwig  wrote:
>
> On Mon, Jun 17, 2019 at 12:02:09PM -0700, Dan Williams wrote:
> > Need a lead in patch that introduces MEMORY_DEVICE_DEVDAX, otherwise:
>
> Or maybe a MEMORY_DEVICE_DEFAULT = 0 shared by fsdax and p2pdma?

I thought about that, but it seems is_pci_p2pdma_page() needs the
distinction between the 2 types.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 10/25] memremap: lift the devmap_enable manipulation into devm_memremap_pages

2019-06-17 Thread Dan Williams
On Mon, Jun 17, 2019 at 5:28 AM Christoph Hellwig  wrote:
>
> Just check if there is a ->page_free operation set and take care of the
> static key enable, as well as the put using device managed resources.
> Also check that a ->page_free is provided for the pgmaps types that
> require it, and check for a valid type as well while we are at it.
>
> Note that this also fixes the fact that hmm never called
> dev_pagemap_put_ops and thus would leave the slow path enabled forever,
> even after a device driver unload or disable.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/nvdimm/pmem.c | 23 +++--
>  include/linux/mm.h| 10 
>  kernel/memremap.c | 57 ++-
>  mm/hmm.c  |  2 --
>  4 files changed, 39 insertions(+), 53 deletions(-)
>
[..]
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index ba7156bd52d1..7272027fbdd7 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
[..]
> @@ -190,6 +219,12 @@ void *devm_memremap_pages(struct device *dev, struct 
> dev_pagemap *pgmap)
> return ERR_PTR(-EINVAL);
> }
>
> +   if (pgmap->type != MEMORY_DEVICE_PCI_P2PDMA) {

Once we have MEMORY_DEVICE_DEVDAX then this check needs to be fixed up
to skip that case as well, otherwise:

 Missing page_free method
 WARNING: CPU: 19 PID: 1518 at kernel/memremap.c:33
devm_memremap_pages+0x745/0x7d0
 RIP: 0010:devm_memremap_pages+0x745/0x7d0
 Call Trace:
  dev_dax_probe+0xc6/0x1e0 [device_dax]
  really_probe+0xef/0x390
  ? driver_allows_async_probing+0x50/0x50
  driver_probe_device+0xb4/0x100
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 07/25] memremap: validate the pagemap type passed to devm_memremap_pages

2019-06-17 Thread Dan Williams
On Mon, Jun 17, 2019 at 5:27 AM Christoph Hellwig  wrote:
>
> Most pgmap types are only supported when certain config options are
> enabled.  Check for a type that is valid for the current configuration
> before setting up the pagemap.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/memremap.c | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 6e1970719dc2..6a2dd31a6250 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -157,6 +157,33 @@ void *devm_memremap_pages(struct device *dev, struct 
> dev_pagemap *pgmap)
> pgprot_t pgprot = PAGE_KERNEL;
> int error, nid, is_ram;
>
> +   switch (pgmap->type) {
> +   case MEMORY_DEVICE_PRIVATE:
> +   if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
> +   WARN(1, "Device private memory not supported\n");
> +   return ERR_PTR(-EINVAL);
> +   }
> +   break;
> +   case MEMORY_DEVICE_PUBLIC:
> +   if (!IS_ENABLED(CONFIG_DEVICE_PUBLIC)) {
> +   WARN(1, "Device public memory not supported\n");
> +   return ERR_PTR(-EINVAL);
> +   }
> +   break;
> +   case MEMORY_DEVICE_FS_DAX:
> +   if (!IS_ENABLED(CONFIG_ZONE_DEVICE) ||
> +   IS_ENABLED(CONFIG_FS_DAX_LIMITED)) {
> +   WARN(1, "File system DAX not supported\n");
> +   return ERR_PTR(-EINVAL);
> +   }
> +   break;
> +   case MEMORY_DEVICE_PCI_P2PDMA:

Need a lead in patch that introduces MEMORY_DEVICE_DEVDAX, otherwise:

 Invalid pgmap type 0
 WARNING: CPU: 6 PID: 1316 at kernel/memremap.c:183
devm_memremap_pages+0x1d8/0x700
 [..]
 RIP: 0010:devm_memremap_pages+0x1d8/0x700
 [..]
 Call Trace:
  dev_dax_probe+0xc7/0x1e0 [device_dax]
  really_probe+0xef/0x390
  driver_probe_device+0xb4/0x100
  device_driver_attach+0x4f/0x60
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH 08/25] memremap: move dev_pagemap callbacks into a separate structure

2019-06-17 Thread Dan Williams
On Mon, Jun 17, 2019 at 5:27 AM Christoph Hellwig  wrote:
>
> The dev_pagemap is a growing too many callbacks.  Move them into a
> separate ops structure so that they are not duplicated for multiple
> instances, and an attacker can't easily overwrite them.
>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Logan Gunthorpe 
> Reviewed-by: Jason Gunthorpe 
> ---
>  drivers/dax/device.c  | 11 ++
>  drivers/dax/pmem/core.c   |  2 +-
>  drivers/nvdimm/pmem.c | 19 +---
>  drivers/pci/p2pdma.c  |  9 +---
>  include/linux/memremap.h  | 36 +--
>  kernel/memremap.c | 18 
>  mm/hmm.c  | 10 ++---
>  tools/testing/nvdimm/test/iomap.c |  9 
>  8 files changed, 65 insertions(+), 49 deletions(-)
>
[..]
> diff --git a/tools/testing/nvdimm/test/iomap.c 
> b/tools/testing/nvdimm/test/iomap.c
> index 219dd0a1cb08..a667d974155e 100644
> --- a/tools/testing/nvdimm/test/iomap.c
> +++ b/tools/testing/nvdimm/test/iomap.c
> @@ -106,11 +106,10 @@ EXPORT_SYMBOL(__wrap_devm_memremap);
>
>  static void nfit_test_kill(void *_pgmap)
>  {
> -   struct dev_pagemap *pgmap = _pgmap;

Whoops, needed to keep this line to avoid:

tools/testing/nvdimm/test/iomap.c:109:11: error: ‘pgmap’ undeclared
(first use in this function); did you mean ‘_pgmap’?


Re: [PATCH 06/25] mm: factor out a devm_request_free_mem_region helper

2019-06-17 Thread Dan Williams
On Mon, Jun 17, 2019 at 5:27 AM Christoph Hellwig  wrote:
>
> Keep the physical address allocation that hmm_add_device does with the
> rest of the resource code, and allow future reuse of it without the hmm
> wrapper.
>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Jason Gunthorpe 
> Reviewed-by: John Hubbard 
> ---
>  include/linux/ioport.h |  2 ++
>  kernel/resource.c  | 39 +++
>  mm/hmm.c   | 33 -
>  3 files changed, 45 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..76a33ae3bf6c 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -286,6 +286,8 @@ static inline bool resource_overlaps(struct resource *r1, 
> struct resource *r2)
> return (r1->start <= r2->end && r1->end >= r2->start);
>  }
>
> +struct resource *devm_request_free_mem_region(struct device *dev,
> +   struct resource *base, unsigned long size);

This appears to need a 'static inline' helper stub in the
CONFIG_DEVICE_PRIVATE=n case, otherwise this compile error triggers:

ld: mm/hmm.o: in function `hmm_devmem_add':
/home/dwillia2/git/linux/mm/hmm.c:1427: undefined reference to
`devm_request_free_mem_region'


Re: [Nouveau] dev_pagemap related cleanups

2019-06-15 Thread Dan Williams
On Sat, Jun 15, 2019 at 1:34 AM Christoph Hellwig  wrote:
>
> On Fri, Jun 14, 2019 at 06:14:45PM -0700, Dan Williams wrote:
> > On Thu, Jun 13, 2019 at 11:14 PM Christoph Hellwig  wrote:
> > >
> > > On Thu, Jun 13, 2019 at 11:27:39AM -0700, Dan Williams wrote:
> > > > It also turns out the nvdimm unit tests crash with this signature on
> > > > that branch where base v5.2-rc3 passes:
> > >
> > > How do you run that test?
> >
> > This is the unit test suite that gets kicked off by running "make
> > check" from the ndctl source repository. In this case it requires the
> > nfit_test set of modules to create a fake nvdimm environment.
> >
> > The setup instructions are in the README, but feel free to send me
> > branches and I can kick off a test. One of these we'll get around to
> > making it automated for patch submissions to the linux-nvdimm mailing
> > list.
>
> Oh, now I remember, and that was the bummer as anything requiring modules
> just does not fit at all into my normal test flows that just inject
> kernel images and use otherwise static images.

Yeah... although we do have some changes being proposed from non-x86
devs to allow a subset of the tests to run without the nfit_test
modules: https://patchwork.kernel.org/patch/10980779/

...so this prompts me to go review that patch.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] dev_pagemap related cleanups

2019-06-14 Thread Dan Williams
On Thu, Jun 13, 2019 at 11:14 PM Christoph Hellwig  wrote:
>
> On Thu, Jun 13, 2019 at 11:27:39AM -0700, Dan Williams wrote:
> > It also turns out the nvdimm unit tests crash with this signature on
> > that branch where base v5.2-rc3 passes:
>
> How do you run that test?

This is the unit test suite that gets kicked off by running "make
check" from the ndctl source repository. In this case it requires the
nfit_test set of modules to create a fake nvdimm environment.

The setup instructions are in the README, but feel free to send me
branches and I can kick off a test. One of these we'll get around to
making it automated for patch submissions to the linux-nvdimm mailing
list.

https://github.com/pmem/ndctl/blob/master/README.md
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: dev_pagemap related cleanups

2019-06-13 Thread Dan Williams
On Thu, Jun 13, 2019 at 1:18 PM Logan Gunthorpe  wrote:
>
>
>
> On 2019-06-13 12:27 p.m., Dan Williams wrote:
> > On Thu, Jun 13, 2019 at 2:43 AM Christoph Hellwig  wrote:
> >>
> >> Hi Dan, Jérôme and Jason,
> >>
> >> below is a series that cleans up the dev_pagemap interface so that
> >> it is more easily usable, which removes the need to wrap it in hmm
> >> and thus allowing to kill a lot of code
> >>
> >> Diffstat:
> >>
> >>  22 files changed, 245 insertions(+), 802 deletions(-)
> >
> > Hooray!
> >
> >> Git tree:
> >>
> >> git://git.infradead.org/users/hch/misc.git hmm-devmem-cleanup
> >
> > I just realized this collides with the dev_pagemap release rework in
> > Andrew's tree (commit ids below are from next.git and are not stable)
> >
> > 4422ee8476f0 mm/devm_memremap_pages: fix final page put race
> > 771f0714d0dc PCI/P2PDMA: track pgmap references per resource, not globally
> > af37085de906 lib/genalloc: introduce chunk owners
> > e0047ff8aa77 PCI/P2PDMA: fix the gen_pool_add_virt() failure path
> > 0315d47d6ae9 mm/devm_memremap_pages: introduce devm_memunmap_pages
> > 216475c7eaa8 drivers/base/devres: introduce devm_release_action()
> >
> > CONFLICT (content): Merge conflict in tools/testing/nvdimm/test/iomap.c
> > CONFLICT (content): Merge conflict in mm/hmm.c
> > CONFLICT (content): Merge conflict in kernel/memremap.c
> > CONFLICT (content): Merge conflict in include/linux/memremap.h
> > CONFLICT (content): Merge conflict in drivers/pci/p2pdma.c
> > CONFLICT (content): Merge conflict in drivers/nvdimm/pmem.c
> > CONFLICT (content): Merge conflict in drivers/dax/device.c
> > CONFLICT (content): Merge conflict in drivers/dax/dax-private.h
> >
> > Perhaps we should pull those out and resend them through hmm.git?
>
> Hmm, I've been waiting for those patches to get in for a little while now ;(

Unless Andrew was going to submit as v5.2-rc fixes I think I should
rebase / submit them on current hmm.git and then throw these cleanups
from Christoph on top?


Re: [Nouveau] [PATCH 08/22] memremap: pass a struct dev_pagemap to ->kill

2019-06-13 Thread Dan Williams
On Thu, Jun 13, 2019 at 1:12 PM Logan Gunthorpe  wrote:
>
>
>
> On 2019-06-13 3:43 a.m., Christoph Hellwig wrote:
> > Passing the actual typed structure leads to more understandable code
> > vs the actual references.
>
> Ha, ok, I originally suggested this to Dan when he introduced the
> callback[1].
>
> Reviewed-by: Logan Gunthorpe 

Reviewed-by: Dan Williams 
Reported-by: Logan Gunthorpe  :)


>
> Logan
>
> [1]
> https://lore.kernel.org/lkml/8f0cae82-130f-8a64-cfbd-fda5fd76b...@deltatee.com/T/#u
>
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH 09/22] memremap: lift the devmap_enable manipulation into devm_memremap_pages

2019-06-13 Thread Dan Williams
On Thu, Jun 13, 2019 at 12:35 PM Jason Gunthorpe  wrote:
>
> On Thu, Jun 13, 2019 at 11:43:12AM +0200, Christoph Hellwig wrote:
> > Just check if there is a ->page_free operation set and take care of the
> > static key enable, as well as the put using device managed resources.
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index c76a1b5defda..6dc769feb2e1 100644
> > +++ b/mm/hmm.c
> > @@ -1378,8 +1378,6 @@ struct hmm_devmem *hmm_devmem_add(const struct 
> > hmm_devmem_ops *ops,
> >   void *result;
> >   int ret;
> >
> > - dev_pagemap_get_ops();
> > -
>
> Where was the matching dev_pagemap_put_ops() for this hmm case? This
> is a bug fix too?
>

It never existed. HMM turned on the facility and made everyone's
put_page() operations slower regardless of whether HMM was in active
use.

> The nouveau driver is the only one to actually call this hmm function
> and it does it as part of a probe function.
>
> Seems reasonable, however, in the unlikely event that it fails to init
> 'dmem' the driver will retain a dev_pagemap_get_ops until it unloads.
> This imbalance doesn't seem worth worrying about.

Right, unless/until the overhead of checking for put_page() callbacks
starts to hurt leaving pagemap_ops tied to lifetime of the driver load
seems acceptable because who unbinds their GPU device at runtime? On
the other hand it was simple enough for the pmem driver to drop the
reference each time a device was unbound just to close the loop.

>
> Reviewed-by: Christoph Hellwig 

...minor typo.


Re: [Nouveau] dev_pagemap related cleanups

2019-06-13 Thread Dan Williams
On Thu, Jun 13, 2019 at 2:43 AM Christoph Hellwig  wrote:
>
> Hi Dan, Jérôme and Jason,
>
> below is a series that cleans up the dev_pagemap interface so that
> it is more easily usable, which removes the need to wrap it in hmm
> and thus allowing to kill a lot of code
>
> Diffstat:
>
>  22 files changed, 245 insertions(+), 802 deletions(-)

Hooray!

> Git tree:
>
> git://git.infradead.org/users/hch/misc.git hmm-devmem-cleanup

I just realized this collides with the dev_pagemap release rework in
Andrew's tree (commit ids below are from next.git and are not stable)

4422ee8476f0 mm/devm_memremap_pages: fix final page put race
771f0714d0dc PCI/P2PDMA: track pgmap references per resource, not globally
af37085de906 lib/genalloc: introduce chunk owners
e0047ff8aa77 PCI/P2PDMA: fix the gen_pool_add_virt() failure path
0315d47d6ae9 mm/devm_memremap_pages: introduce devm_memunmap_pages
216475c7eaa8 drivers/base/devres: introduce devm_release_action()

CONFLICT (content): Merge conflict in tools/testing/nvdimm/test/iomap.c
CONFLICT (content): Merge conflict in mm/hmm.c
CONFLICT (content): Merge conflict in kernel/memremap.c
CONFLICT (content): Merge conflict in include/linux/memremap.h
CONFLICT (content): Merge conflict in drivers/pci/p2pdma.c
CONFLICT (content): Merge conflict in drivers/nvdimm/pmem.c
CONFLICT (content): Merge conflict in drivers/dax/device.c
CONFLICT (content): Merge conflict in drivers/dax/dax-private.h

Perhaps we should pull those out and resend them through hmm.git?

It also turns out the nvdimm unit tests crash with this signature on
that branch where base v5.2-rc3 passes:

BUG: kernel NULL pointer dereference, address: 0008
[..]
CPU: 15 PID: 1414 Comm: lt-libndctl Tainted: G   OE
5.2.0-rc3+ #3399
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
RIP: 0010:percpu_ref_kill_and_confirm+0x1e/0x180
[..]
Call Trace:
 release_nodes+0x234/0x280
 device_release_driver_internal+0xe8/0x1b0
 bus_remove_device+0xf2/0x160
 device_del+0x166/0x370
 unregister_dev_dax+0x23/0x50
 release_nodes+0x234/0x280
 device_release_driver_internal+0xe8/0x1b0
 unbind_store+0x94/0x120
 kernfs_fop_write+0xf0/0x1a0
 vfs_write+0xb7/0x1b0
 ksys_write+0x5c/0xd0
 do_syscall_64+0x60/0x240

The crash bisects to:

d8cc8bbe108c device-dax: use the dev_pagemap internal refcount
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v1] ACPI: Switch to use generic UUID API

2017-05-06 Thread Dan Williams
On Thu, May 4, 2017 at 2:21 AM, Andy Shevchenko
 wrote:
> acpi_evaluate_dsm() and friends take a pointer to a raw buffer of 16
> bytes. Instead we convert them to use uuid_le type. At the same time we
> convert current users.
>
> acpi_str_to_uuid() becomes useless after the conversion and it's safe to
> get rid of it.
>
> The conversion fixes a potential bug in int340x_thermal as well since
> we have to use memcmp() on binary data.
>
> Cc: Rafael J. Wysocki 
> Cc: Mika Westerberg 
> Cc: Borislav Petkov 
> Cc: Dan Williams 
> Cc: Amir Goldstein 
> Cc: Jarkko Sakkinen 
> Cc: Jani Nikula 
> Cc: Ben Skeggs 
> Cc: Benjamin Tissoires 
> Cc: Joerg Roedel 
> Cc: Adrian Hunter 
> Cc: Yisen Zhuang 
> Cc: Bjorn Helgaas 
> Cc: Zhang Rui 
> Cc: Felipe Balbi 
> Cc: Mathias Nyman 
> Cc: Heikki Krogerus 
> Cc: Liam Girdwood 
> Cc: Mark Brown 
> Signed-off-by: Andy Shevchenko 
[..]
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 0f7982a1caaf..bd3e45ede056 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -74,11 +74,11 @@ struct nfit_table_prev {
> struct list_head flushes;
>  };
>
> -static u8 nfit_uuid[NFIT_UUID_MAX][16];
> +static uuid_le nfit_uuid[NFIT_UUID_MAX];
>
> -const u8 *to_nfit_uuid(enum nfit_uuids id)
> +const uuid_le *to_nfit_uuid(enum nfit_uuids id)
>  {
> -   return nfit_uuid[id];
> +   return &nfit_uuid[id];
>  }
>  EXPORT_SYMBOL(to_nfit_uuid);
>
> @@ -207,7 +207,7 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, 
> struct nvdimm *nvdimm,
> u32 offset, fw_status = 0;
> acpi_handle handle;
> unsigned int func;
> -   const u8 *uuid;
> +   const uuid_le *uuid;
> int rc, i;
>
> func = cmd;
> @@ -394,7 +394,7 @@ int nfit_spa_type(struct acpi_nfit_system_address *spa)
> int i;
>
> for (i = 0; i < NFIT_UUID_MAX; i++)
> -   if (memcmp(to_nfit_uuid(i), spa->range_guid, 16) == 0)
> +   if (!uuid_le_cmp_pp(to_nfit_uuid(i), (uuid_le 
> *)spa->range_guid))

What is _cmp_pp? Why not _compare?

Other than that, looks ok to me.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau