Re: [PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-04-03 Thread Dan Williams
On Tue, Apr 3, 2018 at 12:39 PM, Mike Snitzer  wrote:
> On Tue, Apr 03 2018 at  2:24pm -0400,
> Dan Williams  wrote:
>
>> On Fri, Mar 30, 2018 at 9:03 PM, Dan Williams  
>> wrote:
>> > In preparation for allowing filesystems to augment the dev_pagemap
>> > associated with a dax_device, add an ->fs_claim() callback. The
>> > ->fs_claim() callback is leveraged by the device-mapper dax
>> > implementation to iterate all member devices in the map and repeat the
>> > claim operation across the array.
>> >
>> > In order to resolve collisions between filesystem operations and DMA to
>> > DAX mapped pages we need a callback when DMA completes. With a callback
>> > we can hold off filesystem operations while DMA is in-flight and then
>> > resume those operations when the last put_page() occurs on a DMA page.
>> > The ->fs_claim() operation arranges for this callback to be registered,
>> > although that implementation is saved for a later patch.
>> >
>> > Cc: Alasdair Kergon 
>> > Cc: Mike Snitzer 
>>
>> Mike, do these DM touches look ok to you?  We need these ->fs_claim()
>> / ->fs_release() interfaces for device-mapper to set up filesystem-dax
>> infrastructure on all sub-devices whenever a dax-capable DM device is
>> mounted. It builds on the device-mapper dax dependency removal
>> patches.
>
> I'd prefer dm_dax_iterate() be renamed to dm_dax_iterate_devices()

Ok, I'll fix that up.

> But dm_dax_iterate() is weird... it is simply returning the struct
> dax_device *dax_dev that is passed: seemingly without actually directly
> changing anything about that dax_device (I can infer that you're
> claiming the underlying devices, but...)

I could at least add a note to see the comment in dm_dax_dev_claim().
The filesystem caller expects to get a dax_dev back or NULL from
fs_dax_claim_bdev() if the claim failed. For fs_dax_claim() the return
value could simply be bool for pass / fail, but I used dax_dev NULL /
not-NULL instead.

In the case of device-mapper the claim attempt can't fail for
conflicting ownership reasons because the exclusive ownership of the
underlying block device is already established by device-mapper before
the fs claims the device-mapper dax device.

> In general user's of ti->type->iterate_devices can get a result back
> (via 'int' return).. you aren't using it that way (and maybe dax will
> never have a need to return an answer).  But all said, I think I'd
> prefer to see dm_dax_iterate_devices() return void.
>
> But please let me know if I'm missing something, thanks.

Oh, yeah, I like that better. I'll just make it return void and have
dm_dax_fs_claim() return the dax_dev directly.

Thanks Mike!


Re: [PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-04-03 Thread Dan Williams
On Tue, Apr 3, 2018 at 12:39 PM, Mike Snitzer  wrote:
> On Tue, Apr 03 2018 at  2:24pm -0400,
> Dan Williams  wrote:
>
>> On Fri, Mar 30, 2018 at 9:03 PM, Dan Williams  
>> wrote:
>> > In preparation for allowing filesystems to augment the dev_pagemap
>> > associated with a dax_device, add an ->fs_claim() callback. The
>> > ->fs_claim() callback is leveraged by the device-mapper dax
>> > implementation to iterate all member devices in the map and repeat the
>> > claim operation across the array.
>> >
>> > In order to resolve collisions between filesystem operations and DMA to
>> > DAX mapped pages we need a callback when DMA completes. With a callback
>> > we can hold off filesystem operations while DMA is in-flight and then
>> > resume those operations when the last put_page() occurs on a DMA page.
>> > The ->fs_claim() operation arranges for this callback to be registered,
>> > although that implementation is saved for a later patch.
>> >
>> > Cc: Alasdair Kergon 
>> > Cc: Mike Snitzer 
>>
>> Mike, do these DM touches look ok to you?  We need these ->fs_claim()
>> / ->fs_release() interfaces for device-mapper to set up filesystem-dax
>> infrastructure on all sub-devices whenever a dax-capable DM device is
>> mounted. It builds on the device-mapper dax dependency removal
>> patches.
>
> I'd prefer dm_dax_iterate() be renamed to dm_dax_iterate_devices()

Ok, I'll fix that up.

> But dm_dax_iterate() is weird... it is simply returning the struct
> dax_device *dax_dev that is passed: seemingly without actually directly
> changing anything about that dax_device (I can infer that you're
> claiming the underlying devices, but...)

I could at least add a note to see the comment in dm_dax_dev_claim().
The filesystem caller expects to get a dax_dev back or NULL from
fs_dax_claim_bdev() if the claim failed. For fs_dax_claim() the return
value could simply be bool for pass / fail, but I used dax_dev NULL /
not-NULL instead.

In the case of device-mapper the claim attempt can't fail for
conflicting ownership reasons because the exclusive ownership of the
underlying block device is already established by device-mapper before
the fs claims the device-mapper dax device.

> In general user's of ti->type->iterate_devices can get a result back
> (via 'int' return).. you aren't using it that way (and maybe dax will
> never have a need to return an answer).  But all said, I think I'd
> prefer to see dm_dax_iterate_devices() return void.
>
> But please let me know if I'm missing something, thanks.

Oh, yeah, I like that better. I'll just make it return void and have
dm_dax_fs_claim() return the dax_dev directly.

Thanks Mike!


Re: [PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-04-03 Thread Mike Snitzer
On Tue, Apr 03 2018 at  2:24pm -0400,
Dan Williams  wrote:

> On Fri, Mar 30, 2018 at 9:03 PM, Dan Williams  
> wrote:
> > In preparation for allowing filesystems to augment the dev_pagemap
> > associated with a dax_device, add an ->fs_claim() callback. The
> > ->fs_claim() callback is leveraged by the device-mapper dax
> > implementation to iterate all member devices in the map and repeat the
> > claim operation across the array.
> >
> > In order to resolve collisions between filesystem operations and DMA to
> > DAX mapped pages we need a callback when DMA completes. With a callback
> > we can hold off filesystem operations while DMA is in-flight and then
> > resume those operations when the last put_page() occurs on a DMA page.
> > The ->fs_claim() operation arranges for this callback to be registered,
> > although that implementation is saved for a later patch.
> >
> > Cc: Alasdair Kergon 
> > Cc: Mike Snitzer 
> 
> Mike, do these DM touches look ok to you?  We need these ->fs_claim()
> / ->fs_release() interfaces for device-mapper to set up filesystem-dax
> infrastructure on all sub-devices whenever a dax-capable DM device is
> mounted. It builds on the device-mapper dax dependency removal
> patches.

I'd prefer dm_dax_iterate() be renamed to dm_dax_iterate_devices()

But dm_dax_iterate() is weird... it is simply returning the struct
dax_device *dax_dev that is passed: seemingly without actually directly
changing anything about that dax_device (I can infer that you're
claiming the underlying devices, but...)

In general user's of ti->type->iterate_devices can get a result back
(via 'int' return).. you aren't using it that way (and maybe dax will
never have a need to return an answer).  But all said, I think I'd
prefer to see dm_dax_iterate_devices() return void.

But please let me know if I'm missing something, thanks.

Mike


Re: [PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-04-03 Thread Mike Snitzer
On Tue, Apr 03 2018 at  2:24pm -0400,
Dan Williams  wrote:

> On Fri, Mar 30, 2018 at 9:03 PM, Dan Williams  
> wrote:
> > In preparation for allowing filesystems to augment the dev_pagemap
> > associated with a dax_device, add an ->fs_claim() callback. The
> > ->fs_claim() callback is leveraged by the device-mapper dax
> > implementation to iterate all member devices in the map and repeat the
> > claim operation across the array.
> >
> > In order to resolve collisions between filesystem operations and DMA to
> > DAX mapped pages we need a callback when DMA completes. With a callback
> > we can hold off filesystem operations while DMA is in-flight and then
> > resume those operations when the last put_page() occurs on a DMA page.
> > The ->fs_claim() operation arranges for this callback to be registered,
> > although that implementation is saved for a later patch.
> >
> > Cc: Alasdair Kergon 
> > Cc: Mike Snitzer 
> 
> Mike, do these DM touches look ok to you?  We need these ->fs_claim()
> / ->fs_release() interfaces for device-mapper to set up filesystem-dax
> infrastructure on all sub-devices whenever a dax-capable DM device is
> mounted. It builds on the device-mapper dax dependency removal
> patches.

I'd prefer dm_dax_iterate() be renamed to dm_dax_iterate_devices()

But dm_dax_iterate() is weird... it is simply returning the struct
dax_device *dax_dev that is passed: seemingly without actually directly
changing anything about that dax_device (I can infer that you're
claiming the underlying devices, but...)

In general user's of ti->type->iterate_devices can get a result back
(via 'int' return).. you aren't using it that way (and maybe dax will
never have a need to return an answer).  But all said, I think I'd
prefer to see dm_dax_iterate_devices() return void.

But please let me know if I'm missing something, thanks.

Mike


Re: [PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-04-03 Thread Dan Williams
On Fri, Mar 30, 2018 at 9:03 PM, Dan Williams  wrote:
> In preparation for allowing filesystems to augment the dev_pagemap
> associated with a dax_device, add an ->fs_claim() callback. The
> ->fs_claim() callback is leveraged by the device-mapper dax
> implementation to iterate all member devices in the map and repeat the
> claim operation across the array.
>
> In order to resolve collisions between filesystem operations and DMA to
> DAX mapped pages we need a callback when DMA completes. With a callback
> we can hold off filesystem operations while DMA is in-flight and then
> resume those operations when the last put_page() occurs on a DMA page.
> The ->fs_claim() operation arranges for this callback to be registered,
> although that implementation is saved for a later patch.
>
> Cc: Alasdair Kergon 
> Cc: Mike Snitzer 

Mike, do these DM touches look ok to you?  We need these ->fs_claim()
/ ->fs_release() interfaces for device-mapper to set up filesystem-dax
infrastructure on all sub-devices whenever a dax-capable DM device is
mounted. It builds on the device-mapper dax dependency removal
patches.

> Cc: Matthew Wilcox 
> Cc: Ross Zwisler 
> Cc: "Jérôme Glisse" 
> Cc: Christoph Hellwig 
> Cc: Jan Kara 
> Signed-off-by: Dan Williams 
> ---
>  drivers/dax/super.c  |   80 
> ++
>  drivers/md/dm.c  |   56 
>  include/linux/dax.h  |   16 +
>  include/linux/memremap.h |8 +
>  4 files changed, 160 insertions(+)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 2b2332b605e4..c4cf284dfe1c 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -29,6 +29,7 @@ static struct vfsmount *dax_mnt;
>  static DEFINE_IDA(dax_minor_ida);
>  static struct kmem_cache *dax_cache __read_mostly;
>  static struct super_block *dax_superblock __read_mostly;
> +static DEFINE_MUTEX(devmap_lock);
>
>  #define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
>  static struct hlist_head dax_host_list[DAX_HASH_SIZE];
> @@ -169,9 +170,88 @@ struct dax_device {
> const char *host;
> void *private;
> unsigned long flags;
> +   struct dev_pagemap *pgmap;
> const struct dax_operations *ops;
>  };
>
> +#if IS_ENABLED(CONFIG_FS_DAX)
> +static void generic_dax_pagefree(struct page *page, void *data)
> +{
> +   /* TODO: wakeup page-idle waiters */
> +}
> +
> +struct dax_device *fs_dax_claim(struct dax_device *dax_dev, void *owner)
> +{
> +   struct dev_pagemap *pgmap;
> +
> +   if (!dax_dev->pgmap)
> +   return dax_dev;
> +   pgmap = dax_dev->pgmap;
> +
> +   mutex_lock(_lock);
> +   if (pgmap->data && pgmap->data == owner) {
> +   /* dm might try to claim the same device more than once... */
> +   mutex_unlock(_lock);
> +   return dax_dev;
> +   } else if (pgmap->page_free || pgmap->page_fault
> +   || pgmap->type != MEMORY_DEVICE_HOST) {
> +   put_dax(dax_dev);
> +   mutex_unlock(_lock);
> +   return NULL;
> +   }
> +
> +   pgmap->type = MEMORY_DEVICE_FS_DAX;
> +   pgmap->page_free = generic_dax_pagefree;
> +   pgmap->data = owner;
> +   mutex_unlock(_lock);
> +
> +   return dax_dev;
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_claim);
> +
> +struct dax_device *fs_dax_claim_bdev(struct block_device *bdev, void *owner)
> +{
> +   struct dax_device *dax_dev;
> +
> +   if (!blk_queue_dax(bdev->bd_queue))
> +   return NULL;
> +   dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
> +   if (dax_dev->ops->fs_claim)
> +   return dax_dev->ops->fs_claim(dax_dev, owner);
> +   else
> +   return fs_dax_claim(dax_dev, owner);
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_claim_bdev);
> +
> +void __fs_dax_release(struct dax_device *dax_dev, void *owner)
> +{
> +   struct dev_pagemap *pgmap = dax_dev ? dax_dev->pgmap : NULL;
> +
> +   put_dax(dax_dev);
> +   if (!pgmap)
> +   return;
> +   if (!pgmap->data)
> +   return;
> +
> +   mutex_lock(_lock);
> +   WARN_ON(pgmap->data != owner);
> +   pgmap->type = MEMORY_DEVICE_HOST;
> +   pgmap->page_free = NULL;
> +   pgmap->data = NULL;
> +   mutex_unlock(_lock);
> +}
> +EXPORT_SYMBOL_GPL(__fs_dax_release);
> +
> +void fs_dax_release(struct dax_device *dax_dev, void *owner)
> +{
> +   if (dax_dev->ops->fs_release)
> +   dax_dev->ops->fs_release(dax_dev, owner);
> +   else
> +   __fs_dax_release(dax_dev, owner);
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_release);
> +#endif
> +
>  static ssize_t write_cache_show(struct device *dev,
> struct device_attribute *attr, 

Re: [PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-04-03 Thread Dan Williams
On Fri, Mar 30, 2018 at 9:03 PM, Dan Williams  wrote:
> In preparation for allowing filesystems to augment the dev_pagemap
> associated with a dax_device, add an ->fs_claim() callback. The
> ->fs_claim() callback is leveraged by the device-mapper dax
> implementation to iterate all member devices in the map and repeat the
> claim operation across the array.
>
> In order to resolve collisions between filesystem operations and DMA to
> DAX mapped pages we need a callback when DMA completes. With a callback
> we can hold off filesystem operations while DMA is in-flight and then
> resume those operations when the last put_page() occurs on a DMA page.
> The ->fs_claim() operation arranges for this callback to be registered,
> although that implementation is saved for a later patch.
>
> Cc: Alasdair Kergon 
> Cc: Mike Snitzer 

Mike, do these DM touches look ok to you?  We need these ->fs_claim()
/ ->fs_release() interfaces for device-mapper to set up filesystem-dax
infrastructure on all sub-devices whenever a dax-capable DM device is
mounted. It builds on the device-mapper dax dependency removal
patches.

> Cc: Matthew Wilcox 
> Cc: Ross Zwisler 
> Cc: "Jérôme Glisse" 
> Cc: Christoph Hellwig 
> Cc: Jan Kara 
> Signed-off-by: Dan Williams 
> ---
>  drivers/dax/super.c  |   80 
> ++
>  drivers/md/dm.c  |   56 
>  include/linux/dax.h  |   16 +
>  include/linux/memremap.h |8 +
>  4 files changed, 160 insertions(+)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 2b2332b605e4..c4cf284dfe1c 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -29,6 +29,7 @@ static struct vfsmount *dax_mnt;
>  static DEFINE_IDA(dax_minor_ida);
>  static struct kmem_cache *dax_cache __read_mostly;
>  static struct super_block *dax_superblock __read_mostly;
> +static DEFINE_MUTEX(devmap_lock);
>
>  #define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
>  static struct hlist_head dax_host_list[DAX_HASH_SIZE];
> @@ -169,9 +170,88 @@ struct dax_device {
> const char *host;
> void *private;
> unsigned long flags;
> +   struct dev_pagemap *pgmap;
> const struct dax_operations *ops;
>  };
>
> +#if IS_ENABLED(CONFIG_FS_DAX)
> +static void generic_dax_pagefree(struct page *page, void *data)
> +{
> +   /* TODO: wakeup page-idle waiters */
> +}
> +
> +struct dax_device *fs_dax_claim(struct dax_device *dax_dev, void *owner)
> +{
> +   struct dev_pagemap *pgmap;
> +
> +   if (!dax_dev->pgmap)
> +   return dax_dev;
> +   pgmap = dax_dev->pgmap;
> +
> +   mutex_lock(_lock);
> +   if (pgmap->data && pgmap->data == owner) {
> +   /* dm might try to claim the same device more than once... */
> +   mutex_unlock(_lock);
> +   return dax_dev;
> +   } else if (pgmap->page_free || pgmap->page_fault
> +   || pgmap->type != MEMORY_DEVICE_HOST) {
> +   put_dax(dax_dev);
> +   mutex_unlock(_lock);
> +   return NULL;
> +   }
> +
> +   pgmap->type = MEMORY_DEVICE_FS_DAX;
> +   pgmap->page_free = generic_dax_pagefree;
> +   pgmap->data = owner;
> +   mutex_unlock(_lock);
> +
> +   return dax_dev;
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_claim);
> +
> +struct dax_device *fs_dax_claim_bdev(struct block_device *bdev, void *owner)
> +{
> +   struct dax_device *dax_dev;
> +
> +   if (!blk_queue_dax(bdev->bd_queue))
> +   return NULL;
> +   dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
> +   if (dax_dev->ops->fs_claim)
> +   return dax_dev->ops->fs_claim(dax_dev, owner);
> +   else
> +   return fs_dax_claim(dax_dev, owner);
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_claim_bdev);
> +
> +void __fs_dax_release(struct dax_device *dax_dev, void *owner)
> +{
> +   struct dev_pagemap *pgmap = dax_dev ? dax_dev->pgmap : NULL;
> +
> +   put_dax(dax_dev);
> +   if (!pgmap)
> +   return;
> +   if (!pgmap->data)
> +   return;
> +
> +   mutex_lock(_lock);
> +   WARN_ON(pgmap->data != owner);
> +   pgmap->type = MEMORY_DEVICE_HOST;
> +   pgmap->page_free = NULL;
> +   pgmap->data = NULL;
> +   mutex_unlock(_lock);
> +}
> +EXPORT_SYMBOL_GPL(__fs_dax_release);
> +
> +void fs_dax_release(struct dax_device *dax_dev, void *owner)
> +{
> +   if (dax_dev->ops->fs_release)
> +   dax_dev->ops->fs_release(dax_dev, owner);
> +   else
> +   __fs_dax_release(dax_dev, owner);
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_release);
> +#endif
> +
>  static ssize_t write_cache_show(struct device *dev,
> struct device_attribute *attr, char *buf)
>  {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index ffc93aecc02a..964cb7537f11 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1090,6 +1090,60 @@ static 

[PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-03-30 Thread Dan Williams
In preparation for allowing filesystems to augment the dev_pagemap
associated with a dax_device, add an ->fs_claim() callback. The
->fs_claim() callback is leveraged by the device-mapper dax
implementation to iterate all member devices in the map and repeat the
claim operation across the array.

In order to resolve collisions between filesystem operations and DMA to
DAX mapped pages we need a callback when DMA completes. With a callback
we can hold off filesystem operations while DMA is in-flight and then
resume those operations when the last put_page() occurs on a DMA page.
The ->fs_claim() operation arranges for this callback to be registered,
although that implementation is saved for a later patch.

Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Matthew Wilcox 
Cc: Ross Zwisler 
Cc: "Jérôme Glisse" 
Cc: Christoph Hellwig 
Cc: Jan Kara 
Signed-off-by: Dan Williams 
---
 drivers/dax/super.c  |   80 ++
 drivers/md/dm.c  |   56 
 include/linux/dax.h  |   16 +
 include/linux/memremap.h |8 +
 4 files changed, 160 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 2b2332b605e4..c4cf284dfe1c 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -29,6 +29,7 @@ static struct vfsmount *dax_mnt;
 static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
+static DEFINE_MUTEX(devmap_lock);
 
 #define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
 static struct hlist_head dax_host_list[DAX_HASH_SIZE];
@@ -169,9 +170,88 @@ struct dax_device {
const char *host;
void *private;
unsigned long flags;
+   struct dev_pagemap *pgmap;
const struct dax_operations *ops;
 };
 
+#if IS_ENABLED(CONFIG_FS_DAX)
+static void generic_dax_pagefree(struct page *page, void *data)
+{
+   /* TODO: wakeup page-idle waiters */
+}
+
+struct dax_device *fs_dax_claim(struct dax_device *dax_dev, void *owner)
+{
+   struct dev_pagemap *pgmap;
+
+   if (!dax_dev->pgmap)
+   return dax_dev;
+   pgmap = dax_dev->pgmap;
+
+   mutex_lock(_lock);
+   if (pgmap->data && pgmap->data == owner) {
+   /* dm might try to claim the same device more than once... */
+   mutex_unlock(_lock);
+   return dax_dev;
+   } else if (pgmap->page_free || pgmap->page_fault
+   || pgmap->type != MEMORY_DEVICE_HOST) {
+   put_dax(dax_dev);
+   mutex_unlock(_lock);
+   return NULL;
+   }
+
+   pgmap->type = MEMORY_DEVICE_FS_DAX;
+   pgmap->page_free = generic_dax_pagefree;
+   pgmap->data = owner;
+   mutex_unlock(_lock);
+
+   return dax_dev;
+}
+EXPORT_SYMBOL_GPL(fs_dax_claim);
+
+struct dax_device *fs_dax_claim_bdev(struct block_device *bdev, void *owner)
+{
+   struct dax_device *dax_dev;
+
+   if (!blk_queue_dax(bdev->bd_queue))
+   return NULL;
+   dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
+   if (dax_dev->ops->fs_claim)
+   return dax_dev->ops->fs_claim(dax_dev, owner);
+   else
+   return fs_dax_claim(dax_dev, owner);
+}
+EXPORT_SYMBOL_GPL(fs_dax_claim_bdev);
+
+void __fs_dax_release(struct dax_device *dax_dev, void *owner)
+{
+   struct dev_pagemap *pgmap = dax_dev ? dax_dev->pgmap : NULL;
+
+   put_dax(dax_dev);
+   if (!pgmap)
+   return;
+   if (!pgmap->data)
+   return;
+
+   mutex_lock(_lock);
+   WARN_ON(pgmap->data != owner);
+   pgmap->type = MEMORY_DEVICE_HOST;
+   pgmap->page_free = NULL;
+   pgmap->data = NULL;
+   mutex_unlock(_lock);
+}
+EXPORT_SYMBOL_GPL(__fs_dax_release);
+
+void fs_dax_release(struct dax_device *dax_dev, void *owner)
+{
+   if (dax_dev->ops->fs_release)
+   dax_dev->ops->fs_release(dax_dev, owner);
+   else
+   __fs_dax_release(dax_dev, owner);
+}
+EXPORT_SYMBOL_GPL(fs_dax_release);
+#endif
+
 static ssize_t write_cache_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ffc93aecc02a..964cb7537f11 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1090,6 +1090,60 @@ static size_t dm_dax_copy_from_iter(struct dax_device 
*dax_dev, pgoff_t pgoff,
return ret;
 }
 
+static int dm_dax_dev_claim(struct dm_target *ti, struct dm_dev *dev,
+   sector_t start, sector_t len, void *owner)
+{
+   if (fs_dax_claim(dev->dax_dev, owner))
+   return 0;
+   /*
+* Outside of a kernel bug there is no reason a dax_dev should
+* fail a claim attempt. Device-mapper should 

[PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure

2018-03-30 Thread Dan Williams
In preparation for allowing filesystems to augment the dev_pagemap
associated with a dax_device, add an ->fs_claim() callback. The
->fs_claim() callback is leveraged by the device-mapper dax
implementation to iterate all member devices in the map and repeat the
claim operation across the array.

In order to resolve collisions between filesystem operations and DMA to
DAX mapped pages we need a callback when DMA completes. With a callback
we can hold off filesystem operations while DMA is in-flight and then
resume those operations when the last put_page() occurs on a DMA page.
The ->fs_claim() operation arranges for this callback to be registered,
although that implementation is saved for a later patch.

Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Matthew Wilcox 
Cc: Ross Zwisler 
Cc: "Jérôme Glisse" 
Cc: Christoph Hellwig 
Cc: Jan Kara 
Signed-off-by: Dan Williams 
---
 drivers/dax/super.c  |   80 ++
 drivers/md/dm.c  |   56 
 include/linux/dax.h  |   16 +
 include/linux/memremap.h |8 +
 4 files changed, 160 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 2b2332b605e4..c4cf284dfe1c 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -29,6 +29,7 @@ static struct vfsmount *dax_mnt;
 static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
+static DEFINE_MUTEX(devmap_lock);
 
 #define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
 static struct hlist_head dax_host_list[DAX_HASH_SIZE];
@@ -169,9 +170,88 @@ struct dax_device {
const char *host;
void *private;
unsigned long flags;
+   struct dev_pagemap *pgmap;
const struct dax_operations *ops;
 };
 
+#if IS_ENABLED(CONFIG_FS_DAX)
+static void generic_dax_pagefree(struct page *page, void *data)
+{
+   /* TODO: wakeup page-idle waiters */
+}
+
+struct dax_device *fs_dax_claim(struct dax_device *dax_dev, void *owner)
+{
+   struct dev_pagemap *pgmap;
+
+   if (!dax_dev->pgmap)
+   return dax_dev;
+   pgmap = dax_dev->pgmap;
+
+   mutex_lock(_lock);
+   if (pgmap->data && pgmap->data == owner) {
+   /* dm might try to claim the same device more than once... */
+   mutex_unlock(_lock);
+   return dax_dev;
+   } else if (pgmap->page_free || pgmap->page_fault
+   || pgmap->type != MEMORY_DEVICE_HOST) {
+   put_dax(dax_dev);
+   mutex_unlock(_lock);
+   return NULL;
+   }
+
+   pgmap->type = MEMORY_DEVICE_FS_DAX;
+   pgmap->page_free = generic_dax_pagefree;
+   pgmap->data = owner;
+   mutex_unlock(_lock);
+
+   return dax_dev;
+}
+EXPORT_SYMBOL_GPL(fs_dax_claim);
+
+struct dax_device *fs_dax_claim_bdev(struct block_device *bdev, void *owner)
+{
+   struct dax_device *dax_dev;
+
+   if (!blk_queue_dax(bdev->bd_queue))
+   return NULL;
+   dax_dev = fs_dax_get_by_host(bdev->bd_disk->disk_name);
+   if (dax_dev->ops->fs_claim)
+   return dax_dev->ops->fs_claim(dax_dev, owner);
+   else
+   return fs_dax_claim(dax_dev, owner);
+}
+EXPORT_SYMBOL_GPL(fs_dax_claim_bdev);
+
+void __fs_dax_release(struct dax_device *dax_dev, void *owner)
+{
+   struct dev_pagemap *pgmap = dax_dev ? dax_dev->pgmap : NULL;
+
+   put_dax(dax_dev);
+   if (!pgmap)
+   return;
+   if (!pgmap->data)
+   return;
+
+   mutex_lock(_lock);
+   WARN_ON(pgmap->data != owner);
+   pgmap->type = MEMORY_DEVICE_HOST;
+   pgmap->page_free = NULL;
+   pgmap->data = NULL;
+   mutex_unlock(_lock);
+}
+EXPORT_SYMBOL_GPL(__fs_dax_release);
+
+void fs_dax_release(struct dax_device *dax_dev, void *owner)
+{
+   if (dax_dev->ops->fs_release)
+   dax_dev->ops->fs_release(dax_dev, owner);
+   else
+   __fs_dax_release(dax_dev, owner);
+}
+EXPORT_SYMBOL_GPL(fs_dax_release);
+#endif
+
 static ssize_t write_cache_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ffc93aecc02a..964cb7537f11 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1090,6 +1090,60 @@ static size_t dm_dax_copy_from_iter(struct dax_device 
*dax_dev, pgoff_t pgoff,
return ret;
 }
 
+static int dm_dax_dev_claim(struct dm_target *ti, struct dm_dev *dev,
+   sector_t start, sector_t len, void *owner)
+{
+   if (fs_dax_claim(dev->dax_dev, owner))
+   return 0;
+   /*
+* Outside of a kernel bug there is no reason a dax_dev should
+* fail a claim attempt. Device-mapper should have exclusive
+* ownership of the dm_dev and the filesystem should have
+* exclusive ownership of the dm_target.
+*/
+   WARN_ON_ONCE(1);