Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-25 Thread David Hildenbrand
On 25.04.19 14:30, Pavel Tatashin wrote:
>>
>> Yes, also I think you can let go of the device_lock in
>> check_memblocks_offline_cb, lock_device_hotplug() should take care of
>> this (see Documentation/core-api/memory-hotplug.rst - "locking internals")
>>
> Hi David,
> 
> Thank you for your comments. I went through memory-hotplug.rst, and I
> still think that device_lock() is needed here. In this particular case
> it can be replaced with something like READ_ONCE(), but for simplicity
> it is better to have device_lock()/device_unlock() as this is not a
> performance critical code.
> 
> I do not see any lock ordering issues with this code, as we are
> holding lock_device_hotplug() first that prevents userland from
> adding/removing memory during this check.

Yes, lock ordering is not an issue, I rather think that the device
hotplug lock will guard us in all situations. E.g. remove_memory() also
does not use it when checking if all blocks are offline. But you can
leave it in if you think it is needed.

> 
> https://soleen.com/source/xref/linux/arch/powerpc/platforms/powernv/memtrace.c?r=98fa15f3#248
> 
> Here we have a similar code:
> lock_device_hotplug();
>online_mem_block();
> device_online()
>  device_lock(dev);
> 
> Pasha
> 

-- 

Thanks,

David / dhildenb
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-25 Thread Pavel Tatashin
>
> Yes, also I think you can let go of the device_lock in
> check_memblocks_offline_cb, lock_device_hotplug() should take care of
> this (see Documentation/core-api/memory-hotplug.rst - "locking internals")
>
Hi David,

Thank you for your comments. I went through memory-hotplug.rst, and I
still think that device_lock() is needed here. In this particular case
it can be replaced with something like READ_ONCE(), but for simplicity
it is better to have device_lock()/device_unlock() as this is not a
performance critical code.

I do not see any lock ordering issues with this code, as we are
holding lock_device_hotplug() first that prevents userland from
adding/removing memory during this check.

https://soleen.com/source/xref/linux/arch/powerpc/platforms/powernv/memtrace.c?r=98fa15f3#248

Here we have a similar code:
lock_device_hotplug();
   online_mem_block();
device_online()
 device_lock(dev);

Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-25 Thread David Hildenbrand
On 24.04.19 23:34, Pavel Tatashin wrote:
 +static int
 +offline_memblock_cb(struct memory_block *mem, void *arg)
>>>
>>> Function name suggests that you are actually trying to offline memory
>>> here. Maybe check_memblocks_offline_cb(), just like we have in
>>> mm/memory_hotplug.c.
> 
> Makes sense, I will rename to check_memblocks_offline_cb()
> 
 + lock_device_hotplug();
 + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb);
 + unlock_device_hotplug();
 +
 + /*
 +  * If admin has not offlined memory beforehand, we cannot hotremove 
 dax.
 +  * Unfortunately, because unbind will still succeed there is no way 
 for
 +  * user to hotremove dax after this.
 +  */
 + if (rc)
 + return rc;
>>>
>>> Can't it happen that there is a race between you checking if memory is
>>> offline and an admin onlining memory again? maybe pull the
>>> remove_memory() into the locked region, using __remove_memory() instead.
>>
>> I think the race is ok. The admin gets to keep the pieces of allowing
>> racing updates to the state and the kernel will keep the range active
>> until the next reboot.
> 
> Thank you for noticing this. I will pull it into locking region.
> Because, __remove_memory() has this code:
> 
> 1868   ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), 
> NULL,
> 1869   check_memblock_offlined_cb);
> 1870   if (ret)
> 1871  BUG();
> 

Yes, also I think you can let go of the device_lock in
check_memblocks_offline_cb, lock_device_hotplug() should take care of
this (see Documentation/core-api/memory-hotplug.rst - "locking internals")

Cheers!

-- 

Thanks,

David / dhildenb
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-24 Thread Pavel Tatashin
> > > +static int
> > > +offline_memblock_cb(struct memory_block *mem, void *arg)
> >
> > Function name suggests that you are actually trying to offline memory
> > here. Maybe check_memblocks_offline_cb(), just like we have in
> > mm/memory_hotplug.c.

Makes sense, I will rename to check_memblocks_offline_cb()

> > > + lock_device_hotplug();
> > > + rc = walk_memory_range(start_pfn, end_pfn, dev, 
> > > offline_memblock_cb);
> > > + unlock_device_hotplug();
> > > +
> > > + /*
> > > +  * If admin has not offlined memory beforehand, we cannot hotremove 
> > > dax.
> > > +  * Unfortunately, because unbind will still succeed there is no way 
> > > for
> > > +  * user to hotremove dax after this.
> > > +  */
> > > + if (rc)
> > > + return rc;
> >
> > Can't it happen that there is a race between you checking if memory is
> > offline and an admin onlining memory again? maybe pull the
> > remove_memory() into the locked region, using __remove_memory() instead.
>
> I think the race is ok. The admin gets to keep the pieces of allowing
> racing updates to the state and the kernel will keep the range active
> until the next reboot.

Thank you for noticing this. I will pull it into locking region.
Because, __remove_memory() has this code:

1868   ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
1869   check_memblock_offlined_cb);
1870   if (ret)
1871  BUG();

Basically, panic if at least something is not offlined.

Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-24 Thread Dan Williams
On Wed, Apr 24, 2019 at 1:55 PM David Hildenbrand  wrote:
>
> On 21.04.19 03:44, Pavel Tatashin wrote:
> > It is now allowed to use persistent memory like a regular RAM, but
> > currently there is no way to remove this memory until machine is
> > rebooted.
> >
> > This work expands the functionality to also allows hotremoving
> > previously hotplugged persistent memory, and recover the device for use
> > for other purposes.
> >
> > To hotremove persistent memory, the management software must first
> > offline all memory blocks of dax region, and than unbind it from
> > device-dax/kmem driver. So, operations should look like this:
> >
> > echo offline > echo offline > /sys/devices/system/memory/memoryN/state
> > ...
> > echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
> >
> > Note: if unbind is done without offlining memory beforehand, it won't be
> > possible to do dax0.0 hotremove, and dax's memory is going to be part of
> > System RAM until reboot.
> >
> > Signed-off-by: Pavel Tatashin 
> > ---
> >  drivers/dax/dax-private.h |  2 +
> >  drivers/dax/kmem.c| 91 +--
> >  2 files changed, 89 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> > index a45612148ca0..999aaf3a29b3 100644
> > --- a/drivers/dax/dax-private.h
> > +++ b/drivers/dax/dax-private.h
> > @@ -53,6 +53,7 @@ struct dax_region {
> >   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
> >   * @ref: pgmap reference count (driver owned)
> >   * @cmp: @ref final put completion (driver owned)
> > + * @dax_mem_res: physical address range of hotadded DAX memory
> >   */
> >  struct dev_dax {
> >   struct dax_region *region;
> > @@ -62,6 +63,7 @@ struct dev_dax {
> >   struct dev_pagemap pgmap;
> >   struct percpu_ref ref;
> >   struct completion cmp;
> > + struct resource *dax_kmem_res;
> >  };
> >
> >  static inline struct dev_dax *to_dev_dax(struct device *dev)
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index 4c0131857133..d4896b281036 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -71,21 +71,104 @@ int dev_dax_kmem_probe(struct device *dev)
> >   kfree(new_res);
> >   return rc;
> >   }
> > + dev_dax->dax_kmem_res = new_res;
> >
> >   return 0;
> >  }
> >
> > +#ifdef CONFIG_MEMORY_HOTREMOVE
> > +/*
> > + * Check that device-dax's memory_blocks are offline. If a memory_block is 
> > not
> > + * offline a warning is printed and an error is returned. dax hotremove can
> > + * succeed only when every memory_block is offlined beforehand.
> > + */
> > +static int
> > +offline_memblock_cb(struct memory_block *mem, void *arg)
>
> Function name suggests that you are actually trying to offline memory
> here. Maybe check_memblocks_offline_cb(), just like we have in
> mm/memory_hotplug.c.
>
> > +{
> > + struct device *mem_dev = >dev;
> > + bool is_offline;
> > +
> > + device_lock(mem_dev);
> > + is_offline = mem_dev->offline;
> > + device_unlock(mem_dev);
> > +
> > + if (!is_offline) {
> > + struct device *dev = (struct device *)arg;
> > + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> > + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr);
> > + phys_addr_t spa = spfn << PAGE_SHIFT;
> > + phys_addr_t epa = epfn << PAGE_SHIFT;
> > +
> > + dev_warn(dev, "memory block [%pa-%pa] is not offline\n",
> > +  , );
> > +
> > + return -EBUSY;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int dev_dax_kmem_remove(struct device *dev)
> > +{
> > + struct dev_dax *dev_dax = to_dev_dax(dev);
> > + struct resource *res = dev_dax->dax_kmem_res;
> > + resource_size_t kmem_start;
> > + resource_size_t kmem_size;
> > + unsigned long start_pfn;
> > + unsigned long end_pfn;
> > + int rc;
> > +
> > + /*
> > +  * dax kmem resource does not exist, means memory was never 
> > hotplugged.
> > +  * So, nothing to do here.
> > +  */
> > + if (!res)
> > + return 0;
> > +
> > + kmem_start = res->start;
> > + kmem_size = resource_size(res);
> > + start_pfn = kmem_start >> PAGE_SHIFT;
> > + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> > +
> > + /*
> > +  * Walk and check that every singe memory_block of dax region is
> > +  * offline
> > +  */
> > + lock_device_hotplug();
> > + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb);
> > + unlock_device_hotplug();
> > +
> > + /*
> > +  * If admin has not offlined memory beforehand, we cannot hotremove 
> > dax.
> > +  * Unfortunately, because unbind will still succeed there is no way 
> > for
> > +  * user to hotremove dax after this.
> > +  */
> > + if (rc)
> > + return rc;
>
> Can't it happen that there is 

Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-24 Thread David Hildenbrand
On 21.04.19 03:44, Pavel Tatashin wrote:
> It is now allowed to use persistent memory like a regular RAM, but
> currently there is no way to remove this memory until machine is
> rebooted.
> 
> This work expands the functionality to also allows hotremoving
> previously hotplugged persistent memory, and recover the device for use
> for other purposes.
> 
> To hotremove persistent memory, the management software must first
> offline all memory blocks of dax region, and than unbind it from
> device-dax/kmem driver. So, operations should look like this:
> 
> echo offline > echo offline > /sys/devices/system/memory/memoryN/state
> ...
> echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
> 
> Note: if unbind is done without offlining memory beforehand, it won't be
> possible to do dax0.0 hotremove, and dax's memory is going to be part of
> System RAM until reboot.
> 
> Signed-off-by: Pavel Tatashin 
> ---
>  drivers/dax/dax-private.h |  2 +
>  drivers/dax/kmem.c| 91 +--
>  2 files changed, 89 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index a45612148ca0..999aaf3a29b3 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -53,6 +53,7 @@ struct dax_region {
>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
>   * @ref: pgmap reference count (driver owned)
>   * @cmp: @ref final put completion (driver owned)
> + * @dax_mem_res: physical address range of hotadded DAX memory
>   */
>  struct dev_dax {
>   struct dax_region *region;
> @@ -62,6 +63,7 @@ struct dev_dax {
>   struct dev_pagemap pgmap;
>   struct percpu_ref ref;
>   struct completion cmp;
> + struct resource *dax_kmem_res;
>  };
>  
>  static inline struct dev_dax *to_dev_dax(struct device *dev)
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 4c0131857133..d4896b281036 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -71,21 +71,104 @@ int dev_dax_kmem_probe(struct device *dev)
>   kfree(new_res);
>   return rc;
>   }
> + dev_dax->dax_kmem_res = new_res;
>  
>   return 0;
>  }
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/*
> + * Check that device-dax's memory_blocks are offline. If a memory_block is 
> not
> + * offline a warning is printed and an error is returned. dax hotremove can
> + * succeed only when every memory_block is offlined beforehand.
> + */
> +static int
> +offline_memblock_cb(struct memory_block *mem, void *arg)

Function name suggests that you are actually trying to offline memory
here. Maybe check_memblocks_offline_cb(), just like we have in
mm/memory_hotplug.c.

> +{
> + struct device *mem_dev = >dev;
> + bool is_offline;
> +
> + device_lock(mem_dev);
> + is_offline = mem_dev->offline;
> + device_unlock(mem_dev);
> +
> + if (!is_offline) {
> + struct device *dev = (struct device *)arg;
> + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr);
> + phys_addr_t spa = spfn << PAGE_SHIFT;
> + phys_addr_t epa = epfn << PAGE_SHIFT;
> +
> + dev_warn(dev, "memory block [%pa-%pa] is not offline\n",
> +  , );
> +
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> +static int dev_dax_kmem_remove(struct device *dev)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + struct resource *res = dev_dax->dax_kmem_res;
> + resource_size_t kmem_start;
> + resource_size_t kmem_size;
> + unsigned long start_pfn;
> + unsigned long end_pfn;
> + int rc;
> +
> + /*
> +  * dax kmem resource does not exist, means memory was never hotplugged.
> +  * So, nothing to do here.
> +  */
> + if (!res)
> + return 0;
> +
> + kmem_start = res->start;
> + kmem_size = resource_size(res);
> + start_pfn = kmem_start >> PAGE_SHIFT;
> + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> +
> + /*
> +  * Walk and check that every singe memory_block of dax region is
> +  * offline
> +  */
> + lock_device_hotplug();
> + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb);
> + unlock_device_hotplug();
> +
> + /*
> +  * If admin has not offlined memory beforehand, we cannot hotremove dax.
> +  * Unfortunately, because unbind will still succeed there is no way for
> +  * user to hotremove dax after this.
> +  */
> + if (rc)
> + return rc;

Can't it happen that there is a race between you checking if memory is
offline and an admin onlining memory again? maybe pull the
remove_memory() into the locked region, using __remove_memory() instead.

> +
> + /* Hotremove memory, cannot fail because memory is already offlined */
> + remove_memory(dev_dax->target_node, kmem_start, kmem_size);
> +
> +