Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
On 25.04.19 14:30, Pavel Tatashin wrote: >> >> Yes, also I think you can let go of the device_lock in >> check_memblocks_offline_cb, lock_device_hotplug() should take care of >> this (see Documentation/core-api/memory-hotplug.rst - "locking internals") >> > Hi David, > > Thank you for your comments. I went through memory-hotplug.rst, and I > still think that device_lock() is needed here. In this particular case > it can be replaced with something like READ_ONCE(), but for simplicity > it is better to have device_lock()/device_unlock() as this is not a > performance critical code. > > I do not see any lock ordering issues with this code, as we are > holding lock_device_hotplug() first that prevents userland from > adding/removing memory during this check. Yes, lock ordering is not an issue, I rather think that the device hotplug lock will guard us in all situations. E.g. remove_memory() also does not use it when checking if all blocks are offline. But you can leave it in if you think it is needed. > > https://soleen.com/source/xref/linux/arch/powerpc/platforms/powernv/memtrace.c?r=98fa15f3#248 > > Here we have a similar code: > lock_device_hotplug(); >online_mem_block(); > device_online() > device_lock(dev); > > Pasha > -- Thanks, David / dhildenb ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
> > Yes, also I think you can let go of the device_lock in > check_memblocks_offline_cb, lock_device_hotplug() should take care of > this (see Documentation/core-api/memory-hotplug.rst - "locking internals") > Hi David, Thank you for your comments. I went through memory-hotplug.rst, and I still think that device_lock() is needed here. In this particular case it can be replaced with something like READ_ONCE(), but for simplicity it is better to have device_lock()/device_unlock() as this is not a performance critical code. I do not see any lock ordering issues with this code, as we are holding lock_device_hotplug() first that prevents userland from adding/removing memory during this check. https://soleen.com/source/xref/linux/arch/powerpc/platforms/powernv/memtrace.c?r=98fa15f3#248 Here we have a similar code: lock_device_hotplug(); online_mem_block(); device_online() device_lock(dev); Pasha ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
On 24.04.19 23:34, Pavel Tatashin wrote: +static int +offline_memblock_cb(struct memory_block *mem, void *arg) >>> >>> Function name suggests that you are actually trying to offline memory >>> here. Maybe check_memblocks_offline_cb(), just like we have in >>> mm/memory_hotplug.c. > > Makes sense, I will rename to check_memblocks_offline_cb() > + lock_device_hotplug(); + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb); + unlock_device_hotplug(); + + /* + * If admin has not offlined memory beforehand, we cannot hotremove dax. + * Unfortunately, because unbind will still succeed there is no way for + * user to hotremove dax after this. + */ + if (rc) + return rc; >>> >>> Can't it happen that there is a race between you checking if memory is >>> offline and an admin onlining memory again? maybe pull the >>> remove_memory() into the locked region, using __remove_memory() instead. >> >> I think the race is ok. The admin gets to keep the pieces of allowing >> racing updates to the state and the kernel will keep the range active >> until the next reboot. > > Thank you for noticing this. I will pull it into locking region. > Because, __remove_memory() has this code: > > 1868 ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), > NULL, > 1869 check_memblock_offlined_cb); > 1870 if (ret) > 1871 BUG(); > Yes, also I think you can let go of the device_lock in check_memblocks_offline_cb, lock_device_hotplug() should take care of this (see Documentation/core-api/memory-hotplug.rst - "locking internals") Cheers! -- Thanks, David / dhildenb ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
> > > +static int > > > +offline_memblock_cb(struct memory_block *mem, void *arg) > > > > Function name suggests that you are actually trying to offline memory > > here. Maybe check_memblocks_offline_cb(), just like we have in > > mm/memory_hotplug.c. Makes sense, I will rename to check_memblocks_offline_cb() > > > + lock_device_hotplug(); > > > + rc = walk_memory_range(start_pfn, end_pfn, dev, > > > offline_memblock_cb); > > > + unlock_device_hotplug(); > > > + > > > + /* > > > + * If admin has not offlined memory beforehand, we cannot hotremove > > > dax. > > > + * Unfortunately, because unbind will still succeed there is no way > > > for > > > + * user to hotremove dax after this. > > > + */ > > > + if (rc) > > > + return rc; > > > > Can't it happen that there is a race between you checking if memory is > > offline and an admin onlining memory again? maybe pull the > > remove_memory() into the locked region, using __remove_memory() instead. > > I think the race is ok. The admin gets to keep the pieces of allowing > racing updates to the state and the kernel will keep the range active > until the next reboot. Thank you for noticing this. I will pull it into locking region. Because, __remove_memory() has this code: 1868 ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, 1869 check_memblock_offlined_cb); 1870 if (ret) 1871 BUG(); Basically, panic if at least something is not offlined. Pasha ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
On Wed, Apr 24, 2019 at 1:55 PM David Hildenbrand wrote: > > On 21.04.19 03:44, Pavel Tatashin wrote: > > It is now allowed to use persistent memory like a regular RAM, but > > currently there is no way to remove this memory until machine is > > rebooted. > > > > This work expands the functionality to also allows hotremoving > > previously hotplugged persistent memory, and recover the device for use > > for other purposes. > > > > To hotremove persistent memory, the management software must first > > offline all memory blocks of dax region, and than unbind it from > > device-dax/kmem driver. So, operations should look like this: > > > > echo offline > echo offline > /sys/devices/system/memory/memoryN/state > > ... > > echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind > > > > Note: if unbind is done without offlining memory beforehand, it won't be > > possible to do dax0.0 hotremove, and dax's memory is going to be part of > > System RAM until reboot. > > > > Signed-off-by: Pavel Tatashin > > --- > > drivers/dax/dax-private.h | 2 + > > drivers/dax/kmem.c| 91 +-- > > 2 files changed, 89 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h > > index a45612148ca0..999aaf3a29b3 100644 > > --- a/drivers/dax/dax-private.h > > +++ b/drivers/dax/dax-private.h > > @@ -53,6 +53,7 @@ struct dax_region { > > * @pgmap - pgmap for memmap setup / lifetime (driver owned) > > * @ref: pgmap reference count (driver owned) > > * @cmp: @ref final put completion (driver owned) > > + * @dax_mem_res: physical address range of hotadded DAX memory > > */ > > struct dev_dax { > > struct dax_region *region; > > @@ -62,6 +63,7 @@ struct dev_dax { > > struct dev_pagemap pgmap; > > struct percpu_ref ref; > > struct completion cmp; > > + struct resource *dax_kmem_res; > > }; > > > > static inline struct dev_dax *to_dev_dax(struct device *dev) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > > index 4c0131857133..d4896b281036 100644 > > --- a/drivers/dax/kmem.c > > +++ b/drivers/dax/kmem.c > > @@ -71,21 +71,104 @@ int dev_dax_kmem_probe(struct device *dev) > > kfree(new_res); > > return rc; > > } > > + dev_dax->dax_kmem_res = new_res; > > > > return 0; > > } > > > > +#ifdef CONFIG_MEMORY_HOTREMOVE > > +/* > > + * Check that device-dax's memory_blocks are offline. If a memory_block is > > not > > + * offline a warning is printed and an error is returned. dax hotremove can > > + * succeed only when every memory_block is offlined beforehand. > > + */ > > +static int > > +offline_memblock_cb(struct memory_block *mem, void *arg) > > Function name suggests that you are actually trying to offline memory > here. Maybe check_memblocks_offline_cb(), just like we have in > mm/memory_hotplug.c. > > > +{ > > + struct device *mem_dev = >dev; > > + bool is_offline; > > + > > + device_lock(mem_dev); > > + is_offline = mem_dev->offline; > > + device_unlock(mem_dev); > > + > > + if (!is_offline) { > > + struct device *dev = (struct device *)arg; > > + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr); > > + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr); > > + phys_addr_t spa = spfn << PAGE_SHIFT; > > + phys_addr_t epa = epfn << PAGE_SHIFT; > > + > > + dev_warn(dev, "memory block [%pa-%pa] is not offline\n", > > + , ); > > + > > + return -EBUSY; > > + } > > + > > + return 0; > > +} > > + > > +static int dev_dax_kmem_remove(struct device *dev) > > +{ > > + struct dev_dax *dev_dax = to_dev_dax(dev); > > + struct resource *res = dev_dax->dax_kmem_res; > > + resource_size_t kmem_start; > > + resource_size_t kmem_size; > > + unsigned long start_pfn; > > + unsigned long end_pfn; > > + int rc; > > + > > + /* > > + * dax kmem resource does not exist, means memory was never > > hotplugged. > > + * So, nothing to do here. > > + */ > > + if (!res) > > + return 0; > > + > > + kmem_start = res->start; > > + kmem_size = resource_size(res); > > + start_pfn = kmem_start >> PAGE_SHIFT; > > + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1; > > + > > + /* > > + * Walk and check that every singe memory_block of dax region is > > + * offline > > + */ > > + lock_device_hotplug(); > > + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb); > > + unlock_device_hotplug(); > > + > > + /* > > + * If admin has not offlined memory beforehand, we cannot hotremove > > dax. > > + * Unfortunately, because unbind will still succeed there is no way > > for > > + * user to hotremove dax after this. > > + */ > > + if (rc) > > + return rc; > > Can't it happen that there is
Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
On 21.04.19 03:44, Pavel Tatashin wrote: > It is now allowed to use persistent memory like a regular RAM, but > currently there is no way to remove this memory until machine is > rebooted. > > This work expands the functionality to also allows hotremoving > previously hotplugged persistent memory, and recover the device for use > for other purposes. > > To hotremove persistent memory, the management software must first > offline all memory blocks of dax region, and than unbind it from > device-dax/kmem driver. So, operations should look like this: > > echo offline > echo offline > /sys/devices/system/memory/memoryN/state > ... > echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind > > Note: if unbind is done without offlining memory beforehand, it won't be > possible to do dax0.0 hotremove, and dax's memory is going to be part of > System RAM until reboot. > > Signed-off-by: Pavel Tatashin > --- > drivers/dax/dax-private.h | 2 + > drivers/dax/kmem.c| 91 +-- > 2 files changed, 89 insertions(+), 4 deletions(-) > > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h > index a45612148ca0..999aaf3a29b3 100644 > --- a/drivers/dax/dax-private.h > +++ b/drivers/dax/dax-private.h > @@ -53,6 +53,7 @@ struct dax_region { > * @pgmap - pgmap for memmap setup / lifetime (driver owned) > * @ref: pgmap reference count (driver owned) > * @cmp: @ref final put completion (driver owned) > + * @dax_mem_res: physical address range of hotadded DAX memory > */ > struct dev_dax { > struct dax_region *region; > @@ -62,6 +63,7 @@ struct dev_dax { > struct dev_pagemap pgmap; > struct percpu_ref ref; > struct completion cmp; > + struct resource *dax_kmem_res; > }; > > static inline struct dev_dax *to_dev_dax(struct device *dev) > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 4c0131857133..d4896b281036 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -71,21 +71,104 @@ int dev_dax_kmem_probe(struct device *dev) > kfree(new_res); > return rc; > } > + dev_dax->dax_kmem_res = new_res; > > return 0; > } > > +#ifdef CONFIG_MEMORY_HOTREMOVE > +/* > + * Check that device-dax's memory_blocks are offline. If a memory_block is > not > + * offline a warning is printed and an error is returned. dax hotremove can > + * succeed only when every memory_block is offlined beforehand. > + */ > +static int > +offline_memblock_cb(struct memory_block *mem, void *arg) Function name suggests that you are actually trying to offline memory here. Maybe check_memblocks_offline_cb(), just like we have in mm/memory_hotplug.c. > +{ > + struct device *mem_dev = >dev; > + bool is_offline; > + > + device_lock(mem_dev); > + is_offline = mem_dev->offline; > + device_unlock(mem_dev); > + > + if (!is_offline) { > + struct device *dev = (struct device *)arg; > + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr); > + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr); > + phys_addr_t spa = spfn << PAGE_SHIFT; > + phys_addr_t epa = epfn << PAGE_SHIFT; > + > + dev_warn(dev, "memory block [%pa-%pa] is not offline\n", > + , ); > + > + return -EBUSY; > + } > + > + return 0; > +} > + > +static int dev_dax_kmem_remove(struct device *dev) > +{ > + struct dev_dax *dev_dax = to_dev_dax(dev); > + struct resource *res = dev_dax->dax_kmem_res; > + resource_size_t kmem_start; > + resource_size_t kmem_size; > + unsigned long start_pfn; > + unsigned long end_pfn; > + int rc; > + > + /* > + * dax kmem resource does not exist, means memory was never hotplugged. > + * So, nothing to do here. > + */ > + if (!res) > + return 0; > + > + kmem_start = res->start; > + kmem_size = resource_size(res); > + start_pfn = kmem_start >> PAGE_SHIFT; > + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1; > + > + /* > + * Walk and check that every singe memory_block of dax region is > + * offline > + */ > + lock_device_hotplug(); > + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb); > + unlock_device_hotplug(); > + > + /* > + * If admin has not offlined memory beforehand, we cannot hotremove dax. > + * Unfortunately, because unbind will still succeed there is no way for > + * user to hotremove dax after this. > + */ > + if (rc) > + return rc; Can't it happen that there is a race between you checking if memory is offline and an admin onlining memory again? maybe pull the remove_memory() into the locked region, using __remove_memory() instead. > + > + /* Hotremove memory, cannot fail because memory is already offlined */ > + remove_memory(dev_dax->target_node, kmem_start, kmem_size); > + > +