Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
On Wed, Apr 24, 2019 at 6:37 PM Aneesh Kumar K.V wrote: > > On 4/24/19 11:43 PM, Dan Williams wrote: > > On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox wrote: > >> > >> On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote: > >>> I think unaligned addresses have always been passed to > >>> vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think* > >>> the only change needed is the following, thoughts? > >>> > >>> diff --git a/fs/dax.c b/fs/dax.c > >>> index ca0671d55aa6..82aee9a87efa 100644 > >>> --- a/fs/dax.c > >>> +++ b/fs/dax.c > >>> @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct > >>> vm_fault *vmf, pfn_t *pfnp, > >>> } > >>> > >>> trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, > >>> entry); > >>> - result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, > >>> pfn, > >>> + result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn, > >>> write); > >> > >> We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does > >> that need to change too? > > > > It wasn't clear to me that it was a problem. I think that one already > > happens to be pmd-aligned. > > > > How about vmf_insert_pfn_pud()? That is currently not used by fsdax, only devdax, but it does seem that it passes the unaligned fault address rather than the pud aligned address. I'll add that to the proposed fix. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Message could not be delivered
___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] arm64: configurable sparsemem section size
On 04/25/2019 01:18 AM, Pavel Tatashin wrote: > On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual > wrote: >> >> On 04/24/2019 02:08 AM, Pavel Tatashin wrote: >>> sparsemem section size determines the maximum size and alignment that >>> is allowed to offline/online memory block. The bigger the size the less >>> the clutter in /sys/devices/system/memory/*. On the other hand, however, >>> there is less flexability in what granules of memory can be added and >>> removed. >> >> Is there any scenario where less than a 1GB needs to be added on arm64 ? > > Yes, DAX hotplug loses 1G of memory without allowing smaller sections. > Machines on which we are going to be using this functionality have 8G > of System RAM, therefore losing 1G is a big problem. > > For details about using scenario see this cover letter: > https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/ Its loosing 1GB because devdax has 2M alignment ? IIRC from Dan's subsection memory hot add series 2M comes from persistent memory HW controller's limitations. Does that limitation applicable across all platforms including arm64 for all possible persistent memory vendors. I mean is it universal ? IIUC subsection memory hot plug series is still getting reviewed. Hence should not we wait for it to get merged before enabling applicable platforms to accommodate these 2M limitations. > >> >>> >>> Recently, it was enabled in Linux to hotadd persistent memory that >>> can be either real NV device, or reserved from regular System RAM >>> and has identity of devdax. >> >> devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet. > > Correct, I use your patches to enable ZONE_DEVICE, and thus devdax on ARM64: > https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/ > >> >>> >>> The problem is that because ARM64's section size is 1G, and devdax must >>> have 2M label section, the first 1G is always missed when device is >>> attached, because it is not 1G aligned. >> >> devdax has to be 2M aligned ? Does Linux enforce that right now ? > > Unfortunately, there is no way around this. Part of the memory can be > reserved as persistent memory via device tree. > memory@4000 { > device_type = "memory"; > reg = < 0x 0x4000 > 0x0002 0x >; > }; > > pmem@1c000 { > compatible = "pmem-region"; > reg = <0x0001 0xc000 >0x 0x8000>; > volatile; > numa-node-id = <0>; > }; > > So, while pmem is section aligned, as it should be, the dax device is > going to be pmem start address + label size, which is 2M. The actual Forgive my ignorance here but why dax device label size is 2M aligned. Again is that because of some persistent memory HW controller limitations ? > DAX device starts at: > 0x1c000 + 2M. > > Because section size is 1G, the hotplug will able to add only memory > starting from > 0x1c000 + 1G Got it but as mentioned before we will have to make sure that 2M alignment requirement is universal else we will be adjusting this multiple times. > >> 27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and >> SECTION_SIZE mismatch. Even with 27 bits its 128 MB section size. How does it solve the problem with 2M ? The patch just wanted to reduce the memory wastage ? > > Can you please elaborate what configs are you using? I have no > problems compiling with 27 and 28 bit. After applying your patch [1] on current mainline kernel [2]. $make defconfig CONFIG_ARM64_64K_PAGES=y CONFIG_ARM64_VA_BITS_48=y CONFIG_ARM64_VA_BITS=48 CONFIG_ARM64_PA_BITS_48=y CONFIG_ARM64_PA_BITS=48 CONFIG_ARM64_SECTION_SIZE_BITS=27 [1] https://patchwork.kernel.org/patch/10913737/ [2] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git It fails with CC arch/arm64/kernel/asm-offsets.s In file included from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from ./include/linux/resource_ext.h:19, from ./include/linux/acpi.h:26, from ./include/acpi/apei.h:9, from ./include/acpi/ghes.h:5, from ./include/linux/arm_sdei.h:14, from arch/arm64/kernel/asm-offsets.c:21: ./include/linux/mmzone.h:1095:2: error: #error Allocator MAX_ORDER exceeds SECTION_SIZE #error Allocator MAX_ORDER exceeds SECTION_SIZE ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
On 4/24/19 11:43 PM, Dan Williams wrote: On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox wrote: On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote: I think unaligned addresses have always been passed to vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think* the only change needed is the following, thoughts? diff --git a/fs/dax.c b/fs/dax.c index ca0671d55aa6..82aee9a87efa 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, } trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry); - result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn, + result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn, write); We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does that need to change too? It wasn't clear to me that it was a problem. I think that one already happens to be pmd-aligned. How about vmf_insert_pfn_pud()? -aneesh ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
> > > +static int > > > +offline_memblock_cb(struct memory_block *mem, void *arg) > > > > Function name suggests that you are actually trying to offline memory > > here. Maybe check_memblocks_offline_cb(), just like we have in > > mm/memory_hotplug.c. Makes sense, I will rename to check_memblocks_offline_cb() > > > + lock_device_hotplug(); > > > + rc = walk_memory_range(start_pfn, end_pfn, dev, > > > offline_memblock_cb); > > > + unlock_device_hotplug(); > > > + > > > + /* > > > + * If admin has not offlined memory beforehand, we cannot hotremove > > > dax. > > > + * Unfortunately, because unbind will still succeed there is no way > > > for > > > + * user to hotremove dax after this. > > > + */ > > > + if (rc) > > > + return rc; > > > > Can't it happen that there is a race between you checking if memory is > > offline and an admin onlining memory again? maybe pull the > > remove_memory() into the locked region, using __remove_memory() instead. > > I think the race is ok. The admin gets to keep the pieces of allowing > racing updates to the state and the kernel will keep the range active > until the next reboot. Thank you for noticing this. I will pull it into locking region. Because, __remove_memory() has this code: 1868 ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, 1869 check_memblock_offlined_cb); 1870 if (ret) 1871 BUG(); Basically, panic if at least something is not offlined. Pasha ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
On Wed, Apr 24, 2019 at 1:55 PM David Hildenbrand wrote: > > On 21.04.19 03:44, Pavel Tatashin wrote: > > It is now allowed to use persistent memory like a regular RAM, but > > currently there is no way to remove this memory until machine is > > rebooted. > > > > This work expands the functionality to also allows hotremoving > > previously hotplugged persistent memory, and recover the device for use > > for other purposes. > > > > To hotremove persistent memory, the management software must first > > offline all memory blocks of dax region, and than unbind it from > > device-dax/kmem driver. So, operations should look like this: > > > > echo offline > echo offline > /sys/devices/system/memory/memoryN/state > > ... > > echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind > > > > Note: if unbind is done without offlining memory beforehand, it won't be > > possible to do dax0.0 hotremove, and dax's memory is going to be part of > > System RAM until reboot. > > > > Signed-off-by: Pavel Tatashin > > --- > > drivers/dax/dax-private.h | 2 + > > drivers/dax/kmem.c| 91 +-- > > 2 files changed, 89 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h > > index a45612148ca0..999aaf3a29b3 100644 > > --- a/drivers/dax/dax-private.h > > +++ b/drivers/dax/dax-private.h > > @@ -53,6 +53,7 @@ struct dax_region { > > * @pgmap - pgmap for memmap setup / lifetime (driver owned) > > * @ref: pgmap reference count (driver owned) > > * @cmp: @ref final put completion (driver owned) > > + * @dax_mem_res: physical address range of hotadded DAX memory > > */ > > struct dev_dax { > > struct dax_region *region; > > @@ -62,6 +63,7 @@ struct dev_dax { > > struct dev_pagemap pgmap; > > struct percpu_ref ref; > > struct completion cmp; > > + struct resource *dax_kmem_res; > > }; > > > > static inline struct dev_dax *to_dev_dax(struct device *dev) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > > index 4c0131857133..d4896b281036 100644 > > --- a/drivers/dax/kmem.c > > +++ b/drivers/dax/kmem.c > > @@ -71,21 +71,104 @@ int dev_dax_kmem_probe(struct device *dev) > > kfree(new_res); > > return rc; > > } > > + dev_dax->dax_kmem_res = new_res; > > > > return 0; > > } > > > > +#ifdef CONFIG_MEMORY_HOTREMOVE > > +/* > > + * Check that device-dax's memory_blocks are offline. If a memory_block is > > not > > + * offline a warning is printed and an error is returned. dax hotremove can > > + * succeed only when every memory_block is offlined beforehand. > > + */ > > +static int > > +offline_memblock_cb(struct memory_block *mem, void *arg) > > Function name suggests that you are actually trying to offline memory > here. Maybe check_memblocks_offline_cb(), just like we have in > mm/memory_hotplug.c. > > > +{ > > + struct device *mem_dev = >dev; > > + bool is_offline; > > + > > + device_lock(mem_dev); > > + is_offline = mem_dev->offline; > > + device_unlock(mem_dev); > > + > > + if (!is_offline) { > > + struct device *dev = (struct device *)arg; > > + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr); > > + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr); > > + phys_addr_t spa = spfn << PAGE_SHIFT; > > + phys_addr_t epa = epfn << PAGE_SHIFT; > > + > > + dev_warn(dev, "memory block [%pa-%pa] is not offline\n", > > + , ); > > + > > + return -EBUSY; > > + } > > + > > + return 0; > > +} > > + > > +static int dev_dax_kmem_remove(struct device *dev) > > +{ > > + struct dev_dax *dev_dax = to_dev_dax(dev); > > + struct resource *res = dev_dax->dax_kmem_res; > > + resource_size_t kmem_start; > > + resource_size_t kmem_size; > > + unsigned long start_pfn; > > + unsigned long end_pfn; > > + int rc; > > + > > + /* > > + * dax kmem resource does not exist, means memory was never > > hotplugged. > > + * So, nothing to do here. > > + */ > > + if (!res) > > + return 0; > > + > > + kmem_start = res->start; > > + kmem_size = resource_size(res); > > + start_pfn = kmem_start >> PAGE_SHIFT; > > + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1; > > + > > + /* > > + * Walk and check that every singe memory_block of dax region is > > + * offline > > + */ > > + lock_device_hotplug(); > > + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb); > > + unlock_device_hotplug(); > > + > > + /* > > + * If admin has not offlined memory beforehand, we cannot hotremove > > dax. > > + * Unfortunately, because unbind will still succeed there is no way > > for > > + * user to hotremove dax after this. > > + */ > > + if (rc) > > + return rc; > > Can't it happen that there is
Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM
On 21.04.19 03:44, Pavel Tatashin wrote: > It is now allowed to use persistent memory like a regular RAM, but > currently there is no way to remove this memory until machine is > rebooted. > > This work expands the functionality to also allows hotremoving > previously hotplugged persistent memory, and recover the device for use > for other purposes. > > To hotremove persistent memory, the management software must first > offline all memory blocks of dax region, and than unbind it from > device-dax/kmem driver. So, operations should look like this: > > echo offline > echo offline > /sys/devices/system/memory/memoryN/state > ... > echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind > > Note: if unbind is done without offlining memory beforehand, it won't be > possible to do dax0.0 hotremove, and dax's memory is going to be part of > System RAM until reboot. > > Signed-off-by: Pavel Tatashin > --- > drivers/dax/dax-private.h | 2 + > drivers/dax/kmem.c| 91 +-- > 2 files changed, 89 insertions(+), 4 deletions(-) > > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h > index a45612148ca0..999aaf3a29b3 100644 > --- a/drivers/dax/dax-private.h > +++ b/drivers/dax/dax-private.h > @@ -53,6 +53,7 @@ struct dax_region { > * @pgmap - pgmap for memmap setup / lifetime (driver owned) > * @ref: pgmap reference count (driver owned) > * @cmp: @ref final put completion (driver owned) > + * @dax_mem_res: physical address range of hotadded DAX memory > */ > struct dev_dax { > struct dax_region *region; > @@ -62,6 +63,7 @@ struct dev_dax { > struct dev_pagemap pgmap; > struct percpu_ref ref; > struct completion cmp; > + struct resource *dax_kmem_res; > }; > > static inline struct dev_dax *to_dev_dax(struct device *dev) > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 4c0131857133..d4896b281036 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -71,21 +71,104 @@ int dev_dax_kmem_probe(struct device *dev) > kfree(new_res); > return rc; > } > + dev_dax->dax_kmem_res = new_res; > > return 0; > } > > +#ifdef CONFIG_MEMORY_HOTREMOVE > +/* > + * Check that device-dax's memory_blocks are offline. If a memory_block is > not > + * offline a warning is printed and an error is returned. dax hotremove can > + * succeed only when every memory_block is offlined beforehand. > + */ > +static int > +offline_memblock_cb(struct memory_block *mem, void *arg) Function name suggests that you are actually trying to offline memory here. Maybe check_memblocks_offline_cb(), just like we have in mm/memory_hotplug.c. > +{ > + struct device *mem_dev = >dev; > + bool is_offline; > + > + device_lock(mem_dev); > + is_offline = mem_dev->offline; > + device_unlock(mem_dev); > + > + if (!is_offline) { > + struct device *dev = (struct device *)arg; > + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr); > + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr); > + phys_addr_t spa = spfn << PAGE_SHIFT; > + phys_addr_t epa = epfn << PAGE_SHIFT; > + > + dev_warn(dev, "memory block [%pa-%pa] is not offline\n", > + , ); > + > + return -EBUSY; > + } > + > + return 0; > +} > + > +static int dev_dax_kmem_remove(struct device *dev) > +{ > + struct dev_dax *dev_dax = to_dev_dax(dev); > + struct resource *res = dev_dax->dax_kmem_res; > + resource_size_t kmem_start; > + resource_size_t kmem_size; > + unsigned long start_pfn; > + unsigned long end_pfn; > + int rc; > + > + /* > + * dax kmem resource does not exist, means memory was never hotplugged. > + * So, nothing to do here. > + */ > + if (!res) > + return 0; > + > + kmem_start = res->start; > + kmem_size = resource_size(res); > + start_pfn = kmem_start >> PAGE_SHIFT; > + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1; > + > + /* > + * Walk and check that every singe memory_block of dax region is > + * offline > + */ > + lock_device_hotplug(); > + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb); > + unlock_device_hotplug(); > + > + /* > + * If admin has not offlined memory beforehand, we cannot hotremove dax. > + * Unfortunately, because unbind will still succeed there is no way for > + * user to hotremove dax after this. > + */ > + if (rc) > + return rc; Can't it happen that there is a race between you checking if memory is offline and an admin onlining memory again? maybe pull the remove_memory() into the locked region, using __remove_memory() instead. > + > + /* Hotremove memory, cannot fail because memory is already offlined */ > + remove_memory(dev_dax->target_node, kmem_start, kmem_size); > + > +
Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support
I am also taking a look at this work now. I will review and test it in the next couple of days. Pasha On Tue, Apr 23, 2019 at 9:17 AM Oscar Salvador wrote: > > On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote: > > On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton > org> wrote: > > > > > > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams > > el.com> wrote: > > > > > > > The memory hotplug section is an arbitrary / convenient unit for > > > > memory > > > > hotplug. 'Section-size' units have bled into the user interface > > > > ('memblock' sysfs) and can not be changed without breaking > > > > existing > > > > userspace. The section-size constraint, while mostly benign for > > > > typical > > > > memory hotplug, has and continues to wreak havoc with 'device- > > > > memory' > > > > use cases, persistent memory (pmem) in particular. Recall that > > > > pmem uses > > > > devm_memremap_pages(), and subsequently arch_add_memory(), to > > > > allocate a > > > > 'struct page' memmap for pmem. However, it does not use the > > > > 'bottom > > > > half' of memory hotplug, i.e. never marks pmem pages online and > > > > never > > > > exposes the userspace memblock interface for pmem. This leaves an > > > > opening to redress the section-size constraint. > > > > > > v6 and we're not showing any review activity. Who would be > > > suitable > > > people to help out here? > > > > There was quite a bit of review of the cover letter from Michal and > > David, but you're right the details not so much as of yet. I'd like > > to > > call out other people where I can reciprocate with some review of my > > own. Oscar's altmap work looks like a good candidate for that. > > Thanks Dan for ccing me. > I will take a look at the patches soon. > > -- > Oscar Salvador > SUSE L3 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] arm64: configurable sparsemem section size
> This is yet another example of where we need to break down the section > alignment requirement for arch_add_memory(). > > https://lore.kernel.org/lkml/12633539.2015392.2477781120122237934.st...@dwillia2-desk3.amr.corp.intel.com/ Hi Dan, Yes, that is exactly what I am trying to solve with this patch. I will test if your series works with ARM64, and if does not I will let you know what is broken. But, I think, this patch is not needed if your patches are accepted into mainline. Thank you, Pasha ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] arm64: configurable sparsemem section size
On Wed, Apr 24, 2019 at 12:54 PM Pavel Tatashin wrote: > > from original email > > On Wed, Apr 24, 2019 at 3:48 PM Pavel Tatashin > wrote: > > > > On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual > > wrote: > > > > > > On 04/24/2019 02:08 AM, Pavel Tatashin wrote: > > > > sparsemem section size determines the maximum size and alignment that > > > > is allowed to offline/online memory block. The bigger the size the less > > > > the clutter in /sys/devices/system/memory/*. On the other hand, however, > > > > there is less flexability in what granules of memory can be added and > > > > removed. > > > > > > Is there any scenario where less than a 1GB needs to be added on arm64 ? > > > > Yes, DAX hotplug loses 1G of memory without allowing smaller sections. > > Machines on which we are going to be using this functionality have 8G > > of System RAM, therefore losing 1G is a big problem. > > > > For details about using scenario see this cover letter: > > https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/ > > > > > > > > > > > > > Recently, it was enabled in Linux to hotadd persistent memory that > > > > can be either real NV device, or reserved from regular System RAM > > > > and has identity of devdax. > > > > > > devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet. > > > > Correct, I use your patches to enable ZONE_DEVICE, and thus devdax on > > ARM64: > > https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/ > > > > > > > > > > > > > The problem is that because ARM64's section size is 1G, and devdax must > > > > have 2M label section, the first 1G is always missed when device is > > > > attached, because it is not 1G aligned. > > > > > > devdax has to be 2M aligned ? Does Linux enforce that right now ? > > > > Unfortunately, there is no way around this. Part of the memory can be > > reserved as persistent memory via device tree. > > memory@4000 { > > device_type = "memory"; > > reg = < 0x 0x4000 > > 0x0002 0x >; > > }; > > > > pmem@1c000 { > > compatible = "pmem-region"; > > reg = <0x0001 0xc000 > >0x 0x8000>; > > volatile; > > numa-node-id = <0>; > > }; > > > > So, while pmem is section aligned, as it should be, the dax device is > > going to be pmem start address + label size, which is 2M. The actual > > DAX device starts at: > > 0x1c000 + 2M. > > > > Because section size is 1G, the hotplug will able to add only memory > > starting from > > 0x1c000 + 1G This is yet another example of where we need to break down the section alignment requirement for arch_add_memory(). https://lore.kernel.org/lkml/12633539.2015392.2477781120122237934.st...@dwillia2-desk3.amr.corp.intel.com/ ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] arm64: configurable sparsemem section size
from original email On Wed, Apr 24, 2019 at 3:48 PM Pavel Tatashin wrote: > > On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual > wrote: > > > > On 04/24/2019 02:08 AM, Pavel Tatashin wrote: > > > sparsemem section size determines the maximum size and alignment that > > > is allowed to offline/online memory block. The bigger the size the less > > > the clutter in /sys/devices/system/memory/*. On the other hand, however, > > > there is less flexability in what granules of memory can be added and > > > removed. > > > > Is there any scenario where less than a 1GB needs to be added on arm64 ? > > Yes, DAX hotplug loses 1G of memory without allowing smaller sections. > Machines on which we are going to be using this functionality have 8G > of System RAM, therefore losing 1G is a big problem. > > For details about using scenario see this cover letter: > https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/ > > > > > > > > > Recently, it was enabled in Linux to hotadd persistent memory that > > > can be either real NV device, or reserved from regular System RAM > > > and has identity of devdax. > > > > devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet. > > Correct, I use your patches to enable ZONE_DEVICE, and thus devdax on ARM64: > https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/ > > > > > > > > > The problem is that because ARM64's section size is 1G, and devdax must > > > have 2M label section, the first 1G is always missed when device is > > > attached, because it is not 1G aligned. > > > > devdax has to be 2M aligned ? Does Linux enforce that right now ? > > Unfortunately, there is no way around this. Part of the memory can be > reserved as persistent memory via device tree. > memory@4000 { > device_type = "memory"; > reg = < 0x 0x4000 > 0x0002 0x >; > }; > > pmem@1c000 { > compatible = "pmem-region"; > reg = <0x0001 0xc000 >0x 0x8000>; > volatile; > numa-node-id = <0>; > }; > > So, while pmem is section aligned, as it should be, the dax device is > going to be pmem start address + label size, which is 2M. The actual > DAX device starts at: > 0x1c000 + 2M. > > Because section size is 1G, the hotplug will able to add only memory > starting from > 0x1c000 + 1G > > > 27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and > > SECTION_SIZE mismatch. > > Can you please elaborate what configs are you using? I have no > problems compiling with 27 and 28 bit. > > Thank you, > Pasha ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] arm64: configurable sparsemem section size
On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual wrote: > > On 04/24/2019 02:08 AM, Pavel Tatashin wrote: > > sparsemem section size determines the maximum size and alignment that > > is allowed to offline/online memory block. The bigger the size the less > > the clutter in /sys/devices/system/memory/*. On the other hand, however, > > there is less flexability in what granules of memory can be added and > > removed. > > Is there any scenario where less than a 1GB needs to be added on arm64 ? Yes, DAX hotplug loses 1G of memory without allowing smaller sections. Machines on which we are going to be using this functionality have 8G of System RAM, therefore losing 1G is a big problem. For details about using scenario see this cover letter: https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/ > > > > > Recently, it was enabled in Linux to hotadd persistent memory that > > can be either real NV device, or reserved from regular System RAM > > and has identity of devdax. > > devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet. Correct, I use your patches to enable ZONE_DEVICE, and thus devdax on ARM64: https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/ > > > > > The problem is that because ARM64's section size is 1G, and devdax must > > have 2M label section, the first 1G is always missed when device is > > attached, because it is not 1G aligned. > > devdax has to be 2M aligned ? Does Linux enforce that right now ? Unfortunately, there is no way around this. Part of the memory can be reserved as persistent memory via device tree. memory@4000 { device_type = "memory"; reg = < 0x 0x4000 0x0002 0x >; }; pmem@1c000 { compatible = "pmem-region"; reg = <0x0001 0xc000 0x 0x8000>; volatile; numa-node-id = <0>; }; So, while pmem is section aligned, as it should be, the dax device is going to be pmem start address + label size, which is 2M. The actual DAX device starts at: 0x1c000 + 2M. Because section size is 1G, the hotplug will able to add only memory starting from 0x1c000 + 1G > 27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and > SECTION_SIZE mismatch. Can you please elaborate what configs are you using? I have no problems compiling with 27 and 28 bit. Thank you, Pasha ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v1 11/17] kunit: test: add test managed resource tests
On Thu, Apr 04, 2019 at 03:06:46PM -0700, Brendan Higgins wrote: > From: Avinash Kondareddy > > Tests how tests interact with test managed resources in their lifetime. > > Signed-off-by: Avinash Kondareddy > Signed-off-by: Brendan Higgins > --- > kunit/test-test.c | 122 ++ > 1 file changed, 122 insertions(+) > > diff --git a/kunit/test-test.c b/kunit/test-test.c > index 4bd7a34d0a6cb..54add8ca418a0 100644 > --- a/kunit/test-test.c > +++ b/kunit/test-test.c > @@ -135,3 +135,125 @@ static struct kunit_module kunit_try_catch_test_module > = { > .test_cases = kunit_try_catch_test_cases, > }; > module_test(kunit_try_catch_test_module); > + > +/* > + * Context for testing test managed resources > + * is_resource_initialized is used to test arbitrary resources > + */ > +struct kunit_test_resource_context { > + struct kunit test; > + bool is_resource_initialized; > +}; > + > +static int fake_resource_init(struct kunit_resource *res, void *context) > +{ > + struct kunit_test_resource_context *ctx = context; > + > + res->allocation = >is_resource_initialized; > + ctx->is_resource_initialized = true; > + return 0; > +} > + > +static void fake_resource_free(struct kunit_resource *res) > +{ > + bool *is_resource_initialized = res->allocation; > + > + *is_resource_initialized = false; > +} > + > +static void kunit_resource_test_init_resources(struct kunit *test) > +{ > + struct kunit_test_resource_context *ctx = test->priv; > + > + kunit_init_test(>test, "testing_test_init_test"); > + > + KUNIT_EXPECT_TRUE(test, list_empty(>test.resources)); > +} > + > +static void kunit_resource_test_alloc_resource(struct kunit *test) > +{ > + struct kunit_test_resource_context *ctx = test->priv; > + struct kunit_resource *res; > + kunit_resource_free_t free = fake_resource_free; > + > + res = kunit_alloc_resource(>test, > +fake_resource_init, > +fake_resource_free, > +ctx); > + > + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, res); KUNIT_EXPECT_NOT_ERR_OR_NULL(test, res); Thanks! Masa ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v1 14/17] Documentation: kunit: add documentation for KUnit
Hi Brendan, KUNIT_ASSERT_NOT_ERR_OR_NULL() should be replaced to KUNIT_EXPECT_NOT_ERR_OR_NULL(), right? On Thu, Apr 04, 2019 at 03:06:49PM -0700, Brendan Higgins wrote: > Add documentation for KUnit, the Linux kernel unit testing framework. > - Add intro and usage guide for KUnit > - Add API reference > > Signed-off-by: Felix Guo > Signed-off-by: Brendan Higgins > --- > diff --git a/Documentation/kunit/start.rst b/Documentation/kunit/start.rst > new file mode 100644 > index 0..5cdba5091905e > --- /dev/null > +++ b/Documentation/kunit/start.rst > +Assertions > +~~ > + > +KUnit also has the concept of an *assertion*. An assertion is just like an > +expectation except the assertion immediately terminates the test case if it > is > +not satisfied. > + > +For example: > + > +.. code-block:: c > + > + static void mock_test_do_expect_default_return(struct kunit *test) > + { > + struct mock_test_context *ctx = test->priv; > + struct mock *mock = ctx->mock; > + int param0 = 5, param1 = -5; > + const char *two_param_types[] = {"int", "int"}; > + const void *two_params[] = {, }; > + const void *ret; > + > + ret = mock->do_expect(mock, > + "test_printk", test_printk, > + two_param_types, two_params, > + ARRAY_SIZE(two_params)); > + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ret); KUNIT_EXPECT_NOT_ERR_OR_NULL(test, ret); > + KUNIT_EXPECT_EQ(test, -4, *((int *) ret)); > + } > + > +In this example, the method under test should return a pointer to a value, so > +if the pointer returned by the method is null or an errno, we don't want to > +bother continuing the test since the following expectation could crash the > test > +case. `ASSERT_NOT_ERR_OR_NULL(...)` allows us to bail out of the test case if +case. `KUNIT_EXPECT_NOT_ERR_OR_NULL(...)` allows us to bail out of the test case if Thanks! Masa ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox wrote: > > On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote: > > I think unaligned addresses have always been passed to > > vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think* > > the only change needed is the following, thoughts? > > > > diff --git a/fs/dax.c b/fs/dax.c > > index ca0671d55aa6..82aee9a87efa 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct > > vm_fault *vmf, pfn_t *pfnp, > > } > > > > trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, > > entry); > > - result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, > > pfn, > > + result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn, > > write); > > We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does > that need to change too? It wasn't clear to me that it was a problem. I think that one already happens to be pmd-aligned. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory()
On Tue, Apr 23, 2019 at 2:21 PM David Hildenbrand wrote: > > On 17.04.19 20:39, Dan Williams wrote: > > Teach the arch_remove_memory() path to consult the same 'struct > > mhp_restrictions' context as was specified at arch_add_memory() time. > > > > No functional change, this is a preparation step for teaching > > __remove_pages() about how and when to allow sub-section hot-remove, and > > a cleanup for an unnecessary "is_dev_zone()" special case. > > I am not yet sure if this is the right thing to do. When adding memory, > we obviously have to specify the "how". When removing memory, we usually > should be able to look such stuff up. True, the implementation can just use find_memory_block(), and no need to plumb this flag. > > > > void __remove_pages(struct zone *zone, unsigned long phys_start_pfn, > > - unsigned long nr_pages, struct vmem_altmap *altmap) > > + unsigned long nr_pages, struct mhp_restrictions *restrictions) > > { > > unsigned long i; > > - unsigned long map_offset = 0; > > int sections_to_remove; > > + unsigned long map_offset = 0; > > + struct vmem_altmap *altmap = restrictions->altmap; > > > > - /* In the ZONE_DEVICE case device driver owns the memory region */ > > - if (is_dev_zone(zone)) { > > - if (altmap) > > - map_offset = vmem_altmap_offset(altmap); > > - } > > + if (altmap) > > + map_offset = vmem_altmap_offset(altmap); > > > > Why weren't we able to use this exact same hunk before? (after my > resource deletion cleanup of course) > > IOW, do we really need struct mhp_restrictions here? We don't need it. It was only the memblock info why I added the "restrictions" argument. > After I factor out memory device handling into the caller of > arch_remove_memory(), also the next patch ("mm/sparsemem: Prepare for > sub-section ranges") should no longer need it. Or am I missing something? That patch is still needed for the places where it adds the @nr_pages argument, but the mhp_restrictions related bits can be dropped. The subsection_check() helper needs to be refactored a bit to not rely on mhp_restrictions. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote: > I think unaligned addresses have always been passed to > vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think* > the only change needed is the following, thoughts? > > diff --git a/fs/dax.c b/fs/dax.c > index ca0671d55aa6..82aee9a87efa 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct > vm_fault *vmf, pfn_t *pfnp, > } > > trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, > entry); > - result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn, > + result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn, > write); We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does that need to change too? ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()
On Tue, Apr 2, 2019 at 4:51 AM Aneesh Kumar K.V wrote: > > With some architectures like ppc64, set_pmd_at() cannot cope with > a situation where there is already some (different) valid entry present. > > Use pmdp_set_access_flags() instead to modify the pfn which is built to > deal with modifying existing PMD entries. > > This is similar to > commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by > insert_pfn()") > > We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support > pud pfn entries now. > > Without this patch we also see the below message in kernel log > "BUG: non-zero pgtables_bytes on freeing mm:" > > CC: sta...@vger.kernel.org > Reported-by: Chandan Rajendra > Signed-off-by: Aneesh Kumar K.V > --- > Changes from v1: > * Fix the pgtable leak > > mm/huge_memory.c | 36 > 1 file changed, 36 insertions(+) This patch is triggering the following bug in v4.19.35. kernel BUG at arch/x86/mm/pgtable.c:515! invalid opcode: [#1] SMP NOPTI CPU: 51 PID: 43713 Comm: java Tainted: G OE 4.19.35 #1 RIP: 0010:pmdp_set_access_flags+0x48/0x50 [..] Call Trace: vmf_insert_pfn_pmd+0x198/0x350 dax_iomap_fault+0xe82/0x1190 ext4_dax_huge_fault+0x103/0x1f0 ? __switch_to_asm+0x40/0x70 __handle_mm_fault+0x3f6/0x1370 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 handle_mm_fault+0xda/0x200 __do_page_fault+0x249/0x4f0 do_page_fault+0x32/0x110 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 I asked the reporter to try a kernel with commit c6f3c5ee40c1 "mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()" reverted and the failing test passed. I think unaligned addresses have always been passed to vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think* the only change needed is the following, thoughts? diff --git a/fs/dax.c b/fs/dax.c index ca0671d55aa6..82aee9a87efa 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, } trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry); - result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn, + result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn, write); break; case IOMAP_UNWRITTEN: I'll ask the reporter to try this fix as well. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
急件:如何引入项目管理的方法,培养合格的项目管理人才
原邮件信息 - 发件人:急件 收件人:linux-nvdimm 发送时间:2019-4-24 23:36:47 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] arm64: configurable sparsemem section size
On 04/24/2019 02:08 AM, Pavel Tatashin wrote: > sparsemem section size determines the maximum size and alignment that > is allowed to offline/online memory block. The bigger the size the less > the clutter in /sys/devices/system/memory/*. On the other hand, however, > there is less flexability in what granules of memory can be added and > removed. Is there any scenario where less than a 1GB needs to be added on arm64 ? > > Recently, it was enabled in Linux to hotadd persistent memory that > can be either real NV device, or reserved from regular System RAM > and has identity of devdax. devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet. > > The problem is that because ARM64's section size is 1G, and devdax must > have 2M label section, the first 1G is always missed when device is > attached, because it is not 1G aligned. devdax has to be 2M aligned ? Does Linux enforce that right now ? > > Allow, better flexibility by making section size configurable. Unless 2M is being enforced from Linux not sure why this is necessary at the moment. > > Signed-off-by: Pavel Tatashin > --- > arch/arm64/Kconfig | 10 ++ > arch/arm64/include/asm/sparsemem.h | 2 +- > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index b5d8cf57e220..a0c5b9d13a7f 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -801,6 +801,16 @@ config ARM64_PA_BITS > default 48 if ARM64_PA_BITS_48 > default 52 if ARM64_PA_BITS_52 > > +config ARM64_SECTION_SIZE_BITS > + int "sparsemem section size shift" > + range 27 30 27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and SECTION_SIZE mismatch. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm