Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()

2019-04-24 Thread Dan Williams
On Wed, Apr 24, 2019 at 6:37 PM Aneesh Kumar K.V
 wrote:
>
> On 4/24/19 11:43 PM, Dan Williams wrote:
> > On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox  wrote:
> >>
> >> On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote:
> >>> I think unaligned addresses have always been passed to
> >>> vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
> >>> the only change needed is the following, thoughts?
> >>>
> >>> diff --git a/fs/dax.c b/fs/dax.c
> >>> index ca0671d55aa6..82aee9a87efa 100644
> >>> --- a/fs/dax.c
> >>> +++ b/fs/dax.c
> >>> @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> >>> vm_fault *vmf, pfn_t *pfnp,
> >>>  }
> >>>
> >>>  trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, 
> >>> entry);
> >>> -   result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, 
> >>> pfn,
> >>> +   result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
> >>>  write);
> >>
> >> We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does
> >> that need to change too?
> >
> > It wasn't clear to me that it was a problem. I think that one already
> > happens to be pmd-aligned.
> >
>
> How about vmf_insert_pfn_pud()?

That is currently not used by fsdax, only devdax, but it does seem
that it passes the unaligned fault address rather than the pud aligned
address. I'll add that to the proposed fix.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Message could not be delivered

2019-04-24 Thread Bounced mail



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] arm64: configurable sparsemem section size

2019-04-24 Thread Anshuman Khandual



On 04/25/2019 01:18 AM, Pavel Tatashin wrote:
> On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual
>  wrote:
>>
>> On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
>>> sparsemem section size determines the maximum size and alignment that
>>> is allowed to offline/online memory block. The bigger the size the less
>>> the clutter in /sys/devices/system/memory/*. On the other hand, however,
>>> there is less flexability in what granules of memory can be added and
>>> removed.
>>
>> Is there any scenario where less than a 1GB needs to be added on arm64 ?
> 
> Yes, DAX hotplug loses 1G of memory without allowing smaller sections.
> Machines on which we are going to be using this functionality have 8G
> of System RAM, therefore losing 1G is a big problem.
> 
> For details about using scenario see this cover letter:
> https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/

Its loosing 1GB because devdax has 2M alignment ? IIRC from Dan's subsection 
memory
hot add series 2M comes from persistent memory HW controller's limitations. 
Does that
limitation applicable across all platforms including arm64 for all possible 
persistent
memory vendors. I mean is it universal ? IIUC subsection memory hot plug series 
is
still getting reviewed. Hence should not we wait for it to get merged before 
enabling
applicable platforms to accommodate these 2M limitations.

> 
>>
>>>
>>> Recently, it was enabled in Linux to hotadd persistent memory that
>>> can be either real NV device, or reserved from regular System RAM
>>> and has identity of devdax.
>>
>> devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet.
> 
> Correct, I use your patches to enable ZONE_DEVICE, and  thus devdax on ARM64:
> https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/
> 
>>
>>>
>>> The problem is that because ARM64's section size is 1G, and devdax must
>>> have 2M label section, the first 1G is always missed when device is
>>> attached, because it is not 1G aligned.
>>
>> devdax has to be 2M aligned ? Does Linux enforce that right now ?
> 
> Unfortunately, there is no way around this. Part of the memory can be
> reserved as persistent memory via device tree.
> memory@4000 {
> device_type = "memory";
> reg = < 0x 0x4000
> 0x0002 0x >;
> };
> 
> pmem@1c000 {
> compatible = "pmem-region";
> reg = <0x0001 0xc000
>0x 0x8000>;
> volatile;
> numa-node-id = <0>;
> };
> 
> So, while pmem is section aligned, as it should be, the dax device is
> going to be pmem start address + label size, which is 2M. The actual

Forgive my ignorance here but why dax device label size is 2M aligned. Again is 
that
because of some persistent memory HW controller limitations ?

> DAX device starts at:
> 0x1c000 + 2M.
> 
> Because section size is 1G, the hotplug will able to add only memory
> starting from
> 0x1c000 + 1G

Got it but as mentioned before we will have to make sure that 2M alignment 
requirement
is universal else we will be adjusting this multiple times.

> 
>> 27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and
>> SECTION_SIZE mismatch.

Even with 27 bits its 128 MB section size. How does it solve the problem with 
2M ?
The patch just wanted to reduce the memory wastage ?

> 
> Can you please elaborate what configs are you using? I have no
> problems compiling with 27 and 28 bit.

After applying your patch [1] on current mainline kernel [2].

$make defconfig

CONFIG_ARM64_64K_PAGES=y
CONFIG_ARM64_VA_BITS_48=y
CONFIG_ARM64_VA_BITS=48
CONFIG_ARM64_PA_BITS_48=y
CONFIG_ARM64_PA_BITS=48
CONFIG_ARM64_SECTION_SIZE_BITS=27

[1] https://patchwork.kernel.org/patch/10913737/
[2] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

It fails with

  CC  arch/arm64/kernel/asm-offsets.s
In file included from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from ./include/linux/resource_ext.h:19,
 from ./include/linux/acpi.h:26,
 from ./include/acpi/apei.h:9,
 from ./include/acpi/ghes.h:5,
 from ./include/linux/arm_sdei.h:14,
 from arch/arm64/kernel/asm-offsets.c:21:
./include/linux/mmzone.h:1095:2: error: #error Allocator MAX_ORDER exceeds 
SECTION_SIZE
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()

2019-04-24 Thread Aneesh Kumar K.V

On 4/24/19 11:43 PM, Dan Williams wrote:

On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox  wrote:


On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote:

I think unaligned addresses have always been passed to
vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
the only change needed is the following, thoughts?

diff --git a/fs/dax.c b/fs/dax.c
index ca0671d55aa6..82aee9a87efa 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
vm_fault *vmf, pfn_t *pfnp,
 }

 trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
-   result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
+   result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
 write);


We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does
that need to change too?


It wasn't clear to me that it was a problem. I think that one already
happens to be pmd-aligned.



How about vmf_insert_pfn_pud()?

-aneesh

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-24 Thread Pavel Tatashin
> > > +static int
> > > +offline_memblock_cb(struct memory_block *mem, void *arg)
> >
> > Function name suggests that you are actually trying to offline memory
> > here. Maybe check_memblocks_offline_cb(), just like we have in
> > mm/memory_hotplug.c.

Makes sense, I will rename to check_memblocks_offline_cb()

> > > + lock_device_hotplug();
> > > + rc = walk_memory_range(start_pfn, end_pfn, dev, 
> > > offline_memblock_cb);
> > > + unlock_device_hotplug();
> > > +
> > > + /*
> > > +  * If admin has not offlined memory beforehand, we cannot hotremove 
> > > dax.
> > > +  * Unfortunately, because unbind will still succeed there is no way 
> > > for
> > > +  * user to hotremove dax after this.
> > > +  */
> > > + if (rc)
> > > + return rc;
> >
> > Can't it happen that there is a race between you checking if memory is
> > offline and an admin onlining memory again? maybe pull the
> > remove_memory() into the locked region, using __remove_memory() instead.
>
> I think the race is ok. The admin gets to keep the pieces of allowing
> racing updates to the state and the kernel will keep the range active
> until the next reboot.

Thank you for noticing this. I will pull it into locking region.
Because, __remove_memory() has this code:

1868   ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
1869   check_memblock_offlined_cb);
1870   if (ret)
1871  BUG();

Basically, panic if at least something is not offlined.

Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-24 Thread Dan Williams
On Wed, Apr 24, 2019 at 1:55 PM David Hildenbrand  wrote:
>
> On 21.04.19 03:44, Pavel Tatashin wrote:
> > It is now allowed to use persistent memory like a regular RAM, but
> > currently there is no way to remove this memory until machine is
> > rebooted.
> >
> > This work expands the functionality to also allows hotremoving
> > previously hotplugged persistent memory, and recover the device for use
> > for other purposes.
> >
> > To hotremove persistent memory, the management software must first
> > offline all memory blocks of dax region, and than unbind it from
> > device-dax/kmem driver. So, operations should look like this:
> >
> > echo offline > echo offline > /sys/devices/system/memory/memoryN/state
> > ...
> > echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
> >
> > Note: if unbind is done without offlining memory beforehand, it won't be
> > possible to do dax0.0 hotremove, and dax's memory is going to be part of
> > System RAM until reboot.
> >
> > Signed-off-by: Pavel Tatashin 
> > ---
> >  drivers/dax/dax-private.h |  2 +
> >  drivers/dax/kmem.c| 91 +--
> >  2 files changed, 89 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> > index a45612148ca0..999aaf3a29b3 100644
> > --- a/drivers/dax/dax-private.h
> > +++ b/drivers/dax/dax-private.h
> > @@ -53,6 +53,7 @@ struct dax_region {
> >   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
> >   * @ref: pgmap reference count (driver owned)
> >   * @cmp: @ref final put completion (driver owned)
> > + * @dax_mem_res: physical address range of hotadded DAX memory
> >   */
> >  struct dev_dax {
> >   struct dax_region *region;
> > @@ -62,6 +63,7 @@ struct dev_dax {
> >   struct dev_pagemap pgmap;
> >   struct percpu_ref ref;
> >   struct completion cmp;
> > + struct resource *dax_kmem_res;
> >  };
> >
> >  static inline struct dev_dax *to_dev_dax(struct device *dev)
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index 4c0131857133..d4896b281036 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -71,21 +71,104 @@ int dev_dax_kmem_probe(struct device *dev)
> >   kfree(new_res);
> >   return rc;
> >   }
> > + dev_dax->dax_kmem_res = new_res;
> >
> >   return 0;
> >  }
> >
> > +#ifdef CONFIG_MEMORY_HOTREMOVE
> > +/*
> > + * Check that device-dax's memory_blocks are offline. If a memory_block is 
> > not
> > + * offline a warning is printed and an error is returned. dax hotremove can
> > + * succeed only when every memory_block is offlined beforehand.
> > + */
> > +static int
> > +offline_memblock_cb(struct memory_block *mem, void *arg)
>
> Function name suggests that you are actually trying to offline memory
> here. Maybe check_memblocks_offline_cb(), just like we have in
> mm/memory_hotplug.c.
>
> > +{
> > + struct device *mem_dev = >dev;
> > + bool is_offline;
> > +
> > + device_lock(mem_dev);
> > + is_offline = mem_dev->offline;
> > + device_unlock(mem_dev);
> > +
> > + if (!is_offline) {
> > + struct device *dev = (struct device *)arg;
> > + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> > + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr);
> > + phys_addr_t spa = spfn << PAGE_SHIFT;
> > + phys_addr_t epa = epfn << PAGE_SHIFT;
> > +
> > + dev_warn(dev, "memory block [%pa-%pa] is not offline\n",
> > +  , );
> > +
> > + return -EBUSY;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int dev_dax_kmem_remove(struct device *dev)
> > +{
> > + struct dev_dax *dev_dax = to_dev_dax(dev);
> > + struct resource *res = dev_dax->dax_kmem_res;
> > + resource_size_t kmem_start;
> > + resource_size_t kmem_size;
> > + unsigned long start_pfn;
> > + unsigned long end_pfn;
> > + int rc;
> > +
> > + /*
> > +  * dax kmem resource does not exist, means memory was never 
> > hotplugged.
> > +  * So, nothing to do here.
> > +  */
> > + if (!res)
> > + return 0;
> > +
> > + kmem_start = res->start;
> > + kmem_size = resource_size(res);
> > + start_pfn = kmem_start >> PAGE_SHIFT;
> > + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> > +
> > + /*
> > +  * Walk and check that every singe memory_block of dax region is
> > +  * offline
> > +  */
> > + lock_device_hotplug();
> > + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb);
> > + unlock_device_hotplug();
> > +
> > + /*
> > +  * If admin has not offlined memory beforehand, we cannot hotremove 
> > dax.
> > +  * Unfortunately, because unbind will still succeed there is no way 
> > for
> > +  * user to hotremove dax after this.
> > +  */
> > + if (rc)
> > + return rc;
>
> Can't it happen that there is 

Re: [v2 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

2019-04-24 Thread David Hildenbrand
On 21.04.19 03:44, Pavel Tatashin wrote:
> It is now allowed to use persistent memory like a regular RAM, but
> currently there is no way to remove this memory until machine is
> rebooted.
> 
> This work expands the functionality to also allows hotremoving
> previously hotplugged persistent memory, and recover the device for use
> for other purposes.
> 
> To hotremove persistent memory, the management software must first
> offline all memory blocks of dax region, and than unbind it from
> device-dax/kmem driver. So, operations should look like this:
> 
> echo offline > echo offline > /sys/devices/system/memory/memoryN/state
> ...
> echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
> 
> Note: if unbind is done without offlining memory beforehand, it won't be
> possible to do dax0.0 hotremove, and dax's memory is going to be part of
> System RAM until reboot.
> 
> Signed-off-by: Pavel Tatashin 
> ---
>  drivers/dax/dax-private.h |  2 +
>  drivers/dax/kmem.c| 91 +--
>  2 files changed, 89 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index a45612148ca0..999aaf3a29b3 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -53,6 +53,7 @@ struct dax_region {
>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
>   * @ref: pgmap reference count (driver owned)
>   * @cmp: @ref final put completion (driver owned)
> + * @dax_mem_res: physical address range of hotadded DAX memory
>   */
>  struct dev_dax {
>   struct dax_region *region;
> @@ -62,6 +63,7 @@ struct dev_dax {
>   struct dev_pagemap pgmap;
>   struct percpu_ref ref;
>   struct completion cmp;
> + struct resource *dax_kmem_res;
>  };
>  
>  static inline struct dev_dax *to_dev_dax(struct device *dev)
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 4c0131857133..d4896b281036 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -71,21 +71,104 @@ int dev_dax_kmem_probe(struct device *dev)
>   kfree(new_res);
>   return rc;
>   }
> + dev_dax->dax_kmem_res = new_res;
>  
>   return 0;
>  }
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/*
> + * Check that device-dax's memory_blocks are offline. If a memory_block is 
> not
> + * offline a warning is printed and an error is returned. dax hotremove can
> + * succeed only when every memory_block is offlined beforehand.
> + */
> +static int
> +offline_memblock_cb(struct memory_block *mem, void *arg)

Function name suggests that you are actually trying to offline memory
here. Maybe check_memblocks_offline_cb(), just like we have in
mm/memory_hotplug.c.

> +{
> + struct device *mem_dev = >dev;
> + bool is_offline;
> +
> + device_lock(mem_dev);
> + is_offline = mem_dev->offline;
> + device_unlock(mem_dev);
> +
> + if (!is_offline) {
> + struct device *dev = (struct device *)arg;
> + unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> + unsigned long epfn = section_nr_to_pfn(mem->end_section_nr);
> + phys_addr_t spa = spfn << PAGE_SHIFT;
> + phys_addr_t epa = epfn << PAGE_SHIFT;
> +
> + dev_warn(dev, "memory block [%pa-%pa] is not offline\n",
> +  , );
> +
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> +static int dev_dax_kmem_remove(struct device *dev)
> +{
> + struct dev_dax *dev_dax = to_dev_dax(dev);
> + struct resource *res = dev_dax->dax_kmem_res;
> + resource_size_t kmem_start;
> + resource_size_t kmem_size;
> + unsigned long start_pfn;
> + unsigned long end_pfn;
> + int rc;
> +
> + /*
> +  * dax kmem resource does not exist, means memory was never hotplugged.
> +  * So, nothing to do here.
> +  */
> + if (!res)
> + return 0;
> +
> + kmem_start = res->start;
> + kmem_size = resource_size(res);
> + start_pfn = kmem_start >> PAGE_SHIFT;
> + end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> +
> + /*
> +  * Walk and check that every singe memory_block of dax region is
> +  * offline
> +  */
> + lock_device_hotplug();
> + rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb);
> + unlock_device_hotplug();
> +
> + /*
> +  * If admin has not offlined memory beforehand, we cannot hotremove dax.
> +  * Unfortunately, because unbind will still succeed there is no way for
> +  * user to hotremove dax after this.
> +  */
> + if (rc)
> + return rc;

Can't it happen that there is a race between you checking if memory is
offline and an admin onlining memory again? maybe pull the
remove_memory() into the locked region, using __remove_memory() instead.

> +
> + /* Hotremove memory, cannot fail because memory is already offlined */
> + remove_memory(dev_dax->target_node, kmem_start, kmem_size);
> +
> +   

Re: [PATCH v6 00/12] mm: Sub-section memory hotplug support

2019-04-24 Thread Pavel Tatashin
I am also taking a look at this work now. I will review and test it in
the next couple of days.

Pasha

On Tue, Apr 23, 2019 at 9:17 AM Oscar Salvador  wrote:
>
> On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote:
> > On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton  > org> wrote:
> > >
> > > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams  > > el.com> wrote:
> > >
> > > > The memory hotplug section is an arbitrary / convenient unit for
> > > > memory
> > > > hotplug. 'Section-size' units have bled into the user interface
> > > > ('memblock' sysfs) and can not be changed without breaking
> > > > existing
> > > > userspace. The section-size constraint, while mostly benign for
> > > > typical
> > > > memory hotplug, has and continues to wreak havoc with 'device-
> > > > memory'
> > > > use cases, persistent memory (pmem) in particular. Recall that
> > > > pmem uses
> > > > devm_memremap_pages(), and subsequently arch_add_memory(), to
> > > > allocate a
> > > > 'struct page' memmap for pmem. However, it does not use the
> > > > 'bottom
> > > > half' of memory hotplug, i.e. never marks pmem pages online and
> > > > never
> > > > exposes the userspace memblock interface for pmem. This leaves an
> > > > opening to redress the section-size constraint.
> > >
> > > v6 and we're not showing any review activity.  Who would be
> > > suitable
> > > people to help out here?
> >
> > There was quite a bit of review of the cover letter from Michal and
> > David, but you're right the details not so much as of yet. I'd like
> > to
> > call out other people where I can reciprocate with some review of my
> > own. Oscar's altmap work looks like a good candidate for that.
>
> Thanks Dan for ccing me.
> I will take a look at the patches soon.
>
> --
> Oscar Salvador
> SUSE L3
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] arm64: configurable sparsemem section size

2019-04-24 Thread Pavel Tatashin
> This is yet another example of where we need to break down the section
> alignment requirement for arch_add_memory().
>
> https://lore.kernel.org/lkml/12633539.2015392.2477781120122237934.st...@dwillia2-desk3.amr.corp.intel.com/

Hi Dan,

Yes, that is exactly what I am trying to solve with this patch. I will
test if your series works with ARM64, and if does not I will let you
know what is broken. But, I think, this patch is not needed if your
patches are accepted into mainline.

Thank you,
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] arm64: configurable sparsemem section size

2019-04-24 Thread Dan Williams
On Wed, Apr 24, 2019 at 12:54 PM Pavel Tatashin
 wrote:
>
>  from original email
>
> On Wed, Apr 24, 2019 at 3:48 PM Pavel Tatashin
>  wrote:
> >
> > On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual
> >  wrote:
> > >
> > > On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
> > > > sparsemem section size determines the maximum size and alignment that
> > > > is allowed to offline/online memory block. The bigger the size the less
> > > > the clutter in /sys/devices/system/memory/*. On the other hand, however,
> > > > there is less flexability in what granules of memory can be added and
> > > > removed.
> > >
> > > Is there any scenario where less than a 1GB needs to be added on arm64 ?
> >
> > Yes, DAX hotplug loses 1G of memory without allowing smaller sections.
> > Machines on which we are going to be using this functionality have 8G
> > of System RAM, therefore losing 1G is a big problem.
> >
> > For details about using scenario see this cover letter:
> > https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/
> >
> > >
> > > >
> > > > Recently, it was enabled in Linux to hotadd persistent memory that
> > > > can be either real NV device, or reserved from regular System RAM
> > > > and has identity of devdax.
> > >
> > > devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet.
> >
> > Correct, I use your patches to enable ZONE_DEVICE, and  thus devdax on 
> > ARM64:
> > https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/
> >
> > >
> > > >
> > > > The problem is that because ARM64's section size is 1G, and devdax must
> > > > have 2M label section, the first 1G is always missed when device is
> > > > attached, because it is not 1G aligned.
> > >
> > > devdax has to be 2M aligned ? Does Linux enforce that right now ?
> >
> > Unfortunately, there is no way around this. Part of the memory can be
> > reserved as persistent memory via device tree.
> > memory@4000 {
> > device_type = "memory";
> > reg = < 0x 0x4000
> > 0x0002 0x >;
> > };
> >
> > pmem@1c000 {
> > compatible = "pmem-region";
> > reg = <0x0001 0xc000
> >0x 0x8000>;
> > volatile;
> > numa-node-id = <0>;
> > };
> >
> > So, while pmem is section aligned, as it should be, the dax device is
> > going to be pmem start address + label size, which is 2M. The actual
> > DAX device starts at:
> > 0x1c000 + 2M.
> >
> > Because section size is 1G, the hotplug will able to add only memory
> > starting from
> > 0x1c000 + 1G

This is yet another example of where we need to break down the section
alignment requirement for arch_add_memory().

https://lore.kernel.org/lkml/12633539.2015392.2477781120122237934.st...@dwillia2-desk3.amr.corp.intel.com/
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] arm64: configurable sparsemem section size

2019-04-24 Thread Pavel Tatashin
 from original email

On Wed, Apr 24, 2019 at 3:48 PM Pavel Tatashin
 wrote:
>
> On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual
>  wrote:
> >
> > On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
> > > sparsemem section size determines the maximum size and alignment that
> > > is allowed to offline/online memory block. The bigger the size the less
> > > the clutter in /sys/devices/system/memory/*. On the other hand, however,
> > > there is less flexability in what granules of memory can be added and
> > > removed.
> >
> > Is there any scenario where less than a 1GB needs to be added on arm64 ?
>
> Yes, DAX hotplug loses 1G of memory without allowing smaller sections.
> Machines on which we are going to be using this functionality have 8G
> of System RAM, therefore losing 1G is a big problem.
>
> For details about using scenario see this cover letter:
> https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/
>
> >
> > >
> > > Recently, it was enabled in Linux to hotadd persistent memory that
> > > can be either real NV device, or reserved from regular System RAM
> > > and has identity of devdax.
> >
> > devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet.
>
> Correct, I use your patches to enable ZONE_DEVICE, and  thus devdax on ARM64:
> https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/
>
> >
> > >
> > > The problem is that because ARM64's section size is 1G, and devdax must
> > > have 2M label section, the first 1G is always missed when device is
> > > attached, because it is not 1G aligned.
> >
> > devdax has to be 2M aligned ? Does Linux enforce that right now ?
>
> Unfortunately, there is no way around this. Part of the memory can be
> reserved as persistent memory via device tree.
> memory@4000 {
> device_type = "memory";
> reg = < 0x 0x4000
> 0x0002 0x >;
> };
>
> pmem@1c000 {
> compatible = "pmem-region";
> reg = <0x0001 0xc000
>0x 0x8000>;
> volatile;
> numa-node-id = <0>;
> };
>
> So, while pmem is section aligned, as it should be, the dax device is
> going to be pmem start address + label size, which is 2M. The actual
> DAX device starts at:
> 0x1c000 + 2M.
>
> Because section size is 1G, the hotplug will able to add only memory
> starting from
> 0x1c000 + 1G
>
> > 27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and
> > SECTION_SIZE mismatch.
>
> Can you please elaborate what configs are you using? I have no
> problems compiling with 27 and 28 bit.
>
> Thank you,
> Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] arm64: configurable sparsemem section size

2019-04-24 Thread Pavel Tatashin
On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual
 wrote:
>
> On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
> > sparsemem section size determines the maximum size and alignment that
> > is allowed to offline/online memory block. The bigger the size the less
> > the clutter in /sys/devices/system/memory/*. On the other hand, however,
> > there is less flexability in what granules of memory can be added and
> > removed.
>
> Is there any scenario where less than a 1GB needs to be added on arm64 ?

Yes, DAX hotplug loses 1G of memory without allowing smaller sections.
Machines on which we are going to be using this functionality have 8G
of System RAM, therefore losing 1G is a big problem.

For details about using scenario see this cover letter:
https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/

>
> >
> > Recently, it was enabled in Linux to hotadd persistent memory that
> > can be either real NV device, or reserved from regular System RAM
> > and has identity of devdax.
>
> devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet.

Correct, I use your patches to enable ZONE_DEVICE, and  thus devdax on ARM64:
https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/

>
> >
> > The problem is that because ARM64's section size is 1G, and devdax must
> > have 2M label section, the first 1G is always missed when device is
> > attached, because it is not 1G aligned.
>
> devdax has to be 2M aligned ? Does Linux enforce that right now ?

Unfortunately, there is no way around this. Part of the memory can be
reserved as persistent memory via device tree.
memory@4000 {
device_type = "memory";
reg = < 0x 0x4000
0x0002 0x >;
};

pmem@1c000 {
compatible = "pmem-region";
reg = <0x0001 0xc000
   0x 0x8000>;
volatile;
numa-node-id = <0>;
};

So, while pmem is section aligned, as it should be, the dax device is
going to be pmem start address + label size, which is 2M. The actual
DAX device starts at:
0x1c000 + 2M.

Because section size is 1G, the hotplug will able to add only memory
starting from
0x1c000 + 1G

> 27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and
> SECTION_SIZE mismatch.

Can you please elaborate what configs are you using? I have no
problems compiling with 27 and 28 bit.

Thank you,
Pasha
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v1 11/17] kunit: test: add test managed resource tests

2019-04-24 Thread Masayoshi Mizuma
On Thu, Apr 04, 2019 at 03:06:46PM -0700, Brendan Higgins wrote:
> From: Avinash Kondareddy 
> 
> Tests how tests interact with test managed resources in their lifetime.
> 
> Signed-off-by: Avinash Kondareddy 
> Signed-off-by: Brendan Higgins 
> ---
>  kunit/test-test.c | 122 ++
>  1 file changed, 122 insertions(+)
> 
> diff --git a/kunit/test-test.c b/kunit/test-test.c
> index 4bd7a34d0a6cb..54add8ca418a0 100644
> --- a/kunit/test-test.c
> +++ b/kunit/test-test.c
> @@ -135,3 +135,125 @@ static struct kunit_module kunit_try_catch_test_module 
> = {
>   .test_cases = kunit_try_catch_test_cases,
>  };
>  module_test(kunit_try_catch_test_module);
> +
> +/*
> + * Context for testing test managed resources
> + * is_resource_initialized is used to test arbitrary resources
> + */
> +struct kunit_test_resource_context {
> + struct kunit test;
> + bool is_resource_initialized;
> +};
> +
> +static int fake_resource_init(struct kunit_resource *res, void *context)
> +{
> + struct kunit_test_resource_context *ctx = context;
> +
> + res->allocation = >is_resource_initialized;
> + ctx->is_resource_initialized = true;
> + return 0;
> +}
> +
> +static void fake_resource_free(struct kunit_resource *res)
> +{
> + bool *is_resource_initialized = res->allocation;
> +
> + *is_resource_initialized = false;
> +}
> +
> +static void kunit_resource_test_init_resources(struct kunit *test)
> +{
> + struct kunit_test_resource_context *ctx = test->priv;
> +
> + kunit_init_test(>test, "testing_test_init_test");
> +
> + KUNIT_EXPECT_TRUE(test, list_empty(>test.resources));
> +}
> +
> +static void kunit_resource_test_alloc_resource(struct kunit *test)
> +{
> + struct kunit_test_resource_context *ctx = test->priv;
> + struct kunit_resource *res;
> + kunit_resource_free_t free = fake_resource_free;
> +
> + res = kunit_alloc_resource(>test,
> +fake_resource_init,
> +fake_resource_free,
> +ctx);
> +

> + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, res);

KUNIT_EXPECT_NOT_ERR_OR_NULL(test, res);

Thanks!
Masa
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v1 14/17] Documentation: kunit: add documentation for KUnit

2019-04-24 Thread Masayoshi Mizuma
Hi Brendan,

KUNIT_ASSERT_NOT_ERR_OR_NULL() should be replaced to
KUNIT_EXPECT_NOT_ERR_OR_NULL(), right?

On Thu, Apr 04, 2019 at 03:06:49PM -0700, Brendan Higgins wrote:
> Add documentation for KUnit, the Linux kernel unit testing framework.
> - Add intro and usage guide for KUnit
> - Add API reference
> 
> Signed-off-by: Felix Guo 
> Signed-off-by: Brendan Higgins 
> ---



> diff --git a/Documentation/kunit/start.rst b/Documentation/kunit/start.rst
> new file mode 100644
> index 0..5cdba5091905e
> --- /dev/null
> +++ b/Documentation/kunit/start.rst



> +Assertions
> +~~
> +
> +KUnit also has the concept of an *assertion*. An assertion is just like an
> +expectation except the assertion immediately terminates the test case if it 
> is
> +not satisfied.
> +
> +For example:
> +
> +.. code-block:: c
> +
> + static void mock_test_do_expect_default_return(struct kunit *test)
> + {
> + struct mock_test_context *ctx = test->priv;
> + struct mock *mock = ctx->mock;
> + int param0 = 5, param1 = -5;
> + const char *two_param_types[] = {"int", "int"};
> + const void *two_params[] = {, };
> + const void *ret;
> +
> + ret = mock->do_expect(mock,
> +   "test_printk", test_printk,
> +   two_param_types, two_params,
> +   ARRAY_SIZE(two_params));

> + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ret);

KUNIT_EXPECT_NOT_ERR_OR_NULL(test, ret);

> + KUNIT_EXPECT_EQ(test, -4, *((int *) ret));
> + }
> +
> +In this example, the method under test should return a pointer to a value, so
> +if the pointer returned by the method is null or an errno, we don't want to
> +bother continuing the test since the following expectation could crash the 
> test

> +case. `ASSERT_NOT_ERR_OR_NULL(...)` allows us to bail out of the test case if

  +case. `KUNIT_EXPECT_NOT_ERR_OR_NULL(...)` allows us to bail out of the test 
case if

Thanks!
Masa
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()

2019-04-24 Thread Dan Williams
On Wed, Apr 24, 2019 at 10:38 AM Matthew Wilcox  wrote:
>
> On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote:
> > I think unaligned addresses have always been passed to
> > vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
> > the only change needed is the following, thoughts?
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index ca0671d55aa6..82aee9a87efa 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> > vm_fault *vmf, pfn_t *pfnp,
> > }
> >
> > trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, 
> > entry);
> > -   result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, 
> > pfn,
> > +   result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
> > write);
>
> We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does
> that need to change too?

It wasn't clear to me that it was a problem. I think that one already
happens to be pmd-aligned.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 06/12] mm/hotplug: Add mem-hotplug restrictions for remove_memory()

2019-04-24 Thread Dan Williams
On Tue, Apr 23, 2019 at 2:21 PM David Hildenbrand  wrote:
>
> On 17.04.19 20:39, Dan Williams wrote:
> > Teach the arch_remove_memory() path to consult the same 'struct
> > mhp_restrictions' context as was specified at arch_add_memory() time.
> >
> > No functional change, this is a preparation step for teaching
> > __remove_pages() about how and when to allow sub-section hot-remove, and
> > a cleanup for an unnecessary "is_dev_zone()" special case.
>
> I am not yet sure if this is the right thing to do. When adding memory,
> we obviously have to specify the "how". When removing memory, we usually
> should be able to look such stuff up.

True, the implementation can just use find_memory_block(), and no need
to plumb this flag.

>
>
> >  void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> > - unsigned long nr_pages, struct vmem_altmap *altmap)
> > + unsigned long nr_pages, struct mhp_restrictions *restrictions)
> >  {
> >   unsigned long i;
> > - unsigned long map_offset = 0;
> >   int sections_to_remove;
> > + unsigned long map_offset = 0;
> > + struct vmem_altmap *altmap = restrictions->altmap;
> >
> > - /* In the ZONE_DEVICE case device driver owns the memory region */
> > - if (is_dev_zone(zone)) {
> > - if (altmap)
> > - map_offset = vmem_altmap_offset(altmap);
> > - }
> > + if (altmap)
> > + map_offset = vmem_altmap_offset(altmap);
> >
>
> Why weren't we able to use this exact same hunk before? (after my
> resource deletion cleanup of course)
>
> IOW, do we really need struct mhp_restrictions here?

We don't need it. It was only the memblock info why I added the
"restrictions" argument.

> After I factor out memory device handling into the caller of
> arch_remove_memory(), also the next patch ("mm/sparsemem: Prepare for
> sub-section ranges") should no longer need it. Or am I missing something?

That patch is still needed for the places where it adds the @nr_pages
argument, but the mhp_restrictions related bits can be dropped. The
subsection_check() helper needs to be refactored a bit to not rely on
mhp_restrictions.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()

2019-04-24 Thread Matthew Wilcox
On Wed, Apr 24, 2019 at 10:13:15AM -0700, Dan Williams wrote:
> I think unaligned addresses have always been passed to
> vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
> the only change needed is the following, thoughts?
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index ca0671d55aa6..82aee9a87efa 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> vm_fault *vmf, pfn_t *pfnp,
> }
> 
> trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, 
> entry);
> -   result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
> +   result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
> write);

We also call vmf_insert_pfn_pmd() in dax_insert_pfn_mkwrite() -- does
that need to change too?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] mm: Fix modifying of page protection by insert_pfn_pmd()

2019-04-24 Thread Dan Williams
On Tue, Apr 2, 2019 at 4:51 AM Aneesh Kumar K.V
 wrote:
>
> With some architectures like ppc64, set_pmd_at() cannot cope with
> a situation where there is already some (different) valid entry present.
>
> Use pmdp_set_access_flags() instead to modify the pfn which is built to
> deal with modifying existing PMD entries.
>
> This is similar to
> commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by 
> insert_pfn()")
>
> We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support
> pud pfn entries now.
>
> Without this patch we also see the below message in kernel log
> "BUG: non-zero pgtables_bytes on freeing mm:"
>
> CC: sta...@vger.kernel.org
> Reported-by: Chandan Rajendra 
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Changes from v1:
> * Fix the pgtable leak
>
>  mm/huge_memory.c | 36 
>  1 file changed, 36 insertions(+)

This patch is triggering the following bug in v4.19.35.

 kernel BUG at arch/x86/mm/pgtable.c:515!
 invalid opcode:  [#1] SMP NOPTI
 CPU: 51 PID: 43713 Comm: java Tainted: G   OE 4.19.35 #1
 RIP: 0010:pmdp_set_access_flags+0x48/0x50
 [..]
 Call Trace:
  vmf_insert_pfn_pmd+0x198/0x350
  dax_iomap_fault+0xe82/0x1190
  ext4_dax_huge_fault+0x103/0x1f0
  ? __switch_to_asm+0x40/0x70
  __handle_mm_fault+0x3f6/0x1370
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  handle_mm_fault+0xda/0x200
  __do_page_fault+0x249/0x4f0
  do_page_fault+0x32/0x110
  ? page_fault+0x8/0x30
  page_fault+0x1e/0x30

I asked the reporter to try a kernel with commit c6f3c5ee40c1
"mm/huge_memory.c: fix modifying of page protection by
insert_pfn_pmd()" reverted and the failing test passed.

I think unaligned addresses have always been passed to
vmf_insert_pfn_pmd(), but nothing cared until this patch. I *think*
the only change needed is the following, thoughts?

diff --git a/fs/dax.c b/fs/dax.c
index ca0671d55aa6..82aee9a87efa 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1560,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
vm_fault *vmf, pfn_t *pfnp,
}

trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
-   result = vmf_insert_pfn_pmd(vma, vmf->address, vmf->pmd, pfn,
+   result = vmf_insert_pfn_pmd(vma, pmd_addr, vmf->pmd, pfn,
write);
break;
case IOMAP_UNWRITTEN:


I'll ask the reporter to try this fix as well.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


急件:如何引入项目管理的方法,培养合格的项目管理人才

2019-04-24 Thread 急件



 原邮件信息 -
发件人:急件
收件人:linux-nvdimm 
发送时间:2019-4-24  23:36:47
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] arm64: configurable sparsemem section size

2019-04-24 Thread Anshuman Khandual



On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
> sparsemem section size determines the maximum size and alignment that
> is allowed to offline/online memory block. The bigger the size the less
> the clutter in /sys/devices/system/memory/*. On the other hand, however,
> there is less flexability in what granules of memory can be added and
> removed.

Is there any scenario where less than a 1GB needs to be added on arm64 ?

> 
> Recently, it was enabled in Linux to hotadd persistent memory that
> can be either real NV device, or reserved from regular System RAM
> and has identity of devdax.

devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet.

> 
> The problem is that because ARM64's section size is 1G, and devdax must
> have 2M label section, the first 1G is always missed when device is
> attached, because it is not 1G aligned.

devdax has to be 2M aligned ? Does Linux enforce that right now ?

> 
> Allow, better flexibility by making section size configurable.

Unless 2M is being enforced from Linux not sure why this is necessary at
the moment.

> 
> Signed-off-by: Pavel Tatashin 
> ---
>  arch/arm64/Kconfig | 10 ++
>  arch/arm64/include/asm/sparsemem.h |  2 +-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index b5d8cf57e220..a0c5b9d13a7f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -801,6 +801,16 @@ config ARM64_PA_BITS
>   default 48 if ARM64_PA_BITS_48
>   default 52 if ARM64_PA_BITS_52
>  
> +config ARM64_SECTION_SIZE_BITS
> + int "sparsemem section size shift"
> + range 27 30

27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and
SECTION_SIZE mismatch.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm