Re: [PATCH] memremap: move from kernel/ to mm/

2019-07-22 Thread Anshuman Khandual



On 07/22/2019 03:11 PM, Christoph Hellwig wrote:
> memremap.c implements MM functionality for ZONE_DEVICE, so it really
> should be in the mm/ directory, not the kernel/ one.
> 
> Signed-off-by: Christoph Hellwig 

This always made sense.

FWIW

Reviewed-by: Anshuman Khandual 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] mm, memory-failure: clarify error message

2019-05-16 Thread Anshuman Khandual



On 05/17/2019 09:38 AM, Jane Chu wrote:
> Some user who install SIGBUS handler that does longjmp out

What the longjmp about ? Are you referring to the mechanism of catching the
signal which was registered ?

> therefore keeping the process alive is confused by the error
> message
>   "[188988.765862] Memory failure: 0x1840200: Killing
>cellsrv:33395 due to hardware memory corruption"

Its a valid point because those are two distinct actions.

> Slightly modify the error message to improve clarity.
> 
> Signed-off-by: Jane Chu 
> ---
>  mm/memory-failure.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index fc8b517..14de5e2 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -216,10 +216,9 @@ static int kill_proc(struct to_kill *tk, unsigned long 
> pfn, int flags)
>   short addr_lsb = tk->size_shift;
>   int ret;
>  
> - pr_err("Memory failure: %#lx: Killing %s:%d due to hardware memory 
> corruption\n",
> - pfn, t->comm, t->pid);
> -
>   if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
> + pr_err("Memory failure: %#lx: Killing %s:%d due to hardware 
> memory "
> + "corruption\n", pfn, t->comm, t->pid);
>   ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)tk->addr,
>  addr_lsb, current);
>   } else {
> @@ -229,6 +228,8 @@ static int kill_proc(struct to_kill *tk, unsigned long 
> pfn, int flags)
>* This could cause a loop when the user sets SIGBUS
>* to SIG_IGN, but hopefully no one will do that?
>*/
> + pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to 
> hardware "
> + "memory corruption\n", pfn, t->comm, t->pid);
>   ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
> addr_lsb, t);  /* synchronous? */

As both the pr_err() messages are very similar, could not we just switch 
between "Killing"
and "Sending SIGBUS to" based on a variable e.g action_[kill|sigbus] evaluated 
previously
with ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm).
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release

2019-05-13 Thread Anshuman Khandual
On 05/14/2019 08:23 AM, Aneesh Kumar K.V wrote:
> When we initialize the namespace, if we support altmap, we don't initialize 
> all the
> backing struct page where as while releasing the namespace we look at some of
> these uninitilized struct page. This results in a kernel crash as below.
Yes this has been problematic which I have also previously encountered but in a 
bit
different way (while searching memory resources).

> 
> kernel BUG at include/linux/mm.h:1034!
What that would be ? Did not see a corresponding BUG_ON() line in the file.

> cpu 0x2: Vector: 700 (Program Check) at [c0024146b870]
> pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0
> lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0
> sp: c0024146bb00
>msr: 8282b033
>   current = 0xc00241382f00
>   paca= 0xc0003fffd680   irqmask: 0x03   irq_happened: 0x01
> pid   = 4114, comm = ndctl
>  c09bf8c0 devm_action_release+0x30/0x50
>  c09c0938 release_nodes+0x268/0x2d0
>  c09b95b4 device_release_driver_internal+0x164/0x230
>  c09b638c unbind_store+0x13c/0x190
>  c09b4f44 drv_attr_store+0x44/0x60
>  c058ccc0 sysfs_kf_write+0x70/0xa0
>  c058b52c kernfs_fop_write+0x1ac/0x290
>  c04a415c __vfs_write+0x3c/0x70
>  c04a85ac vfs_write+0xec/0x200
>  c04a8920 ksys_write+0x80/0x130
>  c000bee4 system_call+0x5c/0x70

I saw this as memory hotplug problem with respect to ZONE_DEVICE based device 
memory.
Hence a bit different explanation which I never posted. I guess parts of the 
commit
message here can be used for a better comprehensive explanation of the problem.

mm/hotplug: Initialize struct pages for vmem_altmap reserved areas

The following ZONE_DEVICE ranges (altmap) have valid struct pages allocated
from within device memory memmap range.

A. Driver reserved area [BASE -> BASE + RESV)
B. Device mmap area [BASE + RESV -> BASE + RESV + FREE]
C. Device usable area   [BASE + RESV + FREE -> END]

BASE - pgmap->altmap.base_pfn (pgmap->res.start >> PAGE_SHIFT)
RESV - pgmap->altmap.reserve
FREE - pgmap->altmap.free
END  - pgmap->res->end >> PAGE_SHIFT

Struct page init for all areas happens in two phases which detects altmap
use case and init parts of the device range in each phase.

1. memmap_init_zone (Device mmap area)
2. memmap_init_zone_device  (Device usable area)

memmap_init_zone() skips driver reserved area and does not init the
struct pages. This is problematic primarily for two reasons.

Though NODE_DATA(device_node(dev))->node_zones[ZONE_DEVICE] contains the
device memory range in it's entirety (in zone->spanned_pages) parts of this
range does not have zone set to ZONE_DEVICE in their struct page.

__remove_pages() called directly or from within arch_remove_memory() during
ZONE_DEVICE tear down procedure (devm_memremap_pages_release) hits an error
(like below) if there are reserved pages. This is because the first pfn of
the device range (invariably also the first pfn from reserved area) cannot
be identified belonging to ZONE_DEVICE. This erroneously leads range search
within iomem_resource region which never had this device memory region. So
this eventually ends up flashing the following error.

Unable to release resource <0x00068000-0x0006bfff> (-22)

Initialize struct pages for the driver reserved range while still staying
clear from it's contents.

> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  mm/page_alloc.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 59661106da16..892eabe1ec13 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
> nid, unsigned long zone,
>  
>  #ifdef CONFIG_ZONE_DEVICE
>   /*
> -  * Honor reservation requested by the driver for this ZONE_DEVICE
> -  * memory. We limit the total number of pages to initialize to just
> +  * We limit the total number of pages to initialize to just
Comment needs bit change to reflect on the fact that both driver reserved as
well as mapped area (containing altmap struct pages) needs init here.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] arm64: configurable sparsemem section size

2019-04-24 Thread Anshuman Khandual



On 04/25/2019 01:18 AM, Pavel Tatashin wrote:
> On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual
>  wrote:
>>
>> On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
>>> sparsemem section size determines the maximum size and alignment that
>>> is allowed to offline/online memory block. The bigger the size the less
>>> the clutter in /sys/devices/system/memory/*. On the other hand, however,
>>> there is less flexability in what granules of memory can be added and
>>> removed.
>>
>> Is there any scenario where less than a 1GB needs to be added on arm64 ?
> 
> Yes, DAX hotplug loses 1G of memory without allowing smaller sections.
> Machines on which we are going to be using this functionality have 8G
> of System RAM, therefore losing 1G is a big problem.
> 
> For details about using scenario see this cover letter:
> https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatas...@soleen.com/

Its loosing 1GB because devdax has 2M alignment ? IIRC from Dan's subsection 
memory
hot add series 2M comes from persistent memory HW controller's limitations. 
Does that
limitation applicable across all platforms including arm64 for all possible 
persistent
memory vendors. I mean is it universal ? IIUC subsection memory hot plug series 
is
still getting reviewed. Hence should not we wait for it to get merged before 
enabling
applicable platforms to accommodate these 2M limitations.

> 
>>
>>>
>>> Recently, it was enabled in Linux to hotadd persistent memory that
>>> can be either real NV device, or reserved from regular System RAM
>>> and has identity of devdax.
>>
>> devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet.
> 
> Correct, I use your patches to enable ZONE_DEVICE, and  thus devdax on ARM64:
> https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khand...@arm.com/
> 
>>
>>>
>>> The problem is that because ARM64's section size is 1G, and devdax must
>>> have 2M label section, the first 1G is always missed when device is
>>> attached, because it is not 1G aligned.
>>
>> devdax has to be 2M aligned ? Does Linux enforce that right now ?
> 
> Unfortunately, there is no way around this. Part of the memory can be
> reserved as persistent memory via device tree.
> memory@4000 {
> device_type = "memory";
> reg = < 0x 0x4000
> 0x0002 0x >;
> };
> 
> pmem@1c000 {
> compatible = "pmem-region";
> reg = <0x0001 0xc000
>0x 0x8000>;
> volatile;
> numa-node-id = <0>;
> };
> 
> So, while pmem is section aligned, as it should be, the dax device is
> going to be pmem start address + label size, which is 2M. The actual

Forgive my ignorance here but why dax device label size is 2M aligned. Again is 
that
because of some persistent memory HW controller limitations ?

> DAX device starts at:
> 0x1c000 + 2M.
> 
> Because section size is 1G, the hotplug will able to add only memory
> starting from
> 0x1c000 + 1G

Got it but as mentioned before we will have to make sure that 2M alignment 
requirement
is universal else we will be adjusting this multiple times.

> 
>> 27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and
>> SECTION_SIZE mismatch.

Even with 27 bits its 128 MB section size. How does it solve the problem with 
2M ?
The patch just wanted to reduce the memory wastage ?

> 
> Can you please elaborate what configs are you using? I have no
> problems compiling with 27 and 28 bit.

After applying your patch [1] on current mainline kernel [2].

$make defconfig

CONFIG_ARM64_64K_PAGES=y
CONFIG_ARM64_VA_BITS_48=y
CONFIG_ARM64_VA_BITS=48
CONFIG_ARM64_PA_BITS_48=y
CONFIG_ARM64_PA_BITS=48
CONFIG_ARM64_SECTION_SIZE_BITS=27

[1] https://patchwork.kernel.org/patch/10913737/
[2] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

It fails with

  CC  arch/arm64/kernel/asm-offsets.s
In file included from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from ./include/linux/resource_ext.h:19,
 from ./include/linux/acpi.h:26,
 from ./include/acpi/apei.h:9,
 from ./include/acpi/ghes.h:5,
 from ./include/linux/arm_sdei.h:14,
 from arch/arm64/kernel/asm-offsets.c:21:
./include/linux/mmzone.h:1095:2: error: #error Allocator MAX_ORDER exceeds 
SECTION_SIZE
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] arm64: configurable sparsemem section size

2019-04-24 Thread Anshuman Khandual



On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
> sparsemem section size determines the maximum size and alignment that
> is allowed to offline/online memory block. The bigger the size the less
> the clutter in /sys/devices/system/memory/*. On the other hand, however,
> there is less flexability in what granules of memory can be added and
> removed.

Is there any scenario where less than a 1GB needs to be added on arm64 ?

> 
> Recently, it was enabled in Linux to hotadd persistent memory that
> can be either real NV device, or reserved from regular System RAM
> and has identity of devdax.

devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet.

> 
> The problem is that because ARM64's section size is 1G, and devdax must
> have 2M label section, the first 1G is always missed when device is
> attached, because it is not 1G aligned.

devdax has to be 2M aligned ? Does Linux enforce that right now ?

> 
> Allow, better flexibility by making section size configurable.

Unless 2M is being enforced from Linux not sure why this is necessary at
the moment.

> 
> Signed-off-by: Pavel Tatashin 
> ---
>  arch/arm64/Kconfig | 10 ++
>  arch/arm64/include/asm/sparsemem.h |  2 +-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index b5d8cf57e220..a0c5b9d13a7f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -801,6 +801,16 @@ config ARM64_PA_BITS
>   default 48 if ARM64_PA_BITS_48
>   default 52 if ARM64_PA_BITS_52
>  
> +config ARM64_SECTION_SIZE_BITS
> + int "sparsemem section size shift"
> + range 27 30

27 and 28 do not even compile for ARM64_64_PAGES because of MAX_ORDER and
SECTION_SIZE mismatch.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

2017-12-22 Thread Anshuman Khandual
On 12/23/2017 03:43 AM, Ross Zwisler wrote:
> On Fri, Dec 22, 2017 at 08:39:41AM +0530, Anshuman Khandual wrote:
>> On 12/14/2017 07:40 AM, Ross Zwisler wrote:
>>>  Quick Summary 
>>>
>>> Platforms exist today which have multiple types of memory attached to a
>>> single CPU.  These disparate memory ranges have some characteristics in
>>> common, such as CPU cache coherence, but they can have wide ranges of
>>> performance both in terms of latency and bandwidth.
>>
>> Right.
>>
>>>
>>> For example, consider a system that contains persistent memory, standard
>>> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
>>> There could potentially be an order of magnitude or more difference in
>>> performance between the slowest and fastest memory attached to that CPU.
>>
>> Right.
>>
>>>
>>> With the current Linux code NUMA nodes are CPU-centric, so all the memory
>>> attached to a given CPU will be lumped into the same NUMA node.  This makes
>>> it very difficult for userspace applications to understand the performance
>>> of different memory ranges on a given CPU.
>>
>> Right but that might require fundamental changes to the NUMA representation.
>> Plugging those memory as separate NUMA nodes, identify them through sysfs
>> and try allocating from it through mbind() seems like a short term solution.
>>
>> Though if we decide to go in this direction, sysfs interface or something
>> similar is required to enumerate memory properties.
> 
> Yep, and this patch series is trying to be the sysfs interface that is
> required to the memory properties.  :)  It's a certainty that we will have
> memory-only NUMA nodes, at least on platforms that support ACPI.  Supporting
> memory-only proximity domains (which Linux turns in to memory-only NUMA nodes)
> is explicitly supported with the introduction of the HMAT in ACPI 6.2.

Yeah, even on POWER platforms can have memory only NUMA nodes.

> 
> It also turns out that the existing memory management code already deals with
> them just fine - you see this with my hmat_examples setup:
> 
> https://github.com/rzwisler/hmat_examples
> 
> Both configurations created by this repo create memory-only NUMA nodes, even
> with upstream kernels.  My patches don't change that, they just provide a
> sysfs representation of the HMAT so users can discover the memory that exists
> in the system.

Once its a NUMA node everything will work as is from MM interface
point of view. But the point is how we export these properties to
user space. My only concern is lets not do it in a way which will
be locked without first going through NUMA redesign for these new
attribute based memory, thats all.

> 
>>> We solve this issue by providing userspace with performance information on
>>> individual memory ranges.  This performance information is exposed via
>>> sysfs:
>>>
>>>   # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
>>>   mem_tgt2/firmware_id:1
>>>   mem_tgt2/is_cached:0
>>>   mem_tgt2/local_init/read_bw_MBps:40960
>>>   mem_tgt2/local_init/read_lat_nsec:50
>>>   mem_tgt2/local_init/write_bw_MBps:40960
>>>   mem_tgt2/local_init/write_lat_nsec:50
>>
>> I might have missed discussions from earlier versions, why we have this
>> kind of a "source --> target" model ? We will enlist properties for all
>> possible "source --> target" on the system ? Right now it shows only
>> bandwidth and latency properties, can it accommodate other properties
>> as well in future ?
> 
> The initiator/target model is useful in preventing us from needing a
> MAX_NUMA_NODES x MAX_NUMA_NODES sized table for each performance attribute.  I
> talked about it a little more here:

That makes it even more complex. Not only we have a memory attribute
like bandwidth specific to the range, we are also exporting it's
relative values as seen from different CPU nodes. Its again kind of
a NUMA distance table being exported in the generic sysfs path like
/sys/devices/. The problem is possible future memory attributes like
'reliability', 'density', 'power consumption' might not have a need
for a "source --> destination" kind of model as they dont change
based on which CPU node is accessing it.

> 
> https://lists.01.org/pipermail/linux-nvdimm/2017-December/013654.html
> 
>>> This allows applications to easily find the memory that they want to use.
>>> We expect that the existing NUMA APIs will be enhanced to use this new
>>> information so that applications can continue to use them to 

Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

2017-12-22 Thread Anshuman Khandual
On 12/22/2017 10:43 PM, Dave Hansen wrote:
> On 12/21/2017 07:09 PM, Anshuman Khandual wrote:
>> I had presented a proposal for NUMA redesign in the Plumbers Conference this
>> year where various memory devices with different kind of memory attributes
>> can be represented in the kernel and be used explicitly from the user space.
>> Here is the link to the proposal if you feel interested. The proposal is
>> very intrusive and also I dont have a RFC for it yet for discussion here.
> I think that's the best reason to "re-use NUMA" for this: it's _not_
> intrusive.
> 
> Also, from an x86 perspective, these HMAT systems *will* be out there.
> Old versions of Linux *will* see different types of memory as separate
> NUMA nodes.  So, if we are going to do something different, it's going
> to be interesting to un-teach those systems about using the NUMA APIs
> for this.  That ship has sailed.

I understand the need to fetch these details from ACPI/DT for
applications to target these distinct memory only NUMA nodes.
This can be done by parsing from platform specific values from
/proc/acpi/ or /proc/device-tree/ interfaces. This can be a
short term solution before NUMA redesign can be figured out.
But adding generic devices like "hmat" in the /sys/devices/
path which will be locked for good, seems problematic.
   

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

2017-12-22 Thread Anshuman Khandual
On 12/22/2017 04:01 PM, Kogut, Jaroslaw wrote:
>> ... first thinking about redesigning the NUMA for
>> heterogeneous memory may not be a good idea. Will look into this further.
> I agree with comment that first a direction should be defined how to handle 
> heterogeneous memory system.
> 
>> https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/
>> Hierarchical_NUMA_Design_Plumbers_2017.pdf
> I miss in the presentation a user perspective of the new approach, e.g.
> - How does application developer see/understand the heterogeneous memory 
> system?

>From user perspective

- Each memory node (with or without CPU) is a NUMA node with attributes
- User should detect these NUMA nodes from sysfs (not part of proposal)
- User allocates/operates/destroys VMA with new sys calls (_mattr based)

> - How does app developer use the heterogeneous memory system?

- Through existing and new system calls

> - What are modification in API/sys interfaces?

- The presentation has possible addition of new system calls with 'u64
  _mattr' representation for memory attributes which can be used while
  requesting different kinds of memory from the kernel

> 
> In other hand, if we assume that separate memory NUMA node has different 
> memory capabilities/attributes from stand point of particular CPU, it is easy 
> to explain for user how to describe/handle heterogeneous memory. 
> 
> Of course, current numa design is not sufficient in kernel in following areas 
> today:
> - Exposing memory attributes that describe heterogeneous memory system
> - Interfaces to use the heterogeneous memory system, e.g. more sophisticated 
> policies
> - Internal mechanism in memory management, e.g. automigration, maybe 
> something else.

Right, we would need

- Representation of NUMA with attributes
- APIs/syscalls for accessing the intended memory from user space
- Memory management policies and algorithms navigating trough all these
  new attributes in various situations

IMHO, we should not consider sysfs interfaces for heterogeneous memory
(which will be an ABI going forward and hence cannot be changed easily)
before we get the NUMA redesign right.

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

2017-12-21 Thread Anshuman Khandual
On 12/14/2017 07:40 AM, Ross Zwisler wrote:
>  Quick Summary 
> 
> Platforms exist today which have multiple types of memory attached to a
> single CPU.  These disparate memory ranges have some characteristics in
> common, such as CPU cache coherence, but they can have wide ranges of
> performance both in terms of latency and bandwidth.

Right.

> 
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.

Right.

> 
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node.  This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.

Right but that might require fundamental changes to the NUMA representation.
Plugging those memory as separate NUMA nodes, identify them through sysfs
and try allocating from it through mbind() seems like a short term solution.

Though if we decide to go in this direction, sysfs interface or something
similar is required to enumerate memory properties.

> 
> We solve this issue by providing userspace with performance information on
> individual memory ranges.  This performance information is exposed via
> sysfs:
> 
>   # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
>   mem_tgt2/firmware_id:1
>   mem_tgt2/is_cached:0
>   mem_tgt2/local_init/read_bw_MBps:40960
>   mem_tgt2/local_init/read_lat_nsec:50
>   mem_tgt2/local_init/write_bw_MBps:40960
>   mem_tgt2/local_init/write_lat_nsec:50

I might have missed discussions from earlier versions, why we have this
kind of a "source --> target" model ? We will enlist properties for all
possible "source --> target" on the system ? Right now it shows only
bandwidth and latency properties, can it accommodate other properties
as well in future ?

> 
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.

I had presented a proposal for NUMA redesign in the Plumbers Conference this
year where various memory devices with different kind of memory attributes
can be represented in the kernel and be used explicitly from the user space.
Here is the link to the proposal if you feel interested. The proposal is
very intrusive and also I dont have a RFC for it yet for discussion here.

https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf

Problem is, designing the sysfs interface for memory attribute detection
from user space without first thinking about redesigning the NUMA for
heterogeneous memory may not be a good idea. Will look into this further.

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] mm: add ZONE_DEVICE statistics to smaps

2016-11-10 Thread Anshuman Khandual
On 11/11/2016 03:41 AM, Dan Williams wrote:
> ZONE_DEVICE pages are mapped into a process via the filesystem-dax and
> device-dax mechanisms.  There are also proposals to use ZONE_DEVICE
> pages for other usages outside of dax.  Add statistics to smaps so
> applications can debug that they are obtaining the mappings they expect,
> or otherwise accounting them.

This might also help when we will have ZONE_DEVICE based solution for
HMM based device memory.

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm