On 6/17/21 1:16 PM, David Gibson wrote:
On Tue, Jun 15, 2021 at 12:35:17PM +0530, Aneesh Kumar K.V wrote:
David Gibson <da...@gibson.dropbear.id.au> writes:

On Tue, Jun 15, 2021 at 11:27:50AM +0530, Aneesh Kumar K.V wrote:
David Gibson <da...@gibson.dropbear.id.au> writes:

...

It's weird to me that you'd want to consider them in different nodes
for those different purposes.


    --------------------------------------
   |                            NUMA node0 |
   |    ProcA -----> MEMA                  |
   |     |                                 |
   |    |                                 |
   |    -------------------> PMEMB        |
   |                                       |
    ---------------------------------------

    ---------------------------------------
   |                            NUMA node1 |
   |                                       |
   |    ProcB -------> MEMC                |
   |    |                                 |
   |    -------------------> PMEMD        |
   |                                       |
   |                                       |
    ---------------------------------------
For a topology like the above application running of ProcA wants to find out
persistent memory mount local to its NUMA node. Hence when using it as
pmem fsdax mount or devdax device we want PMEMB to have associativity
of NUMA node0 and PMEMD to have associativity of NUMA node 1. But when
we want to use it as memory using dax kmem driver, we want both PMEMB
and PMEMD to appear as memory only NUMA node at a distance that is
derived based on the latency of the media.

I'm still not understanding why the latency we care about is different
in the two cases.  Can you give an example of when this would result
in different actual node assignments for the two different cases?


In the above example in order allow use of PMEMB and PMEMD as memory only NUMA nodes
we need platform to represent them in its own domainID. Let's assume that
platform assigned id 40 and 41 and hence both PMEMB and PMEMD will have associativity array like below

{ 4, 6, 0}  -> PROCA/MEMA
{ 4, 6, 40} -> PMEMB
{ 4, 6, 41} -> PMEMD
{ 4, 6, 1} ->  PROCB/MEMB

When we want to use this device PMEMB and PMEMD as fsdax/devdax devices, we essentially look for the first nearest online node. Which means both PMEMB and PMEMD will appear as devices attached to node0. That is not ideal for for many applications.

using secondary domainID index as explained here helps to associate
each PMEM device to the right group. On a non virtualized config or hard partitioned config such a device tree representation can be looked at as a hint to identify which socket the actual device is connected to.

-aneesh

Reply via email to