Brice Goglin wrote:
> Le 11/02/2023 à 02:53, Dan Williams a écrit :
> 
> > Brice Goglin wrote:
> > [..]
> >>>> By the way, once configured in system ram, my CXL ram is merged into an
> >>>> existing "normal" NUMA node. How do I tell Qemu that a CXL region should
> >>>> be part of a new NUMA node? I assume that's what's going to happen on
> >>>> real hardware?
> >>> We don't yet have kernel code to deal with assigning a new NUMA node.
> >>> Was on the todo list in last sync call I think.
> >>
> > In fact, there is no plan to support "new" NUMA node creation. A node
> > can only be onlined / populated from set of static nodes defined by
> > platform-firmware. The set of static nodes is defined by the union of
> > all the proximity domain numbers in the SRAT as well as a node per
> > CFMWS / QTG id. See:
> >
> >      fd49f99c1809 ACPI: NUMA: Add a node and memblk for each CFMWS not in 
> > SRAT
> >
> > ...for the CXL node enumeration scheme.
> >
> > Once you have a node per CFMWS then it is up to CDAT and the QTG DSM to
> > group devices by window. This scheme attempts to be as simple as
> > possible, but no simpler. If more granularity is necessary in practice,
> > that would be a good discussion to have soonish.. LSF/MM comes to mind.
> 
> Actually I was mistaken, there's already a new NUMA node when creating
> a region under Qemu, but my tools ignored it because it's empty.
> After daxctl online-memory, things look good.
> 
> Can you clarify your above sentences on a real node? If I connect two
> memory expanders on two slots of the same CPU, do I get a single CFMWS or two?
> What if I connect two devices to a single slot across a CXL switch?

Ultimately the answer is "ask your platform vendor", because this is a
firmware decision. However, my expectation is that since the ACPI HMAT
requires a proximity domain per distinct performance class, and because
the ACPI HMAT needs to distinguish the memory that is "attached" to a
CPU initiator domain, that CXL will at a minimum be described in a
proximity domain distinct from "local DRAM".

The number of CFMWS windows published is gated by the degrees of freedom
platform-firmware wants to give the OS relative to the number of CXL
host-bridges in the system. One scheme that seems plausible is one CFMWS
window for each host-bridge / x1 interleave (to maximize RAS) and one
CFMWS with all host-bridges interleaved together (to maximize
performance).

The above is just my personal opinion as a Linux kernel developer, a
platform implementation is free to be as restrictive or generous as it
wants with CFMWS resources.

Reply via email to