Changes since v1 [1]:
- Move 19 patches that have received a Reviewed-by to the 'pending'
branch in cxl.git (Thanks Alison, Adam, and Jonathan!)
- Improve the changelog and add more Cc's to "cxl/acpi: Track CXL
resources in iomem_resource" and highlight the new export of
insert_resource_expand_to_fit()
- Switch all occurrences of the pattern "rc = -ECODE; if (condition)
goto err;" to "if (condition) { rc = -ECODE; goto err; }" (Jonathan)
- Re-organize all the cxl_{root,switch,endpoint}_decoder() patches to
move the decoder-type-specific setup into the decoder-type-specific
allocation routines (Jonathan)
- Add kdoc to clarify the behavior of add_cxl_resources() (Jonathan)
- Add IORES_DESC_CXL for kernel components like EDAC to determine when
they might be dealing with a CXL address range (Tony)
- Drop usage of dev_set_drvdata() for passing @cxl_res (Jonathan)
- Drop @remove_action argument to __cxl_dpa_release(), make it behave
like any other devm_<free> helper (Jonathan)
- Clarify 'skip' vs 'skipped' in DPA handling helpers (Jonathan)
- Clarify why port teardown no proceeds under the lock with the
conversion from list to xarray (Jonathan)
- Revert rename of cxl_find_dport_by_dev() (Jonathan)
- Fold down_read() / up_write() mismatch fix to the patch that
introduced the problem (Jonathan)
- Fix description of interleave_ways and interleave_granularity in the
sysfs ABI document
- Clarify tangential cleanups in "resource: Introduce
alloc_free_mem_region()" (Jonathan)
- Clarify rationale for the region creation / naming ABI (Jonathan)
- Add SET_CXL_REGION_ATTR() to supplement CXL_REGION_ATTR() the former
is used to optionally added region attributes to an attribute list
(position independent) and the latter is used to retrieve a pointer to
the attribute in code. (Jonathan)
- For writes to region attributes allow the same value to be written
multiple times without error (Jonathan)
- Clarify the actions performed by cxl_port_attach_region() (Jonathan)
- Commit message spelling fixes (Alison and Jonathan)
- Rename cxl_dpa_resource() => cxl_dpa_resource_start() (Jonathan)
- Reword error message in cxl_parse_cfmws() (Adam)
- Keep @expected_len signed in cxl_acpi_cfmws_verify() (Jonathan)
- Miscellaneous formatting and doc fixes (Jonathan)
- Rename port->dpa_end port->hdm_end (Jonathan)
- Rename unregister_region() => unregister_nvdimm_region() (Jonathan)
[1]:
https://lore.kernel.org/linux-cxl/165603869943.551046.3498980330327696732.stgit@dwillia2-xfh
---
Until the CXL 2.0 definition arrived there was little reason for OS
drivers to care about CXL memory expanders. Similar to DDR they just
implemented a physical address range that was described to the OS by
platform firmware (EFI Memory Map + ACPI SRAT/SLIT/HMAT etc). The CXL
2.0 definition adds support for PMEM, hotplug, switch topologies, and
device-interleaving which exceeds the limits of what can be reasonably
abstracted by EFI + ACPI mechanisms. As a result, Linux needs a native
capability to provision new CXL regions.
The term "region" is the same term that originated in the LIBNVDIMM
implementation to describe a host physical / system physical address
range. For PMEM a region is a persistent memory range that can be
further sub-divided into namespaces. For CXL there are three
classifications of regions:
- PMEM: set up by CXL native tooling and persisted in CXL region labels
- RAM: set up dynamically by CXL native tooling after hotplug events, or
leftover capacity not mapped by platform firmware. Any persistent
configuration would come from set up scripts / configuration files in
userspace.
- System RAM: set up by platform firmware and described by EFI + ACPI
metadata, these regions are static.
For now, these patches implement just PMEM regions without region label
support. Note though that the infrastructure routines like
cxl_region_attach() and cxl_region_setup_targets() are building blocks
for region-label support, provisioning RAM regions, and enumerating
System RAM regions.
The general flow for provisioning a CXL region is to:
- Find a device or set of devices with available device-physical-address
(DPA) capacity
- Find a platform CXL window that has free capacity to map a new region
and that is able to target the devices in the previous step.
- Allocate DPA according to the CXL specification rules of sequential
enabling of decoders by id and when a device hosts multiple decoders
make sure that lower-id decoders map lower HPA and higher-id decoders
map higher HPA.
- Assign endpoint decoders to a region and validate that the switching
topology supports the requested configuration. Recall that
interleaving is governed by modulo or xormap math that constrains which
device can support which positions in a given region interleave.
- Program all the decoders an all endpoints and participating switches
to bring the new address range online.
Once the range is online then existing drivers like LIBNVDIMM or
device-dax can manage the memory range as if the ACPI BIOS had conveyed
its parameters at boot.
This patch kit is the result of significant amounts of path finding work
[2] and long discussions with Ben. Thank you Ben for all that work!
Where the patches in this kit go in a different design direction than
the RFC, the authorship is changed and a Co-developed-by is added mainly
so I get blamed for the bad decisions and not Ben. The major updates
from that last posting are:
- all CXL resources are reflected in full in iomem_resource
- host-physical-address (HPA) range allocation moves to a
devm_request_free_mem_region() derivative
- locking moves to two global rwsems, one for DPA / endpoint decoders
and one for HPA / regions.
- the existing port scanning path is augmented to cache more topology
information rather than recreate it at region creation time
[2]: https://lore.kernel.org/r/[email protected]
---
Ben Widawsky (4):
cxl/hdm: Add sysfs attributes for interleave ways + granularity
cxl/region: Add region creation support
cxl/region: Add a 'uuid' attribute
cxl/region: Add interleave geometry attributes
Dan Williams (24):
Documentation/cxl: Use a double line break between entries
cxl/core: Define a 'struct cxl_switch_decoder'
cxl/acpi: Track CXL resources in iomem_resource
cxl/core: Define a 'struct cxl_root_decoder'
cxl/core: Define a 'struct cxl_endpoint_decoder'
cxl/hdm: Enumerate allocated DPA
cxl/hdm: Add 'mode' attribute to decoder objects
cxl/hdm: Track next decoder to allocate
cxl/hdm: Add support for allocating DPA to an endpoint decoder
cxl/port: Record dport in endpoint references
cxl/port: Record parent dport when adding ports
cxl/port: Move 'cxl_ep' references to an xarray per port
cxl/port: Move dport tracking to an xarray
cxl/mem: Enumerate port targets before adding endpoints
resource: Introduce alloc_free_mem_region()
cxl/region: Allocate HPA capacity to regions
cxl/region: Enable the assignment of endpoint decoders to regions
cxl/acpi: Add a host-bridge index lookup mechanism
cxl/region: Attach endpoint decoders
cxl/region: Program target lists
cxl/hdm: Commit decoder state to hardware
cxl/region: Add region driver boiler plate
cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge
cxl/region: Introduce cxl_pmem_region objects
Documentation/ABI/testing/sysfs-bus-cxl | 213 +++
Documentation/driver-api/cxl/memory-devices.rst | 11
drivers/cxl/Kconfig | 8
drivers/cxl/acpi.c | 185 ++
drivers/cxl/core/Makefile | 1
drivers/cxl/core/core.h | 49 +
drivers/cxl/core/hdm.c | 623 +++++++-
drivers/cxl/core/pmem.c | 4
drivers/cxl/core/port.c | 669 ++++++--
drivers/cxl/core/region.c | 1830 +++++++++++++++++++++++
drivers/cxl/cxl.h | 263 +++
drivers/cxl/cxlmem.h | 18
drivers/cxl/mem.c | 32
drivers/cxl/pmem.c | 259 +++
drivers/nvdimm/region_devs.c | 28
include/linux/ioport.h | 3
include/linux/libnvdimm.h | 5
kernel/resource.c | 185 ++
mm/Kconfig | 5
tools/testing/cxl/Kbuild | 1
tools/testing/cxl/test/cxl.c | 75 +
21 files changed, 4156 insertions(+), 311 deletions(-)
create mode 100644 drivers/cxl/core/region.c
base-commit: b060edfd8cdd52bc8648392500bf152a8dd6d4c5