Re: [PATCH v3 13/25] cxl/region: Add sparse DAX region support
On 8/16/2024 10:44 PM, ira.we...@intel.com wrote: > From: Navneet Singh > > Dynamic Capacity CXL regions must allow memory to be added or removed > dynamically. In addition to the quantity of memory available the > location of the memory within a DC partition is dynamic based on the > extents offered by a device. CXL DAX regions must accommodate the > sparseness of this memory in the management of DAX regions and devices. > > Introduce the concept of a sparse DAX region. Add a create_dc_region() > sysfs entry to create such regions. Special case DC capable regions to > create a 0 sized seed DAX device to maintain compatibility which > requires a default DAX device to hold a region reference. > > Indicate 0 byte available capacity until such time that capacity is > added. > > Sparse regions complicate the range mapping of dax devices. There is no > known use case for range mapping on sparse regions. Avoid the > complication by preventing range mapping of dax devices on sparse > regions. > > Interleaving is deferred for now. Add checks. > > Signed-off-by: Navneet Singh > Co-developed-by: Ira Weiny > Signed-off-by: Ira Weiny > > --- > Changes: > [Fan: use single function for dc region store] > [djiang: avoid setting dev_size twice] > [djbw: Check DCD support and interleave restriction on region creation] > [iweiny: squash patch : dax/region: Prevent range mapping allocation on > sparse regions] > [iwieny: remove reviews] > [iweiny: rebase to master] > [iweiny: push sysfs version to 6.12] > [iweiny: make cxled_to_mds inline] > --- > Documentation/ABI/testing/sysfs-bus-cxl | 22 > drivers/cxl/core/core.h | 12 + > drivers/cxl/core/port.c | 1 + > drivers/cxl/core/region.c | 46 > +++-- > drivers/dax/bus.c | 10 +++ > drivers/dax/bus.h | 1 + > drivers/dax/cxl.c | 16 ++-- > 7 files changed, 93 insertions(+), 15 deletions(-) > [...] > @@ -2185,8 +2191,13 @@ static size_t store_targetN(struct cxl_region *cxlr, > const char *buf, int pos, > goto out; > } > > - rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos, > -TASK_INTERRUPTIBLE); > + cxled = to_cxl_endpoint_decoder(dev); > + if (cxlr->mode == CXL_REGION_DC && > + !cxl_dcd_supported(cxled_to_mds(cxled))) { > + dev_dbg(dev, "DCD unsupported\n"); > + return -EINVAL; need a 'goto out' here to dereference the device? > + } > + rc = attach_target(cxlr, cxled, pos, TASK_INTERRUPTIBLE); > out: > put_device(dev); > } > @@ -2534,6 +2545,7 @@ static struct cxl_region *__create_region(struct > cxl_root_decoder *cxlrd, > switch (mode) { > case CXL_REGION_RAM: > case CXL_REGION_PMEM: > + case CXL_REGION_DC: > break; > default: > dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n", > @@ -2587,6 +2599,20 @@ static ssize_t create_ram_region_store(struct device > *dev, > } > DEVICE_ATTR_RW(create_ram_region); > > +static ssize_t create_dc_region_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + return __create_region_show(to_cxl_root_decoder(dev), buf); > +} > + > +static ssize_t create_dc_region_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t len) > +{ > + return create_region_store(dev, buf, len, CXL_REGION_DC); > +} > +DEVICE_ATTR_RW(create_dc_region); > + > static ssize_t region_show(struct device *dev, struct device_attribute *attr, > char *buf) > { > @@ -3168,6 +3194,11 @@ static int devm_cxl_add_dax_region(struct cxl_region > *cxlr) > struct device *dev; > int rc; > > + if (cxlr->mode == CXL_REGION_DC && cxlr->params.interleave_ways != 1) { > + dev_err(&cxlr->dev, "Interleaving DC not supported\n"); > + return -EINVAL; > + } > + > cxlr_dax = cxl_dax_region_alloc(cxlr); > if (IS_ERR(cxlr_dax)) > return PTR_ERR(cxlr_dax); > @@ -3260,6 +3291,16 @@ static struct cxl_region *construct_region(struct > cxl_root_decoder *cxlrd, > return ERR_PTR(-EINVAL); > > mode = cxl_decoder_to_region_mode(cxled->mode); > + if (mode == CXL_REGION_DC) { > + if (!cxl_dcd_supported(cxled_to_mds(cxled))) { > + dev_err(&cxled->cxld.dev, "DCD unsupported\n"); > + return ERR_PTR(-EINVAL); > + } > + if (cxled->cxld.interleave_ways != 1) { > + dev_err(&cxled->cxld.dev, "Interleaving and DCD not > supported\n"); > + r
Re: [PATCH v3 18/25] cxl/extent: Process DCD events and realize region extents
On 8/16/2024 10:44 PM, ira.we...@intel.com wrote: > From: Navneet Singh > > A dynamic capacity device (DCD) sends events to signal the host for > changes in the availability of Dynamic Capacity (DC) memory. These > events contain extents describing a DPA range and meta data for memory > to be added or removed. Events may be sent from the device at any time. > > Three types of events can be signaled, Add, Release, and Force Release. > > On add, the host may accept or reject the memory being offered. If no > region exists, or the extent is invalid, the extent should be rejected. > Add extent events may be grouped by a 'more' bit which indicates those > extents should be processed as a group. > > On remove, the host can delay the response until the host is safely not > using the memory. If no region exists the release can be sent > immediately. The host may also release extents (or partial extents) at > any time. Thus the 'more' bit grouping of release events is of less > value and can be ignored in favor of sending multiple release capacity > responses for groups of release events. > > Force removal is intended as a mechanism between the FM and the device > and intended only when the host is unresponsive, out of sync, or > otherwise broken. Purposely ignore force removal events. > > Regions are made up of one or more devices which may be surfacing memory > to the host. Once all devices in a region have surfaced an extent the > region can expose a corresponding extent for the user to consume. > Without interleaving a device extent forms a 1:1 relationship with the > region extent. Immediately surface a region extent upon getting a > device extent. > > Per the specification the device is allowed to offer or remove extents > at any time. However, anticipated use cases can expect extents to be > offered, accepted, and removed in well defined chunks. > > Simplify extent tracking with the following restrictions. > > 1) Flag for removal any extent which overlaps a requested > release range. > 2) Refuse the offer of extents which overlap already accepted > memory ranges. > 3) Accept again a range which has already been accepted by the > host. (It is likely the device has an error because it > should already know that this range was accepted. But from > the host point of view it is safe to acknowledge that > acceptance again.) > > Management of the region extent devices must be synchronized with > potential uses of the memory within the DAX layer. Create region extent > devices as children of the cxl_dax_region device such that the DAX > region driver can co-drive them and synchronize with the DAX layer. > Synchronization and management is handled in a subsequent patch. > > Process DCD events and create region devices. > > Signed-off-by: Navneet Singh > Co-developed-by: Ira Weiny > Signed-off-by: Ira Weiny > > --- > Changes: > [iweiny: combine this with the extent surface patches to better show the > lifetime extent objects in review] > [iweiny: clean up commit message.] > [iweiny: move extent verification of the 'read extents on region > creation' to this patch] > [iweiny: Provide for a common path for extent realization between an add >event and adding existing extents.] > [iweiny: Persist a check that an extent is within an endpoint decoder] > [iweiny: reduce exported and non-static calls] > [iweiny: use %par] > > > > [Jonathan: implement the more bit with a simple algorithm which accepts > all extents it can. > Also include the response more bit to prevent payload > overflow] > [Fan: Do not error if a contained extent is added.] > [Jonathan: allocate ida after kzalloc] > [iweiny: fix ida resource leak] > [fan/djiang: remove unneeded memset] > [djiang: fix indentation] > [Jonathan: Fix indentation] > [Jonathan/djbw: make tag a uuid] > [djbw: create helper calc_hpa_range() straight away] > [djbw: Allow for multiple cxled_extents per region_extent] > [djbw: s/cxl_ed/cxled] > [djbw: s/cxl_release_ed_extent/cxled_release_extent/] > [djbw: s/reg_ext/region_extent/] > [djbw: s/dc_extent/extent/] > [Gregory/djbw: reject shared extents] > [iweiny: predicate extent.c compile on CONFIG_CXL_REGION] > --- > drivers/cxl/core/Makefile | 2 +- > drivers/cxl/core/core.h | 13 ++ > drivers/cxl/core/extent.c | 345 > ++ > drivers/cxl/core/mbox.c | 268 ++- > drivers/cxl/core/region.c | 6 + > drivers/cxl/cxl.h | 52 ++- > drivers/cxl/cxlmem.h | 26 > include/linux/cxl-event.h | 32 + > tools/testing/cxl/Kbuild | 3 +- > 9 files changed, 743 insertions(+), 4 deletions(-) [...] > + > +static bool extents_contain(struct cxl_dax_region *cxlr_dax, > + struct cxl_endpoint_decoder *cxled, > + struct range *new_range) > +{ > +
Re: [PATCH v3 04/25] cxl/pci: Delay event buffer allocation
On 8/16/2024 10:44 PM, Ira Weiny wrote: > The event buffer does not need to be allocated if something has failed in > setting up event irq's. > > In prep for adjusting event configuration for DCD events move the buffer > allocation to the end of the event configuration. > > Reviewed-by: Davidlohr Bueso > Reviewed-by: Dave Jiang > Reviewed-by: Jonathan Cameron > Signed-off-by: Ira Weiny Reviewed-by: Li Ming
Re: [PATCH v3 05/25] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
On 8/16/2024 10:44 PM, ira.we...@intel.com wrote: > From: Navneet Singh > > Per the CXL 3.1 specification software must check the Command Effects > Log (CEL) for dynamic capacity command support. > > Detect support for the DCD commands while reading the CEL, including: > > Get DC Config > Get DC Extent List > Add DC Response > Release DC > > Signed-off-by: Navneet Singh > Reviewed-by: Jonathan Cameron > Reviewed-by: Fan Ni > Reviewed-by: Dave Jiang > Reviewed-by: Davidlohr Bueso > Co-developed-by: Ira Weiny > Signed-off-by: Ira Weiny > Reviewed-by: Li Ming
Re: [PATCH v3 07/25] cxl/core: Separate region mode from decoder mode
On 8/16/2024 10:44 PM, ira.we...@intel.com wrote: > From: Navneet Singh > > Until now region modes and decoder modes were equivalent in that both > modes were either PMEM or RAM. The addition of Dynamic > Capacity partitions defines up to 8 DC partitions per device. > > The region mode is thus no longer equivalent to the endpoint decoder > mode. IOW the endpoint decoders may have modes of DC0-DC7 while the > region mode is simply DC. > > Define a new region mode enumeration which applies to regions separate > from the decoder mode. Adjust the code to process these modes > independently. > > There is no equal to decoder mode dead in region modes. Avoid > constructing regions with decoders which have been flagged as dead. > > Suggested-by: Jonathan Cameron > Signed-off-by: Navneet Singh > Co-developed-by: Ira Weiny > Signed-off-by: Ira Weiny Reviewed-by: Li Ming