Re: [PATCH v3 13/25] cxl/region: Add sparse DAX region support

2024-09-02 Thread Li, Ming4
On 8/16/2024 10:44 PM, ira.we...@intel.com wrote:
> From: Navneet Singh 
>
> Dynamic Capacity CXL regions must allow memory to be added or removed
> dynamically.  In addition to the quantity of memory available the
> location of the memory within a DC partition is dynamic based on the
> extents offered by a device.  CXL DAX regions must accommodate the
> sparseness of this memory in the management of DAX regions and devices.
>
> Introduce the concept of a sparse DAX region.  Add a create_dc_region()
> sysfs entry to create such regions.  Special case DC capable regions to
> create a 0 sized seed DAX device to maintain compatibility which
> requires a default DAX device to hold a region reference.
>
> Indicate 0 byte available capacity until such time that capacity is
> added.
>
> Sparse regions complicate the range mapping of dax devices.  There is no
> known use case for range mapping on sparse regions.  Avoid the
> complication by preventing range mapping of dax devices on sparse
> regions.
>
> Interleaving is deferred for now.  Add checks.
>
> Signed-off-by: Navneet Singh 
> Co-developed-by: Ira Weiny 
> Signed-off-by: Ira Weiny 
>
> ---
> Changes:
> [Fan: use single function for dc region store]
> [djiang: avoid setting dev_size twice]
> [djbw: Check DCD support and interleave restriction on region creation]
> [iweiny: squash patch : dax/region: Prevent range mapping allocation on 
> sparse regions]
> [iwieny: remove reviews]
> [iweiny: rebase to master]
> [iweiny: push sysfs version to 6.12]
> [iweiny: make cxled_to_mds inline]
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 22 
>  drivers/cxl/core/core.h | 12 +
>  drivers/cxl/core/port.c |  1 +
>  drivers/cxl/core/region.c   | 46 
> +++--
>  drivers/dax/bus.c   | 10 +++
>  drivers/dax/bus.h   |  1 +
>  drivers/dax/cxl.c   | 16 ++--
>  7 files changed, 93 insertions(+), 15 deletions(-)
>
[...]
> @@ -2185,8 +2191,13 @@ static size_t store_targetN(struct cxl_region *cxlr, 
> const char *buf, int pos,
>   goto out;
>   }
>  
> - rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos,
> -TASK_INTERRUPTIBLE);
> + cxled = to_cxl_endpoint_decoder(dev);
> + if (cxlr->mode == CXL_REGION_DC &&
> + !cxl_dcd_supported(cxled_to_mds(cxled))) {
> + dev_dbg(dev, "DCD unsupported\n");
> + return -EINVAL;

need a 'goto out' here to dereference the device?


> + }
> + rc = attach_target(cxlr, cxled, pos, TASK_INTERRUPTIBLE);
>  out:
>   put_device(dev);
>   }
> @@ -2534,6 +2545,7 @@ static struct cxl_region *__create_region(struct 
> cxl_root_decoder *cxlrd,
>   switch (mode) {
>   case CXL_REGION_RAM:
>   case CXL_REGION_PMEM:
> + case CXL_REGION_DC:
>   break;
>   default:
>   dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %s\n",
> @@ -2587,6 +2599,20 @@ static ssize_t create_ram_region_store(struct device 
> *dev,
>  }
>  DEVICE_ATTR_RW(create_ram_region);
>  
> +static ssize_t create_dc_region_show(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + return __create_region_show(to_cxl_root_decoder(dev), buf);
> +}
> +
> +static ssize_t create_dc_region_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t len)
> +{
> + return create_region_store(dev, buf, len, CXL_REGION_DC);
> +}
> +DEVICE_ATTR_RW(create_dc_region);
> +
>  static ssize_t region_show(struct device *dev, struct device_attribute *attr,
>  char *buf)
>  {
> @@ -3168,6 +3194,11 @@ static int devm_cxl_add_dax_region(struct cxl_region 
> *cxlr)
>   struct device *dev;
>   int rc;
>  
> + if (cxlr->mode == CXL_REGION_DC && cxlr->params.interleave_ways != 1) {
> + dev_err(&cxlr->dev, "Interleaving DC not supported\n");
> + return -EINVAL;
> + }
> +
>   cxlr_dax = cxl_dax_region_alloc(cxlr);
>   if (IS_ERR(cxlr_dax))
>   return PTR_ERR(cxlr_dax);
> @@ -3260,6 +3291,16 @@ static struct cxl_region *construct_region(struct 
> cxl_root_decoder *cxlrd,
>   return ERR_PTR(-EINVAL);
>  
>   mode = cxl_decoder_to_region_mode(cxled->mode);
> + if (mode == CXL_REGION_DC) {
> + if (!cxl_dcd_supported(cxled_to_mds(cxled))) {
> + dev_err(&cxled->cxld.dev, "DCD unsupported\n");
> + return ERR_PTR(-EINVAL);
> + }
> + if (cxled->cxld.interleave_ways != 1) {
> + dev_err(&cxled->cxld.dev, "Interleaving and DCD not 
> supported\n");
> + r

Re: [PATCH v3 18/25] cxl/extent: Process DCD events and realize region extents

2024-09-02 Thread Li, Ming4
On 8/16/2024 10:44 PM, ira.we...@intel.com wrote:
> From: Navneet Singh 
>
> A dynamic capacity device (DCD) sends events to signal the host for
> changes in the availability of Dynamic Capacity (DC) memory.  These
> events contain extents describing a DPA range and meta data for memory
> to be added or removed.  Events may be sent from the device at any time.
>
> Three types of events can be signaled, Add, Release, and Force Release.
>
> On add, the host may accept or reject the memory being offered.  If no
> region exists, or the extent is invalid, the extent should be rejected.
> Add extent events may be grouped by a 'more' bit which indicates those
> extents should be processed as a group.
>
> On remove, the host can delay the response until the host is safely not
> using the memory.  If no region exists the release can be sent
> immediately.  The host may also release extents (or partial extents) at
> any time.  Thus the 'more' bit grouping of release events is of less
> value and can be ignored in favor of sending multiple release capacity
> responses for groups of release events.
>
> Force removal is intended as a mechanism between the FM and the device
> and intended only when the host is unresponsive, out of sync, or
> otherwise broken.  Purposely ignore force removal events.
>
> Regions are made up of one or more devices which may be surfacing memory
> to the host.  Once all devices in a region have surfaced an extent the
> region can expose a corresponding extent for the user to consume.
> Without interleaving a device extent forms a 1:1 relationship with the
> region extent.  Immediately surface a region extent upon getting a
> device extent.
>
> Per the specification the device is allowed to offer or remove extents
> at any time.  However, anticipated use cases can expect extents to be
> offered, accepted, and removed in well defined chunks.
>
> Simplify extent tracking with the following restrictions.
>
>   1) Flag for removal any extent which overlaps a requested
>  release range.
>   2) Refuse the offer of extents which overlap already accepted
>  memory ranges.
>   3) Accept again a range which has already been accepted by the
>  host.  (It is likely the device has an error because it
>  should already know that this range was accepted.  But from
>  the host point of view it is safe to acknowledge that
>  acceptance again.)
>
> Management of the region extent devices must be synchronized with
> potential uses of the memory within the DAX layer.  Create region extent
> devices as children of the cxl_dax_region device such that the DAX
> region driver can co-drive them and synchronize with the DAX layer.
> Synchronization and management is handled in a subsequent patch.
>
> Process DCD events and create region devices.
>
> Signed-off-by: Navneet Singh 
> Co-developed-by: Ira Weiny 
> Signed-off-by: Ira Weiny 
>
> ---
> Changes:
> [iweiny: combine this with the extent surface patches to better show the
>  lifetime extent objects in review]
> [iweiny: clean up commit message.]
> [iweiny: move extent verification of the 'read extents on region
>  creation' to this patch]
> [iweiny: Provide for a common path for extent realization between an add
>event and adding existing extents.]
> [iweiny: Persist a check that an extent is within an endpoint decoder]
> [iweiny: reduce exported and non-static calls]
> [iweiny: use %par]
>
>   
>
> [Jonathan: implement the more bit with a simple algorithm which accepts
>  all extents it can.
>  Also include the response more bit to prevent payload
>  overflow]
> [Fan: Do not error if a contained extent is added.]
> [Jonathan: allocate ida after kzalloc]
> [iweiny: fix ida resource leak]
> [fan/djiang: remove unneeded memset]
> [djiang: fix indentation]
> [Jonathan: Fix indentation]
> [Jonathan/djbw: make tag a uuid]
> [djbw: create helper calc_hpa_range() straight away]
> [djbw: Allow for multiple cxled_extents per region_extent]
> [djbw: s/cxl_ed/cxled]
> [djbw: s/cxl_release_ed_extent/cxled_release_extent/]
> [djbw: s/reg_ext/region_extent/]
> [djbw: s/dc_extent/extent/]
> [Gregory/djbw: reject shared extents]
> [iweiny: predicate extent.c compile on CONFIG_CXL_REGION]
> ---
>  drivers/cxl/core/Makefile |   2 +-
>  drivers/cxl/core/core.h   |  13 ++
>  drivers/cxl/core/extent.c | 345 
> ++
>  drivers/cxl/core/mbox.c   | 268 ++-
>  drivers/cxl/core/region.c |   6 +
>  drivers/cxl/cxl.h |  52 ++-
>  drivers/cxl/cxlmem.h  |  26 
>  include/linux/cxl-event.h |  32 +
>  tools/testing/cxl/Kbuild  |   3 +-
>  9 files changed, 743 insertions(+), 4 deletions(-)
[...]
> +
> +static bool extents_contain(struct cxl_dax_region *cxlr_dax,
> + struct cxl_endpoint_decoder *cxled,
> + struct range *new_range)
> +{
> +   

Re: [PATCH v3 04/25] cxl/pci: Delay event buffer allocation

2024-09-02 Thread Li, Ming4
On 8/16/2024 10:44 PM, Ira Weiny wrote:
> The event buffer does not need to be allocated if something has failed in
> setting up event irq's.
>
> In prep for adjusting event configuration for DCD events move the buffer
> allocation to the end of the event configuration.
>
> Reviewed-by: Davidlohr Bueso 
> Reviewed-by: Dave Jiang 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Ira Weiny 
Reviewed-by: Li Ming 





Re: [PATCH v3 05/25] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)

2024-09-02 Thread Li, Ming4
On 8/16/2024 10:44 PM, ira.we...@intel.com wrote:
> From: Navneet Singh 
>
> Per the CXL 3.1 specification software must check the Command Effects
> Log (CEL) for dynamic capacity command support.
>
> Detect support for the DCD commands while reading the CEL, including:
>
>   Get DC Config
>   Get DC Extent List
>   Add DC Response
>   Release DC
>
> Signed-off-by: Navneet Singh 
> Reviewed-by: Jonathan Cameron 
> Reviewed-by: Fan Ni 
> Reviewed-by: Dave Jiang 
> Reviewed-by: Davidlohr Bueso 
> Co-developed-by: Ira Weiny 
> Signed-off-by: Ira Weiny 
>
Reviewed-by: Li Ming 




Re: [PATCH v3 07/25] cxl/core: Separate region mode from decoder mode

2024-09-02 Thread Li, Ming4
On 8/16/2024 10:44 PM, ira.we...@intel.com wrote:
> From: Navneet Singh 
>
> Until now region modes and decoder modes were equivalent in that both
> modes were either PMEM or RAM.  The addition of Dynamic
> Capacity partitions defines up to 8 DC partitions per device.
>
> The region mode is thus no longer equivalent to the endpoint decoder
> mode.  IOW the endpoint decoders may have modes of DC0-DC7 while the
> region mode is simply DC.
>
> Define a new region mode enumeration which applies to regions separate
> from the decoder mode.  Adjust the code to process these modes
> independently.
>
> There is no equal to decoder mode dead in region modes.  Avoid
> constructing regions with decoders which have been flagged as dead.
>
> Suggested-by: Jonathan Cameron 
> Signed-off-by: Navneet Singh 
> Co-developed-by: Ira Weiny 
> Signed-off-by: Ira Weiny 
Reviewed-by: Li Ming