On 10/8/2024 7:16 AM, ira.we...@intel.com wrote:
> From: Navneet Singh <navneet.si...@intel.com>
>
> A dynamic capacity device (DCD) sends events to signal the host for
> changes in the availability of Dynamic Capacity (DC) memory.  These
> events contain extents describing a DPA range and meta data for memory
> to be added or removed.  Events may be sent from the device at any time.
>
> Three types of events can be signaled, Add, Release, and Force Release.
>
> On add, the host may accept or reject the memory being offered.  If no
> region exists, or the extent is invalid, the extent should be rejected.
> Add extent events may be grouped by a 'more' bit which indicates those
> extents should be processed as a group.
>
> On remove, the host can delay the response until the host is safely not
> using the memory.  If no region exists the release can be sent
> immediately.  The host may also release extents (or partial extents) at
> any time.  Thus the 'more' bit grouping of release events is of less
> value and can be ignored in favor of sending multiple release capacity
> responses for groups of release events.
>
> Force removal is intended as a mechanism between the FM and the device
> and intended only when the host is unresponsive, out of sync, or
> otherwise broken.  Purposely ignore force removal events.
>
> Regions are made up of one or more devices which may be surfacing memory
> to the host.  Once all devices in a region have surfaced an extent the
> region can expose a corresponding extent for the user to consume.
> Without interleaving a device extent forms a 1:1 relationship with the
> region extent.  Immediately surface a region extent upon getting a
> device extent.
>
> Per the specification the device is allowed to offer or remove extents
> at any time.  However, anticipated use cases can expect extents to be
> offered, accepted, and removed in well defined chunks.
>
> Simplify extent tracking with the following restrictions.
>
>       1) Flag for removal any extent which overlaps a requested
>          release range.
>       2) Refuse the offer of extents which overlap already accepted
>          memory ranges.
>       3) Accept again a range which has already been accepted by the
>          host.  Eating duplicates serves three purposes.  First, this
>          simplifies the code if the device should get out of sync with
>          the host.  And it should be safe to acknowledge the extent
>          again.  Second, this simplifies the code to process existing
>          extents if the extent list should change while the extent
>          list is being read.  Third, duplicates for a given region
>          which are seen during a race between the hardware surfacing
>          an extent and the cxl dax driver scanning for existing
>          extents will be ignored.
>
>          NOTE: Processing existing extents is done in a later patch.
>
> Management of the region extent devices must be synchronized with
> potential uses of the memory within the DAX layer.  Create region extent
> devices as children of the cxl_dax_region device such that the DAX
> region driver can co-drive them and synchronize with the DAX layer.
> Synchronization and management is handled in a subsequent patch.
>
> Tag support within the DAX layer is not yet supported.  To maintain
> compatibility legacy DAX/region processing only tags with a value of 0
> are allowed.  This defines existing DAX devices as having a 0 tag which
> makes the most logical sense as a default.
>
> Process DCD events and create region devices.
>
> Signed-off-by: Navneet Singh <navneet.si...@intel.com>
> Co-developed-by: Ira Weiny <ira.we...@intel.com>
> Signed-off-by: Ira Weiny <ira.we...@intel.com>
>
Hi Ira,

I guess you missed my comments for V3, I comment it again for this patch.

> +static bool extents_contain(struct cxl_dax_region *cxlr_dax,
> +                         struct cxl_endpoint_decoder *cxled,
> +                         struct range *new_range)
> +{
> +     struct device *extent_device;
> +     struct match_data md = {
> +             .cxled = cxled,
> +             .new_range = new_range,
> +     };
> +
> +     extent_device = device_find_child(&cxlr_dax->dev, &md, match_contains);
> +     if (!extent_device)
> +             return false;
> +
> +     put_device(extent_device);
could use __free(put_device) to drop this 'put_device(extent_device)'
> +     return true;
> +}
[...]
> +static bool extents_overlap(struct cxl_dax_region *cxlr_dax,
> +                         struct cxl_endpoint_decoder *cxled,
> +                         struct range *new_range)
> +{
> +     struct device *extent_device;
> +     struct match_data md = {
> +             .cxled = cxled,
> +             .new_range = new_range,
> +     };
> +
> +     extent_device = device_find_child(&cxlr_dax->dev, &md, match_overlaps);
> +     if (!extent_device)
> +             return false;
> +
> +     put_device(extent_device);
Same as above.
> +     return true;
> +}
> +
[...]
> +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode,
> +                             struct xarray *extent_array, int cnt)
> +{
> +     struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> +     struct cxl_mbox_dc_response *p;
> +     struct cxl_mbox_cmd mbox_cmd;
> +     struct cxl_extent *extent;
> +     unsigned long index;
> +     u32 pl_index;
> +     int rc;
> +
> +     size_t pl_size = struct_size(p, extent_list, cnt);
> +     u32 max_extents = cnt;
> +
> +     /* May have to use more bit on response. */
> +     if (pl_size > cxl_mbox->payload_size) {
> +             max_extents = (cxl_mbox->payload_size - sizeof(*p)) /
> +                           sizeof(struct updated_extent_list);
> +             pl_size = struct_size(p, extent_list, max_extents);
> +     }
> +
> +     struct cxl_mbox_dc_response *response __free(kfree) =
> +                                             kzalloc(pl_size, GFP_KERNEL);
> +     if (!response)
> +             return -ENOMEM;
> +
> +     pl_index = 0;
> +     xa_for_each(extent_array, index, extent) {
> +
> +             response->extent_list[pl_index].dpa_start = extent->start_dpa;
> +             response->extent_list[pl_index].length = extent->length;
> +             pl_index++;
> +             response->extent_list_size = cpu_to_le32(pl_index);
> +
> +             if (pl_index == max_extents) {
> +                     mbox_cmd = (struct cxl_mbox_cmd) {
> +                             .opcode = opcode,
> +                             .size_in = struct_size(response, extent_list,
> +                                                    pl_index),
> +                             .payload_in = response,
> +                     };
> +
> +                     response->flags = 0;
> +                     if (pl_index < cnt)
> +                             response->flags &= CXL_DCD_EVENT_MORE;

It should be 'response->flags |= CXL_DCD_EVENT_MORE' here.

Another issue is if 'cnt' is N times bigger than 'max_extents'(e,g. cnt=20, 
max_extents=10). all responses will be sent in this xa_for_each(), and 
CXL_DCD_EVENT_MORE will be set in the last response but it should not be set in 
these cases.


> +
> +                     rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> +                     if (rc)
> +                             return rc;
> +                     pl_index = 0;
> +             }
> +     }
> +
> +     if (cnt == 0 || pl_index) {
> +             mbox_cmd = (struct cxl_mbox_cmd) {
> +                     .opcode = opcode,
> +                     .size_in = struct_size(response, extent_list,
> +                                            pl_index),
> +                     .payload_in = response,
> +             };
> +
> +             response->flags = 0;
> +             rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> +             if (rc)
> +                     return rc;
> +     }
> +
> +     return 0;
> +}
> +


Reply via email to