On Mon, 25 Mar 2024 12:02:27 -0700
nifan....@gmail.com wrote:

> From: Fan Ni <fan...@samsung.com>
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "region-id": 0,
>       "extents": [
>       {
>           "offset": 0,
>           "len": 134217728
>       },
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }

Hi Fan,

I talk more on this inline, but to me this interface takes multiple extents
so that we can treat them as a single 'offer' of capacity. That is they
should be linked in the event log with the more flag and the host should
have to handle them in one go (I known Ira and Navneet's code doesn't handle
this yet, but that doesn't mean QEMU shouldn't).

Alternative for now would be to only support a single entry.  Keep the
interface defined to take multiple entries but reject it at runtime.

I don't want to end up with a more complex interface in the end just
because we allowed this form to not set the MORE flag today.
We will need this to do tagged handling and ultimately sharing, so good
to get it right from the start.

For tagged handling I think the right option is to have the tag alongside
region-id not in the individual extents.  That way the interface is naturally
used to generate the right description to the host.

>       ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>       "path": "/machine/peripheral/cxl-dcd0",
>       "region-id": 0,
>       "extents": [
>       {
>           "offset": 134217728,
>           "len": 134217728
>       }
>       ]
>   }
> }
> 
> Signed-off-by: Fan Ni <fan...@samsung.com>



>          /* to-be-added range should not overlap with range already accepted 
> */
>          QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
> @@ -1585,9 +1586,13 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct 
> cxl_cmd *cmd,
>      CXLDCExtentList *extent_list = &ct3d->dc.extents;
>      uint32_t i;
>      uint64_t dpa, len;
> +    CXLDCExtent *ent;
>      CXLRetCode ret;
>  
>      if (in->num_entries_updated == 0) {
> +        /* Always remove the first pending extent when response received. */
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
>          return CXL_MBOX_SUCCESS;
>      }
>  
> @@ -1604,6 +1609,8 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct 
> cxl_cmd *cmd,
>  
>      ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
>      if (ret != CXL_MBOX_SUCCESS) {
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);

Ah this deals with the todo I suggest you add to the earlier patch.
I'd not mind so much if you hadn't been so thorough on other todo notes ;)
Add one in the earlier patch and get rid of ti here like you do below.

However as I note below I think we need to handle these as groups of extents
not single extents. That way we keep an 'offered' set offered at the same time 
by
a single command (and expose to host using the more flag) together and reject
them on mass.


>          return ret;
>      }
>  
> @@ -1613,10 +1620,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct 
> cxl_cmd *cmd,
>  
>          cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>          ct3d->dc.total_extent_count += 1;
> -        /*
> -         * TODO: we will add a pending extent list based on event log record
> -         * and process the list according here.
> -         */
> +
> +        ent = QTAILQ_FIRST(&ct3d->dc.extents_pending);
> +        cxl_remove_extent_from_extent_list(&ct3d->dc.extents_pending, ent);
>      }
>  
>      return CXL_MBOX_SUCCESS;
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 951bd79a82..74cb64e843 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c

>  
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> @@ -1449,7 +1454,8 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
>          return CXL_EVENT_TYPE_FAIL;
>      case CXL_EVENT_LOG_FATAL:
>          return CXL_EVENT_TYPE_FATAL;
> -/* DCD not yet supported */

Drop the comment but don't add the code.  We are handling DCD differently
from other events, so this code should never deal with it.

> +    case CXL_EVENT_LOG_DYNCAP:
> +        return CXL_EVENT_TYPE_DYNAMIC_CAP;
>      default:
>          return -EINVAL;
>      }
> @@ -1700,6 +1706,250 @@ void qmp_cxl_inject_memory_module_event(const char 
> *path, CxlEventLog log,
>      }
>  }

> +/*
> + * Check whether the range [dpa, dpa + len -1] has overlaps with extents in

space after - (just looks odd otherwise)

> + * the list.
> + * Return value: return true if has overlaps; otherwise, return false
> + */
> +static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
> +                                           uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_overlaps_range(&range1, &range2)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * Check whether the range [dpa, dpa + len -1] is contained by extents in 

space after -

> + * the list.
> + * Will check multiple extents containment once superset release is added.
> + * Return value: return true if range is contained; otherwise, return false
> + */
> +bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
> +                                    uint64_t dpa, uint64_t len)
> +{
> +    CXLDCExtent *ent;
> +    Range range1, range2;
> +
> +    if (!list) {
> +        return false;
> +    }
> +
> +    range_init_nofail(&range1, dpa, len);
> +    QTAILQ_FOREACH(ent, list, node) {
> +        range_init_nofail(&range2, ent->start_dpa, ent->len);
> +        if (range_contains_range(&range2, &range1)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/*
> + * The main function to process dynamic capacity event. Currently DC extents
> + * add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog 
> log,

As below. Don't pass in a CxlEventLog.  Whilst some infrastructure is shared
with other event logs, we don't want to accidentally enable other events
being added to the DC event log.

> +                                             CXLDCEventType type, uint16_t 
> hid,
> +                                             uint8_t rid,
> +                                             CXLDCExtentRecordList *records,
> +                                             Error **errp)
> +{
> +    Object *obj;
> +    CXLEventDynamicCapacity dCap = {};
> +    CXLEventRecordHdr *hdr = &dCap.hdr;
> +    CXLType3Dev *dcd;
> +    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +    uint32_t num_extents = 0;
> +    CXLDCExtentRecordList *list;
> +    g_autofree CXLDCExtentRaw *extents = NULL;
> +    uint8_t enc_log;
> +    uint64_t dpa, offset, len, block_size;
> +    int i, rc;
> +    g_autofree unsigned long *blk_bitmap = NULL;
> +
> +    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
> +    if (!obj) {
> +        error_setg(errp, "Unable to resolve CXL type 3 device");
> +        return;
> +    }
> +
> +    dcd = CXL_TYPE3(obj);
> +    if (!dcd->dc.num_regions) {
> +        error_setg(errp, "No dynamic capacity support from the device");
> +        return;
> +    }
> +
> +    rc = ct3d_qmp_cxl_event_log_enc(log);

enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP; always so don't look it up.

> +    if (rc < 0) {
> +        error_setg(errp, "Unhandled error log type");
> +        return;
> +    }
> +    enc_log = rc;
> +
> +    if (rid >= dcd->dc.num_regions) {
> +        error_setg(errp, "region id is too large");
> +        return;
> +    }
> +    block_size = dcd->dc.regions[rid].block_size;
> +    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
> +
> +    /* Sanity check and count the extents */
> +    list = records;
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = offset + dcd->dc.regions[rid].base;
> +
> +        if (len == 0) {
> +            error_setg(errp, "extent with 0 length is not allowed");
> +            return;
> +        }
> +
> +        if (offset % block_size || len % block_size) {
> +            error_setg(errp, "dpa or len is not aligned to region block 
> size");
> +            return;
> +        }
> +
> +        if (offset + len > dcd->dc.regions[rid].len) {
> +            error_setg(errp, "extent range is beyond the region end");
> +            return;
> +        }
> +
> +        /* No duplicate or overlapped extents are allowed */
> +        if (test_any_bits_set(blk_bitmap, offset / block_size,
> +                              len / block_size)) {
> +            error_setg(errp, "duplicate or overlapped extents are detected");
> +            return;
> +        }
> +        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +        num_extents++;
> +        if (type == DC_EVENT_RELEASE_CAPACITY) {
> +            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents_pending,
> +                                               dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with pending DPA range");
> +                return;
> +            }
> +            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents,
> +                                                dpa, len)) {
> +                error_setg(errp,
> +                           "cannot release extent with non-existing DPA 
> range");
> +                return;
> +            }
> +        }
> +        list = list->next;
> +    }
> +    if (num_extents == 0) {

We can just check if there is a first one.  That check can be done before
counting them and is probably a little more elegant than leaving it to down 
here.
I'm not sure we can pass in an empty list but if we can (easy to poke interface
and check) then I assume records == NULL. 

> +        error_setg(errp, "no valid extents to send to process");
> +        return;
> +    }
> +
> +    /* Create extent list for event being passed to host */
> +    i = 0;
> +    list = records;
> +    extents = g_new0(CXLDCExtentRaw, num_extents);
> +    while (list) {
> +        offset = list->value->offset;
> +        len = list->value->len;
> +        dpa = dcd->dc.regions[rid].base + offset;
> +
> +        extents[i].start_dpa = dpa;
> +        extents[i].len = len;
> +        memset(extents[i].tag, 0, 0x10);
> +        extents[i].shared_seq = 0;
> +        list = list->next;
> +        i++;
> +    }
> +
> +    /*
> +     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> +     *
> +     * All Dynamic Capacity event records shall set the Event Record Severity
> +     * field in the Common Event Record Format to Informational Event. All
> +     * Dynamic Capacity related events shall be logged in the Dynamic 
> Capacity
> +     * Event Log.
> +     */
> +    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
> +                            cxl_device_get_timestamp(&dcd->cxl_dstate));
> +
> +    dCap.type = type;
> +    /* FIXME: for now, validity flag is cleared */
> +    dCap.validity_flags = 0;
> +    stw_le_p(&dCap.host_id, hid);
> +    /* only valid for DC_REGION_CONFIG_UPDATED event */
> +    dCap.updated_region_id = 0;
> +    /*
> +     * FIXME: for now, the "More" flag is cleared as there is only one
> +     * extent associating with each record and tag-based release is
> +     * not supported.

This is misleading by my understanding of the specification.
More isn't directly related to tags (though it is necessary for some
flows with tags, when sharing is enabled anyway).
The reference to record also isn't that relevant. The idea is you set
it for all but the last record pushed to the event log (from a given
action from an FM).

The whole reason to have a multi extent injection interface is to set
the more flag to indicate that the OS needs to treat a bunch of extents
as one 'offer' of capacity.  So a rejection from the OS needs to take
out 'all those records'.  The proposed linux code will currently reject
all by the first extent (I moaned about that yesterday). 

It is fine to not support this in the current code, but then I would check
the number of extents and reject any multi extent commands until we
do support it.

Ultimately I want a qmp command with more than one extent to mean
they are one 'offer' of capacity and must be handled as such by
the OS.  I.e. it can't reply with multiple unrelated acceptance
or reject replies.

On the add side this is easy to support, the fiddly bit is if the
OS rejects some or all of the capacity and you then need to
take out all the extents offered that it hasn't accepted in it's reply.

Pending list will need to maintain that association.
Maybe the easiest way is to have pending list be a list of sublists?
That way each sublist is handled in one go and any non accepted extents
in that sub list are dropped.

 
> +     */
> +    dCap.flags = 0;
> +    for (i = 0; i < num_extents; i++) {
> +        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> +               sizeof(CXLDCExtentRaw));
> +
> +        if (type == DC_EVENT_ADD_CAPACITY) {
> +            cxl_insert_extent_to_extent_list(&dcd->dc.extents_pending,
> +                                             extents[i].start_dpa,
> +                                             extents[i].len,
> +                                             extents[i].tag,
> +                                             extents[i].shared_seq);
> +        }
> +
> +        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
> +                             (CXLEventRecordRaw *)&dCap)) {
> +            cxl_event_irq_assert(dcd);
> +        }
> +    }
> +}
> +
> +void qmp_cxl_add_dynamic_capacity(const char *path, uint8_t region_id,
> +                                  CXLDCExtentRecordList  *records,
> +                                  Error **errp)
> +{
> +   qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,

Drop passing in the log, it doesn't make sense given these events only occur
on that log and we can hard code it in the function.

> +                                    DC_EVENT_ADD_CAPACITY, 0,
> +                                    region_id, records, errp);
> +}
> +
> +void qmp_cxl_release_dynamic_capacity(const char *path, uint8_t region_id,
> +                                      CXLDCExtentRecordList  *records,
> +                                      Error **errp)
> +{
> +    qmp_cxl_process_dynamic_capacity(path, CXL_EVENT_LOG_DYNCAP,
> +                                     DC_EVENT_RELEASE_CAPACITY, 0,
> +                                     region_id, records, errp);
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {


> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index df3511e91b..b84063d9f4 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -494,6 +494,7 @@ struct CXLType3Dev {
>           */
>          uint64_t total_capacity; /* 256M aligned */
>          CXLDCExtentList extents;
> +        CXLDCExtentList extents_pending;
>          uint32_t total_extent_count;
>          uint32_t ext_list_gen_seq;



>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 8cc4c72fa9..2645004666 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -19,13 +19,16 @@
>  #
>  # @fatal: Fatal Event Log
>  #
> +# @dyncap: Dynamic Capacity Event Log
> +#
>  # Since: 8.1
>  ##
>  { 'enum': 'CxlEventLog',
>    'data': ['informational',
>             'warning',
>             'failure',
> -           'fatal']
> +           'fatal',
> +           'dyncap']

Does this have the side effect of letting us inject error events
onto the dynamic capacity log? 

>   }
>  
>  ##
> @@ -361,3 +364,59 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
...

> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to start add dynamic capacity extents flow. The device will
> +# have to acknowledged the acceptance of the extents before they are usable.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to add
> +# @extents: Extents to add
> +#
> +# Since : 9.0

9.1

> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'region-id': 'uint8',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to start release dynamic capacity extents flow. The host will
> +# need to respond to indicate that it has released the capacity before it
> +# is made unavailable for read and write and can be re-added.
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: id of the region where the extent to release
> +# @extents: Extents to release
> +#
> +# Since : 9.0

9.1

> +##
> +{ 'command': 'cxl-release-dynamic-capacity',
> +  'data': { 'path': 'str',
> +            'region-id': 'uint8',
> +            'extents': [ 'CXLDCExtentRecord' ]
> +           }
> +}


Reply via email to