release dynamic capacity extents

fan Tue, 30 Apr 2024 10:49:03 -0700

On Mon, Apr 29, 2024 at 09:58:42AM +0200, Markus Armbruster wrote:
> fan <nifan....@gmail.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:
> >> nifan....@gmail.com writes:
> 
> [...]
> 
> >> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> >> > index 4281726dec..2dcf03d973 100644
> >> > --- a/qapi/cxl.json
> >> > +++ b/qapi/cxl.json
> >> > @@ -361,3 +361,72 @@
> >> >  ##
> >> >  {'command': 'cxl-inject-correctable-error',
> >> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> >> > +
> >> > +##
> >> > +# @CXLDCExtentRecord:
> >> 
> >> Such traffic jams of capital letters are hard to read.  What about
> >> CxlDynamicCapacityExtent?
> >> 
> >> > +#
> >> > +# Record of a single extent to add/release
> >> 
> >> Suggest "A dynamic capacity extent."
> >> 
> >> > +#
> >> > +# @offset: offset to the start of the region where the extent to be 
> >> > operated
> >> 
> >> Blank line here, please.
> >> 
> >> 
> >> 
> >> > +# @len: length of the extent
> >> > +#
> >> > +# Since: 9.1
> >> > +##
> >> > +{ 'struct': 'CXLDCExtentRecord',
> >> > +  'data': {
> >> > +      'offset':'uint64',
> >> > +      'len': 'uint64'
> >> > +  }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-add-dynamic-capacity:
> >> > +#
> >> > +# Command to start add dynamic capacity extents flow. The device will
> >> > +# have to acknowledged the acceptance of the extents before they are 
> >> > usable.
> >> 
> >> This text needs work.  More on that at the end of my review.
> >
> > Yes. I will work on it for the next version once all the feedbacks
> > are collected and comments are resolved.
> >
> > See below.
> >
> >> 
> >> docs/devel/qapi-code-gen.rst:
> >> 
> >>     For legibility, wrap text paragraphs so every line is at most 70
> >>     characters long.
> >> 
> >>     Separate sentences with two spaces.
> >> 
> >> More elsewhere.
> >> 
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path
> >> 
> >> I'd prefer @qom-path, unless you can make a consistency argument for
> >> @path.
> >> 
> >> Sure the QOM path needs to be canonical?
> >> 
> >> If not, what about "path to the CXL dynamic capacity device in the QOM
> >> tree".  Intentionally close to existing descriptions of @qom-path
> >> elsewhere.
> >
> > From the same file, I saw "path" was used for other commands, like
> > "cxl-inject-memory-module-event", so I followed it.
> > DCD is nothing different from "type 3 device" expect it can dynamically
> > change capacity. 
> > Renaming it to "qom-path" is no problem for me, just want to make sure it
> > will not break the naming consistency.
> 
> Both @path and @qom-path are used (sadly).  @path is used for all kinds
> of paths, whereas @qom-path is only used for QOM paths.  That's why I
> prefer it.
> 
> However, you're making a compelling local consistency argument: cxl.json
> uses only @path.  Sticking to that makes sense.
> 
> >> > +# @hid: host id
> >> 
> >> @host-id, unless "HID" is established terminology in CXL DCD land.
> >
> > host-id works.
> >> 
> >> What is a host ID?
> >
> > It is an id identifying the host to which the capacity is being added.
> 
> How are these IDs assigned?


All the arguments passed to the command here are defined in CXL spec. I
will add reference to the spec.

Based on the spec, for LD-FAM (Fabric attached memory represented as
logical device), host id is the LD-ID of the host interface to which
the capacity is being added. LD-ID is a unique number (16-bit) assigned
to a host interface.

> 
> >> > +# @selection-policy: policy to use for selecting extents for adding 
> >> > capacity
> >> 
> >> Where are selection policies defined?
> >
> > It is defined in CXL specification: Specifies the policy to use for 
> > selecting
> > which extents comprise the added capacity
> 
> Include a reference to the spec here?
Wil do.
> 
> >> > +# @region-id: id of the region where the extent to add
> >> 
> >> Is "region ID" the established terminology in CXL DCD land?  Or is
> >> "region number" also used?  I'm asking because "ID" in this QEMU device
> >> context suggests a connection to a qdev ID.
> >> 
> >> If region number is fine, I'd rename to just @region, and rephrase the
> >> description to avoid "ID".  Perhaps "number of the region the extent is
> >> to be added to".  Not entirely happy with the phrasing, doesn't exactly
> >> roll off the tongue, but "where the extent to add" sounds worse to my
> >> ears.  Mind, I'm not a native speaker.
> >
> > Yes. region number is fine. Will rename it as "region"
> >
> >> 
> >> > +# @tag: Context field
> >> 
> >> What is this about?
> >
> > Based on the specification, it is "Context field utilized by implementations
> > that make use of the Dynamic Capacity feature.". Basically, it is a
> > string (label) attached to an dynamic capacity extent so we can achieve
> > specific purpose, like identifying or grouping extents.
> 
> Include a reference to the spec here?
Will do.
> 
> >> > +# @extents: Extents to add
> >> 
> >> Blank lines between argument descriptions, please.
> >> 
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-add-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'selection-policy': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> > +
> >> > +##
> >> > +# @cxl-release-dynamic-capacity:
> >> > +#
> >> > +# Command to start release dynamic capacity extents flow. The host will
> >> > +# need to respond to indicate that it has released the capacity before 
> >> > it
> >> > +# is made unavailable for read and write and can be re-added.
> >> 
> >> This text needs work.  More on that at the end of my review.
> >
> > Will do.
> >
> >> 
> >> > +#
> >> > +# @path: CXL DCD canonical QOM path
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >> 
> >> > +# @hid: host id
> >> 
> >> Likewise.
> >> 
> >> > +# @flags: bit[3:0] for removal policy, bit[4] for forced removal, 
> >> > bit[5] for
> >> > +#     sanitize on release, bit[7:6] reserved
> >> 
> >> Where are these flags defined?
> >
> > Defined in the CXL specification, it defines the release behaviour.
> 
> Include a reference to the spec here?
Will do.
> 
> Is the numeric encoding of flags appropriate?
> 
> In general, we prefer symbolic encodings.  Numeric encodings can make
> sense when
> 
> • the encoding is stable, and
> 
> • QEMU doesn't need to decode it, only pass it on to something else, and
> 
> • both the QMP client and the "something else" prefer a numeric
>   encoding.
The encoding is from the specification, and we do not invent anything
here. It is stable and all the updates to the spec need to be backward
compatible.
> 
> >> > +# @region-id: id of the region where the extent to release
> >> 
> >> My comment on cxl-add-dynamic-capacity applies.
> >> 
> >> > +# @tag: Context field
> >> 
> >> Likewise.
> >> 
> >> > +# @extents: Extents to release
> >> > +#
> >> > +# Since : 9.1
> >> > +##
> >> > +{ 'command': 'cxl-release-dynamic-capacity',
> >> > +  'data': { 'path': 'str',
> >> > +            'hid': 'uint16',
> >> > +            'flags': 'uint8',
> >> > +            'region-id': 'uint8',
> >> > +            'tag': 'str',
> >> > +            'extents': [ 'CXLDCExtentRecord' ]
> >> > +           }
> >> > +}
> >> 
> >> During review of v5, you wrote:
> >> 
> >>     For add command, the host will send a mailbox command to response to
> >>     the add request to the device to indicate whether it accepts the add
> >>     capacity offer or not.
> >>     
> >>     For release command, the host send a mailbox command (not always a
> >>     response since the host can proactively release capacity if it does
> >>     not need it any more) to device to ask device release the capacity.
> >> 
> >> Can you briefly sketch the protocol?  Peers and messages involved.
> >> Possibly as a state diagram.
> >
> > Need to think about it. If we can polish the text nicely, maybe the
> > sketch is not needed. My concern is that the sketch may
> > introduce unwanted complexity as we expose too much details. The two
> > commands provide ways to add/release dynamic capacity to/from a host,
> > that is all. All the other information, like what the host will do, or
> > how the device will react, are consequence of the command, not sure
> > whether we want to include here.
> 
> The protocol sketch is for me, not necessarily the doc comment.  I'd
> like to understand at high level how this stuff works, because only then
> can I meaningfully review the docs.

--------------------------------
For add command, saying a user sends a request to FM to ask to add
extent A of the device (managed by FM) to host 0.
The function cxl-add-dynamic-capacity simulates what FM needs to do.
1. Verify extent A is valid (behaviour defined by the spec), return
error if not; otherwise,
2. Add a record to the device's event log (indicating the intent to
add extent A to host 0), update device internal extent tracking status,
signal an interrupt to host 0;
(The above step 1 & 2 are performed in the QMP interface, following
operations are QMP irrelevant, only host and device involved.)
3. Once the interrupt is received, host 0 fetch the event record from
the device's event log through some mailbox command (out of scope
of this patch series).
4. Host 0 decides whether it accepts extent A or not. Whether accept or
reject, host needs to send a response (add-response mailbox command) to
the device so the device can update its internal extent tracking
status accordingly.
The device return a value to the host showing whether the response is
successful or failed.
5. Based on the mailbox command return value, the host process
accordingly.
6. The host sends a mailbox command to the device to clear the event
record in the device's event log. 

---------------------------------
For release command, saying a user sends a request to FM to ask host 0
to release extent A and return it back to the device (managed by FM).

The function cxl-release-dynamic-capacity simulates what FM needs to do.
1. Verify extent A is valid (defined by the spec), return error if not;
otherwise,
2. Add a record to the event log (indicating the intent to
release extent A from host 0), signal an interrupt to host 0;
(The above step 1 & 2 are performed in the QMP interface, following
operations are QMP irrelevant, only host and device involved.
3. Once the interrupt is received, host 0 fetch the event record from
the device's event log through some mailbox command (out of scope
of this patch series).
4. Host 0 decides whether it can release extent A or not. Whether can or
cannot release, host needs to send a release (mailbox command) to the device
so the device can update its internal extent tracking status accordingly.
The device returns a value to host 0 showing whether the release is
successful or failed.
5. Based on the returned value, the host process accordingly.
6. The host sends mailbox command to clear the event record in the
device's event log. 

For release command, it is more complicated. Based on the release flag
passed to FM, FM can behaviour differently. For example, if the
forced-removal flag is set, FM can directly get the extent back from a
host for other uses without waiting for the host to send command to the
device. For the above step 2, their may be not event record to the event
log (no supported in this patch series yet).

Also, for the release interface here, it simulates FM initializes the
release request.
There is another case where the host can proactively release extents it
do not need any more back to device. However, this case is out of the
scope of this release interface.

Hope the above text helps a little for the context here.
Let me know if further clarification is needed.

Thanks,
Fan



> 
> > @Jonathan, Any thoughts on this?
> 
> Thanks!
>

Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

Reply via email to