alison.schofield@ wrote:
> From: Alison Schofield <[email protected]>
>
> Changes since v5:
> - Use a private parser for cxl_poison events. (Dan)
> Previously used the default parser and re-parsed per the cxl-list
> needs. Replace that with a private parsing method for cxl_poison.
> - Add a private context to support private parsers.
> - Add helpers to use with the cxl_poison parser.
> - cxl list json: drop nr_records field (Dan)
> - cxl list option: replace "poison" w "media-errors" (Dan)
> - cxl list json: replace "poison" w "media_errors" (Dan)
> - Link to v5:
> https://lore.kernel.org/linux-cxl/[email protected]/
>
>
> Begin cover letter:
>
> Add the option to add a memory devices poison list to the cxl-list
> json output. Offer the option by memdev and by region. Sample usage:
>
> # cxl list -m mem1 --media-errors
> [
> {
> "memdev":"mem1",
> "pmem_size":1073741824,
> "ram_size":1073741824,
> "serial":1,
> "numa_node":1,
> "host":"cxl_mem.1",
> "media_errors":[
> {
> "dpa":0,
> "dpa_length":64,
> "source":"Injected"
> },
> {
> "region":"region5",
It feels odd to list the region here. I feel like what really matters is
to list the endpoint decoder and if someone wants to associate endpoint
decoder to region, or endpoint decoder to memdev there are other queries
for that.
Then this format does not change between the "region" listing and
"memdev" listing, they both just output the endpoint decoder and leave
the rest to follow-on queries.
For example I expect operations software has already recorded the
endpoint decoder to region mapping, so when this data comes in the
endpoint decoder is a key to make that association. Otherwise:
cxl list -RT -e $endpoint_decoder
...can answer follow up questions about what is impacted by a given
media error record.
> "dpa":1073741824,
> "dpa_length":64,
The dpa_length is also the hpa_length, right? So maybe just call the
field "length".
> "hpa":1035355557888,
> "source":"Injected"
> },
> {
> "region":"region5",
> "dpa":1073745920,
> "dpa_length":64,
> "hpa":1035355566080,
> "source":"Injected"
This "source" field feels like debug data. In production nobody is going
to be doing poison injection, and if the administrator injected it then
its implied they know that status. Otherwise a media-error is a
media-error regardless of the source.
> }
> ]
> }
> ]
>
> # cxl list -r region5 --media-errors
> [
> {
> "region":"region5",
> "resource":1035355553792,
> "size":2147483648,
> "type":"pmem",
> "interleave_ways":2,
> "interleave_granularity":4096,
> "decode_state":"commit",
> "media_errors":[
> {
> "memdev":"mem1",
> "dpa":1073741824,
> "dpa_length":64,
> "hpa":1035355557888,
> "source":"Injected"
> },
> {
> "memdev":"mem1",
> "dpa":1073745920,
> "dpa_length":64,
> "hpa":1035355566080,
> "source":"Injected"
> }
> ]
> }
> ]
>
> Alison Schofield (7):
> libcxl: add interfaces for GET_POISON_LIST mailbox commands
> cxl: add an optional pid check to event parsing
> cxl/event_trace: add a private context for private parsers
> cxl/event_trace: add helpers get_field_[string|data]()
> cxl/list: collect and parse media_error records
> cxl/list: add --media-errors option to cxl list
> cxl/test: add cxl-poison.sh unit test
>
> Documentation/cxl/cxl-list.txt | 71 +++++++++++
> cxl/event_trace.c | 53 +++++++-
> cxl/event_trace.h | 9 +-
> cxl/filter.h | 3 +
> cxl/json.c | 218 +++++++++++++++++++++++++++++++++
> cxl/lib/libcxl.c | 47 +++++++
> cxl/lib/libcxl.sym | 6 +
> cxl/libcxl.h | 2 +
> cxl/list.c | 2 +
> test/cxl-poison.sh | 133 ++++++++++++++++++++
> test/meson.build | 2 +
> 11 files changed, 543 insertions(+), 3 deletions(-)
> create mode 100644 test/cxl-poison.sh
>
>
> base-commit: a871e6153b11fe63780b37cdcb1eb347b296095c
> --
> 2.37.3
>
>