On Mon, Jun 02, 2025 at 03:56:26PM -0500, Ben Cheatham wrote:
> v2 Changes:
> - Make the --clear option of 'inject-error' its own command (Alison)
> - Debugfs is now found using the /proc/mount entry instead of
> providing the path using a --debugfs option
> - Man page added for 'clear-error'
> - Reword commit descriptions for clarity
>
> This series adds support for injecting CXL protocol (CXL.cache/mem)
> errors[1] into CXL RCH Downstream ports and VH root ports[2] and
> poison into CXL memory devices through the CXL debugfs. Errors are
> injected using a new 'inject-error' command, while errors are reported
> using a new cxl-list "-N"/"--injectable-errors" option. Device poison
> can be cleared using the 'clear-error' command.
>
> The 'inject-error'/'clear-error' commands and "-N" option of cxl-list all
> require access to the CXL driver's debugfs.
>
> The documentation for the new cxl-inject-error command shows both usage
> and the possible device/error types, as well as how to retrieve them
> using cxl-list. The documentation for cxl-list has also been updated to
> show the usage of the new injectable errors option.
>
> [1]: ACPI v6.5 spec, section 18.6.4
> [2]: ACPI v6.5 spec, table 18.31
>
> --
>
> Alison, I reached out to Junhyeok about his poison injection series but
> never heard back, so I've just continued with my original plans for a
> v2.
>
> Quick note: My testing setup is screwed up at the moment, so this
> revision is untested. I'll try to get it fixed for the next revision.
I applied this to v82 (needs a sync up in libcxl.sym) and ran cxl-poison unit
test using your new cxl-cli cmds instead of writing to debugfs directly.[1]
Works for me. Just thought I'd share that as proof of life until I review it
completely.
Adding more test cases to cxl-poison.sh makes sense for the device poison.
Wondering about the protocol errors. How do we test those?
[1] diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
index 6ed890bc666c..41ab670b1094 100644
--- a/test/cxl-poison.sh
+++ b/test/cxl-poison.sh
@@ -68,7 +68,8 @@ inject_poison_sysfs()
memdev="$1"
addr="$2"
- echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
+# echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
+ $CXL inject-error "$memdev" -t poison -a "$addr"
}
clear_poison_sysfs()
@@ -76,7 +77,8 @@ clear_poison_sysfs()
memdev="$1"
addr="$2"
- echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
+# echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
+ $CXL clear-error "$memdev" -a "$addr"
}
While applying this: Documentation: Add docs for inject/clear-error commands
Got these whitespace complaints:
234: new blank line at EOF
158: space before tab in indent.
"offset":"0x1000",
159: space before tab in indent.
"length":64,
160: space before tab in indent.
"source":"Injected"
-- snip