On 12/16/2025 10:47 PM, Alison Schofield wrote:
> On Mon, Dec 15, 2025 at 03:36:23PM -0600, Ben Cheatham wrote:
>> v5 Changes:
>>      - Use setmntent()/getmntent() instead of open-coding getting the
>>        debugfs path (Dave)
>>      - Use correct return code for sysfs_read_attr() (Dave)
>>
>> v4 Changes:
>>      - Variable renames for clarity (Dave)
>>      - Use errno instead of rc for access() calls (Dave)
>>      - Check returns for snprintf() (Dave)
>>      - Add util_cxl_dport_filter() (Dave)
>>      - Replace printf() calls with log_info() (Dave)
>>      - Write correct value to debugfs during protocol error injection
>>      (BIT(error) vs. error)
>>
>> v3 Changes:
>>      - Rebase on v83 release
>>      - Fix whitespace errors (Alison)
>>
>> v2 Changes:
>>      - Make the --clear option of 'inject-error' its own command (Alison)
>>      - Debugfs is now found using the /proc/mount entry instead of
>>      providing the path using a --debugfs option
>>      - Man page added for 'clear-error'
>>      - Reword commit descriptions for clarity
>>
>> This series adds support for injecting CXL protocol (CXL.cache/mem)
>> errors[1] into CXL RCH Downstream ports and VH root ports[2] and
>> poison into CXL memory devices through the CXL debugfs. Errors are
>> injected using a new 'inject-error' command, while errors are reported
>> using a new cxl-list "-N"/"--injectable-errors" option. Device poison
>> can be cleared using the 'clear-error' command.
>>
>> The 'inject-error'/'clear-error' commands and "-N" option of cxl-list all
>> require access to the CXL driver's debugfs.
>>
>> The documentation for the new cxl-inject-error command shows both usage
>> and the possible device/error types, as well as how to retrieve them
>> using cxl-list. The documentation for cxl-list has also been updated to
>> show the usage of the new injectable errors option.
>>
>> [1]: ACPI v6.5 spec, section 18.6.4
>> [2]: ACPI v6.5 spec, table 18.31
> 
> Hi Ben,
> 
> I did a patch by patch review but saved up a couple of usability things
> to chat about here:
> 
> Consider removing the -N option and simply adding the new info to the
> default memdev and bus listings. Both are only accessing debugfs files and
> don't add much to the default listing, especially the memdev one.

That makes sense, I'll do that.

> 
> For the protocol errors, the cxl list entry is always present, even when 
> empty,
> but the poison_injectable attribute is only present when true. Should that be
> always present and true/false? Or maybe true/false/unknown, where unknown is
> the status when CONFIG_DEBUG_FS is not enabled? 
> And, maybe something similar for protocol errors?

It's probably fine to have them be always present when CONFIG_DEBUG_FS is 
enabled.
I think it would be cool to have them be removed when CONFIG_DEBUG_FS is 
disabled,
but I'm not sure how that would work. If I can do that, that's what I'll do.

Otherwise, I'll just set poison_injectable to false. That's what makes sense to 
me
since poison injection and error injection into non-RCH ports aren't available
unless the debugfs files are there (AFAIK). For the protocol errors, I'll 
either do
"None", "N/A", or leave it as an empty list.

> 
> Please add more strong 'danger' warnings to the poison inject and clear
> command man pages. See Documentation/ABI/testing/debugfs-cxl for the language
> we converged on when adding the debugfs attributes.

For sure, I'll take a look and update.

> 
> I have no test for the protocol errors. Is there anything you can
> share for that?

I don't have any at the moment. My first idea for one is to modify the CXL test 
module(s)
to replace /sys/kernel/debug/cxl/einj_inject with a dummy that prints a message 
to the dmesg
then check for that message in the test after running the command. That would 
somewhat match
the real use case, but doesn't test any actual error injection. If you think 
that would
be useful let me know and I'll put something together.

> 
> I'll send a separate reply asking if you to append an updated cxl-poison
> unit test patch to this set.

That sounds fine to me, I'll append it on v6.

Thanks,
Ben

> 
> --Alison

Reply via email to