Re: [RFC PATCH 0/6] CCIX Protocol Error reporting
On Wed, 3 Jul 2019 14:08:49 +0100 Jonathan Cameron wrote: > On Wed, 3 Jul 2019 10:28:08 +0100 > James Morse wrote: > > > Hi Jonathan, > > > > (Beware: this CCIX thing is new to me, some of this will be questions on > > the scope of CCIX) > > Sure and welcome to our crazy world. > > > > > On 06/06/2019 13:36, Jonathan Cameron wrote: > > > UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error > > > Records > > > (PER). www.uefi.org > > > > > These include Protocol Error Record logs which are defined in the > > > CCIX 1.0 Base Specification www.ccixconsortium.com. > > > > I can't find a public version of this spec, and I don't have an account to > > look at the > > private one. (It looks like I could get one but I'm guessing there is an > > NDA between me > > and the document text!) > > > > Will it ever be public? > > +cc A few people who might be able to help. Evaluation version of the CCIX 1.0a base specification now available, (though there is a form to complete and license agreement).. https://www.ccixconsortium.com/ccix-library/download-form/ > > > > > > > > Handling of coherency protocol errors is complex and how Linux does this > > > will take some time to evolve. For now, fatal errors are handled via the > > > usual means and everything else is reported. > > > > > There are 6 types of error defined, covering: > > > * Memory errors > > > > How does this interact with the vanilla Memory-Error CPER records? Do we > > always get them > > as a pair? (that struct definition looks familiar...) > > The snag is that the standard memory error doesn't provide quite enough > information to > actually identify a component once you have buses involved. > The actual memory specific bit is nearly identical, but the header section > isn't > as it gives the CCIX agent ID (see below given you asked a more specific > question > on that). > > We did discuss this in a lot of detail and currently the answer is that the > UEFI spec + ccix allows you to report only the ccix error if you want to. > > Note that I believe there is an OSC bit that I'm not currently setting so > right now no firmware compliant with the CCIX SW Guide (information available > from your local Charles) should actually be generating these anyway because > the OS hasn't said it can handle them. I took the view I wanted to get > round 2 which actually hooks up handling rather than just reporting before > setting that bit. > > > Is memory described like this never present in the UEFI memory map? (so we > > never > > accidentally use it as host memory, for processes etc) > Yes it definitely can. If you want to you can have all your RAM attached by > CCIX. > (probably not a good idea performance wise :) > > More likely in the short term is SCM using something like the hmem patches > to hotplug it as if it were RAM. > https://patchwork.kernel.org/cover/10982677/ > > In order to support legacy OS, it wouldn't be unusual to have a bios switch > to make SCM on CCIX just appear as normal memory in the first place > (if that was true though it is highly unlikely you'd use the CCIX > PER messages.) > > > > > ~ > > > > include/linux/memremap.h has: > > | * Device memory that is cache coherent from device and CPU point of > > view. This > > | * is use on platform that have an advance system bus (like CAPI or > > CCIX). A > > | * driver can hotplug the device memory using ZONE_DEVICE and with that > > memory > > | * type. Any page of a process can be migrated to such memory. However no > > one > > | * should be allow to pin such memory so that it can always be evicted. > > > > Is this the same CCIX? > > Not always, but sometimes. Typically if it's on an accelerator (GPU etc) then > one software model is to do that. There are others though. Until we have > more devices out there, its not even clear that model with dominate. > > It's a much argued about topic. My personal thoughts are you either have: > 1. An accelerator that was designed from the ground up assuming that it had >tight control of its own memory layout etc - someone bolted on coherency >later. This is the current GPU model and fits with ZONE_DEVICE etc. GPUs >can and do require shuffling memory. > > 2. An accelerator designed to run without it's own memory. We just decided >to move some of the host memory to it's end of the bus for size or >efficiency reasons. As it's reasonably happy with fragmentation and >the assumption that someone else is doing it's memory management for it >(the OS) you can run in an SVM type mode, with a bit of hinting by >a userspace app on where it would prefer the accelerator accessed memory >comes from. Those have no more constraints than any other memory. If >you want to pin for RDMA for example, its fine. > > More importantly, it can just be normal DDR on a card with no extra functions. > In those cases it's just a memory
Re: [RFC PATCH 0/6] CCIX Protocol Error reporting
On Wed, 3 Jul 2019 10:28:08 +0100 James Morse wrote: > Hi Jonathan, > > (Beware: this CCIX thing is new to me, some of this will be questions on the > scope of CCIX) Sure and welcome to our crazy world. > > On 06/06/2019 13:36, Jonathan Cameron wrote: > > UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error > > Records > > (PER). www.uefi.org > > > These include Protocol Error Record logs which are defined in the > > CCIX 1.0 Base Specification www.ccixconsortium.com. > > I can't find a public version of this spec, and I don't have an account to > look at the > private one. (It looks like I could get one but I'm guessing there is an NDA > between me > and the document text!) > > Will it ever be public? +cc A few people who might be able to help. > > > > Handling of coherency protocol errors is complex and how Linux does this > > will take some time to evolve. For now, fatal errors are handled via the > > usual means and everything else is reported. > > > There are 6 types of error defined, covering: > > * Memory errors > > How does this interact with the vanilla Memory-Error CPER records? Do we > always get them > as a pair? (that struct definition looks familiar...) The snag is that the standard memory error doesn't provide quite enough information to actually identify a component once you have buses involved. The actual memory specific bit is nearly identical, but the header section isn't as it gives the CCIX agent ID (see below given you asked a more specific question on that). We did discuss this in a lot of detail and currently the answer is that the UEFI spec + ccix allows you to report only the ccix error if you want to. Note that I believe there is an OSC bit that I'm not currently setting so right now no firmware compliant with the CCIX SW Guide (information available from your local Charles) should actually be generating these anyway because the OS hasn't said it can handle them. I took the view I wanted to get round 2 which actually hooks up handling rather than just reporting before setting that bit. > Is memory described like this never present in the UEFI memory map? (so we > never > accidentally use it as host memory, for processes etc) Yes it definitely can. If you want to you can have all your RAM attached by CCIX. (probably not a good idea performance wise :) More likely in the short term is SCM using something like the hmem patches to hotplug it as if it were RAM. https://patchwork.kernel.org/cover/10982677/ In order to support legacy OS, it wouldn't be unusual to have a bios switch to make SCM on CCIX just appear as normal memory in the first place (if that was true though it is highly unlikely you'd use the CCIX PER messages.) > > ~ > > include/linux/memremap.h has: > | * Device memory that is cache coherent from device and CPU point of view. > This > | * is use on platform that have an advance system bus (like CAPI or CCIX). A > | * driver can hotplug the device memory using ZONE_DEVICE and with that > memory > | * type. Any page of a process can be migrated to such memory. However no > one > | * should be allow to pin such memory so that it can always be evicted. > > Is this the same CCIX? Not always, but sometimes. Typically if it's on an accelerator (GPU etc) then one software model is to do that. There are others though. Until we have more devices out there, its not even clear that model with dominate. It's a much argued about topic. My personal thoughts are you either have: 1. An accelerator that was designed from the ground up assuming that it had tight control of its own memory layout etc - someone bolted on coherency later. This is the current GPU model and fits with ZONE_DEVICE etc. GPUs can and do require shuffling memory. 2. An accelerator designed to run without it's own memory. We just decided to move some of the host memory to it's end of the bus for size or efficiency reasons. As it's reasonably happy with fragmentation and the assumption that someone else is doing it's memory management for it (the OS) you can run in an SVM type mode, with a bit of hinting by a userspace app on where it would prefer the accelerator accessed memory comes from. Those have no more constraints than any other memory. If you want to pin for RDMA for example, its fine. More importantly, it can just be normal DDR on a card with no extra functions. In those cases it's just a memory only NUMA node. > > If process memory can be migrated onto this stuff we need to kick off > memory_failure() in > response to memory errors, otherwise we don't mark the pages as poison, > signal user-space etc. Agreed. That was in the follow up patch set (which I haven't written yet) that actually does some error handling. I need to do some work on the emulation side to actually be able to generate sensible errors of that type. I'll stick some mention of that in the v2 cover letter. > > > > *
Re: [RFC PATCH 0/6] CCIX Protocol Error reporting
Hi Jonathan, (Beware: this CCIX thing is new to me, some of this will be questions on the scope of CCIX) On 06/06/2019 13:36, Jonathan Cameron wrote: > UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error Records > (PER). www.uefi.org > These include Protocol Error Record logs which are defined in the > CCIX 1.0 Base Specification www.ccixconsortium.com. I can't find a public version of this spec, and I don't have an account to look at the private one. (It looks like I could get one but I'm guessing there is an NDA between me and the document text!) Will it ever be public? > Handling of coherency protocol errors is complex and how Linux does this > will take some time to evolve. For now, fatal errors are handled via the > usual means and everything else is reported. > There are 6 types of error defined, covering: > * Memory errors How does this interact with the vanilla Memory-Error CPER records? Do we always get them as a pair? (that struct definition looks familiar...) Is memory described like this never present in the UEFI memory map? (so we never accidentally use it as host memory, for processes etc) ~ include/linux/memremap.h has: | * Device memory that is cache coherent from device and CPU point of view. This | * is use on platform that have an advance system bus (like CAPI or CCIX). A | * driver can hotplug the device memory using ZONE_DEVICE and with that memory | * type. Any page of a process can be migrated to such memory. However no one | * should be allow to pin such memory so that it can always be evicted. Is this the same CCIX? If process memory can be migrated onto this stuff we need to kick off memory_failure() in response to memory errors, otherwise we don't mark the pages as poison, signal user-space etc. > * Cache errors > * Address translation unit errors > * CCIX port errors > * CCIX link errors It looks like this stuff operates 'over' PCIe[0]. How does this interact with AER? Will we always get a vanilla PCIE-AER CPER in addition to these two? (I see 'CHECK THE AER EQUIVALENT' comments in your series, are these TODO:?) > * Agent internal errors. > > The set includes tracepoints to report the errors to RAS Daemon and a patch > set for RAS Daemon will follow shortly. > > There are several open questions for this RFC. > 1. Reporting of vendor data. We have little choice but to do this via a >dynamic array as these blocks can take arbitrary size. I had hoped >no one would actually use these given the odd mismatch between a >standard error structure and non standard element, but there are >already designs out there that do use it. I think its okay to spit these blobs out of the trace points, but could we avoid printing them in the kernel log? > 2. The trade off between explicit tracepoint fields, on which we might >want to filter, and the simplicity of a blob. I have gone for having >the whole of the block specific to the PER error type in an opaque blob. >Perhaps this is not the right balance? (I suspect I don't understand this). The filtering can be done by user-space. Isn't that enough? I see the memory-error event format file has 'pa', which is all anyone is likely to care about. Do 'source/component' indicate the device? (what do I pull out of the machine to stop these happening?) > 3. Whether defining 6 new tracepoints is sensible. I think it is: >* They are all defined by the CCIX specification as independant error > classes. >* Many of them can only be generated by particular types of agent. >* The handling required will vary widely depending on types. > In the kernel some map cleanly onto existing handling. Keeping the > whole flow separate will aide this. They vary by a similar amount > in scope to the RAS errors found on an existing system which have > independent tracepoints. >* Separating them out allows for filtering on the tracepoints by > elements that are not shared between them. >* Muxing the lot into one record type can lead to ugly code both in > kernel and in userspace. Sold! > This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to > you and other parties that are paticipating (the "participants") in the > Linux kernel with the understanding that the participants will use CCIX's > name and trademark only when this patch is used in association with the > Linux kernel and associated user space. > > CCIX is also distributing this patch to these participants with the > understanding that if any portion of the CCIX specification will be > used or referenced in the Linux kernel, the participants will not modify > the cited portion of the CCIX specification and will give CCIX propery > copyright attribution by including the following copyright notice with > the cited part of the CCIX specification: > "© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED." Is this text permission from the CCIX-Consortium to
Re: [RFC PATCH 0/6] CCIX Protocol Error reporting
On Thu, 6 Jun 2019 20:36:48 +0800 Jonathan Cameron wrote: Hi All, I'm looking for some reviews on this series if anyone has time to take a look. Rasdaemon patches to match with this are on linux-edac but are waiting on the tracepoints merging. I'm not currently planning to upstream the qemu injection patches used to test this but anyone would like those I can certainly put a public branch up somewhere. Thanks, Jonathan > UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error Records > (PER). www.uefi.org > > These include Protocol Error Record logs which are defined in the > CCIX 1.0 Base Specification www.ccixconsortium.com. > > Handling of coherency protocol errors is complex and how Linux does this > will take some time to evolve. For now, fatal errors are handled via the > usual means and everything else is reported. > > There are 6 types of error defined, covering: > * Memory errors > * Cache errors > * Address translation unit errors > * CCIX port errors > * CCIX link errors > * Agent internal errors. > > The set includes tracepoints to report the errors to RAS Daemon and a patch > set for RAS Daemon will follow shortly. > > There are several open questions for this RFC. > 1. Reporting of vendor data. We have little choice but to do this via a >dynamic array as these blocks can take arbitrary size. I had hoped >no one would actually use these given the odd mismatch between a >standard error structure and non standard element, but there are >already designs out there that do use it. > 2. The trade off between explicit tracepoint fields, on which we might >want to filter, and the simplicity of a blob. I have gone for having >the whole of the block specific to the PER error type in an opaque blob. >Perhaps this is not the right balance? > 3. Whether defining 6 new tracepoints is sensible. I think it is: >* They are all defined by the CCIX specification as independant error > classes. >* Many of them can only be generated by particular types of agent. >* The handling required will vary widely depending on types. > In the kernel some map cleanly onto existing handling. Keeping the > whole flow separate will aide this. They vary by a similar amount > in scope to the RAS errors found on an existing system which have > independent tracepoints. >* Separating them out allows for filtering on the tracepoints by > elements that are not shared between them. >* Muxing the lot into one record type can lead to ugly code both in > kernel and in userspace. > > Rasdaemon patches will follow shortly. > > This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to > you and other parties that are paticipating (the "participants") in the > Linux kernel with the understanding that the participants will use CCIX's > name and trademark only when this patch is used in association with the > Linux kernel and associated user space. > > CCIX is also distributing this patch to these participants with the > understanding that if any portion of the CCIX specification will be > used or referenced in the Linux kernel, the participants will not modify > the cited portion of the CCIX specification and will give CCIX propery > copyright attribution by including the following copyright notice with > the cited part of the CCIX specification: > "© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED." > > Jonathan Cameron (6): > efi / ras: CCIX Memory error reporting > efi / ras: CCIX Cache error reporting > efi / ras: CCIX Address Translation Cache error reporting > efi / ras: CCIX Port error reporting > efi / ras: CCIX Link error reporting > efi / ras: CCIX Agent internal error reporting > > drivers/acpi/apei/Kconfig| 8 + > drivers/acpi/apei/ghes.c | 59 ++ > drivers/firmware/efi/Kconfig | 5 + > drivers/firmware/efi/Makefile| 1 + > drivers/firmware/efi/cper-ccix.c | 916 +++ > drivers/firmware/efi/cper.c | 6 + > include/linux/cper.h | 333 +++ > include/ras/ras_event.h | 405 ++ > 8 files changed, 1733 insertions(+) > create mode 100644 drivers/firmware/efi/cper-ccix.c >
[RFC PATCH 0/6] CCIX Protocol Error reporting
UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error Records (PER). www.uefi.org These include Protocol Error Record logs which are defined in the CCIX 1.0 Base Specification www.ccixconsortium.com. Handling of coherency protocol errors is complex and how Linux does this will take some time to evolve. For now, fatal errors are handled via the usual means and everything else is reported. There are 6 types of error defined, covering: * Memory errors * Cache errors * Address translation unit errors * CCIX port errors * CCIX link errors * Agent internal errors. The set includes tracepoints to report the errors to RAS Daemon and a patch set for RAS Daemon will follow shortly. There are several open questions for this RFC. 1. Reporting of vendor data. We have little choice but to do this via a dynamic array as these blocks can take arbitrary size. I had hoped no one would actually use these given the odd mismatch between a standard error structure and non standard element, but there are already designs out there that do use it. 2. The trade off between explicit tracepoint fields, on which we might want to filter, and the simplicity of a blob. I have gone for having the whole of the block specific to the PER error type in an opaque blob. Perhaps this is not the right balance? 3. Whether defining 6 new tracepoints is sensible. I think it is: * They are all defined by the CCIX specification as independant error classes. * Many of them can only be generated by particular types of agent. * The handling required will vary widely depending on types. In the kernel some map cleanly onto existing handling. Keeping the whole flow separate will aide this. They vary by a similar amount in scope to the RAS errors found on an existing system which have independent tracepoints. * Separating them out allows for filtering on the tracepoints by elements that are not shared between them. * Muxing the lot into one record type can lead to ugly code both in kernel and in userspace. Rasdaemon patches will follow shortly. This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to you and other parties that are paticipating (the "participants") in the Linux kernel with the understanding that the participants will use CCIX's name and trademark only when this patch is used in association with the Linux kernel and associated user space. CCIX is also distributing this patch to these participants with the understanding that if any portion of the CCIX specification will be used or referenced in the Linux kernel, the participants will not modify the cited portion of the CCIX specification and will give CCIX propery copyright attribution by including the following copyright notice with the cited part of the CCIX specification: "© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED." Jonathan Cameron (6): efi / ras: CCIX Memory error reporting efi / ras: CCIX Cache error reporting efi / ras: CCIX Address Translation Cache error reporting efi / ras: CCIX Port error reporting efi / ras: CCIX Link error reporting efi / ras: CCIX Agent internal error reporting drivers/acpi/apei/Kconfig| 8 + drivers/acpi/apei/ghes.c | 59 ++ drivers/firmware/efi/Kconfig | 5 + drivers/firmware/efi/Makefile| 1 + drivers/firmware/efi/cper-ccix.c | 916 +++ drivers/firmware/efi/cper.c | 6 + include/linux/cper.h | 333 +++ include/ras/ras_event.h | 405 ++ 8 files changed, 1733 insertions(+) create mode 100644 drivers/firmware/efi/cper-ccix.c -- 2.20.1