Re: [RFC PATCH 0/6] CCIX Protocol Error reporting

2019-08-06 Thread Jonathan Cameron
On Wed, 3 Jul 2019 14:08:49 +0100
Jonathan Cameron  wrote:

> On Wed, 3 Jul 2019 10:28:08 +0100
> James Morse  wrote:
> 
> > Hi Jonathan,
> > 
> > (Beware: this CCIX thing is new to me, some of this will be questions on 
> > the scope of CCIX)  
> 
> Sure and welcome to our crazy world.
> 
> > 
> > On 06/06/2019 13:36, Jonathan Cameron wrote:  
> > > UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error 
> > > Records
> > > (PER). www.uefi.org
> >   
> > > These include Protocol Error Record logs which are defined in the
> > > CCIX 1.0 Base Specification www.ccixconsortium.com.
> > 
> > I can't find a public version of this spec, and I don't have an account to 
> > look at the
> > private one. (It looks like I could get one but I'm guessing there is an 
> > NDA between me
> > and the document text!)
> > 
> > Will it ever be public?  
> 
> +cc A few people who might be able to help.

Evaluation version of the CCIX 1.0a base specification now available,
(though there is a form to complete and license agreement)..

https://www.ccixconsortium.com/ccix-library/download-form/


> 
> > 
> >   
> > > Handling of coherency protocol errors is complex and how Linux does this
> > > will take some time to evolve.  For now, fatal errors are handled via the
> > > usual means and everything else is reported.
> >   
> > > There are 6 types of error defined, covering:
> > > * Memory errors
> > 
> > How does this interact with the vanilla Memory-Error CPER records? Do we 
> > always get them
> > as a pair? (that struct definition looks familiar...)  
> 
> The snag is that the standard memory error doesn't provide quite enough 
> information to
> actually identify a component once you have buses involved.
> The actual memory specific bit is nearly identical, but the header section 
> isn't
> as it gives the CCIX agent ID (see below given you asked a more specific 
> question
> on that).
> 
> We did discuss this in a lot of detail and currently the answer is that the
> UEFI spec + ccix allows you to report only the ccix error if you want to.
> 
> Note that I believe there is an OSC bit that I'm not currently setting so
> right now no firmware compliant with the CCIX SW Guide (information available
> from your local Charles) should actually be generating these anyway because
> the OS hasn't said it can handle them.  I took the view I wanted to get
> round 2 which actually hooks up handling rather than just reporting before
> setting that bit.
> 
> > Is memory described like this never present in the UEFI memory map? (so we 
> > never
> > accidentally use it as host memory, for processes etc)  
> Yes it definitely can.  If you want to you can have all your RAM attached by 
> CCIX.
> (probably not a good idea performance wise :)
> 
> More likely in the short term is SCM using something like the hmem patches
> to hotplug it as if it were RAM.
> https://patchwork.kernel.org/cover/10982677/
> 
> In order to support legacy OS, it wouldn't be unusual to have a bios switch
> to make SCM on CCIX just appear as normal memory in the first place
> (if that was true though it is highly unlikely you'd use the CCIX
> PER messages.)
> 
> > 
> > ~
> > 
> > include/linux/memremap.h has:
> > |  * Device memory that is cache coherent from device and CPU point of 
> > view. This
> > |  * is use on platform that have an advance system bus (like CAPI or 
> > CCIX). A
> > |  * driver can hotplug the device memory using ZONE_DEVICE and with that 
> > memory
> > |  * type. Any page of a process can be migrated to such memory. However no 
> > one
> > |  * should be allow to pin such memory so that it can always be evicted.
> > 
> > Is this the same CCIX?  
> 
> Not always, but sometimes.  Typically if it's on an accelerator (GPU etc) then
> one software model is to do that.  There are others though. Until we have
> more devices out there, its not even clear that model with dominate.
> 
> It's a much argued about topic. My personal thoughts are you either have:
> 1. An accelerator that was designed from the ground up assuming that it had
>tight control of its own memory layout etc - someone bolted on coherency
>later.  This is the current GPU model and fits with ZONE_DEVICE etc. GPUs
>can and do require shuffling memory.
> 
> 2. An accelerator designed to run without it's own memory.  We just decided
>to move some of the host memory to it's end of the bus for size or
>efficiency reasons.  As it's reasonably happy with fragmentation and
>the assumption that someone else is doing it's memory management for it
>(the OS) you can run in an SVM type mode, with a bit of hinting by
>a userspace app on where it would prefer the accelerator accessed memory
>comes from.  Those have no more constraints than any other memory.  If
>you want to pin for RDMA for example, its fine.
> 
> More importantly, it can just be normal DDR on a card with no extra functions.
> In those cases it's just a memory 

Re: [RFC PATCH 0/6] CCIX Protocol Error reporting

2019-07-03 Thread Jonathan Cameron
On Wed, 3 Jul 2019 10:28:08 +0100
James Morse  wrote:

> Hi Jonathan,
> 
> (Beware: this CCIX thing is new to me, some of this will be questions on the 
> scope of CCIX)

Sure and welcome to our crazy world.

> 
> On 06/06/2019 13:36, Jonathan Cameron wrote:
> > UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error 
> > Records
> > (PER). www.uefi.org  
> 
> > These include Protocol Error Record logs which are defined in the
> > CCIX 1.0 Base Specification www.ccixconsortium.com.  
> 
> I can't find a public version of this spec, and I don't have an account to 
> look at the
> private one. (It looks like I could get one but I'm guessing there is an NDA 
> between me
> and the document text!)
> 
> Will it ever be public?

+cc A few people who might be able to help.

> 
> 
> > Handling of coherency protocol errors is complex and how Linux does this
> > will take some time to evolve.  For now, fatal errors are handled via the
> > usual means and everything else is reported.  
> 
> > There are 6 types of error defined, covering:
> > * Memory errors  
> 
> How does this interact with the vanilla Memory-Error CPER records? Do we 
> always get them
> as a pair? (that struct definition looks familiar...)

The snag is that the standard memory error doesn't provide quite enough 
information to
actually identify a component once you have buses involved.
The actual memory specific bit is nearly identical, but the header section isn't
as it gives the CCIX agent ID (see below given you asked a more specific 
question
on that).

We did discuss this in a lot of detail and currently the answer is that the
UEFI spec + ccix allows you to report only the ccix error if you want to.

Note that I believe there is an OSC bit that I'm not currently setting so
right now no firmware compliant with the CCIX SW Guide (information available
from your local Charles) should actually be generating these anyway because
the OS hasn't said it can handle them.  I took the view I wanted to get
round 2 which actually hooks up handling rather than just reporting before
setting that bit.

> Is memory described like this never present in the UEFI memory map? (so we 
> never
> accidentally use it as host memory, for processes etc)
Yes it definitely can.  If you want to you can have all your RAM attached by 
CCIX.
(probably not a good idea performance wise :)

More likely in the short term is SCM using something like the hmem patches
to hotplug it as if it were RAM.
https://patchwork.kernel.org/cover/10982677/

In order to support legacy OS, it wouldn't be unusual to have a bios switch
to make SCM on CCIX just appear as normal memory in the first place
(if that was true though it is highly unlikely you'd use the CCIX
PER messages.)

> 
> ~
> 
> include/linux/memremap.h has:
> |  * Device memory that is cache coherent from device and CPU point of view. 
> This
> |  * is use on platform that have an advance system bus (like CAPI or CCIX). A
> |  * driver can hotplug the device memory using ZONE_DEVICE and with that 
> memory
> |  * type. Any page of a process can be migrated to such memory. However no 
> one
> |  * should be allow to pin such memory so that it can always be evicted.
> 
> Is this the same CCIX?

Not always, but sometimes.  Typically if it's on an accelerator (GPU etc) then
one software model is to do that.  There are others though. Until we have
more devices out there, its not even clear that model with dominate.

It's a much argued about topic. My personal thoughts are you either have:
1. An accelerator that was designed from the ground up assuming that it had
   tight control of its own memory layout etc - someone bolted on coherency
   later.  This is the current GPU model and fits with ZONE_DEVICE etc. GPUs
   can and do require shuffling memory.

2. An accelerator designed to run without it's own memory.  We just decided
   to move some of the host memory to it's end of the bus for size or
   efficiency reasons.  As it's reasonably happy with fragmentation and
   the assumption that someone else is doing it's memory management for it
   (the OS) you can run in an SVM type mode, with a bit of hinting by
   a userspace app on where it would prefer the accelerator accessed memory
   comes from.  Those have no more constraints than any other memory.  If
   you want to pin for RDMA for example, its fine.

More importantly, it can just be normal DDR on a card with no extra functions.
In those cases it's just a memory only NUMA node.

> 
> If process memory can be migrated onto this stuff we need to kick off 
> memory_failure() in
> response to memory errors, otherwise we don't mark the pages as poison, 
> signal user-space etc.

Agreed.  That was in the follow up patch set (which I haven't written yet)
that actually does some error handling.  I need to do some work on the
emulation side to actually be able to generate sensible errors of that type.

I'll stick some mention of that in the v2 cover letter.
> 
> 
> > * 

Re: [RFC PATCH 0/6] CCIX Protocol Error reporting

2019-07-03 Thread James Morse
Hi Jonathan,

(Beware: this CCIX thing is new to me, some of this will be questions on the 
scope of CCIX)

On 06/06/2019 13:36, Jonathan Cameron wrote:
> UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error Records
> (PER). www.uefi.org

> These include Protocol Error Record logs which are defined in the
> CCIX 1.0 Base Specification www.ccixconsortium.com.

I can't find a public version of this spec, and I don't have an account to look 
at the
private one. (It looks like I could get one but I'm guessing there is an NDA 
between me
and the document text!)

Will it ever be public?


> Handling of coherency protocol errors is complex and how Linux does this
> will take some time to evolve.  For now, fatal errors are handled via the
> usual means and everything else is reported.

> There are 6 types of error defined, covering:
> * Memory errors

How does this interact with the vanilla Memory-Error CPER records? Do we always 
get them
as a pair? (that struct definition looks familiar...)

Is memory described like this never present in the UEFI memory map? (so we never
accidentally use it as host memory, for processes etc)

~

include/linux/memremap.h has:
|  * Device memory that is cache coherent from device and CPU point of view. 
This
|  * is use on platform that have an advance system bus (like CAPI or CCIX). A
|  * driver can hotplug the device memory using ZONE_DEVICE and with that memory
|  * type. Any page of a process can be migrated to such memory. However no one
|  * should be allow to pin such memory so that it can always be evicted.

Is this the same CCIX?

If process memory can be migrated onto this stuff we need to kick off 
memory_failure() in
response to memory errors, otherwise we don't mark the pages as poison, signal 
user-space etc.


> * Cache errors
> * Address translation unit errors

> * CCIX port errors 
> * CCIX link errors

It looks like this stuff operates 'over' PCIe[0]. How does this interact with 
AER?
Will we always get a vanilla PCIE-AER CPER in addition to these two?

(I see 'CHECK THE AER EQUIVALENT' comments in your series, are these TODO:?)


> * Agent internal errors.
> 
> The set includes tracepoints to report the errors to RAS Daemon and a patch
> set for RAS Daemon will follow shortly.
> 
> There are several open questions for this RFC.
> 1. Reporting of vendor data.  We have little choice but to do this via a
>dynamic array as these blocks can take arbitrary size. I had hoped
>no one would actually use these given the odd mismatch between a
>standard error structure and non standard element, but there are
>already designs out there that do use it.

I think its okay to spit these blobs out of the trace points, but could we 
avoid printing
them in the kernel log?


> 2. The trade off between explicit tracepoint fields, on which we might
>want to filter, and the simplicity of a blob. I have gone for having
>the whole of the block specific to the PER error type in an opaque blob.
>Perhaps this is not the right balance?

(I suspect I don't understand this).

The filtering can be done by user-space. Isn't that enough?

I see the memory-error event format file has 'pa', which is all anyone is 
likely to care
about.
Do 'source/component' indicate the device? (what do I pull out of the machine 
to stop
these happening?)


> 3. Whether defining 6 new tracepoints is sensible. I think it is:
>* They are all defined by the CCIX specification as independant error
>  classes.
>* Many of them can only be generated by particular types of agent.
>* The handling required will vary widely depending on types.
>  In the kernel some map cleanly onto existing handling. Keeping the
>  whole flow separate will aide this. They vary by a similar amount
>  in scope to the RAS errors found on an existing system which have
>  independent tracepoints.
>* Separating them out allows for filtering on the tracepoints by
>  elements that are not shared between them.
>* Muxing the lot into one record type can lead to ugly code both in
>  kernel and in userspace.

Sold!


> This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to
> you and other parties that are paticipating (the "participants") in the
> Linux kernel with the understanding that the participants will use CCIX's
> name and trademark only when this patch is used in association with the
> Linux kernel and associated user space.
> 
> CCIX is also distributing this patch to these participants with the
> understanding that if any portion of the CCIX specification will be
> used or referenced in the Linux kernel, the participants will not modify
> the cited portion of the CCIX specification and will give CCIX propery
> copyright attribution by including the following copyright notice with
> the cited part of the CCIX specification:
> "© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED."

Is this text permission from the CCIX-Consortium to 

Re: [RFC PATCH 0/6] CCIX Protocol Error reporting

2019-06-25 Thread Jonathan Cameron
On Thu, 6 Jun 2019 20:36:48 +0800
Jonathan Cameron  wrote:

Hi All,

I'm looking for some reviews on this series if anyone has time to take
a look.  Rasdaemon patches to match with this are on linux-edac but
are waiting on the tracepoints merging.

I'm not currently planning to upstream the qemu injection patches
used to test this but anyone would like those I can certainly put
a public branch up somewhere.

Thanks,

Jonathan

> UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error Records
> (PER). www.uefi.org
> 
> These include Protocol Error Record logs which are defined in the
> CCIX 1.0 Base Specification www.ccixconsortium.com.
> 
> Handling of coherency protocol errors is complex and how Linux does this
> will take some time to evolve.  For now, fatal errors are handled via the
> usual means and everything else is reported.
> 
> There are 6 types of error defined, covering:
> * Memory errors
> * Cache errors
> * Address translation unit errors
> * CCIX port errors 
> * CCIX link errors
> * Agent internal errors.
> 
> The set includes tracepoints to report the errors to RAS Daemon and a patch
> set for RAS Daemon will follow shortly.
> 
> There are several open questions for this RFC.
> 1. Reporting of vendor data.  We have little choice but to do this via a
>dynamic array as these blocks can take arbitrary size. I had hoped
>no one would actually use these given the odd mismatch between a
>standard error structure and non standard element, but there are
>already designs out there that do use it.
> 2. The trade off between explicit tracepoint fields, on which we might
>want to filter, and the simplicity of a blob. I have gone for having
>the whole of the block specific to the PER error type in an opaque blob.
>Perhaps this is not the right balance?
> 3. Whether defining 6 new tracepoints is sensible. I think it is:
>* They are all defined by the CCIX specification as independant error
>  classes.
>* Many of them can only be generated by particular types of agent.
>* The handling required will vary widely depending on types.
>  In the kernel some map cleanly onto existing handling. Keeping the
>  whole flow separate will aide this. They vary by a similar amount
>  in scope to the RAS errors found on an existing system which have
>  independent tracepoints.
>* Separating them out allows for filtering on the tracepoints by
>  elements that are not shared between them.
>* Muxing the lot into one record type can lead to ugly code both in
>  kernel and in userspace.
> 
> Rasdaemon patches will follow shortly.
> 
> This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to
> you and other parties that are paticipating (the "participants") in the
> Linux kernel with the understanding that the participants will use CCIX's
> name and trademark only when this patch is used in association with the
> Linux kernel and associated user space.
> 
> CCIX is also distributing this patch to these participants with the
> understanding that if any portion of the CCIX specification will be
> used or referenced in the Linux kernel, the participants will not modify
> the cited portion of the CCIX specification and will give CCIX propery
> copyright attribution by including the following copyright notice with
> the cited part of the CCIX specification:
> "© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED."
> 
> Jonathan Cameron (6):
>   efi / ras: CCIX Memory error reporting
>   efi / ras: CCIX Cache error reporting
>   efi / ras: CCIX Address Translation Cache error reporting
>   efi / ras: CCIX Port error reporting
>   efi / ras: CCIX Link error reporting
>   efi / ras: CCIX Agent internal error reporting
> 
>  drivers/acpi/apei/Kconfig|   8 +
>  drivers/acpi/apei/ghes.c |  59 ++
>  drivers/firmware/efi/Kconfig |   5 +
>  drivers/firmware/efi/Makefile|   1 +
>  drivers/firmware/efi/cper-ccix.c | 916 +++
>  drivers/firmware/efi/cper.c  |   6 +
>  include/linux/cper.h | 333 +++
>  include/ras/ras_event.h  | 405 ++
>  8 files changed, 1733 insertions(+)
>  create mode 100644 drivers/firmware/efi/cper-ccix.c
> 




[RFC PATCH 0/6] CCIX Protocol Error reporting

2019-06-06 Thread Jonathan Cameron
UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error Records
(PER). www.uefi.org

These include Protocol Error Record logs which are defined in the
CCIX 1.0 Base Specification www.ccixconsortium.com.

Handling of coherency protocol errors is complex and how Linux does this
will take some time to evolve.  For now, fatal errors are handled via the
usual means and everything else is reported.

There are 6 types of error defined, covering:
* Memory errors
* Cache errors
* Address translation unit errors
* CCIX port errors 
* CCIX link errors
* Agent internal errors.

The set includes tracepoints to report the errors to RAS Daemon and a patch
set for RAS Daemon will follow shortly.

There are several open questions for this RFC.
1. Reporting of vendor data.  We have little choice but to do this via a
   dynamic array as these blocks can take arbitrary size. I had hoped
   no one would actually use these given the odd mismatch between a
   standard error structure and non standard element, but there are
   already designs out there that do use it.
2. The trade off between explicit tracepoint fields, on which we might
   want to filter, and the simplicity of a blob. I have gone for having
   the whole of the block specific to the PER error type in an opaque blob.
   Perhaps this is not the right balance?
3. Whether defining 6 new tracepoints is sensible. I think it is:
   * They are all defined by the CCIX specification as independant error
 classes.
   * Many of them can only be generated by particular types of agent.
   * The handling required will vary widely depending on types.
 In the kernel some map cleanly onto existing handling. Keeping the
 whole flow separate will aide this. They vary by a similar amount
 in scope to the RAS errors found on an existing system which have
 independent tracepoints.
   * Separating them out allows for filtering on the tracepoints by
 elements that are not shared between them.
   * Muxing the lot into one record type can lead to ugly code both in
 kernel and in userspace.

Rasdaemon patches will follow shortly.

This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to
you and other parties that are paticipating (the "participants") in the
Linux kernel with the understanding that the participants will use CCIX's
name and trademark only when this patch is used in association with the
Linux kernel and associated user space.

CCIX is also distributing this patch to these participants with the
understanding that if any portion of the CCIX specification will be
used or referenced in the Linux kernel, the participants will not modify
the cited portion of the CCIX specification and will give CCIX propery
copyright attribution by including the following copyright notice with
the cited part of the CCIX specification:
"© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED."

Jonathan Cameron (6):
  efi / ras: CCIX Memory error reporting
  efi / ras: CCIX Cache error reporting
  efi / ras: CCIX Address Translation Cache error reporting
  efi / ras: CCIX Port error reporting
  efi / ras: CCIX Link error reporting
  efi / ras: CCIX Agent internal error reporting

 drivers/acpi/apei/Kconfig|   8 +
 drivers/acpi/apei/ghes.c |  59 ++
 drivers/firmware/efi/Kconfig |   5 +
 drivers/firmware/efi/Makefile|   1 +
 drivers/firmware/efi/cper-ccix.c | 916 +++
 drivers/firmware/efi/cper.c  |   6 +
 include/linux/cper.h | 333 +++
 include/ras/ras_event.h  | 405 ++
 8 files changed, 1733 insertions(+)
 create mode 100644 drivers/firmware/efi/cper-ccix.c

-- 
2.20.1