Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-25 Thread Borislav Petkov
On Tue, Apr 25, 2017 at 10:05:31AM -0600, Baicar, Tyler wrote:
> That seems like something that should be done outside of these patches (if
> added to the kernel at all). The decoding for this information would all be
> vendor specific, so I'm not sure if we want to pollute the EFI code with
> vendor specific error decoding. Currently we are using the RAS Daemon user
> space tool for the decoding of this information since vendors can easily
> pick up this tool and add an extension for their vendor specific parsing.
> These prints will only happen when the firmware supports the vendor specific
> error information anyway.

The same questions apply here: what do you do if the machine panics
because the error is fatal and you can't get it to run any userspace?
Wouldn't it be good to decode the whole error?

Right now we photograph screens on Intel x86 and feed the typed MCA info
by hand to mcelog. Hardly an optimal situation.

On AMD, there's a decoder which actually can dump the decoded critical
error. (Or could - that's in flux again :-\).

So yes, if stuff is too vendor-specific then you probably don't
want to decode it (or add a vendor-specific decoding module like
edac_mce_amd.ko, for example). But the tables from the UEFI spec,
documenting IP-specific error types which should be valid for most if
not all ARM64 implementations adhering to the spec, would be useful to
decode.

In general, the more we can decode the error in the kernel and the less
we need an external help to do so, the better.

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-25 Thread Borislav Petkov
On Tue, Apr 25, 2017 at 10:05:31AM -0600, Baicar, Tyler wrote:
> That seems like something that should be done outside of these patches (if
> added to the kernel at all). The decoding for this information would all be
> vendor specific, so I'm not sure if we want to pollute the EFI code with
> vendor specific error decoding. Currently we are using the RAS Daemon user
> space tool for the decoding of this information since vendors can easily
> pick up this tool and add an extension for their vendor specific parsing.
> These prints will only happen when the firmware supports the vendor specific
> error information anyway.

The same questions apply here: what do you do if the machine panics
because the error is fatal and you can't get it to run any userspace?
Wouldn't it be good to decode the whole error?

Right now we photograph screens on Intel x86 and feed the typed MCA info
by hand to mcelog. Hardly an optimal situation.

On AMD, there's a decoder which actually can dump the decoded critical
error. (Or could - that's in flux again :-\).

So yes, if stuff is too vendor-specific then you probably don't
want to decode it (or add a vendor-specific decoding module like
edac_mce_amd.ko, for example). But the tables from the UEFI spec,
documenting IP-specific error types which should be valid for most if
not all ARM64 implementations adhering to the spec, would be useful to
decode.

In general, the more we can decode the error in the kernel and the less
we need an external help to do so, the better.

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-25 Thread Baicar, Tyler

On 4/24/2017 11:52 AM, Borislav Petkov wrote:

On Fri, Apr 21, 2017 at 12:22:09PM -0600, Baicar, Tyler wrote:

I guess it's not really needed. It just may be useful considering there can
be numerous error info structures, numerous context info structures, and a
variable length vendor information section. I can move this print to only in
the length check failure cases.

And? Why does the user care?

I mean, it is good for debugging when you wanna see you're parsing the
error info data properly but otherwise it doesn't improve the error
reporting one bit.

I'll move this to just happen when the length check fails.

Because these are part of the error information structure. I wouldn't think
FW would populate error information structures that are different versions
in the same processor error, but it could be possible from the spec (at
least once there are different versions of the table).

Same argument as above.

I can remove it then.



There is an error information 64 bit value in the ARM processor error
information structure. (UEFI spec 2.6 table 261)

So that's IP-dependent and explained in the following tables. Any plans
on decoding that too?

Yes, I do plan on adding further decoding for these values in the future.



Why's that? Dumping this vendor specific error information is similar to the
unrecognized CPER section reporting which is also meant for vendor specific
information https://lkml.org/lkml/2017/4/18/751

And how do those naked bytes help the user understand the error happening?

Even in your example you have:

[  140.739210] {1}[Hardware Error]:   : 4d415201 4d492031 453a4d45 
435f4343  .RAM1 IMEM:ECC_C
[  140.739214] {1}[Hardware Error]:   0010: 53515f45 44525f42  
  E_QSB_RD

Which looks like some correctable ECC DRAM error and is actually begging
to be decoded in a human-readable form. So let's do that completely and
not dump partially decoded information.
That seems like something that should be done outside of these patches 
(if added to the kernel at all). The decoding for this information would 
all be vendor specific, so I'm not sure if we want to pollute the EFI 
code with vendor specific error decoding. Currently we are using the RAS 
Daemon user space tool for the decoding of this information since 
vendors can easily pick up this tool and add an extension for their 
vendor specific parsing. These prints will only happen when the firmware 
supports the vendor specific error information anyway.


Thanks,
Tyler

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.



Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-25 Thread Baicar, Tyler

On 4/24/2017 11:52 AM, Borislav Petkov wrote:

On Fri, Apr 21, 2017 at 12:22:09PM -0600, Baicar, Tyler wrote:

I guess it's not really needed. It just may be useful considering there can
be numerous error info structures, numerous context info structures, and a
variable length vendor information section. I can move this print to only in
the length check failure cases.

And? Why does the user care?

I mean, it is good for debugging when you wanna see you're parsing the
error info data properly but otherwise it doesn't improve the error
reporting one bit.

I'll move this to just happen when the length check fails.

Because these are part of the error information structure. I wouldn't think
FW would populate error information structures that are different versions
in the same processor error, but it could be possible from the spec (at
least once there are different versions of the table).

Same argument as above.

I can remove it then.



There is an error information 64 bit value in the ARM processor error
information structure. (UEFI spec 2.6 table 261)

So that's IP-dependent and explained in the following tables. Any plans
on decoding that too?

Yes, I do plan on adding further decoding for these values in the future.



Why's that? Dumping this vendor specific error information is similar to the
unrecognized CPER section reporting which is also meant for vendor specific
information https://lkml.org/lkml/2017/4/18/751

And how do those naked bytes help the user understand the error happening?

Even in your example you have:

[  140.739210] {1}[Hardware Error]:   : 4d415201 4d492031 453a4d45 
435f4343  .RAM1 IMEM:ECC_C
[  140.739214] {1}[Hardware Error]:   0010: 53515f45 44525f42  
  E_QSB_RD

Which looks like some correctable ECC DRAM error and is actually begging
to be decoded in a human-readable form. So let's do that completely and
not dump partially decoded information.
That seems like something that should be done outside of these patches 
(if added to the kernel at all). The decoding for this information would 
all be vendor specific, so I'm not sure if we want to pollute the EFI 
code with vendor specific error decoding. Currently we are using the RAS 
Daemon user space tool for the decoding of this information since 
vendors can easily pick up this tool and add an extension for their 
vendor specific parsing. These prints will only happen when the firmware 
supports the vendor specific error information anyway.


Thanks,
Tyler

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.



Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-24 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 12:22:09PM -0600, Baicar, Tyler wrote:
> I guess it's not really needed. It just may be useful considering there can
> be numerous error info structures, numerous context info structures, and a
> variable length vendor information section. I can move this print to only in
> the length check failure cases.

And? Why does the user care?

I mean, it is good for debugging when you wanna see you're parsing the
error info data properly but otherwise it doesn't improve the error
reporting one bit.

> Because these are part of the error information structure. I wouldn't think
> FW would populate error information structures that are different versions
> in the same processor error, but it could be possible from the spec (at
> least once there are different versions of the table).

Same argument as above.

> There is an error information 64 bit value in the ARM processor error
> information structure. (UEFI spec 2.6 table 261)

So that's IP-dependent and explained in the following tables. Any plans
on decoding that too?

> Why's that? Dumping this vendor specific error information is similar to the
> unrecognized CPER section reporting which is also meant for vendor specific
> information https://lkml.org/lkml/2017/4/18/751

And how do those naked bytes help the user understand the error happening?

Even in your example you have:

[  140.739210] {1}[Hardware Error]:   : 4d415201 4d492031 453a4d45 
435f4343  .RAM1 IMEM:ECC_C
[  140.739214] {1}[Hardware Error]:   0010: 53515f45 44525f42  
  E_QSB_RD

Which looks like some correctable ECC DRAM error and is actually begging
to be decoded in a human-readable form. So let's do that completely and
not dump partially decoded information.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-24 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 12:22:09PM -0600, Baicar, Tyler wrote:
> I guess it's not really needed. It just may be useful considering there can
> be numerous error info structures, numerous context info structures, and a
> variable length vendor information section. I can move this print to only in
> the length check failure cases.

And? Why does the user care?

I mean, it is good for debugging when you wanna see you're parsing the
error info data properly but otherwise it doesn't improve the error
reporting one bit.

> Because these are part of the error information structure. I wouldn't think
> FW would populate error information structures that are different versions
> in the same processor error, but it could be possible from the spec (at
> least once there are different versions of the table).

Same argument as above.

> There is an error information 64 bit value in the ARM processor error
> information structure. (UEFI spec 2.6 table 261)

So that's IP-dependent and explained in the following tables. Any plans
on decoding that too?

> Why's that? Dumping this vendor specific error information is similar to the
> unrecognized CPER section reporting which is also meant for vendor specific
> information https://lkml.org/lkml/2017/4/18/751

And how do those naked bytes help the user understand the error happening?

Even in your example you have:

[  140.739210] {1}[Hardware Error]:   : 4d415201 4d492031 453a4d45 
435f4343  .RAM1 IMEM:ECC_C
[  140.739214] {1}[Hardware Error]:   0010: 53515f45 44525f42  
  E_QSB_RD

Which looks like some correctable ECC DRAM error and is actually begging
to be decoded in a human-readable form. So let's do that completely and
not dump partially decoded information.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-21 Thread Baicar, Tyler

On 4/21/2017 11:55 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 05:05:16PM -0600, Tyler Baicar wrote:

Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Reviewed-by: Ard Biesheuvel 
---
  drivers/firmware/efi/cper.c | 135 
  include/linux/cper.h|  54 ++
  2 files changed, 189 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 46585f9..f959185 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
  static const char * const proc_type_strs[] = {
"IA32/X64",
"IA64",
+   "ARM",
  };
  
  static const char * const proc_isa_strs[] = {

"IA32",
"IA64",
"X64",
+   "ARM A32/T32",
+   "ARM A64",
  };
  
  static const char * const proc_error_type_strs[] = {

@@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
  }
  
+#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)

+static const char * const arm_reg_ctx_strs[] = {
+   "AArch32 general purpose registers",
+   "AArch32 EL1 context registers",
+   "AArch32 EL2 context registers",
+   "AArch32 secure context registers",
+   "AArch64 general purpose registers",
+   "AArch64 EL1 context registers",
+   "AArch64 EL2 context registers",
+   "AArch64 EL3 context registers",
+   "Misc. system register structure",
+};
+
+static void cper_print_proc_arm(const char *pfx,
+   const struct cper_sec_proc_arm *proc)
+{
+   int i, len, max_ctx_type;
+   struct cper_arm_err_info *err_info;
+   struct cper_arm_ctx_info *ctx_info;
+   char newpfx[64];
+
+   printk("%ssection length: %d\n", pfx, proc->section_length);

We need to dump section length because?
I guess it's not really needed. It just may be useful considering there 
can be numerous error info structures, numerous context info structures, 
and a variable length vendor information section. I can move this print 
to only in the length check failure cases.



+   printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
+
+   len = proc->section_length - (sizeof(*proc) +
+   proc->err_info_num * (sizeof(*err_info)));
+   if (len < 0) {
+   printk("%ssection length is too small\n", pfx);

Now here we *can* dump it.


+   printk("%sfirmware-generated error record is incorrect\n", pfx);
+   printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+   return;
+   }
+
+   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+   printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);


< newline here.

Also, what is MPIDR and can it be written in a more user-friendly manner
and not be an abbreviation?


+   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+   printk("%serror affinity level: %d\n", pfx,
+   proc->affinity_level);
+   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+   printk("%srunning state: 0x%x\n", pfx, proc->running_state);
+   printk("%sPSCI state: %d\n", pfx, proc->psci_state);

One more abbreviation. Please consider whether having the abbreviations
or actually writing them out is more user-friendly.


+   }
+
+   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

That INDENT_SP thing is just silly, someone should kill it.


+
+   err_info = (struct cper_arm_err_info *)(proc + 1);
+   for (i = 0; i < proc->err_info_num; i++) {
+   printk("%sError info structure %d:\n", pfx, i);
+   printk("%sversion:%d\n", newpfx, err_info->version);
+   printk("%slength:%d\n", newpfx, err_info->length);

< newline here.

Why do we even dump version and info for *every* err_info structure?
Because these are part of the error information structure. I wouldn't 
think FW would populate error information structures that are different 
versions in the same processor error, but it could be possible from the 
spec (at least once there are different versions of the table).



+   if (err_info->validation_bits &
+   CPER_ARM_INFO_VALID_MULTI_ERR) {
+   if (err_info->multiple_error == 0)
+   printk("%ssingle error\n", newpfx);
+   else if (err_info->multiple_error == 1)
+   printk("%smultiple 

Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-21 Thread Baicar, Tyler

On 4/21/2017 11:55 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 05:05:16PM -0600, Tyler Baicar wrote:

Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Reviewed-by: Ard Biesheuvel 
---
  drivers/firmware/efi/cper.c | 135 
  include/linux/cper.h|  54 ++
  2 files changed, 189 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 46585f9..f959185 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
  static const char * const proc_type_strs[] = {
"IA32/X64",
"IA64",
+   "ARM",
  };
  
  static const char * const proc_isa_strs[] = {

"IA32",
"IA64",
"X64",
+   "ARM A32/T32",
+   "ARM A64",
  };
  
  static const char * const proc_error_type_strs[] = {

@@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
  }
  
+#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)

+static const char * const arm_reg_ctx_strs[] = {
+   "AArch32 general purpose registers",
+   "AArch32 EL1 context registers",
+   "AArch32 EL2 context registers",
+   "AArch32 secure context registers",
+   "AArch64 general purpose registers",
+   "AArch64 EL1 context registers",
+   "AArch64 EL2 context registers",
+   "AArch64 EL3 context registers",
+   "Misc. system register structure",
+};
+
+static void cper_print_proc_arm(const char *pfx,
+   const struct cper_sec_proc_arm *proc)
+{
+   int i, len, max_ctx_type;
+   struct cper_arm_err_info *err_info;
+   struct cper_arm_ctx_info *ctx_info;
+   char newpfx[64];
+
+   printk("%ssection length: %d\n", pfx, proc->section_length);

We need to dump section length because?
I guess it's not really needed. It just may be useful considering there 
can be numerous error info structures, numerous context info structures, 
and a variable length vendor information section. I can move this print 
to only in the length check failure cases.



+   printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
+
+   len = proc->section_length - (sizeof(*proc) +
+   proc->err_info_num * (sizeof(*err_info)));
+   if (len < 0) {
+   printk("%ssection length is too small\n", pfx);

Now here we *can* dump it.


+   printk("%sfirmware-generated error record is incorrect\n", pfx);
+   printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+   return;
+   }
+
+   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+   printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);


< newline here.

Also, what is MPIDR and can it be written in a more user-friendly manner
and not be an abbreviation?


+   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+   printk("%serror affinity level: %d\n", pfx,
+   proc->affinity_level);
+   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+   printk("%srunning state: 0x%x\n", pfx, proc->running_state);
+   printk("%sPSCI state: %d\n", pfx, proc->psci_state);

One more abbreviation. Please consider whether having the abbreviations
or actually writing them out is more user-friendly.


+   }
+
+   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

That INDENT_SP thing is just silly, someone should kill it.


+
+   err_info = (struct cper_arm_err_info *)(proc + 1);
+   for (i = 0; i < proc->err_info_num; i++) {
+   printk("%sError info structure %d:\n", pfx, i);
+   printk("%sversion:%d\n", newpfx, err_info->version);
+   printk("%slength:%d\n", newpfx, err_info->length);

< newline here.

Why do we even dump version and info for *every* err_info structure?
Because these are part of the error information structure. I wouldn't 
think FW would populate error information structures that are different 
versions in the same processor error, but it could be possible from the 
spec (at least once there are different versions of the table).



+   if (err_info->validation_bits &
+   CPER_ARM_INFO_VALID_MULTI_ERR) {
+   if (err_info->multiple_error == 0)
+   printk("%ssingle error\n", newpfx);
+   else if (err_info->multiple_error == 1)
+   printk("%smultiple errors\n", newpfx);
+   else
+   printk("%smultiple 

Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-21 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:16PM -0600, Tyler Baicar wrote:
> Add support for ARM Common Platform Error Record (CPER).
> UEFI 2.6 specification adds support for ARM specific
> processor error information to be reported as part of the
> CPER records. This provides more detail on for processor error logs.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> Reviewed-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/cper.c | 135 
> 
>  include/linux/cper.h|  54 ++
>  2 files changed, 189 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 46585f9..f959185 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>  static const char * const proc_type_strs[] = {
>   "IA32/X64",
>   "IA64",
> + "ARM",
>  };
>  
>  static const char * const proc_isa_strs[] = {
>   "IA32",
>   "IA64",
>   "X64",
> + "ARM A32/T32",
> + "ARM A64",
>  };
>  
>  static const char * const proc_error_type_strs[] = {
> @@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
>   printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
>  }
>  
> +#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
> +static const char * const arm_reg_ctx_strs[] = {
> + "AArch32 general purpose registers",
> + "AArch32 EL1 context registers",
> + "AArch32 EL2 context registers",
> + "AArch32 secure context registers",
> + "AArch64 general purpose registers",
> + "AArch64 EL1 context registers",
> + "AArch64 EL2 context registers",
> + "AArch64 EL3 context registers",
> + "Misc. system register structure",
> +};
> +
> +static void cper_print_proc_arm(const char *pfx,
> + const struct cper_sec_proc_arm *proc)
> +{
> + int i, len, max_ctx_type;
> + struct cper_arm_err_info *err_info;
> + struct cper_arm_ctx_info *ctx_info;
> + char newpfx[64];
> +
> + printk("%ssection length: %d\n", pfx, proc->section_length);

We need to dump section length because?

> + printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
> +
> + len = proc->section_length - (sizeof(*proc) +
> + proc->err_info_num * (sizeof(*err_info)));
> + if (len < 0) {
> + printk("%ssection length is too small\n", pfx);

Now here we *can* dump it.

> + printk("%sfirmware-generated error record is incorrect\n", pfx);
> + printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
> + return;
> + }
> +
> + if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
> + printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);


< newline here.

Also, what is MPIDR and can it be written in a more user-friendly manner
and not be an abbreviation?

> + if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
> + printk("%serror affinity level: %d\n", pfx,
> + proc->affinity_level);
> + if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
> + printk("%srunning state: 0x%x\n", pfx, proc->running_state);
> + printk("%sPSCI state: %d\n", pfx, proc->psci_state);

One more abbreviation. Please consider whether having the abbreviations
or actually writing them out is more user-friendly.

> + }
> +
> + snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

That INDENT_SP thing is just silly, someone should kill it.

> +
> + err_info = (struct cper_arm_err_info *)(proc + 1);
> + for (i = 0; i < proc->err_info_num; i++) {
> + printk("%sError info structure %d:\n", pfx, i);
> + printk("%sversion:%d\n", newpfx, err_info->version);
> + printk("%slength:%d\n", newpfx, err_info->length);

< newline here.

Why do we even dump version and info for *every* err_info structure?

> + if (err_info->validation_bits &
> + CPER_ARM_INFO_VALID_MULTI_ERR) {
> + if (err_info->multiple_error == 0)
> + printk("%ssingle error\n", newpfx);
> + else if (err_info->multiple_error == 1)
> + printk("%smultiple errors\n", newpfx);
> + else
> + printk("%smultiple errors count:%u\n",
> + newpfx, err_info->multiple_error);

So this can be simply: "num errors: %d", err_info->multiple_error+1...

Without checking CPER_ARM_INFO_VALID_MULTI_ERR.

> + }

< newline here.

> + if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
> + if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
> + 

Re: [PATCH V15 04/11] efi: parse ARM processor error

2017-04-21 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 05:05:16PM -0600, Tyler Baicar wrote:
> Add support for ARM Common Platform Error Record (CPER).
> UEFI 2.6 specification adds support for ARM specific
> processor error information to be reported as part of the
> CPER records. This provides more detail on for processor error logs.
> 
> Signed-off-by: Tyler Baicar 
> CC: Jonathan (Zhixiong) Zhang 
> Reviewed-by: James Morse 
> Reviewed-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/cper.c | 135 
> 
>  include/linux/cper.h|  54 ++
>  2 files changed, 189 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 46585f9..f959185 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>  static const char * const proc_type_strs[] = {
>   "IA32/X64",
>   "IA64",
> + "ARM",
>  };
>  
>  static const char * const proc_isa_strs[] = {
>   "IA32",
>   "IA64",
>   "X64",
> + "ARM A32/T32",
> + "ARM A64",
>  };
>  
>  static const char * const proc_error_type_strs[] = {
> @@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
>   printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
>  }
>  
> +#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
> +static const char * const arm_reg_ctx_strs[] = {
> + "AArch32 general purpose registers",
> + "AArch32 EL1 context registers",
> + "AArch32 EL2 context registers",
> + "AArch32 secure context registers",
> + "AArch64 general purpose registers",
> + "AArch64 EL1 context registers",
> + "AArch64 EL2 context registers",
> + "AArch64 EL3 context registers",
> + "Misc. system register structure",
> +};
> +
> +static void cper_print_proc_arm(const char *pfx,
> + const struct cper_sec_proc_arm *proc)
> +{
> + int i, len, max_ctx_type;
> + struct cper_arm_err_info *err_info;
> + struct cper_arm_ctx_info *ctx_info;
> + char newpfx[64];
> +
> + printk("%ssection length: %d\n", pfx, proc->section_length);

We need to dump section length because?

> + printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
> +
> + len = proc->section_length - (sizeof(*proc) +
> + proc->err_info_num * (sizeof(*err_info)));
> + if (len < 0) {
> + printk("%ssection length is too small\n", pfx);

Now here we *can* dump it.

> + printk("%sfirmware-generated error record is incorrect\n", pfx);
> + printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
> + return;
> + }
> +
> + if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
> + printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);


< newline here.

Also, what is MPIDR and can it be written in a more user-friendly manner
and not be an abbreviation?

> + if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
> + printk("%serror affinity level: %d\n", pfx,
> + proc->affinity_level);
> + if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
> + printk("%srunning state: 0x%x\n", pfx, proc->running_state);
> + printk("%sPSCI state: %d\n", pfx, proc->psci_state);

One more abbreviation. Please consider whether having the abbreviations
or actually writing them out is more user-friendly.

> + }
> +
> + snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

That INDENT_SP thing is just silly, someone should kill it.

> +
> + err_info = (struct cper_arm_err_info *)(proc + 1);
> + for (i = 0; i < proc->err_info_num; i++) {
> + printk("%sError info structure %d:\n", pfx, i);
> + printk("%sversion:%d\n", newpfx, err_info->version);
> + printk("%slength:%d\n", newpfx, err_info->length);

< newline here.

Why do we even dump version and info for *every* err_info structure?

> + if (err_info->validation_bits &
> + CPER_ARM_INFO_VALID_MULTI_ERR) {
> + if (err_info->multiple_error == 0)
> + printk("%ssingle error\n", newpfx);
> + else if (err_info->multiple_error == 1)
> + printk("%smultiple errors\n", newpfx);
> + else
> + printk("%smultiple errors count:%u\n",
> + newpfx, err_info->multiple_error);

So this can be simply: "num errors: %d", err_info->multiple_error+1...

Without checking CPER_ARM_INFO_VALID_MULTI_ERR.

> + }

< newline here.

> + if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
> + if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
> + printk("%sfirst error captured\n", newpfx);
> + if (err_info->flags & 

[PATCH V15 04/11] efi: parse ARM processor error

2017-04-18 Thread Tyler Baicar
Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Reviewed-by: Ard Biesheuvel 
---
 drivers/firmware/efi/cper.c | 135 
 include/linux/cper.h|  54 ++
 2 files changed, 189 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 46585f9..f959185 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 static const char * const proc_type_strs[] = {
"IA32/X64",
"IA64",
+   "ARM",
 };
 
 static const char * const proc_isa_strs[] = {
"IA32",
"IA64",
"X64",
+   "ARM A32/T32",
+   "ARM A64",
 };
 
 static const char * const proc_error_type_strs[] = {
@@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
+#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
+static const char * const arm_reg_ctx_strs[] = {
+   "AArch32 general purpose registers",
+   "AArch32 EL1 context registers",
+   "AArch32 EL2 context registers",
+   "AArch32 secure context registers",
+   "AArch64 general purpose registers",
+   "AArch64 EL1 context registers",
+   "AArch64 EL2 context registers",
+   "AArch64 EL3 context registers",
+   "Misc. system register structure",
+};
+
+static void cper_print_proc_arm(const char *pfx,
+   const struct cper_sec_proc_arm *proc)
+{
+   int i, len, max_ctx_type;
+   struct cper_arm_err_info *err_info;
+   struct cper_arm_ctx_info *ctx_info;
+   char newpfx[64];
+
+   printk("%ssection length: %d\n", pfx, proc->section_length);
+   printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
+
+   len = proc->section_length - (sizeof(*proc) +
+   proc->err_info_num * (sizeof(*err_info)));
+   if (len < 0) {
+   printk("%ssection length is too small\n", pfx);
+   printk("%sfirmware-generated error record is incorrect\n", pfx);
+   printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+   return;
+   }
+
+   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+   printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);
+   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+   printk("%serror affinity level: %d\n", pfx,
+   proc->affinity_level);
+   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+   printk("%srunning state: 0x%x\n", pfx, proc->running_state);
+   printk("%sPSCI state: %d\n", pfx, proc->psci_state);
+   }
+
+   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+
+   err_info = (struct cper_arm_err_info *)(proc + 1);
+   for (i = 0; i < proc->err_info_num; i++) {
+   printk("%sError info structure %d:\n", pfx, i);
+   printk("%sversion:%d\n", newpfx, err_info->version);
+   printk("%slength:%d\n", newpfx, err_info->length);
+   if (err_info->validation_bits &
+   CPER_ARM_INFO_VALID_MULTI_ERR) {
+   if (err_info->multiple_error == 0)
+   printk("%ssingle error\n", newpfx);
+   else if (err_info->multiple_error == 1)
+   printk("%smultiple errors\n", newpfx);
+   else
+   printk("%smultiple errors count:%u\n",
+   newpfx, err_info->multiple_error);
+   }
+   if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
+   printk("%sfirst error captured\n", newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_LAST)
+   printk("%slast error captured\n", newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_PROPAGATED)
+   printk("%spropagated error captured\n",
+  newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_OVERFLOW)
+   printk("%soverflow occurred, error info is 
incomplete\n",
+  newpfx);
+   }
+   printk("%serror_type: %d, %s\n", newpfx, err_info->type,
+   err_info->type < ARRAY_SIZE(proc_error_type_strs) ?
+ 

[PATCH V15 04/11] efi: parse ARM processor error

2017-04-18 Thread Tyler Baicar
Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Reviewed-by: Ard Biesheuvel 
---
 drivers/firmware/efi/cper.c | 135 
 include/linux/cper.h|  54 ++
 2 files changed, 189 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 46585f9..f959185 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 static const char * const proc_type_strs[] = {
"IA32/X64",
"IA64",
+   "ARM",
 };
 
 static const char * const proc_isa_strs[] = {
"IA32",
"IA64",
"X64",
+   "ARM A32/T32",
+   "ARM A64",
 };
 
 static const char * const proc_error_type_strs[] = {
@@ -184,6 +187,128 @@ static void cper_print_proc_generic(const char *pfx,
printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
+#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
+static const char * const arm_reg_ctx_strs[] = {
+   "AArch32 general purpose registers",
+   "AArch32 EL1 context registers",
+   "AArch32 EL2 context registers",
+   "AArch32 secure context registers",
+   "AArch64 general purpose registers",
+   "AArch64 EL1 context registers",
+   "AArch64 EL2 context registers",
+   "AArch64 EL3 context registers",
+   "Misc. system register structure",
+};
+
+static void cper_print_proc_arm(const char *pfx,
+   const struct cper_sec_proc_arm *proc)
+{
+   int i, len, max_ctx_type;
+   struct cper_arm_err_info *err_info;
+   struct cper_arm_ctx_info *ctx_info;
+   char newpfx[64];
+
+   printk("%ssection length: %d\n", pfx, proc->section_length);
+   printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
+
+   len = proc->section_length - (sizeof(*proc) +
+   proc->err_info_num * (sizeof(*err_info)));
+   if (len < 0) {
+   printk("%ssection length is too small\n", pfx);
+   printk("%sfirmware-generated error record is incorrect\n", pfx);
+   printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+   return;
+   }
+
+   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+   printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);
+   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+   printk("%serror affinity level: %d\n", pfx,
+   proc->affinity_level);
+   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+   printk("%srunning state: 0x%x\n", pfx, proc->running_state);
+   printk("%sPSCI state: %d\n", pfx, proc->psci_state);
+   }
+
+   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+
+   err_info = (struct cper_arm_err_info *)(proc + 1);
+   for (i = 0; i < proc->err_info_num; i++) {
+   printk("%sError info structure %d:\n", pfx, i);
+   printk("%sversion:%d\n", newpfx, err_info->version);
+   printk("%slength:%d\n", newpfx, err_info->length);
+   if (err_info->validation_bits &
+   CPER_ARM_INFO_VALID_MULTI_ERR) {
+   if (err_info->multiple_error == 0)
+   printk("%ssingle error\n", newpfx);
+   else if (err_info->multiple_error == 1)
+   printk("%smultiple errors\n", newpfx);
+   else
+   printk("%smultiple errors count:%u\n",
+   newpfx, err_info->multiple_error);
+   }
+   if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
+   printk("%sfirst error captured\n", newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_LAST)
+   printk("%slast error captured\n", newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_PROPAGATED)
+   printk("%spropagated error captured\n",
+  newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_OVERFLOW)
+   printk("%soverflow occurred, error info is 
incomplete\n",
+  newpfx);
+   }
+   printk("%serror_type: %d, %s\n", newpfx, err_info->type,
+   err_info->type < ARRAY_SIZE(proc_error_type_strs) ?
+   proc_error_type_strs[err_info->type] : "unknown");
+   if