Hi Tyler, Thanks for your comments and testing.
On 2017/4/15 4:36, Baicar, Tyler wrote: > On 3/30/2017 4:31 AM, Xie XiuQi wrote: >> Add a new trace event for ARM processor error information, so that >> the user will know what error occurred. With this information the >> user may take appropriate action. >> >> These trace events are consistent with the ARM processor error >> information table which defined in UEFI 2.6 spec section N.2.4.4.1. >> >> --- >> v2: add trace enabled condition as Steven's suggestion. >> fix a typo. >> --- >> >> Cc: Steven Rostedt <rost...@goodmis.org> >> Cc: Tyler Baicar <tbai...@codeaurora.org> >> Signed-off-by: Xie XiuQi <xiexi...@huawei.com> >> --- > ... >> +#define ARM_PROC_ERR_TYPE \ >> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \ >> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \ >> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \ >> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" ) >> + >> +#define ARM_PROC_ERR_FLAGS \ >> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \ >> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \ >> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \ >> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" ) >> + > Hello Xie XiuQi, > > This isn't compiling for me because of these definitions. Here you are using > ARM_*, but below in the TP_printk you are using ARCH_*. The compiler > complains the ARCH_* ones are undefined: > > ./include/trace/../../include/ras/ras_event.h:278:37: error: > 'ARCH_PROC_ERR_TYPE' undeclared (first use in this function) > __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE), > ./include/trace/../../include/ras/ras_event.h:280:38: error: > 'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function) > __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS), Sorry, it's a typo. It should be ARM_xxx. > >> +/* >> + * First define the enums in MM_ACTION_RESULT to be exported to userspace >> + * via TRACE_DEFINE_ENUM(). >> + */ >> +#undef EM >> +#undef EMe >> +#define EM(a, b) TRACE_DEFINE_ENUM(a); >> +#define EMe(a, b) TRACE_DEFINE_ENUM(a); >> + >> +ARM_PROC_ERR_TYPE >> +ARM_PROC_ERR_FLAGS > Are the above two lines supposed to be here? >> + >> +/* >> + * Now redefine the EM() and EMe() macros to map the enums to the strings >> + * that will be printed in the output. >> + */ >> +#undef EM >> +#undef EMe >> +#define EM(a, b) { a, b }, >> +#define EMe(a, b) { a, b } >> + >> +TRACE_EVENT(arm_proc_err, > I think it would be better to keep this similar to the naming of the current > RAS trace events (right now we have mc_event, arm_event, aer_event, etc.). I > would suggest using "arm_err_info_event" since this is handling the error > information structures of the arm errors. >> + >> + TP_PROTO(const struct cper_arm_err_info *err), >> + >> + TP_ARGS(err), >> + >> + TP_STRUCT__entry( >> + __field(u8, type) >> + __field(u16, multiple_error) >> + __field(u8, flags) >> + __field(u64, error_info) >> + __field(u64, virt_fault_addr) >> + __field(u64, physical_fault_addr) > Validation bits should also be a part of this structure that way user space > tools will know which of these fields are valid. Could we use the default value to check the validation which we have checked in TP_fast_assign? >> + ), >> + >> + TP_fast_assign( >> + __entry->type = err->type; >> + >> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR) >> + __entry->multiple_error = err->multiple_error; >> + else >> + __entry->multiple_error = ~0; >> + >> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS) >> + __entry->flags = err->flags; >> + else >> + __entry->flags = ~0; >> + >> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) >> + __entry->error_info = err->error_info; >> + else >> + __entry->error_info = 0ULL; >> + >> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR) >> + __entry->virt_fault_addr = err->virt_fault_addr; >> + else >> + __entry->virt_fault_addr = 0ULL; >> + >> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) >> + __entry->physical_fault_addr = err->physical_fault_addr; >> + else >> + __entry->physical_fault_addr = 0ULL; >> + ), >> + >> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;" > I think the "ARM Processor Error:" part of this should just be removed. > Here's the output with this removed and the trace event renamed to > arm_err_info_event. I think this looks much cleaner and matches the style > used with the arm_event. > > <idle>-0 [020] .ns. 366.592434: arm_event: affinity level: 2; > MPIDR: 0000000000000000; MIDR: 00000000510f8000; running state: 1; PSCI > state: 0 > <idle>-0 [020] .ns. 366.592437: arm_err_info_event: type > cache error; count: 0; flags: 0x3; error info: 0000000000c20058; virtual > address: 0000000000000000; physical address: 0000000000000000 I agree. It looks much better. > > Thanks, > Tyler > -- Thanks, Xie XiuQi