Re: [PATCH] powerpc/mce: check if event info is valid
Excerpts from Michael Ellerman's message of October 7, 2021 10:09 pm: > Ganesh writes: >> On 8/6/21 6:53 PM, Ganesh Goudar wrote: >> >>> Check if the event info is valid before printing the >>> event information. When a fwnmi enabled nested kvm guest >>> hits a machine check exception L0 and L2 would generate >>> machine check event info, But L1 would not generate any >>> machine check event info as it won't go through 0x200 >>> vector and prints some unwanted message. >>> >>> To fix this, 'in_use' variable in machine check event info is >>> no more in use, rename it to 'valid' and check if the event >>> information is valid before logging the event information. >>> >>> without this patch L1 would print following message for >>> exceptions encountered in L2, as event structure will be >>> empty in L1. >>> >>> "Machine Check Exception, Unknown event version 0". >>> >>> Signed-off-by: Ganesh Goudar >>> --- >> >> Hi mpe, Any comments on this patch. > > The variable rename is a bit of a distraction. > > But ignoring that, how do we end up processing a machine_check_event > that has in_use = 0? > > You don't give much detail on what call path you're talking about. I > guess we're coming in via the calls in the KVM code? > > In the definition of kvm_vcpu_arch we have: > > struct machine_check_event mce_evt; /* Valid if trap == 0x200 */ > > And you said we're not going via 0x200 in L1. But so shouldn't we be > teaching the KVM code not to use mce_evt when trap is not 0x200? I'm not sure we want the MCE to skip the L1 either. It should match the L0 hypervisor behaviour as closely as reasonably possible. We might have to teach the KVM pseries path to do something about the 0x200 before the common HV guest exit handler (which is where the L1 message comes from). Thanks, Nick
Re: [PATCH] powerpc/mce: check if event info is valid
Ganesh writes: > On 8/6/21 6:53 PM, Ganesh Goudar wrote: > >> Check if the event info is valid before printing the >> event information. When a fwnmi enabled nested kvm guest >> hits a machine check exception L0 and L2 would generate >> machine check event info, But L1 would not generate any >> machine check event info as it won't go through 0x200 >> vector and prints some unwanted message. >> >> To fix this, 'in_use' variable in machine check event info is >> no more in use, rename it to 'valid' and check if the event >> information is valid before logging the event information. >> >> without this patch L1 would print following message for >> exceptions encountered in L2, as event structure will be >> empty in L1. >> >> "Machine Check Exception, Unknown event version 0". >> >> Signed-off-by: Ganesh Goudar >> --- > > Hi mpe, Any comments on this patch. The variable rename is a bit of a distraction. But ignoring that, how do we end up processing a machine_check_event that has in_use = 0? You don't give much detail on what call path you're talking about. I guess we're coming in via the calls in the KVM code? In the definition of kvm_vcpu_arch we have: struct machine_check_event mce_evt; /* Valid if trap == 0x200 */ And you said we're not going via 0x200 in L1. But so shouldn't we be teaching the KVM code not to use mce_evt when trap is not 0x200? cheers
Re: [PATCH] powerpc/mce: check if event info is valid
On 8/6/21 6:53 PM, Ganesh Goudar wrote: Check if the event info is valid before printing the event information. When a fwnmi enabled nested kvm guest hits a machine check exception L0 and L2 would generate machine check event info, But L1 would not generate any machine check event info as it won't go through 0x200 vector and prints some unwanted message. To fix this, 'in_use' variable in machine check event info is no more in use, rename it to 'valid' and check if the event information is valid before logging the event information. without this patch L1 would print following message for exceptions encountered in L2, as event structure will be empty in L1. "Machine Check Exception, Unknown event version 0". Signed-off-by: Ganesh Goudar --- Hi mpe, Any comments on this patch.
[PATCH] powerpc/mce: check if event info is valid
Check if the event info is valid before printing the event information. When a fwnmi enabled nested kvm guest hits a machine check exception L0 and L2 would generate machine check event info, But L1 would not generate any machine check event info as it won't go through 0x200 vector and prints some unwanted message. To fix this, 'in_use' variable in machine check event info is no more in use, rename it to 'valid' and check if the event information is valid before logging the event information. without this patch L1 would print following message for exceptions encountered in L2, as event structure will be empty in L1. "Machine Check Exception, Unknown event version 0". Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm/mce.h | 2 +- arch/powerpc/kernel/mce.c | 7 +-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 331d944280b8..3646f53f228f 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -113,7 +113,7 @@ enum MCE_LinkErrorType { struct machine_check_event { enum MCE_Versionversion:8; - u8 in_use; + u8 valid; enum MCE_Severity severity:8; enum MCE_Initiator initiator:8; enum MCE_ErrorType error_type:8; diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 47a683cd00d2..b778394a06b5 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -114,7 +114,7 @@ void save_mce_event(struct pt_regs *regs, long handled, mce->srr0 = nip; mce->srr1 = regs->msr; mce->gpr3 = regs->gpr[3]; - mce->in_use = 1; + mce->valid = 1; mce->cpu = get_paca()->paca_index; /* Mark it recovered if we have handled it and MSR(RI=1). */ @@ -202,7 +202,7 @@ int get_mce_event(struct machine_check_event *mce, bool release) if (mce) *mce = *mc_evt; if (release) - mc_evt->in_use = 0; + mc_evt->valid = 0; ret = 1; } /* Decrement the count to free the slot. */ @@ -413,6 +413,9 @@ void machine_check_print_event_info(struct machine_check_event *evt, "Probable Software error (some chance of hardware cause)", }; + if (!evt->valid) + return; + /* Print things out */ if (evt->version != MCE_V1) { pr_err("Machine Check Exception, Unknown event version %d !\n", -- 2.31.1