On 08/08/2013 10:21 AM, Paul Mackerras wrote: > On Wed, Aug 07, 2013 at 03:08:15PM +0530, Mahesh J Salgaonkar wrote: >> From: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com> >> >> Move machine check entry point into Linux. So far we were dependent on >> firmware to decode MCE error details and handover the high level info to OS. >> >> This patch introduces early machine check routine that saves the MCE >> information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate >> stack frame on emergency stack and set the r1 accordingly. This allows us >> to be prepared to take another exception without loosing context. One thing >> to note here that, if we get another machine check while ME bit is off then >> we risk a checkstop. Hence we restrict ourselves to save only MCE information >> and turn the ME bit on. > > Some comments below: > >> + * Swicth to emergency stack and handle re-entrancy (though we currently > ^ > Switch > >> + * don't test for overflow). Save MCE registers srr1, srr0, dar and >> + * dsisr and then turn the ME bit on. >> + */ >> +#define __EARLY_MACHINE_CHECK_HANDLER(area, label) \ >> + /* Check if we are laready using emergency stack. */ \ >> + ld r10,PACAEMERGSP(r13); \ >> + subi r10,r10,THREAD_SIZE; \ >> + rldicr r10,r10,0,(63 - THREAD_SHIFT); \ >> + rldicr r11,r1,0,(63 - THREAD_SHIFT); \ >> + cmpd r10,r11; /* Are we using emergency stack? */ \ > > This assumes that r1 contains a host kernel stack pointer value. > However, we could be coming in from a guest, or from userspace, in > which case r1 could have any value at all, and we shouldn't even be > considering using it as our stack pointer.
I see your point. I need to have different mechanism to keep track of emergency stack pointer. May be using some sratch registers or something in paca structure. > > There is another complication, too: if we are coming in from a KVM > guest running on a secondary thread (not thread 0), we will be using > part of the emergency stack already (see kvm_start_guest in > arch/powerpc/kvm/book3s_hv_rmhandlers.S). However, because we have > come in from a guest, r1 contains a guest value, and we would have to > look in KVM data structures to find out how much of the emergency > stack we have already used. I'm not sure at the moment what the best > way to fix this is. How about adding a paca member to maintain current emergency stack pointer in use? > >> + mr r11,r1; /* Save current stack pointer */\ >> + beq 0f; \ >> + ld r1,PACAEMERGSP(r13); /* Use emergency stack */ \ >> +0: subi r1,r1,INT_FRAME_SIZE; /* alloc stack frame */ \ >> + std r11,GPR1(r1); \ >> + std r11,0(r1); /* make stack chain pointer */ \ >> + mfspr r11,SPRN_SRR0; /* Save SRR0 */ \ >> + std r11,_NIP(r1); \ >> + mfspr r11,SPRN_SRR1; /* Save SRR1 */ \ >> + std r11,_MSR(r1); \ >> + mfspr r11,SPRN_DAR; /* Save DAR */ \ >> + std r11,_DAR(r1); \ >> + mfspr r11,SPRN_DSISR; /* Save DSISR */ \ >> + std r11,_DSISR(r1); \ >> + mfmsr r11; /* get MSR value */ \ >> + ori r11,r11,MSR_ME; /* turn on ME bit */ \ >> + ld r12,PACAKBASE(r13); /* get high part of &label */ \ >> + LOAD_HANDLER(r12,label) \ >> + mtspr SPRN_SRR0,r12; \ >> + mtspr SPRN_SRR1,r11; \ >> + rfid; \ >> + b . /* prevent speculative execution */ >> +#define EARLY_MACHINE_CHECK_HANDLER(area, label) \ >> + __EARLY_MACHINE_CHECK_HANDLER(area, label) > > Given this is only used in one place, why make this a macro? Wouldn't > it be simpler and neater to just spell out this code in the place > where you use the macro? Sure, will do that. > >> + /* >> + * Handle machine check early in real mode. We come here with >> + * ME bit on with MMU (IR and DR) off and using emergency stack. >> + */ >> + .align 7 >> + .globl machine_check_handle_early >> +machine_check_handle_early: >> + std r9,_CCR(r1) /* Save CR in stackframe */ >> + std r0,GPR0(r1) /* Save r0, r2 - r10 */ >> + EXCEPTION_PROLOG_COMMON_2(0x200, PACA_EXMC) >> + bl .save_nvgprs >> + addi r3,r1,STACK_FRAME_OVERHEAD >> + bl .machine_check_early >> + cmpdi r3,0 /* Did we handle it? */ >> + ld r9,_MSR(r1) >> + beq 1f >> + ori r9,r9,MSR_RI /* Yes we have handled it. */ > > I don't think this is a good idea. MSR_RI being off means either that > the machine check happened at a time when SRR0 and SRR1 contained live > contents, which have been overwritten by the machine check interrupt, > or else that the CPU was not able to determine a precise state at > which the interrupt could be taken. Neither of those problems would > be fixed by our machine check handler. Agree. > > What was the rationale for setting RI in this situation? When I tested SLB multihit, I got an interrupt with RI=0. In opal_machine_check() the very first check I added is to see if MSR_RI is set or not and return accordingly which ended up in die and panic in machine_check_exception() routine. But, I think I should depend on evt->disposition==MCE_DISPOSITION_RECOVERED and not play with MSR_RI and SRR1. I will fix my code. > >> + /* Move original SRR0 and SRR1 into the respective regs */ >> +1: mtspr SPRN_SRR1,r9 >> + ld r3,_NIP(r1) >> + mtspr SPRN_SRR0,r3 >> + REST_NVGPRS(r1) >> + ld r9,_CTR(r1) >> + mtctr r9 >> + ld r9,_XER(r1) >> + mtxer r9 >> + ld r9,_CCR(r1) >> + mtcr r9 >> +BEGIN_FTR_SECTION >> + ld r9,ORIG_GPR3(r1) >> + mtspr SPRN_CFAR,r9 >> +END_FTR_SECTION_IFSET(CPU_FTR_CFAR) >> + ld r9,_LINK(r1) >> + mtlr r9 >> + REST_GPR(0, r1) >> + REST_8GPRS(2, r1) >> + REST_2GPRS(10, r1) >> + REST_GPR(12, r1) >> + /* >> + * restore orig stack pointer. We have already saved MCE info >> + * to per cpu MCE event buffer. >> + */ >> + ld r1,GPR1(r1) >> + b machine_check_pSeries > > What about r13? I don't see that you have restored it at this point. Will fix it. > >> + >> STD_EXCEPTION_COMMON_ASYNC(0x500, hardware_interrupt, do_IRQ) >> STD_EXCEPTION_COMMON_ASYNC(0x900, decrementer, .timer_interrupt) >> STD_EXCEPTION_COMMON(0x980, hdecrementer, .hdec_interrupt) >> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c >> index bf33c22..1720e08 100644 >> --- a/arch/powerpc/kernel/traps.c >> +++ b/arch/powerpc/kernel/traps.c >> @@ -286,6 +286,18 @@ void system_reset_exception(struct pt_regs *regs) >> >> /* What should we do here? We could issue a shutdown or hard reset. */ >> } >> + >> +/* >> + * This function is called in real mode. Strictly no printk's please. >> + * >> + * regs->nip and regs->msr contains srr0 and ssr1. >> + */ >> +long machine_check_early(struct pt_regs *regs) >> +{ >> + /* TODO: handle/decode machine check reason */ >> + return 0; >> +} >> + >> #endif >> >> /* > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev