On 23.11.2012, at 22:42, Paul Mackerras wrote: > On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote: >> >> On 22.11.2012, at 10:25, Paul Mackerras wrote: >> >>> + /* Do they have an SLB shadow buffer registered? */ >>> + slb = vcpu->arch.slb_shadow.pinned_addr; >>> + if (!slb) >>> + return; >> >> Mind to explain this case? What happens here? Do we leave the guest with an >> empty SLB? Why would this ever happen? What happens next as soon as we go >> back into the guest? > > Yes, we leave the guest with an empty SLB, the access gets retried and > this time the guest gets an SLB miss interrupt, which it can hopefully > handle using an SLB miss handler that runs entirely in real mode. > This could happen for instance while the guest is in SLOF or yaboot or > some other code that runs basically in real mode but occasionally > turns the MMU on for some accesses, and happens to have a bug where it > creates a duplicate SLB entry.
Is this what pHyp does? Also, is this what we want? Why don't we populate an #MC into the guest so it knows it did something wrong? Alex > >>> + /* Sanity check */ >>> + n = slb->persistent; >>> + if (n > SLB_MIN_SIZE) >>> + n = SLB_MIN_SIZE; >> >> Please use a max() macro here. > > OK. > >>> + rb = 0x800; /* IS field = 0b10, flush congruence class */ >>> + for (i = 0; i < 128; ++i) { >> >> Please put up a #define for this. POWER7_TLB_SIZE or so. Is there any way to >> fetch that number from an SPR? I don't really want to have a p7+ and a p8 >> function in here too ;). >> >>> + asm volatile("tlbiel %0" : : "r" (rb)); >>> + rb += 0x1000; >> >> I assume this also means (1 << TLBIE_ENTRY_SHIFT)? Would be nice to keep the >> code readable without guessing :). > > The 0x800 and 0x1000 are taken from the architecture - it defines > fields in the RB value for the flush type and TLB index. The 128 is > POWER7-specific and isn't in any SPR that I know of. Eventually we'll > probably have to put it (the number of TLB congruence classes) in the > cputable, but for now I'll just do a define. > >> So I take it that p7 does not implement tlbia? > > Correct. > >> So we never return 0? How about ECC errors and the likes? Wouldn't those >> also be #MCs that the host needs to handle? > > Yes, true. In fact the OPAL firmware gets to see the machine checks > before we do (see the opal_register_exception_handler() calls in > arch/powerpc/platforms/powernv/opal.c), so it should have already > handled recoverable things like L1 cache parity errors. > > I'll make the function return 0 if there's an error that it doesn't > know about. > >>> ld r8, HSTATE_VMHANDLER(r13) >>> ld r7, HSTATE_HOST_MSR(r13) >>> >>> cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL >>> +BEGIN_FTR_SECTION >>> beq 11f >>> - cmpwi r12, BOOK3S_INTERRUPT_MACHINE_CHECK >>> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) >> >> Mind to explain the logic that happens here? Do we get external interrupts >> on 970? If not, the cmpwi should also be inside the FTR section. Also, if we >> do a beq here, why do the beqctr below again? > > I was making it not call the host kernel machine check handler if it > was a machine check that pulled us out of the guest. In fact we > probably do still want to call the handler, but we don't want to jump > to 0x200, since that has been patched by OPAL, and we don't want to > make OPAL think we just got another machine check. Instead we would > need to jump to machine_check_pSeries. > > The feature section is because POWER7 sets HSRR0/1 on external > interrupts, whereas PPC970 sets SRR0/1. > > Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html