On 23.11.2012, at 22:42, Paul Mackerras wrote:

> On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote:
>> 
>> On 22.11.2012, at 10:25, Paul Mackerras wrote:
>> 
>>> +   /* Do they have an SLB shadow buffer registered? */
>>> +   slb = vcpu->arch.slb_shadow.pinned_addr;
>>> +   if (!slb)
>>> +           return;
>> 
>> Mind to explain this case? What happens here? Do we leave the guest with an 
>> empty SLB? Why would this ever happen? What happens next as soon as we go 
>> back into the guest?
> 
> Yes, we leave the guest with an empty SLB, the access gets retried and
> this time the guest gets an SLB miss interrupt, which it can hopefully
> handle using an SLB miss handler that runs entirely in real mode.
> This could happen for instance while the guest is in SLOF or yaboot or
> some other code that runs basically in real mode but occasionally
> turns the MMU on for some accesses, and happens to have a bug where it
> creates a duplicate SLB entry.

Is this what pHyp does? Also, is this what we want? Why don't we populate an 
#MC into the guest so it knows it did something wrong?


Alex

> 
>>> +   /* Sanity check */
>>> +   n = slb->persistent;
>>> +   if (n > SLB_MIN_SIZE)
>>> +           n = SLB_MIN_SIZE;
>> 
>> Please use a max() macro here.
> 
> OK.
> 
>>> +   rb = 0x800;             /* IS field = 0b10, flush congruence class */
>>> +   for (i = 0; i < 128; ++i) {
>> 
>> Please put up a #define for this. POWER7_TLB_SIZE or so. Is there any way to 
>> fetch that number from an SPR? I don't really want to have a p7+ and a p8 
>> function in here too ;).
>> 
>>> +           asm volatile("tlbiel %0" : : "r" (rb));
>>> +           rb += 0x1000;
>> 
>> I assume this also means (1 << TLBIE_ENTRY_SHIFT)? Would be nice to keep the 
>> code readable without guessing :).
> 
> The 0x800 and 0x1000 are taken from the architecture - it defines
> fields in the RB value for the flush type and TLB index.  The 128 is
> POWER7-specific and isn't in any SPR that I know of.  Eventually we'll
> probably have to put it (the number of TLB congruence classes) in the
> cputable, but for now I'll just do a define.
> 
>> So I take it that p7 does not implement tlbia?
> 
> Correct.
> 
>> So we never return 0? How about ECC errors and the likes? Wouldn't those 
>> also be #MCs that the host needs to handle?
> 
> Yes, true.  In fact the OPAL firmware gets to see the machine checks
> before we do (see the opal_register_exception_handler() calls in
> arch/powerpc/platforms/powernv/opal.c), so it should have already
> handled recoverable things like L1 cache parity errors.
> 
> I'll make the function return 0 if there's an error that it doesn't
> know about.
> 
>>>     ld      r8, HSTATE_VMHANDLER(r13)
>>>     ld      r7, HSTATE_HOST_MSR(r13)
>>> 
>>>     cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
>>> +BEGIN_FTR_SECTION
>>>     beq     11f
>>> -   cmpwi   r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
>> 
>> Mind to explain the logic that happens here? Do we get external interrupts 
>> on 970? If not, the cmpwi should also be inside the FTR section. Also, if we 
>> do a beq here, why do the beqctr below again?
> 
> I was making it not call the host kernel machine check handler if it
> was a machine check that pulled us out of the guest.  In fact we
> probably do still want to call the handler, but we don't want to jump
> to 0x200, since that has been patched by OPAL, and we don't want to
> make OPAL think we just got another machine check.  Instead we would
> need to jump to machine_check_pSeries.
> 
> The feature section is because POWER7 sets HSRR0/1 on external
> interrupts, whereas PPC970 sets SRR0/1.
> 
> Paul.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to