On 12/11/15 09:09, Thomas Huth wrote:
> On 11/11/15 18:16, Aravinda Prasad wrote:
>> Memory error such as bit flips that cannot be corrected
>> by hardware are passed on to the kernel for handling.
>> If the memory address in error belongs to guest then
>> guest kernel is responsible for taking suitable action.
>> Patch [1] enhances KVM to exit guest with exit reason
>> set to KVM_EXIT_NMI in such cases.
>>
>> This patch handles KVM_EXIT_NMI exit. If the guest OS
>> has registered the machine check handling routine by
>> calling "ibm,nmi-register", then the handler builds
>> the error log and invokes the registered handler else
>> invokes the handler at 0x200.
>>
>> [1] http://marc.info/?l=kvm-ppc&m=144726114408289
>>
>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
>> ---
>>  target-ppc/kvm.c     |   69 +++++++++++++++++++++++++++++++++++++++++++
>>  target-ppc/kvm_ppc.h |   81 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 150 insertions(+)
>>
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index 110436d..e2e5170 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -1665,6 +1665,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
>> *run)
>>          ret = 0;
>>          break;
>>  
>> +    case KVM_EXIT_NMI:
>> +        DPRINTF("handle NMI exception\n");
>> +        ret = kvm_handle_nmi(cpu);
>> +        break;
>> +
>>      default:
>>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>          ret = -1;
>> @@ -2484,3 +2489,67 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>>  {
>>      return data & 0xffff;
>>  }
>> +
>> +int kvm_handle_nmi(PowerPCCPU *cpu)
>> +{
>> +    struct rtas_mc_log mc_log;
>> +    CPUPPCState *env = &cpu->env;
>> +    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +
>> +    cpu_synchronize_state(CPU(ppc_env_get_cpu(env)));
>> +
>> +    /* Properly set bits in MSR before we invoke the handler */
>> +    env->msr = 0;
>> +
>> +    if (!(*pcc->interrupts_big_endian)(cpu)) {
>> +        env->msr |= (1ULL << MSR_LE);
>> +    }
>> +
>> +#ifdef TARGET_PPC64
>> +    env->msr |= (1ULL << MSR_SF);
>> +#endif
>> +
>> +    if (!spapr->guest_machine_check_addr) {
>> +        /*
>> +         * If OS has not registered with "ibm,nmi-register"
>> +         * jump to 0x200
>> +         */
> 
> Shouldn't you also check MSR_ME here first and enter checkstop when
> machine checks are disabled?
> Also I think you have to set up some more registers for machine check
> interrupts, like SRR0 and SRR1?
> 
>> +        env->nip = 0x200;
>> +        return 0;
>> +    }
>> +
>> +    qemu_mutex_lock(&spapr->mc_in_progress);
> 
> Using a mutex here is definitely wrong. The kvm_arch_handle_exit() code
> is run under the Big QEMU Lockā„¢ (see qemu_mutex_lock_iothread() in
> kvm_cpu_exec()),

In case you're looking for the calls, I just noticed that the
qemu_mutex_lock_iothread() have recently been pushed into
kvm_arch_handle_exit() itself.

> so if you would ever get one thread waiting for this
> mutex here, it could never be unlocked again in rtas_ibm_nmi_interlock()
> because the other code would wait forever to get the BQL ==> Deadlock.
> 
> I think if you want to be able to handle multiple NMIs at once, you
> likely need something like an error log per CPU instead. And if an NMI
> happens one CPU while there is already a NMI handler running on the very
> same CPU, you could likely simply track this with an boolean variable
> and put the CPU into checkstop if this happens?

Ok, I now had a look into the LoPAPR spec, and if I've got that right,
you really have to serialize the NMIs in case they happen at multiple
CPUs at the same time. So I guess the best thing you can do here is
something like:

   while (spapr->mc_in_progress) {
       /*
        * There is already another NMI in progress, thus we need
        * to yield here to wait until it has been finsihed
        */
       qemu_mutex_unlock_iothread();
       usleep(10);
       qemu_mutex_lock_iothread();
   }
   spapr->mc_in_progress = true;

Also LoPAPR talks about 'subsequent processors report "fatal error
previously reported"', so maybe the other processors should report that
condition in this case?
And of course you've also got to check that the same CPU is not getting
multiple NMIs before the interlock function has been called again.

 Thomas


Reply via email to