* Chris J Arges <chris.j.ar...@canonical.com> wrote:

> /sys/module/kvm_intel/parameters/enable_apicv on the affected 
> hardware is not enabled, and unfortunately my hardware doesn't have 
> the necessary features to enable it. So we are dealing with KVM's 
> lapic implementation only.

That's actually pretty fortunate, as we don't have to worry about 
hardware state nearly as much!

> FYI, I'm working on getting better data at the moment and here is my approach:
> * For the L0 kernel:
>  - In arch/x86/kvm/lapic.c, I enabled 'apic_debug' to get more output (and 
> print
>    the addresses of various useful structures)
>  - Setup crash to live dump kvm_lapic structures and associated registers for
>    both vCPUs

It would also be nice to double check the stuck vCPU's normal CPU 
state: is it truly able to receive interrupts? (IRQ flags are on, or 
is it sitting in the idle loop, etc.?)

If the IRQ flag (in EFLAGS) is off then the vCPU is not able to 
receive interrupts, regardless of local APIC state.

> * For the L1 kernel:
>  - Dump a stacktrace when we detect a lockup.
>  - Detect a lockup and try to not alter the state.
>  - Have a reliable signal such that the L0 hypervisor can dump the lapic
>    structures and registers when csd_lock_wait detects a softlockup.

I'd also suggest adding a printk() to IPI receipt, to make sure it's 
not the CSD code that is not getting called into after the IPI resend 
attempt. To make sure you only get messages after the CPU got stuck, 
add a 'locked_up' flag that signals this, and only print the messages 
if the lockup scenario is happening.

I'd do it by adding something like this to 
kernel/smp.c::generic_smp_call_function_single_interrupt():

        if (csd_locked_up)
                printk("CSD: Function call IPI callback on CPU#%d\n", 
raw_smp_processor_id());

Having this message in place would ensure that the IPI indeed did not 
get generated on the stuck vCPU. (Because we'd not get this message.)

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to