Hi,

I work on the Yocto Project and we use qemu to test boot our Linux
images and run tests against them. We've been noticing some instability
for ppc where the images sometimes hang, usually around udevd bring up
time so just after booting into userspace.

To cut a long story short, I've tracked down what I think is the
problem. I believe the decrementer timer stops receiving interrupts so
tasks in our images hang indefinitely as the timer stopped. 

It can be summed up with this line of debug:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000004

It should normally read:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000002

The question is why CPU_INTERRUPT_EXITTB ends up being set when the
lines above this log message clearly sets CPU_INTERRUPT_HARD (via 
cpu_interrupt() ).

I note in cpu.h:

    /* updates protected by BQL */
    uint32_t interrupt_request;

(for struct CPUState)

The ppc code does "cs->interrupt_request |= CPU_INTERRUPT_EXITTB" in 5
places, 3 in excp_helper.c and 2 in helper_regs.h. In all cases,  
g_assert(qemu_mutex_iothread_locked()); fails. If I do something like:

if (!qemu_mutex_iothread_locked()) {
    qemu_mutex_lock_iothread();
    cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
    qemu_mutex_unlock_iothread();
} else {
    cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
}

in these call sites then I can no longer lock qemu up with my test
case.

I suspect the _HARD setting gets overwritten which stops the 
decrementer interrupts being delivered.

I don't know if taking this lock in these situations is going to be bad
for performance and whether such a patch would be right/wrong.

At this point I therefore wanted to seek advice on what the real issue
is here and how to fix it!

Cheers,

Richard




Reply via email to