Re: [Qemu-devel] qemu-system-ppc hangs

Richard Purdie Tue, 21 Nov 2017 09:34:15 -0800

On Tue, 2017-11-21 at 09:55 +0000, Alex Bennée wrote:
> Richard Purdie <richard.pur...@linuxfoundation.org> writes:
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
> Code should be using cpu_interrupt() to change things but we have
> plenty
> of examples in the code of messing with cpu->interrupt_request
> directly
> which is often why the assert() in cpu_interrupt() doesn't get a
> chance
> to fire.
> 
> The two most prolific direct users seems to be i386 and ppc.
> 
> The i386 cases are all most likely OK as it tends to be in KVM
> emulation
> code where by definition the BQL is already held by the time you get
> there. For PPC it will depend on how you got there. The
> multi-thread-tcg.txt document describes the approach:
> 
>   Emulated hardware state
>   -----------------------
> 
>   Currently thanks to KVM work any access to IO memory is
> automatically
>   protected by the global iothread mutex, also known as the BQL (Big
>   Qemu Lock). Any IO region that doesn't use global mutex is expected
> to
>   do its own locking.
> 
>   However IO memory isn't the only way emulated hardware state can be
>   modified. Some architectures have model specific registers that
>   trigger hardware emulation features. Generally any translation
> helper
>   that needs to update more than a single vCPUs of state should take
> the
>   BQL.
> 
>   As the BQL, or global iothread mutex is shared across the system we
>   push the use of the lock as far down into the TCG code as possible
> to
>   minimise contention.
> 
>   (Current solution)
> 
>   MMIO access automatically serialises hardware emulation by way of
> the
>   BQL. Currently ARM targets serialise all ARM_CP_IO register
> accesses
>   and also defer the reset/startup of vCPUs to the vCPU context by
> way
>   of async_run_on_cpu().
> 
>   Updates to interrupt state are also protected by the BQL as they
> can
>   often be cross vCPU.
> 
> So basically it comes down to the call-path into your final
> cpu_interrupt() call which is why I guess your doing the:
> 
>   if (!qemu_mutex_iothread_locked()) {
>      qemu_mutex_lock_iothread();
>      cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
>      qemu_mutex_unlock_iothread();
>   }
> 
> dance. I suspect the helper functions are called both from device
> emulation (where BQL is held) and from vCPU helpers (where it is
> currently not).
> 
> So I suggest the fix is:
> 
>  1. Convert sites doing their own manipulation of
>  cpu->interrupt_request() to call the helper instead
>  2. If helper directly called from TCG code:
>       - take BQL before calling cpu_interrupt(), release after
>     Else if helper shared between MMIO/TCG paths
>       - take BQL from TCG path, call helper, release after
> 
> It might be there is some sensible re-factoring that could be done to
> make things clearer but I'll leave that to the PPC experts to weigh
> in
> on.
> 
> Hope that helps!


It does indeed, thanks. I've sent out a proposed patch which does the
above so people can review it and see if its the right thing to do.
Certainly its working fine locally.

Cheers,

Richard

Re: [Qemu-devel] qemu-system-ppc hangs

Reply via email to