On Tue, 2017-11-21 at 09:55 +0000, Alex Bennée wrote:
> Richard Purdie <richard.pur...@linuxfoundation.org> writes:
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
> Code should be using cpu_interrupt() to change things but we have
> plenty
> of examples in the code of messing with cpu->interrupt_request
> directly
> which is often why the assert() in cpu_interrupt() doesn't get a
> chance
> to fire.
> 
> The two most prolific direct users seems to be i386 and ppc.
> 
> The i386 cases are all most likely OK as it tends to be in KVM
> emulation
> code where by definition the BQL is already held by the time you get
> there. For PPC it will depend on how you got there. The
> multi-thread-tcg.txt document describes the approach:
> 
>   Emulated hardware state
>   -----------------------
> 
>   Currently thanks to KVM work any access to IO memory is
> automatically
>   protected by the global iothread mutex, also known as the BQL (Big
>   Qemu Lock). Any IO region that doesn't use global mutex is expected
> to
>   do its own locking.
> 
>   However IO memory isn't the only way emulated hardware state can be
>   modified. Some architectures have model specific registers that
>   trigger hardware emulation features. Generally any translation
> helper
>   that needs to update more than a single vCPUs of state should take
> the
>   BQL.
> 
>   As the BQL, or global iothread mutex is shared across the system we
>   push the use of the lock as far down into the TCG code as possible
> to
>   minimise contention.
> 
>   (Current solution)
> 
>   MMIO access automatically serialises hardware emulation by way of
> the
>   BQL. Currently ARM targets serialise all ARM_CP_IO register
> accesses
>   and also defer the reset/startup of vCPUs to the vCPU context by
> way
>   of async_run_on_cpu().
> 
>   Updates to interrupt state are also protected by the BQL as they
> can
>   often be cross vCPU.
> 
> So basically it comes down to the call-path into your final
> cpu_interrupt() call which is why I guess your doing the:
> 
>   if (!qemu_mutex_iothread_locked()) {
>      qemu_mutex_lock_iothread();
>      cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
>      qemu_mutex_unlock_iothread();
>   }
> 
> dance. I suspect the helper functions are called both from device
> emulation (where BQL is held) and from vCPU helpers (where it is
> currently not).
> 
> So I suggest the fix is:
> 
>  1. Convert sites doing their own manipulation of
>  cpu->interrupt_request() to call the helper instead
>  2. If helper directly called from TCG code:
>       - take BQL before calling cpu_interrupt(), release after
>     Else if helper shared between MMIO/TCG paths
>       - take BQL from TCG path, call helper, release after
> 
> It might be there is some sensible re-factoring that could be done to
> make things clearer but I'll leave that to the PPC experts to weigh
> in
> on.
> 
> Hope that helps!

It does indeed, thanks. I've sent out a proposed patch which does the
above so people can review it and see if its the right thing to do.
Certainly its working fine locally.

Cheers,

Richard

Reply via email to