On Tue, 2017-11-21 at 09:55 +0000, Alex Bennée wrote: > Richard Purdie <richard.pur...@linuxfoundation.org> writes: > > At this point I therefore wanted to seek advice on what the real > > issue > > is here and how to fix it! > Code should be using cpu_interrupt() to change things but we have > plenty > of examples in the code of messing with cpu->interrupt_request > directly > which is often why the assert() in cpu_interrupt() doesn't get a > chance > to fire. > > The two most prolific direct users seems to be i386 and ppc. > > The i386 cases are all most likely OK as it tends to be in KVM > emulation > code where by definition the BQL is already held by the time you get > there. For PPC it will depend on how you got there. The > multi-thread-tcg.txt document describes the approach: > > Emulated hardware state > ----------------------- > > Currently thanks to KVM work any access to IO memory is > automatically > protected by the global iothread mutex, also known as the BQL (Big > Qemu Lock). Any IO region that doesn't use global mutex is expected > to > do its own locking. > > However IO memory isn't the only way emulated hardware state can be > modified. Some architectures have model specific registers that > trigger hardware emulation features. Generally any translation > helper > that needs to update more than a single vCPUs of state should take > the > BQL. > > As the BQL, or global iothread mutex is shared across the system we > push the use of the lock as far down into the TCG code as possible > to > minimise contention. > > (Current solution) > > MMIO access automatically serialises hardware emulation by way of > the > BQL. Currently ARM targets serialise all ARM_CP_IO register > accesses > and also defer the reset/startup of vCPUs to the vCPU context by > way > of async_run_on_cpu(). > > Updates to interrupt state are also protected by the BQL as they > can > often be cross vCPU. > > So basically it comes down to the call-path into your final > cpu_interrupt() call which is why I guess your doing the: > > if (!qemu_mutex_iothread_locked()) { > qemu_mutex_lock_iothread(); > cpu_interrupt(cs, CPU_INTERRUPT_EXITTB); > qemu_mutex_unlock_iothread(); > } > > dance. I suspect the helper functions are called both from device > emulation (where BQL is held) and from vCPU helpers (where it is > currently not). > > So I suggest the fix is: > > 1. Convert sites doing their own manipulation of > cpu->interrupt_request() to call the helper instead > 2. If helper directly called from TCG code: > - take BQL before calling cpu_interrupt(), release after > Else if helper shared between MMIO/TCG paths > - take BQL from TCG path, call helper, release after > > It might be there is some sensible re-factoring that could be done to > make things clearer but I'll leave that to the PPC experts to weigh > in > on. > > Hope that helps!
It does indeed, thanks. I've sent out a proposed patch which does the above so people can review it and see if its the right thing to do. Certainly its working fine locally. Cheers, Richard