David Gibson <da...@gibson.dropbear.id.au> writes: > On Mon, Oct 24, 2016 at 03:44:01PM +0100, Alex Bennée wrote: >> >> Alex Bennée <alex.ben...@linaro.org> writes: >> >> > Hi, >> > >> > In the MTTCG patch set one of the big patches is to remove the >> > requirement to hold the BQL while running code: >> > >> > tcg: drop global lock during TCG code execution >> > >> > And this broke the PPC code because emulate_ppc_hypercall can cause >> > changes to the global state. This function just calls spapr_hypercall() >> > and puts the results into the TCG register file. Normally >> > spapr_hypercall() is called under the BQL in KVM as >> > kvm_arch_handle_exit() does things with the BQL held. >> > >> > I blithely wrapped the called in a lock/unlock pair only to find the >> > ppc64 check builds failed as the hypercall was made during the >> > cc->do_interrupt() code which also holds the BQL. >> > >> > I'm a little confused by the nature of PPC hypercalls in TCG? Are they >> > not all detectable at code generation time? What is the case that causes >> > an exception to occur rather than the helper function doing the >> > hypercall? >> > >> > I guess it comes down to can I avoid doing: >> > >> > /* If we come via cc->do_interrupt BQL may already be held */ >> > if (!qemu_mutex_iothread_locked()) { >> > g_mutex_lock_iothread(); >> > env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]); >> > g_muetx_unlock_iothread(); >> > } else { >> > env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]); >> > } >> >> Of course I mean: >> >> /* If we come via cc->do_interrupt BQL may already be held */ >> if (!qemu_mutex_iothread_locked()) { >> qemu_mutex_lock_iothread(); >> env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]); >> qemu_mutex_unlock_iothread(); >> } else { >> env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]); >> } >> >> > Any thoughts? > > So, I understand why the hypercall is being called from exception code > and therefore with the BQL held. On Power, the hypercall instruction > is the same as the guest-level system call instruction, just with a > flag bit set. System calls are, of course, treated as exceptions, > because they change the CPU's privilege mode. Likewise if we were > implementing a full host system (like the upcoming 'powernv' machine > type) we'd need to treat hypercalls as exceptions for the same reason. > > We could detect hypercalls at translation time, but at present we > don't: we go into the exception path, then detect that it's a "level > 1" (i.e. hypervisor) sc instruction and branch off to the hypercall > emulation code if that's been set up. It just seemed the simplet > approach at the time. > > What I *don't* understand is how the hypercall code is ever being > invoked *without* the BQL. I grepped through and the only entry paths > I can see are the one in the exception handling and KVM.
Ahh I think I have figured out what happened. In the pre-MTTCG world TCG code always held the BQL. After Jan's patch the BQL was held for interrupt processing but not exception handling. I've since fixed this. I think I must have tried this hack before fixing the root cause because I had another assertion failure when trying to ensure all IRQ processing has the BQL held (as the target specific code can do all sorts of things on an IRQ/Exception). > Could you try to get a backtrace from the case where we're entering > the hypercall without the BQL? I'll remove the hack, add an assert and see if it comes up again in testing. -- Alex Bennée