On Fri, Oct 19, 2018 at 18:01:18 +0200, Paolo Bonzini wrote: > On 19/10/2018 16:50, Emilio G. Cota wrote: > > On Fri, Oct 19, 2018 at 08:59:24 +0200, Paolo Bonzini wrote: > >> On 19/10/2018 03:05, Emilio G. Cota wrote: > >>> I'm calling this series a v3 because it supersedes the two series > >>> I previously sent about using atomics for interrupt_request: > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg02013.html > >>> The approach in that series cannot work reliably; using (locked) atomics > >>> to set interrupt_request but not using (locked) atomics to read it > >>> can lead to missed updates. > >> > >> The idea here was that changes to protected fields are all followed by > >> kick. That may not have been the case, granted, but I wonder if the > >> plan is unworkable. > > > > I suspect that the cpu->interrupt_request+kick mechanism is not the issue, > > otherwise master should not work--we do atomic_read(cpu->interrupt_request) > > and only if that read != 0 we take the BQL. > > > > My guess is that the problem is with other reads of cpu->interrupt_request, > > e.g. those in cpu_has_work. Currently those reads happen with the > > BQL held, and updates to cpu->interrupt_request take the BQL. If we drop > > the BQL from the setters to instead use locked atomics (like in the > > aforementioned series), those BQL-protected readers might miss updates. > > cpu_has_work is only needed to handle the processor's halted state (or > is it?). If it is, OR+kick should work. > > > Given that we need a per-CPU lock anyway to remove the BQL from the > > CPU loop, extending this lock to protect cpu->interrupt_request is > > a simple solution that keeps the current logic and allows for > > greater scalability. > > Sure, I was just curious what the problem was. KVM uses OR+kick with no > problems.
I never found exactly where things break. The hangs happen pretty early when booting a large (-smp > 16) x86_64 Ubuntu guest. Booting never completes (ssh unresponsive) if I don't have the console output (I suspect the console output slows things down enough to hide some races). I only see a few threads busy: a couple of vCPU threads, and the I/O thread. I didn't have time to debug any further, so I moved on to an alternative approach. So it is possible that it was my implementation, and not the approach, what was at fault :-) Thanks, E.