Hi again, On 12/7/18 4:59 PM, Peter Maydell wrote: > We use cpu_stop_current() to ensure the current CPU has stopped > from places like qemu_system_reset_request(). Unfortunately its > current implementation has a race. It calls qemu_cpu_stop(), > which sets cpu->stopped to true even though the CPU hasn't > actually stopped yet. The main thread will look at the flags > set by qemu_system_reset_request() and call pause_all_vcpus(). > pause_all_vcpus() waits for every cpu to have cpu->stopped true, > so it can continue (and we will start the system reset operation) > before the vcpu thread has got back to its top level loop. > > Instead, just set cpu->stop and call cpu_exit(). This will > cause the vcpu to exit back to the top level loop, and there > (as part of the wait_io_event code) it will call qemu_cpu_stop(). > > This fixes bugs where the reset request appeared to be ignored > or the CPU misbehaved because the reset operation started > to change vcpu state while the vcpu thread was still using it. > > Signed-off-by: Peter Maydell <peter.mayd...@linaro.org> > --- > We discussed this a little while back: > https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html > and Jaap reported a bug which I suspect of being the same thing: > https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html > > Annoyingly I have lost the test case that demonstrated this > race, but I analysed it at the time and this should definitely > fix it. I have opted not to try to address any of the other > possible cleanup here (eg vm_stop() has a potential similar > race if called from a vcpu thread I suspect), since it gets > pretty tangled. > > Jaap: could you test whether this patch fixes the issue you > were seeing, please? > --- > cpus.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/cpus.c b/cpus.c > index 0ddeeefc14f..b09b7027126 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu) > void cpu_stop_current(void) > { > if (current_cpu) { > - qemu_cpu_stop(current_cpu, true); > + current_cpu->stop = true; > + cpu_exit(current_cpu); > } > } > >
The patch also fixed the issue on my production system. Thanks! regards, Jaap