Re: [Qemu-devel] [PATCH] cpus.c: Fix race condition in cpu_stop_current()

2018-12-10 Thread Jaap Crezee
Hi again,

On 12/7/18 4:59 PM, Peter Maydell wrote:
> We use cpu_stop_current() to ensure the current CPU has stopped
> from places like qemu_system_reset_request(). Unfortunately its
> current implementation has a race. It calls qemu_cpu_stop(),
> which sets cpu->stopped to true even though the CPU hasn't
> actually stopped yet. The main thread will look at the flags
> set by qemu_system_reset_request() and call pause_all_vcpus().
> pause_all_vcpus() waits for every cpu to have cpu->stopped true,
> so it can continue (and we will start the system reset operation)
> before the vcpu thread has got back to its top level loop.
> 
> Instead, just set cpu->stop and call cpu_exit(). This will
> cause the vcpu to exit back to the top level loop, and there
> (as part of the wait_io_event code) it will call qemu_cpu_stop().
> 
> This fixes bugs where the reset request appeared to be ignored
> or the CPU misbehaved because the reset operation started
> to change vcpu state while the vcpu thread was still using it.
> 
> Signed-off-by: Peter Maydell 
> ---
> We discussed this a little while back:
> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html
> and Jaap reported a bug which I suspect of being the same thing:
> https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html
> 
> Annoyingly I have lost the test case that demonstrated this
> race, but I analysed it at the time and this should definitely
> fix it. I have opted not to try to address any of the other
> possible cleanup here (eg vm_stop() has a potential similar
> race if called from a vcpu thread I suspect), since it gets
> pretty tangled.
> 
> Jaap: could you test whether this patch fixes the issue you
> were seeing, please?
> ---
>  cpus.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 0ddeeefc14f..b09b7027126 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu)
>  void cpu_stop_current(void)
>  {
>  if (current_cpu) {
> -qemu_cpu_stop(current_cpu, true);
> +current_cpu->stop = true;
> +cpu_exit(current_cpu);
>  }
>  }
>  
> 


The patch also fixed the issue on my production system. Thanks!

regards,


Jaap



Re: [Qemu-devel] [PATCH] cpus.c: Fix race condition in cpu_stop_current()

2018-12-09 Thread Jaap Crezee
Hello all,

On 12/7/18 4:59 PM, Peter Maydell wrote:
> Jaap: could you test whether this patch fixes the issue you
> were seeing, please?


My test is going very well. With the patch applied, I have no longer been able 
to freeze/hang the VM. Currently at 7024 reboots and counting over
runtime 1 day 23 hours. I will start testing on my production environment as 
well.

Tested-by: Jaap Crezee 


regards,


Jaap



Re: [Qemu-devel] [PATCH] cpus.c: Fix race condition in cpu_stop_current()

2018-12-08 Thread Jaap Crezee
Hi all,

On 12/7/18 4:59 PM, Peter Maydell wrote:
> Jaap: could you test whether this patch fixes the issue you
> were seeing, please?

I have applied the patch and started my test tool against it. It will need some 
time as I have also seen cases where it only failed after 600 reboots.
My testtool logs into the VM over ssh and reboots it. Meanwhile it will check 
the device going offline and coming back online again and starts over
(including a random wait).

I will keep you updated. Thanks!

Jaap