Re: [PATCH] accel/tcg: mttcg remove false-negative halted assertion
On Fri Sep 22, 2023 at 4:25 AM AEST, Michael Tokarev wrote: > 29.08.2023 04:06, Nicholas Piggin wrote: > > mttcg asserts that an execution ending with EXCP_HALTED must have > > cpu->halted. However between the event or instruction that sets > > cpu->halted and requests exit and the assertion here, an > > asynchronous event could clear cpu->halted. > > > > This leads to crashes running AIX on ppc/pseries because it uses > > H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and > > H_PROD sets other cpu->halted = 0 and kicks it. > > > > H_PROD could be turned into an interrupt to wake, but several other > > places in ppc, sparc, and semihosting follow what looks like a similar > > pattern setting halted = 0 directly. So remove this assertion. > > > > Reported-by: Ivan Warren > > Signed-off-by: Nicholas Piggin > > This one also smells like a stable material, is it not? Yeah I would say it is. Thanks, Nick > > Thanks, > > /mjt > > > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c > > b/accel/tcg/tcg-accel-ops-mttcg.c > > index b276262007..d0b6f288d9 100644 > > --- a/accel/tcg/tcg-accel-ops-mttcg.c > > +++ b/accel/tcg/tcg-accel-ops-mttcg.c > > @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) > > case EXCP_DEBUG: > > cpu_handle_guest_debug(cpu); > > break; > > -case EXCP_HALTED: > > -/* > > - * during start-up the vCPU is reset and the thread is > > - * kicked several times. If we don't ensure we go back > > - * to sleep in the halted state we won't cleanly > > - * start-up when the vCPU is enabled. > > - * > > - * cpu->halted should ensure we sleep in wait_io_event > > - */ > > -g_assert(cpu->halted); > > -break; > > case EXCP_ATOMIC: > > qemu_mutex_unlock_iothread(); > > cpu_exec_step_atomic(cpu);
Re: [PATCH] accel/tcg: mttcg remove false-negative halted assertion
29.08.2023 04:06, Nicholas Piggin wrote: mttcg asserts that an execution ending with EXCP_HALTED must have cpu->halted. However between the event or instruction that sets cpu->halted and requests exit and the assertion here, an asynchronous event could clear cpu->halted. This leads to crashes running AIX on ppc/pseries because it uses H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and H_PROD sets other cpu->halted = 0 and kicks it. H_PROD could be turned into an interrupt to wake, but several other places in ppc, sparc, and semihosting follow what looks like a similar pattern setting halted = 0 directly. So remove this assertion. Reported-by: Ivan Warren Signed-off-by: Nicholas Piggin This one also smells like a stable material, is it not? Thanks, /mjt diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c index b276262007..d0b6f288d9 100644 --- a/accel/tcg/tcg-accel-ops-mttcg.c +++ b/accel/tcg/tcg-accel-ops-mttcg.c @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) case EXCP_DEBUG: cpu_handle_guest_debug(cpu); break; -case EXCP_HALTED: -/* - * during start-up the vCPU is reset and the thread is - * kicked several times. If we don't ensure we go back - * to sleep in the halted state we won't cleanly - * start-up when the vCPU is enabled. - * - * cpu->halted should ensure we sleep in wait_io_event - */ -g_assert(cpu->halted); -break; case EXCP_ATOMIC: qemu_mutex_unlock_iothread(); cpu_exec_step_atomic(cpu);
Re: [PATCH] accel/tcg: mttcg remove false-negative halted assertion
On 8/28/23 18:06, Nicholas Piggin wrote: mttcg asserts that an execution ending with EXCP_HALTED must have cpu->halted. However between the event or instruction that sets cpu->halted and requests exit and the assertion here, an asynchronous event could clear cpu->halted. This leads to crashes running AIX on ppc/pseries because it uses H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and H_PROD sets other cpu->halted = 0 and kicks it. H_PROD could be turned into an interrupt to wake, but several other places in ppc, sparc, and semihosting follow what looks like a similar pattern setting halted = 0 directly. So remove this assertion. Reported-by: Ivan Warren Signed-off-by: Nicholas Piggin --- accel/tcg/tcg-accel-ops-mttcg.c | 11 --- 1 file changed, 11 deletions(-) The adjustments of 'halted' and 'prod' are done under the io lock in both cases, so there's no race there. It is perfectly reasonable that after thread A sets halted and drops the lock, thread B may acquire the lock and clear halted before thread A has a chance to complete longjmp and cycle through its main loop. Reviewed-by: Richard Henderson diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c index b276262007..d0b6f288d9 100644 --- a/accel/tcg/tcg-accel-ops-mttcg.c +++ b/accel/tcg/tcg-accel-ops-mttcg.c @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) case EXCP_DEBUG: cpu_handle_guest_debug(cpu); break; -case EXCP_HALTED: -/* - * during start-up the vCPU is reset and the thread is - * kicked several times. If we don't ensure we go back - * to sleep in the halted state we won't cleanly - * start-up when the vCPU is enabled. - * - * cpu->halted should ensure we sleep in wait_io_event - */ -g_assert(cpu->halted); -break; I adjusted the patch to keep the case label and update the comment, still dropping the assert. Queued to tcg-next. r~