"Nicholas Piggin" <npig...@gmail.com> writes:

> On Sat Sep 16, 2023 at 1:29 PM AEST, Richard Henderson wrote:
>> From: Nicholas Piggin <npig...@gmail.com>
>>
>> mttcg asserts that an execution ending with EXCP_HALTED must have
>> cpu->halted. However between the event or instruction that sets
>> cpu->halted and requests exit and the assertion here, an
>> asynchronous event could clear cpu->halted.
>>
>> This leads to crashes running AIX on ppc/pseries because it uses
>> H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and
>> H_PROD sets other cpu->halted = 0 and kicks it.
>>
>> H_PROD could be turned into an interrupt to wake, but several other
>> places in ppc, sparc, and semihosting follow what looks like a similar
>> pattern setting halted = 0 directly. So remove this assertion.
>>
>> Reported-by: Ivan Warren <i...@vmfacility.fr>
>> Signed-off-by: Nicholas Piggin <npig...@gmail.com>
>> Message-Id: <20230829010658.8252-1-npig...@gmail.com>
>> [rth: Keep the case label and adjust the comment.]
>
> Hey Richard,
>
> Thanks for picking this up.
>
> I think EXCP_HLT and EXCP_HALTED are effectively the same, so they could
> be merged after this.
>
> I couldn't quite decipher the intended difference between them, HLT is
> "hlt instruction reached", but it does tend to go into a mode where it
> is halted waiting for external event. Is there some useful difference in
> semantics we should retain (and at least try to find a way to assert)?

I always thought HALTED was where the system was halted (e.g. during a
shutdown) but I agree its less than clear.

Do both effectively end up in wait_for_io for some event to start the
loop again?

>
> I did look at how to avoid the halted race and keep the assert, e.g.,
> have the CPU only modify its own halted, and external events would have
> a wakeup field to set. In the end it wasn't clear that that was any
> simpler and you still have races to reason about, now between the two
> fields. So unless someone wants to keep both, should we merge?
>
> Thanks,
> Nick
>
>> Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
>> ---
>>  accel/tcg/tcg-accel-ops-mttcg.c | 9 ++-------
>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/accel/tcg/tcg-accel-ops-mttcg.c 
>> b/accel/tcg/tcg-accel-ops-mttcg.c
>> index b276262007..4b0dfb4be7 100644
>> --- a/accel/tcg/tcg-accel-ops-mttcg.c
>> +++ b/accel/tcg/tcg-accel-ops-mttcg.c
>> @@ -100,14 +100,9 @@ static void *mttcg_cpu_thread_fn(void *arg)
>>                  break;
>>              case EXCP_HALTED:
>>                  /*
>> -                 * during start-up the vCPU is reset and the thread is
>> -                 * kicked several times. If we don't ensure we go back
>> -                 * to sleep in the halted state we won't cleanly
>> -                 * start-up when the vCPU is enabled.
>> -                 *
>> -                 * cpu->halted should ensure we sleep in wait_io_event
>> +                 * Usually cpu->halted is set, but may have already been
>> +                 * reset by another thread by the time we arrive here.
>>                   */
>> -                g_assert(cpu->halted);
>>                  break;
>>              case EXCP_ATOMIC:
>>                  qemu_mutex_unlock_iothread();


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to