Hi David, Thanks for reviewing.
On 17.03.24 09:37, Keqian Zhu via wrote: >> Both main loop thread and vCPU thread are allowed to call >> pause_all_vcpus(), and in general resume_all_vcpus() is called after >> it. Two issues live in pause_all_vcpus(): > >In general, calling pause_all_vcpus() from VCPU threads is quite dangerous. > >Do we have reproducers for the cases below? > I produce the issues by testing ARM vCPU hotplug feature: QEMU changes for vCPU hotplug could be cloned from below site, https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2 Guest Kernel changes (by James Morse, ARM) are available here: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git virtual_cpu_hotplug/rfc/v2 The procedure to produce problems: 1. Startup a Linux VM (e.g., called OS-vcpuhotplug) with 32 possible vCPUs and 16 current vCPUs. 2. Log in guestOS and run script[1] to continuously online/offline CPU. 3. At host side, run script[2] to continuously hotplug/unhotplug vCPU. After several minutes, we can hit these problems. Script[1] to online/offline CPU: for ((time=1;time<10000000;time++)); do for ((cpu=16;cpu<32;cpu++)); do echo 1 > /sys/devices/system/cpu/cpu$cpu/online done for ((cpu=16;cpu<32;cpu++)); do echo 0 > /sys/devices/system/cpu/cpu$cpu/online done done Script[2] to hotplug/unhotplug vCPU: for ((time=1;time<1000000;time++)); do echo $time for ((cpu=16;cpu<=32;cpu++)); do echo "virsh setvcpus OS-vcpuhotplug --count $cpu --live" virsh setvcpus OS-vcpuhotplug --count $cpu --live sleep 2 done for ((cpu=32;cpu>=16;cpu--)); do echo "virsh setvcpus OS-vcpuhotplug --count $cpu --live" virsh setvcpus OS-vcpuhotplug --count $cpu --live sleep 2 done for ((cpu=16;cpu<=32;cpu+=2)); do echo "virsh setvcpus OS-vcpuhotplug --count $cpu --live" virsh setvcpus OS-vcpuhotplug --count $cpu --live sleep 2 done for ((cpu=32;cpu>=16;cpu-=2)); do echo "virsh setvcpus OS-vcpuhotplug --count $cpu --live" virsh setvcpus OS-vcpuhotplug --count $cpu --live sleep 2 done done The script[1] will call PSCI CPU_ON which emulated by QEMU, which result in calling cpu_reset() on vCPU thread. For ARM architecture, it needs to reset GICC registers, which is only possible when all vcpus paused. So script[1] will call pause_all_vcpus() in vCPU thread. The script[2] also calls cpu_reset() for newly hotplugged vCPU, which is done in main loop thread. So this scenario causes problems as I state in commit message. >> >> 1. There is possibility that during thread T1 waits on qemu_pause_cond >> with bql unlocked, other thread has called >> pause_all_vcpus() and resume_all_vcpus(), then thread T1 will stuck, >> because the condition all_vcpus_paused() is always false. > >How can this happen? > >Two threads calling pause_all_vcpus() is borderline broken, as you note. > >IIRC, we should call pause_all_vcpus() only if some other mechanism prevents >these races. For example, based on runstate changes. > We already has bql to prevent concurrent calling of pause_all_vcpus() and resume_all_vcpus(). But pause_all_vcpus() will unlock bql in the half way, which gives change for other thread to call pause and resume. In the past, code does not consider this problem, now I add retry mechanism to fix it. > >Just imagine one thread calling pause_all_vcpus() while another one >calls resume_all_vcpus(). It cannot possibly work. With bql, we can make sure all vcpus are paused after pause_all_vcpus() finish, and all vcpus are resumed after resume_all_vcpus() finish. For example, the following situation may occur: Thread T1: lock bql -> pause_all_vcpus -> wait on cond and unlock bql -> wait T2 unlock bql to lock bql -> lock bql && all_vcpu_paused -> success and do other work -> unlock bql Thread T2: wait T1 unlock bql to lock bql -> lock bql -> resume_all_vcpus -> success and do other work -> unlock bql Thanks, Keqian > > >> >> 2. After all_vcpus_paused() has been checked as true, we will >> unlock bql to relock replay_mutex. During the bql was unlocked, >> the vcpu's state may has been changed by other thread, so we >> must retry. >> >> Signed-off-by: Keqian Zhu <zhukeqi...@huawei.com> >> --- >> system/cpus.c | 29 ++++++++++++++++++++++++----- >> 1 file changed, 24 insertions(+), 5 deletions(-) >> > diff --git a/system/cpus.c b/system/cpus.c > index 68d161d96b..4e41abe23e 100644 > --- a/system/cpus.c > +++ b/system/cpus.c > @@ -571,12 +571,14 @@ static bool all_vcpus_paused(void) > return true; > } > > -void pause_all_vcpus(void) > +static void request_pause_all_vcpus(void) > { > CPUState *cpu; > > - qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false); > CPU_FOREACH(cpu) { > + if (cpu->stopped) { > + continue; > + } > if (qemu_cpu_is_self(cpu)) { > qemu_cpu_stop(cpu, true); > } else { > @@ -584,6 +586,14 @@ void pause_all_vcpus(void) > qemu_cpu_kick(cpu); > } > } > +} > + > +void pause_all_vcpus(void) > +{ > + qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false); > + > +retry: > + request_pause_all_vcpus(); > > /* We need to drop the replay_lock so any vCPU threads woken up > * can finish their replay tasks > @@ -592,14 +602,23 @@ void pause_all_vcpus(void) > > while (!all_vcpus_paused()) { > qemu_cond_wait(&qemu_pause_cond, &bql); > - CPU_FOREACH(cpu) { > - qemu_cpu_kick(cpu); > - } > + /* During we waited on qemu_pause_cond the bql was unlocked, > + * the vcpu's state may has been changed by other thread, so > + * we must request the pause state on all vcpus again. > + */ > + request_pause_all_vcpus(); > } > > bql_unlock(); > replay_mutex_lock(); > bql_lock(); > + > + /* During the bql was unlocked, the vcpu's state may has been > + * changed by other thread, so we must retry. > + */ > + if (!all_vcpus_paused()) { > + goto retry; > + } > } > > void cpu_resume(CPUState *cpu) -- Cheers, David / dhildenb