On 2025/11/15 0:30, Peter Xu wrote:
On Fri, Nov 14, 2025 at 10:24:40AM +0900, Akihiko Odaki wrote:
However it creates another problem. Think of the following sequence:
call_rcu_thread() |
| call_rcu1()
| qatomic_fetch_inc(&rcu_call_count)
| qemu_sem_post(&sync_event)
|
qatomic_read(&rcu_call_count) |
enter_qs(false) |
wait_for_readers(false) |
qemu_sem_timedwait( |
&sync_event, 10) |
qemu_sem_timedwait() incorrectly interrupts the RCU thread and enters the
force quiescent state.
First thing to mention is, IIUC above can't happen because if
call_rcu_thread() is already waked up and reaching enter_qs(), then there
should have been, for example, a prior call_rcu1() that incremented
rcu_call_count and posted to sync_event, hence rcu_call_count cannot be 0
anymore in the call_rcu1() above, because the sub happens later:
call_rcu_thread:
... sees rcu_call_count > 0, quit loop ...
...
enter_qs(sleep);
qatomic_sub(&rcu_call_count, n); <-------------------------
...
That means the concurrent call_rcu1() above will not post sem anymore
because it will only post it if rcu_call_count==0.
Below is an extended version of the sequence:
call_rcu_thread() |
qatomic_sub(&rcu_call_count, n) |
(sets rcu_call_count to 0) |
| call_rcu1()
| qatomic_fetch_inc(&rcu_call_count)
| qemu_sem_post(&sync_event)
qatomic_read(&rcu_call_count) |
enter_qs(false) |
wait_for_readers(false) |
qemu_sem_timedwait( |
&sync_event, 10) |
Note that there may not be a qemu_sem_wait(&sync_event) call between
qatomic_sub(&rcu_call_count, n) and qatomic_read(&rcu_call_count).
Besides, IMHO replacing the event with sem shouldn't change similar
behavior comparing to when using events. Because any spot that can post()
concurrently can also does qemu_event_set() concurrently... after all, we
only have a few spots resettting the event in the original patch, after the
reset a concurrent qemu_event_set() will re-activate it.
The sequence I showed is handled properly with properly placed
qemu_event_reset():
call_rcu_thread() |
qatomic_sub(&rcu_call_count, n) |
(sets rcu_call_count to 0) |
| call_rcu1()
| qatomic_fetch_inc(&rcu_call_count)
| qemu_event_set(&sync_event)
qatomic_read(&rcu_call_count) |
qemu_event_reset(&sync_event) |
enter_qs(false) |
wait_for_readers(false) |
qemu_sem_timedwait( |
&sync_event, 10) |
Note that a concurrent qemu_event_set() after resetting the event can
only triggered with force_rcu(), which is intended to interrupt
wait_for_readers().
Sem does introduce possible false positives on events, but as I was trying
to justify, a generic VM boots and shutdown should need less than 1000 rcu
calls, so worst case is we loop read rcu_call_count 1000 times... I also
didn't measure how many of such call_rcu1() will see N>0 so they'll not
post() to sem at all, I believe it'll be common. So the perf overhead
should really be rare, IMHO.
Note that I was not requesting you to replace this event to sem before
merging. Frankly, I think your work is good enough to land.
However I still want to discuss a sem replacement, not only because this is
the best place to discuss it (after all, force rcu starts to introduce a
genuine second user of the event besides call_rcu1, hence the "being able
to remember more than 1" will start to make more sense than avoid resets),
it looks also cleaner to remove some new code force rcu adds.
How do "being able to remember more than 1" help?
Regards,
Akihiko Odaki