On Sat, Feb 25, 2023 at 11:31:37AM -0500, Peter Xu wrote: > [not for merging, but for discussion; this is something I found when > looking at another issue on Chuang's optimization for migration downtime] > > Summary: we tried to access memory_listeners, address_spaces, etc. in RCU > way. However we didn't implement them with RCU-safety. This patchset is > trying to do that; at least making it closer. > > NOTE! It's doing it wrongly for now, so please feel free to see this as a > thread to start discussing this problem, as in subject. > > The core problem here is how to make sure memory listeners will be freed in > RCU ways, per when unlinking them from the global memory_listeners list. > > The current patchset (in patch 1) did it with drain_call_rcu(), but of > course it's wrong, because of at least two things: > > (1) drain_call_rcu() will release BQL; currently there's no way to me to > guarantee that releasing BQL is safe here. > > (2) memory_listener_unregister() can be called within a RCU read lock > itself (we're so happy to take rcu read lock in many places but we > don't think much on how long it'll be taken; at least not as strict > as the kernel variance, so we're just less care about that fact yet). > It means, drain_call_rcu() should deadlock there waiting for itself. > For an example, see Appendix A. > > Side question to Stefan / Maxim: why do we need drain_call_rcu() and what's > its difference from synchronize_rcu() in API level besides releasing and > retaking BQL when taken?
Hi, I haven't taken a look at the patches or thought about the larger problem you're tackling here, but I wanted to reply to this specific question. It's been a long time since Maxim, Paolo, and I discussed this, but drain_call_rcu() and synchronize_rcu() do different things: - drain_call_rcu() is about waiting until the current thread's call_rcu() callbacks have completed. - synchronize_rcu() is about waiting until there are no more readers in the last grace period. Calling synchronize_rcu() doesn't guarantee that call_rcu_thread() has completed pending call_rcu() callbacks. Therefore it's not appropriate for the existing drain_call_rcu() callers because they rely on previous call_rcu() callbacks to have finished. Stefan
signature.asc
Description: PGP signature