On Tue, Feb 28, 2023 at 07:09:57PM -0500, Stefan Hajnoczi wrote: > On Sat, Feb 25, 2023 at 11:31:37AM -0500, Peter Xu wrote: > > [not for merging, but for discussion; this is something I found when > > looking at another issue on Chuang's optimization for migration downtime] > > > > Summary: we tried to access memory_listeners, address_spaces, etc. in RCU > > way. However we didn't implement them with RCU-safety. This patchset is > > trying to do that; at least making it closer. > > > > NOTE! It's doing it wrongly for now, so please feel free to see this as a > > thread to start discussing this problem, as in subject. > > > > The core problem here is how to make sure memory listeners will be freed in > > RCU ways, per when unlinking them from the global memory_listeners list. > > > > The current patchset (in patch 1) did it with drain_call_rcu(), but of > > course it's wrong, because of at least two things: > > > > (1) drain_call_rcu() will release BQL; currently there's no way to me to > > guarantee that releasing BQL is safe here. > > > > (2) memory_listener_unregister() can be called within a RCU read lock > > itself (we're so happy to take rcu read lock in many places but we > > don't think much on how long it'll be taken; at least not as strict > > as the kernel variance, so we're just less care about that fact yet). > > It means, drain_call_rcu() should deadlock there waiting for itself. > > For an example, see Appendix A. > > > > Side question to Stefan / Maxim: why do we need drain_call_rcu() and what's > > its difference from synchronize_rcu() in API level besides releasing and > > retaking BQL when taken? > > Hi, > I haven't taken a look at the patches or thought about the larger > problem you're tackling here, but I wanted to reply to this specific > question. > > It's been a long time since Maxim, Paolo, and I discussed this, but > drain_call_rcu() and synchronize_rcu() do different things: > - drain_call_rcu() is about waiting until the current thread's > call_rcu() callbacks have completed. > - synchronize_rcu() is about waiting until there are no more readers in > the last grace period. > > Calling synchronize_rcu() doesn't guarantee that call_rcu_thread() has > completed pending call_rcu() callbacks. Therefore it's not appropriate > for the existing drain_call_rcu() callers because they rely on previous > call_rcu() callbacks to have finished.
Ah I missed that detail. I was quickly thinking whether such a requirement can also be done with a customized rcu callback that will simply kick a signal after the real "free" is done, while the call_rcu() context can wait for the signal. It's just that assuming RCU callbacks will be executed in order is slightly tricky. But I guess it's also hard if the call_rcu() is deep in the stack so drain_call_rcu() should avoid fiddling on the details. Thanks Stefan! -- Peter Xu