* Markus Armbruster (arm...@redhat.com) wrote: > "Dr. David Alan Gilbert" <dgilb...@redhat.com> writes: > > > * Markus Armbruster (arm...@redhat.com) wrote: > >> "Dr. David Alan Gilbert" <dgilb...@redhat.com> writes: > >> > >> > Hi, > >> > I wondered what it would take to be able to do a lock-free monitor; > >> > i.e. one that could respond to (some) commands while the qemu big lock > >> > is held. > >> > >> Requires a careful audit of the monitor code. > >> > >> For instance, cur_mon needs to be made thread local for running multiple > >> monitors concurrently. > > > > Hmm that's fun; and cur_mon is used all over the place when 'fd' passing > > is used - although that probably should be cleaned up somehow. > > When cur_mon got created, we had no threads. Even now, monitors only > ever run under the BQL, so cur_mon's still fine. > > Having a current monitor is simpler than passing one around, especially > when you go through layers that don't know about it. Such cases exist, > and they made us abandon commit 376253e's hope to eliminate cur_mon. > Unfortunately, I've since forgotten the details, so I can't point you to > an example.
Yes, I think maybe if we can try and remove the use of cur_mon one use at a time outside of the monitor it might move it in the right direction. > >> > My reason for asking is that there are cases in migration and colo > >> > where a network block device has failed and is blocking, but it fails > >> > at just the wrong time while the lock is held, and now, you don't have > >> > >> Is it okay to block while holding the BQL? > > > > I'd hope the simple answer was no; unfortunately the more complex answer > > is that we keep finding places that do. > > Would you call these bugs? > > Even if you do, we may want to recover from them. They're a bug in the end result that needs fixing; so for example a crashing NBD server shouldn't lock you out of the monitor during a migrate, and I don't think we current;y have other solutions. > > The cases I'm aware of are > > mostly bdrv_drain_all calls in migration/colo, where one of the block > > devices (e.g. an NBD network device) fails. Generally these are called > > at the end of a migration cycle when we're just ensuring that everything > > really is drained and are normally called with the lock held to ensure > > that nothing else is generating new traffic as we're clearing everything > > else. > > Using the BQL for that drives a cork into the I/O pipeline with a Really > Big Hammer. Can we do better? > > The answer may be "yes, but don't hold your breath." Then we may need > means to better cope with the bugs. Yeh there are some places that the migraiton code holds the BQL where I don't really understand all the things it's guarding against. > >> > any way to unblock it since you can't do anything on the monitors. > >> > If I could issue a command then I could have it end up doing a > >> > shutdown(2) > >> > on the network connection and free things up. > >> > > >> > Maybe this would also help real-time people? > >> > > >> > I was thinking then, we could: > >> > a) Have a monitor that wasn't tied to the main loop at all - each > >> > instance > >> > would be it's own thread and have it's own poll() (probably using the new > >> > IO channels?) > >> > b) for most commands it would take the big lock before execute the > >> > command > >> > and release it afterwards - so there would be no risk in those commands. > >> > >> Saves you the auditing their code, which would probably be impractical. > >> > >> > c) Some commands you could mark as 'lock free' - they would be > >> > required > >> > not to take any locks or wait for *anything* and for those the monitor > >> > would > >> > not take the lock. > >> > >> They also must not access shared state without suitable synchronization. > > > > Right; I'd expect most of the cases to either be reading simple status > > information, > > or having to find whatever they need to do using rcu list walks. > > > >> > d) Perhaps have some way of restricting a particular monitor session > >> > to only > >> > allowing non-blocking commands. > >> > >> Why? To ensure you always have an emergency monitor available? > > > > Yes, mostly to stop screwups of putting a potentially blocking command down > > your > > emergency route. > > Understand. > > >> > e) Then I'd have to have a way of finding the broken device in a > >> > lockfree > >> > way (RTU list walk?) and issuing the shutdown(2). > >> > > >> > Bonus points to anyone who can statically check commands in (c). > >> > >> Tall order. > > > > Yes :-) A (compile time selected) dynamic check might be possible > > where the normal global lock functions and rcu block functions check if > > they're > > in a monitor thread and assert. > > > >> > Does this make sense to everyone else, or does anyone have any better > >> > suggestions? > >> > >> Sounds like a big, hairy task to me. > >> > >> Any alternatives? > > > > I hope so; although is the idea of making a lock-free monitor a generally > > good idea we should do anyway? > > There's no shortage of good ideas, but our bandwidth to implement, > review, document and test them has limits. > > The general plan is to reduce the scope of the BQL gradually. > Liberating the monitor from the BQL is one step on the road to complete > BQL elimination. The question is whether this step needs to be taken > *now*. COLO probably needs something; although in it's case I'm going to investigate if some evil with iptables might be able to force a recovery by causing the socket to close (after all we're assuming that case involves the destination is dead - but I don't want to wait for a full TCP timeout). Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK