On Tue, 08/22 13:59, Peter Xu wrote: > On Tue, Aug 22, 2017 at 12:15:19PM +0800, Fam Zheng wrote: > > On Tue, 08/22 10:56, Peter Xu wrote: > > > I haven't really encountered (c), but I think it's the migrate_cancel > > > command that matters, which should not need BQL as well. > > > > There is bdrv_invalidate_cache_all() in migrate_cancel which clearly isn't > > safe. > > Is that if block unreachable in this case? If so we should assert, otherwise > > this command is not okay to run without BQL. > > Ah. I see. Even if so, if that is the only usage of BQL, IMHO we can > still mark migrate_cancel as "without-bql=true", instead we take the > BQL before calling bdrv_invalidate_cache_all(). Then migrate_cancel > can be BQL-free at least when block migration is not active. > > > > > Generically, what guarantee the thread-safety of a qmp command when you > > decide > > BQL is not needed? In other words, how do you prove commands are safe > > without > > BQL? I think almost every command accesses global state, but lock-free data > > structures are rare AFAICT. > > I would suggest we split the problem into at least three parts. IMHO > we need to answer below questions one by one to know what we should do > next: > > 1. whether we can handle monitor commands outside iothread, or say, in > an isolated thread? > > This is basically what patch 2 does, the "per-monitor threads". > > IMHO this is the very first question to ask. So now I know that at > least current code cannot do it. We need to at least do something > to remove/replace the assertion to make this happen. Can we? I > don't really know the answer yet. If this is undoable, we can skip > question 2/3 below and may need to rethink on how to solve the > problem that postcopy recovery encounters. > > 2. whether there is any monitor commands can run without BQL? > > This is basically what patch 3/5 does, one for QMP, one for HMP. > > If we can settle question 1, then we can possibly start consider > this question. This step does not really allow any command to run > without BQL, but we need to know whether it's possible in general, > and if possible, we provide a framework to allow QMP/HMP developers > to specify that. If you see patch 3/5, the default behavior is > still taking the BQL for all commands. > > IMHO doing this whole thing is generally good in the sense that > this is actually forcing ourselves to break the BQL into smaller > locks. Take the migration commands for example: migrate_incoming > do not need BQL, and when we write codes around it we know that we > don't need to think about thread-safety. That's not good IMHO. I > think it's time we should start consider thread-safety always. > Again, for migrate_incoming to do this, actually we'll possibly at > least need a migration management lock (the smaller lock) to make > sure e.g. the user is not running two migrate_incoming commands in > parallel (after per-monitor threads, it can happen). But it's > better than BQL, because BQL is for sure too big, so even a guest > page access (as long as it held the BQL) can block migration > commands.
Yes, this is my point. You cannot just declare a command "BQL-free" without adding small locks first, and I think this is actually missing in this series. As you said, two per-monitor threads can race if they do migrate_incoming in parallel. This is also the answer to 3. Fam > > 3. which monitor commands can be run without BQL? > > This is what patch 4/6 was doing. It tries to move > migrate_incoming command out as the first candidate BQL-free > command. > > Yes it's hard to say which command can be run without BQL. So we > need to investigate, possibly modify existing codes to make sure > it's thread-safe, prove validity, then we can add the new ones into > the BQL-free list. > > If after evaluating the pros and cons, we found that one command > can be put into BQL-free but not worth the time for working on it, > we can also keep those commands under BQL. > > I assume question 3 is the one you were asking, and I'd say we may > need to solve question 1/2 first. If we are done with 1/2, we just > need to spend time on each command to prove whether it is doable to > let that command run without BQL, and whether it worths itself to move > the command out of BQL. Then we decide. Thanks, > > -- > Peter Xu