On Tue, Aug 22, 2017 at 12:15:19PM +0800, Fam Zheng wrote: > On Tue, 08/22 10:56, Peter Xu wrote: > > I haven't really encountered (c), but I think it's the migrate_cancel > > command that matters, which should not need BQL as well. > > There is bdrv_invalidate_cache_all() in migrate_cancel which clearly isn't > safe. > Is that if block unreachable in this case? If so we should assert, otherwise > this command is not okay to run without BQL.
Ah. I see. Even if so, if that is the only usage of BQL, IMHO we can still mark migrate_cancel as "without-bql=true", instead we take the BQL before calling bdrv_invalidate_cache_all(). Then migrate_cancel can be BQL-free at least when block migration is not active. > > Generically, what guarantee the thread-safety of a qmp command when you decide > BQL is not needed? In other words, how do you prove commands are safe without > BQL? I think almost every command accesses global state, but lock-free data > structures are rare AFAICT. I would suggest we split the problem into at least three parts. IMHO we need to answer below questions one by one to know what we should do next: 1. whether we can handle monitor commands outside iothread, or say, in an isolated thread? This is basically what patch 2 does, the "per-monitor threads". IMHO this is the very first question to ask. So now I know that at least current code cannot do it. We need to at least do something to remove/replace the assertion to make this happen. Can we? I don't really know the answer yet. If this is undoable, we can skip question 2/3 below and may need to rethink on how to solve the problem that postcopy recovery encounters. 2. whether there is any monitor commands can run without BQL? This is basically what patch 3/5 does, one for QMP, one for HMP. If we can settle question 1, then we can possibly start consider this question. This step does not really allow any command to run without BQL, but we need to know whether it's possible in general, and if possible, we provide a framework to allow QMP/HMP developers to specify that. If you see patch 3/5, the default behavior is still taking the BQL for all commands. IMHO doing this whole thing is generally good in the sense that this is actually forcing ourselves to break the BQL into smaller locks. Take the migration commands for example: migrate_incoming do not need BQL, and when we write codes around it we know that we don't need to think about thread-safety. That's not good IMHO. I think it's time we should start consider thread-safety always. Again, for migrate_incoming to do this, actually we'll possibly at least need a migration management lock (the smaller lock) to make sure e.g. the user is not running two migrate_incoming commands in parallel (after per-monitor threads, it can happen). But it's better than BQL, because BQL is for sure too big, so even a guest page access (as long as it held the BQL) can block migration commands. 3. which monitor commands can be run without BQL? This is what patch 4/6 was doing. It tries to move migrate_incoming command out as the first candidate BQL-free command. Yes it's hard to say which command can be run without BQL. So we need to investigate, possibly modify existing codes to make sure it's thread-safe, prove validity, then we can add the new ones into the BQL-free list. If after evaluating the pros and cons, we found that one command can be put into BQL-free but not worth the time for working on it, we can also keep those commands under BQL. I assume question 3 is the one you were asking, and I'd say we may need to solve question 1/2 first. If we are done with 1/2, we just need to spend time on each command to prove whether it is doable to let that command run without BQL, and whether it worths itself to move the command out of BQL. Then we decide. Thanks, -- Peter Xu