On 11/12/18 1:25 AM, Max Reitz wrote: > This is what I’ve taken from two or three BoF-like get-togethers on > blocky things. Amendments are more than welcome, of course. > > > > Permission system > ================= > > GRAPH_MOD > --------- > > We need some way for the commit job to prevent graph changes on its > chain while it is running. Our current blocker doesn’t do the job, > however. What to do? > > - We have no idea how to make a *permission* work. Maybe the biggest > problem is that it just doesn’t work as a permission, because the > commit job doesn’t own the BdrvChildren that would need to be > blocked (namely the @backing BdrvChild). > > - A property of BdrvChild that can be set by a non-parent seems more > feasible, e.g. a counter where changing the child is possible only > if the counter is 0. This also actually makes sense in what it > means. > (We never quite knew what “taking the GRAPH_PERMISSION” or > “unsharing the GRPAH_MOD permission” was supposed to mean. Figuring > that out always took like half an our in any face-to-face meeting, > and then we decided it was pretty much useless for any case we had > at hand.) > > > Reopen > ------ > > How should permissions be handled while the reopen is under way? > Maybe we should take the union of @perm before and after, and the > intersection of @shared before and after? > > - Taking permissions is a transaction that can fail. Reopen, too, is > a transaction, and we want to go from the intermediate to the final > permissions in reopen’s commit part, so that transition is not > allowed to fail. > Since with the above model we would only relax things during that > transition (relinquishing bits from @perm and adding bits to > @shared), this transition should in theory be possible without any > failure. However, in practice things are different, as permission > changes with file-posix nodes imply lock changes on the filesystem > -- which may always fail. Arguably failures from changing the > file-posix locks can be ignored, because that just means that the > file claims more permissions to be taken and less to be shared than > is actually the case. Which means you may not be able to open the > file in some other application, while you should be, but that’s the > benign kind of error. You won’t be able to access data in a way > you shouldn’t be able to. > - Note that we have this issue already, so in general dropping > permissions sometimes aborts because code assumes that dropping > permissions is always safe and can never result in an error. It > seems best to ignore such protocol layer errors in the generic > block layer rather than handling this in every protocol driver > itself. > (The block layer should discard errors from dropping permissions > on the protocol layer.) > > - Is it possible that changing an option may require taking an > intermediate permission that is required neither before nor after > the reopen process? > Changing a child link comes to mind (like changing a child from one > BDS to another, where the visible data changes, which would mean we > may want to e.g. unshare CONSISTENT_READ during the reopen). > However: > 1. It is unfeasible to unshare that for all child changes. > Effectively everything requires CONSISTENT_READ, and for good > reason. > 2. Why would a user even change a BDS to something of a different > content? > 3. Anything that currently allows you to change a child node assumes > that the user always changes it to something of the same content > (some take extra care to verify this, like mirror, which makes > sure that @replaces and the target are connected, and there are > only filter nodes in between). > Always using the same enforcing model as mirror does (no. 3 above) > does not really work, though, because one use case is to copy a > backing file offline to some different storage and then replace the > files via QMP. To qemu, both files are completely unrelated. > > > Block jobs, including blockdev-copy > =================================== > > Example for use of the fleecing filter: > - The real target is on slow storage. Put an overlay on fast storage > on top of it. Then use that overlay as the target of the fleecing > filter (and commit the data later or on the side), so that the > backup job does not slow down the guest. > > For a unified copy job, having a backup/fleecing filter is not a > problem on the way. One thing we definitely have to and can do is to > copy common functionality into a shared file so that the different > jobs can at least share that. > > COR/Stream: > - There should be a way to discard ranges that have been copied into > the overlay from the backing files to save space > - Also, the COR filter should integrated with the stream job (at some > point, as always) > > Hole punching with active commit: > - Putting data into the target and punching holes in the overlays to > make it visible on the active disk may be reasonable for some, but > not for others -- it should be an option. You want this if saving > space is important, but you may not want this if speed is more > important (depends on your backing chain length and other factors > then, but that’s your choice). > > - Another thing: If we don’t need to punch any holes because the > intermediate layers aren’t allocated anyway, we don’t need to write > the data into the active disk either. This can probably be done > indiscriminately, because the check for this does not concern the > protocol layer but only qemu-controlled metadata, so it should be > deterministically fast (want_zero=false). > > > qcow2 > ===== > > Recovering corrupt images: > - Salvaging qemu-img convert would help (one that doesn’t abort > everything on encountering a single I/O error) > - We may want to add an in-sync L1 table copy to recover from the > worst kinds of corruptions. Checksumming would be a good idea > (then), too. > - Should we update the checksum every time? If it’s just the sum of > all L1 entry values, why not, doing the update is trivial then and > does not involve looking at any but the entries modified. > > Online check: > - This would need to be a block job > - The check function would probably need to be a proper coroutine > (that does not just lock everything) > - Would be very complicated if you wanted it to work on R/W images. > It’s probably the best to focus on making this work for read-only > images, because you can always just put a temporary snapshot over > the image for the time of the test and then commit it down after the > check is done. > > > Bitmaps > ======= > > (Got this section from sneaking into a BoF I wasn’t invited to. Oh > well. Won’t hurt to include them here.) > > Currently, when dirty bitmaps are loaded, all IN_USE bitmaps are just > not loaded at all and completely ignored. That isn’t correct, though, > they should either still be loaded (and automatically treated and > written back as fully dirty), or at least qemu-img check should > “repair” them (i.e. fully dirtying them). > > Sometimes qemu (running in a mode as bare as possible) is better than > using qemu-img convert, for instance. It gives you more control > (through QMP; you get e.g. better progress reporting), you get all of > the mirror optimizations (we do have optimizations for convert, too, > but whether it’s any good to write the same (or different?) > optimizations twice is another question), and you get a common > interface for everything (online and offline). > Note that besides a bare qemu we’ve also always wanted to convert as > many qemu-img operations into frontends for block jobs as possible. > We have only done this for commit, however, even though convert looked > like basically the ideal target. It was just too hard with too little > apparent gain, like always (and convert supports additional features > like concatenation which we don’t have in the runtime block layer > yet). > > Someone (not that someone™, but actually some specific someone) is > about to make qemu-img info display the list of persistent bitmaps. > Potential reviewers should be aware of the fact that this should be > done bye adding that information to ImageInfoSpecificQCow2. /me
Den