On 06/19/2014 10:26 AM, Jeff Cody wrote: > Having said that, to be fair, the new QAPI command change-backing-file > does propagate this top-layer in-use flag semantic, but I would prefer > that patch to be dropped rather than not committing this series.
Libvirt would prefer to have change-backing-file in the same release that adds the optional backing-file parameters, as a witness that backing-file is under libvirt's control (even if libvirt does not CALL change-backing-file, the command needs to exist). One possibility for 2.1: create the command (so it shows up in 'query-commands' and satisfies libvirt probing) but make it fail unless the node being changed is the active node (that is, NO way to rewrite the metadata of a backing file). Then, in 2.2, when we figure out op-blockers for child nodes, we can enable the full power of change-backing-file to work on any node, not just the active device node. >> My questions are: >> a. How do we fix resize, snapshot-sync, etc? It seems like we need to >> propagate child op blockers. >> >> b. Is it a good idea to perform op blocker checks on the root node? >> It's inconsistent with resize, snapshot-sync, etc. Permissions in >> BDS graphs with multiple root nodes (e.g. guest device and NBD >> run-time server) will be different depending on which root you >> specify. > > I don't think (b) is the ultimate solution. It is used as a stop-gap > because op blockers in the current implementation is essentially > analogous to the in-use flag. But is it good enough for 2.1? If > *everything* checks the topmost node in 2.1, then I think we are OK in > all cases except where images files share a common BDS. > > The ability for internal BDSs to share a common base BDS makes some > block jobs unsafe currently, I believe. A crude and ugly fix is to > only allow a single block-job to occur at any given time, but that > doesn't seem feasible, so let's ignore that. Can we even create a common base BDS in qemu 2.1? I know that the blockdev-add command was designed with this possibility in mind, but to my knowledge, we have not actually wired it up yet. If 2.1 still creates a distinct BDS for every element of every chain, even if some of those distinct BDS are visiting the same file, then blocking jobs based on device is sufficient. Then, in 2.2 when we finally figure out how to do op-blockers on child nodes, we can also figure out sharing nodes. Of course, until we have shared nodes, management apps must realize that requesting an operation on a node that happens to have a duplicate BDS visiting the same file from another device will NOT be flagged by qemu as dangerous. That's just a quality-of-implementation issue, though - just because qemu can't flag ALL stupid actions of management doesn't mean qemu is buggy. Or put another way, if I have: disk1: base <- img1 disk2: base <- img2 if qemu allows shared BDS, and I am able to open 'base' as a shared BDS for both disks, then an attempt to block-commit img1 into base will be blocked because base is still in use by img2. But as a management app, I shouldn't be attempting that in the first place. Yes, it would be nice if qemu blocks me from being stupid; but even if qemu allows shared BDS, if I don't open a shared BDS on 'base' in the first place, it is no different than if qemu doesn't allow shared BDS. Either way, it's not qemu's fault if I, as management, try to commit into a file that is in use by more than one chain. if I am not able to do a shared BDS, I'm still and as the management, I did not request that qemu open 'base' as a shared BDS between both > > Perhaps, for 2.1, provide an overlay pointer list inside each BDS > (some of my earlier patches in this series had a single overlay, but > that is not enough). We could then apply op blockers to the topmost > nodes for any affected BDS image in a chain, by navigating upwards. > Not sure how complex this would be in practice, though. > > We could also apply child blockers to all nodes in all directions in a > graph, if we don't want to rely on the topmost image as a blocker > proxy for the whole drive. > >> >> c. We're painting ourselves into a corner by using the root node for op >> blocker checks. We'll have to apply the same op blockers to all >> nodes in a graph. There's no opportunity to apply different op >> blockers to a subset of the child nodes. I *think* this can be >> changed later without affecting the QMP API, so it's not a critical >> issue. > > We've already painted ourselves in that corner, alas. > > I agree that from a QAPI perspective, the change is not critical: > once op blockers are correctly applied to all child nodes, any API > change (e.g. commit or stream) would likely be optional only (such as, > making 'device' optional instead of mandatory), and thus discoverable. Well, discoverable if we ever get QAPI introspection added, or if we can rely on the same hacks as we are already planning to use with optional-'top' of getting a distinct GenericError vs. DeviceNotFound class when abusing the QMP call with a bogus device name during probing time. > >> >> The answer seems to be that op blockers should be propagated to all >> child nodes and commands should test the node, not the drive/root node. >> That gives us the flexibility for per-node op blockers in the future and >> maintains compatibility with existing node-name users. >> > > Long term thoughts: > > So if I think of operations that are done on block devices from a > block job, and chuck them into categories, I think we have: > > 1) Read of guest-visible data > 2) Write of guest-visible data > 3) Read of host-visible data (e.g. image file metadata) > 4) Write of host-visible data (e.g. image file metadata, such as > the backing-file) > 5) Block chain manipulations (e.g. movement of a BDS, change to r/w > instead of r/o, etc..) > 6) I/O attribute changes (e.g. throttling, etc..) > > Does this make sense, and did I miss any (likely so)? It seems like > we should issue blockers not based on specific commands (e.g. > BLOCK_OP_TYPE_COMMIT), but rather based on what specific category of > interaction on a BDS we want to prohibit / allow. Yes, I like that train of thought - it's not the operation that we are blocking, but the access category(ies) required by that operation. > > I don't think specific command blockers provide enough granularity, > and doesn't necessarily scale well as new commands are added. It > forces a new block job author to go through the specific > implementation of the other block job commands, and interpret what > operations to prohibit based on what other jobs do. Whereas if each > command issues blockers based on the operation category, it takes care > of itself, and I just issue blockers based on my block job behavior. > > Each command would then issue appropriate operational blockers to each > BDS affected by an operation. For instance, at first blush, I think > block-commit would want (at the very least) to block (and check) the > following, in this example chain: > > > [base] <-- [int1] <-- [int2] <-- [int3] <-- [top] <-- [overlay] > > becomes: > > [base] <-- [overlay] > > > Blocked operations per image: > > * 'base' image > > Operation | Blocked > ----------------------------- > GUEST READ | Y > GUEST WRITE | Y > HOST READ | > HOST WRITE | > CHAIN | Y > I/O ATTRIBUTE | Makes sense: blocking guest read because we are actively modifying the contents; if any other operation branches off of base, they will see data that is not consistent with what the guest sees through overlay. blocking guest write because it is a backing chain, and overlay still depends on base image for sectors that have not yet been commited.. blocking chain manipulations because we are going to do a chain manipulation once our command completes. Maybe you also want to block HOST_WRITE (our chain manipulation is done by writing host metadata; so if someone else writes that under our feet, we may have races on what gets written). > > > * Intermediate images between 'base' up to and including 'top': > > Operation | Blocked > ----------------------------- > GUEST READ | > GUEST WRITE | Y > HOST READ | > HOST WRITE | Y > CHAIN | Y > I/O ATTRIBUTE | > This needs to also block GUEST_READ - as soon as the commit starts modifying base, ALL intermediate images are invalid (they no longer represent a state of memory as was seen by the guest). Once you start a commit, you MUST NOT allow a fleecing operation on any of the intermediate operations. > > * The overlay of 'top', if it exists: > > Operation | Blocked > ----------------------------- > GUEST READ | > GUEST WRITE | > HOST READ | > HOST WRITE | Y > CHAIN | Y > I/O ATTRIBUTE | > > > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature