Le Monday 16 Sep 2013 à 15:41:45 (+0800), Fam Zheng a écrit : > On Sun, 09/15 20:10, Benoît Canet wrote: > > Le Friday 06 Sep 2013 à 11:55:38 (+0200), Kevin Wolf a écrit : > > > Am 06.09.2013 um 11:18 hat Fam Zheng geschrieben: > > > > On Fri, 09/06 10:45, Kevin Wolf wrote: > > > > > Am 06.09.2013 um 09:56 hat Fam Zheng geschrieben: > > > > > > Since BlockDriver.bdrv_snapshot_create() is an optional operation, > > > > > > blockdev.c > > > > > > can navigate down the tree from top node, until hitting some layer > > > > > > where the op > > > > > > is implemented (the QCow2 bs), so we get rid of this > > > > > > top_node_below_filter > > > > > > pointer. > > > > > > > > > > Is it even inherent to a block driver (like a filter), if a snapshot > > > > > is > > > > > to be taken at its level? Or is it rather a policy decision that > > > > > should > > > > > be made by the user? > > > > > > > > > OK, getting the point that user should have full flexibility and fine > > > > operation > > > > granularity. It also stands against > > > > block_backend->top_node_below_filter. Do we > > > > really have the assumption that all the filters are on top of the tree > > > > and linear? > > > > Shouldn't this be possible? > > > > > > > > Block Backend > > > > | > > > > | > > > > Quodrum BDS > > > > / | \ > > > > throttle filter | \ > > > > / | \ > > > > qcow2 qcow2 qcow2 > > > > > > > > So we throttle only a particular image, not the whole device. But this > > > > will > > > > make a top_node_below_filter pointer impossible. > > > > > > I was assuming that Benoît's model works for the special case of > > > snapshotting in one predefined way, but this is actually a very good > > > example of why it doesn't. > > > > > > The approach relies on snapshotting siblings together, and in this case > > > the siblings would be throttle/qcow2/qcow2, while throttle is still a > > > filter. This > > > would mean that either throttle needs to be top_node_below_filter and > > > throttling doesn't stay on top, or the left qcow2 is > > > top_node_below_filter and the other Quorum images aren't snapshotted. > > > > > > > > In our example, the quorum driver, it's not at all clear to me that > > > > > you > > > > > want to snapshot all children. In order to roll back to a previous > > > > > state, one snapshot is enough, you don't need multiple copies of the > > > > > same one. Perhaps you want two so that we can still compare them for > > > > > verification. Or all of them because you can afford the disk space and > > > > > want ultimate safety. I don't think qemu can know which one is true. > > > > > > > > > Only if quorum ever knows about and operates on snapshots, it should be > > > > considered specifically, but no. So we need to achieve this in the > > > > general > > > > design: allow user to take snapshot, or set throttle limits on > > > > particular > > > > BDSes, as above graph. > > > > > > > > > In the same way, in a typical case you may want to keep I/O throttling > > > > > for the whole drive, including the new snapshot. But what if the > > > > > throttling was used in order to not overload the network where the > > > > > image > > > > > is stored, and you're now doing a local snapshot, to which you want to > > > > > stream the image? The I/O throttling should apply only to the backing > > > > > file, not the new snapshot. > > > > > > > > > Yes, and OTOH, throttling really suits to be a filter only if it can be > > > > a non > > > > top one, otherwise it's no better than what we have now. > > > > > > Well, it would be a cleaner architecture in any case, but having it in > > > the middle of the stack feels useful indeed, so we should support it. > > > > > > > > So perhaps what we really need is a more flexible snapshot/BDS tree > > > > > manipulation command that describes in detail which structure you want > > > > > to have in the end. > > > > > > Designing the corresponding QMP command is the hard part, I guess. > > > > During my vacation I though about the fact that JSON is pretty good to > > build a > > tree. > > > > QMP, HMP and the command line could take a "block-tree" argument which would > > look like the following. > > > > block-tree = { 'quorum': [ > > { > > 'throttle' : { > > 'qcow2' : { 'filename': "img1.qcow2" } > > 'snapshotable': true, > > What's the 'snapshotable' for?
Kevin mentioned the fact that when taking a snapshot with QUORUM we may want to snapshot only a part of the QUORUM branches. snapshotable is a way to indicate to the QCOW2 driver that it must create a new snapshot when the snapshoting order cascade down the bs tree. Best regards Benoît > > > }, > > 'throttle-iops' : 150, > > 'throttle-iops-max' : 1000, > > }, > > { > > 'qcow2' : { 'filename': "img2.qcow2" }, > > 'snapshotable': true, > > }, > > { > > 'qcow2' : { 'filename': "img3.qcow2" } > > 'snapshotable': false, > > } > > ] }; > > > > It's not very clear to me. Does this mean a key associated with a dict value > means creating type? What do you put in the inner dict (i.e. why filename > here) > and what to put outter besides the key (i.e. snapshotable)? Where to put 'id'? > > I think JSON is flexible enough to specify anything we can take, but the > format > needs to be designed carefully. And do we really want to use JSON in the > command line options? Very hard to imagine that. :) > > Thanks, > > Fam > > > This would be passed to QEMU in a compact form without carriage return and > > spaces. > > > > The block layer would convert this to C structs like the QMP code would do > > for a > > QMP command and the bs tree would be recursively build from top to bottom by > > the Block Backend and each Block driver in the path using the C structs. > > > > Each level would instanciate the lower level until a raw or protocol driver > > is > > reached. > > > > What about this ? > > > > Best regards > > > > Benoît > >