Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
On Mon, Jan 11, 2016 at 04:10:12PM +0100, Kevin Wolf wrote: > Am 23.12.2015 um 09:33 hat Stefan Hajnoczi geschrieben: > > On Fri, Dec 18, 2015 at 02:15:38PM +0100, Markus Armbruster wrote: > > Another problem is that the backup block job and other operations that > > require a single command today could require sequences of low-level > > setup commands to create filter nodes. The QMP client would need to > > first create a write notifier filter and then start the backup block job > > with the write notifier node name. It's clumsy. > > I don't think splitting it up into several low-level commands is > necessary. We don't expect the user to set any options for the filter > (and if we did, they would probably have to match the same options for > the block job), so we can keep the BDS creation as a part of starting > the block job. > > The important part is that the management tool knows that a filter is > going to be inserted and how to address it. In order to achieve that, we > could simply add a 'filter-node-name' option to the QMP command starting > the job. That sounds like a good approach. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
Am 23.12.2015 um 09:33 hat Stefan Hajnoczi geschrieben: > On Fri, Dec 18, 2015 at 02:15:38PM +0100, Markus Armbruster wrote: > > What should happen when the user asks for a mutation at a place where we > > have implicit filter(s)? > > Please suspend your disbelief for a second: > > In principle it's simplest not having implicit filters. The client > needs to set up throttling nodes or the backup filter explicitly. > > Okay, now it's time to tear this apart: > > For backwards compatibility it's necessary to support throttling, > copy-on-read, backup notifier, etc. It may be possible to tag implicit > filter nodes so that mutation operations that wouldn't be possible today > are rejected. The client must use the explicit syntax to do mutations > on implicit filters. This is easier said than done, I'm not sure it can > be implemented cleanly. Yes, backwards compatibility is what complicates things a bit. > Another problem is that the backup block job and other operations that > require a single command today could require sequences of low-level > setup commands to create filter nodes. The QMP client would need to > first create a write notifier filter and then start the backup block job > with the write notifier node name. It's clumsy. I don't think splitting it up into several low-level commands is necessary. We don't expect the user to set any options for the filter (and if we did, they would probably have to match the same options for the block job), so we can keep the BDS creation as a part of starting the block job. The important part is that the management tool knows that a filter is going to be inserted and how to address it. In order to achieve that, we could simply add a 'filter-node-name' option to the QMP command starting the job. Kevin pgp5FS6ENXitq.pgp Description: PGP signature
Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
On Mon, 01/04 13:16, Stefan Hajnoczi wrote: > On Wed, Dec 23, 2015 at 06:15:20PM +0800, Fam Zheng wrote: > > On Fri, 12/18 14:15, Markus Armbruster wrote: > > In that theory, all other block job types, mirror/stream/commit, fit into a > > "pull" model, which follows a specified dirty bitmap and copies data from a > > specified src BDS. In this pull model, > > > > mirror (device=d0 target=d1) becomes a pull fileter: > > > > BB[d0]BB[d1] > >| | > > throttle[pull,src=d0] > >| | > >detect-zero detect-zero > >| | > > copy-on-read copy-on-read > >| | > > BDS BDS > > > > Note: the pull reuses most of the block/mirror.c code except the > > s->dirty_bitmap will be initialized depending on the block job type. In the > > case of mirror, it is trivially the same as now. > > I don't understand the pull filter. Is there also a mirror block job > coroutine? > > Does anything perform I/O to BB[d1]? Yes, the filter will have a mirror block job coroutine, and it writes to the BDS behind BB[d1]. This is conceptually different from the "block jobs have their own BBs" design. > > If nothing is writing to/reading from BB[d1], then I don't understand > the purpose of the pull filter. > > Stefan
Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
On Wed, Dec 23, 2015 at 06:15:20PM +0800, Fam Zheng wrote: > On Fri, 12/18 14:15, Markus Armbruster wrote: > In that theory, all other block job types, mirror/stream/commit, fit into a > "pull" model, which follows a specified dirty bitmap and copies data from a > specified src BDS. In this pull model, > > mirror (device=d0 target=d1) becomes a pull fileter: > > BB[d0]BB[d1] >| | > throttle[pull,src=d0] >| | >detect-zero detect-zero >| | > copy-on-read copy-on-read >| | > BDS BDS > > Note: the pull reuses most of the block/mirror.c code except the > s->dirty_bitmap will be initialized depending on the block job type. In the > case of mirror, it is trivially the same as now. I don't understand the pull filter. Is there also a mirror block job coroutine? Does anything perform I/O to BB[d1]? If nothing is writing to/reading from BB[d1], then I don't understand the purpose of the pull filter. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
On Fri, 12/18 14:15, Markus Armbruster wrote: > First, let's examine how such a chain could look like. If we read the > current code correctly, it behaves as if we had a chain > > BB > | > throttle > | > detect-zero > | > copy-on-read > | > BDS > > Except for the backup job, which behaves as if we had > >backup job > / > notifier > | > detect-zero > | > BDS Just to brainstorm block jobs in the dynamic reconfigured node graph: (not sure if this is useful) Nothing stops us from viewing backup as a self-contained filter, [backup] | detect-zero | BDS where its .bdrv_co_writev copies out the old data, and at instantiation time it also creates a long running coroutine (backup_run). In that theory, all other block job types, mirror/stream/commit, fit into a "pull" model, which follows a specified dirty bitmap and copies data from a specified src BDS. In this pull model, mirror (device=d0 target=d1) becomes a pull fileter: BB[d0]BB[d1] | | throttle[pull,src=d0] | | detect-zero detect-zero | | copy-on-read copy-on-read | | BDS BDS Note: the pull reuses most of the block/mirror.c code except the s->dirty_bitmap will be initialized depending on the block job type. In the case of mirror, it is trivially the same as now. stream (device=d0 base=base0) becomes a pull filter: BB[d0] | [pull,src=base0] | detect-zero | copy-on-read | BDS | BDS[base0] Note: s->dirty_bitmap will be initialized with the blocks which should be copied by block-stream. Similarly, active commit (device=d0 base=base0) becomes a pull filter: BB[d0] | detect-zero | copy-on-read | BDS | [pull,src=d0] | BDS[base0] and commit (device=d0 top=base1 base=base0) becomes a pull filter: BB[d0] | detect-zero | copy-on-read | BDS | BDS[base1] | [pull,src=base1] | BDS[base0] If this could work, I'm looking forward to a pretty looking diffstat if we can unify the coroutine code of all four jobs. :) Fam
Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
On Fri, Dec 18, 2015 at 02:15:38PM +0100, Markus Armbruster wrote: > What should happen when the user asks for a mutation at a place where we > have implicit filter(s)? Please suspend your disbelief for a second: In principle it's simplest not having implicit filters. The client needs to set up throttling nodes or the backup filter explicitly. Okay, now it's time to tear this apart: For backwards compatibility it's necessary to support throttling, copy-on-read, backup notifier, etc. It may be possible to tag implicit filter nodes so that mutation operations that wouldn't be possible today are rejected. The client must use the explicit syntax to do mutations on implicit filters. This is easier said than done, I'm not sure it can be implemented cleanly. Another problem is that the backup block job and other operations that require a single command today could require sequences of low-level setup commands to create filter nodes. The QMP client would need to first create a write notifier filter and then start the backup block job with the write notifier node name. It's clumsy. Stefan signature.asc Description: PGP signature
[Qemu-devel] Minutes from the "Stuttgart block Gipfele"
Kevin, Max and I used an opportunity to meet and discuss block layer matters. We examined two topics in some depth: BlockBackend, and block filters and dynamic reconfiguration. Not nearly enough people to call it a block summit. But the local dialect is known for its use of diminutives, and "Gipfele" is the diminutive of "summit" :) = BlockBackend = Background: BlockBackend (BB) was split off BlockDriverState (BDS) to separate the block layer's external interface (BB) from its internal building block (BDS). Block layer clients such as device models and the NBD server attach to a BB by BB name. A BB has zero or one BDS (zero means no medium). Multiple device models using the same BB is dangerous, so we allow attaching only one. We don't currently enforce an "only one" restriction for other clients. This is problematic, because * Different clients may want to configure the BB in conflicting ways, e.g. writeback caching mode (still to be moved from the BDS's enable_write_cache to the BB). * When the BDS graph gets dynamically reconfigured, say when a block filter gets spliced in, clients that started out in the same spot may need to move differently. Instead, each client should connect to its own BB. This leads to the next question: how should this BB be created? Initially, what is now the BB was mashed into the BDS. In a way, the BB got created along with the BDS. The current code lets you create a BB along with a BDS when you need one, or create a new BB for an existing BDS. The BB has a name, and the BDS may have a node-name. The obvious low-level building blocks would be "create BB", "connect BB to a BDS" (we have that as x-blockdev-insert-medium), "disconnect BB from a BDS" (x-blockdev-remove-medium) and "destroy BB" (x-blockdev-del). Management applications probably don't mind having to work at this low level, but for human users, it's cumbersome. Perhaps the BB should be created along with the client, at least optionally. Means to create BBs separately are mostly useful when the BB needs to be configured by the user: instead of duplicating the BB configuration within each client, we keep it neatly separate. We're not aware of user-configurable knobs, though. Currently, a client is configured to attach to a BB by specifying a BB name. For instance, a device model has a "drive" property that names a BB. If we create the BB automatically, we need client configuration to name a BDS instead, i.e. we need a node-name instead of a BB name. Of course, we'll have to keep the legacy configuration working. The "drive" property will have to refer to a BDS, like it did before BBs were invented. We could: * Move the BB name back into the BDS. * Move the BB name into DriveInfo, where the other legacy stuff lives. DriveInfo needs to be changed to hang off BDS rather than BB. Regardless, dynamic reconfiguration may have to move the name / the DriveInfo to a different BDS. Not entirely sure automatic creation of BB is worthwhile or not. Next steps: * Support multiple BBs sharing the same BDS. * Restrict BB to only one client of any kind instead of special-casing device models. * Block jobs should go through BB. * Investigate automatic creation of BB. = Block filters = We already have a few block filters: * blkdebug, blkverify, quorum Encryption should become another one. Moreover, we have a few things mashed into BDS that should be filters: * throttle (only at a root, i.e. right below a BB), copy-on-read, notifier (for backup block job), detect-zero Dynamic reconfiguration means altering the BDS graph while it's in use. Existing mutators: * snaphot, mirror-complete, commit-complete, x-blockdev-change. Things become interesting when nodes get implicitly inserted into the graph, e.g.: * A backup job inserts its notifier filter * We create an implicit throttle filter to implement legacy throttling configuration And so forth. Nothing of the sort exists just yet. What should happen when the user asks for a mutation at a place where we have implicit filter(s)? First, let's examine how such a chain could look like. If we read the current code correctly, it behaves as if we had a chain BB | throttle | detect-zero | copy-on-read | BDS Except for the backup job, which behaves as if we had backup job / notifier | detect-zero | BDS We believe that the following cleaned up filter stack should work: BB | throttle \ | \ copy-on-read ) fixed at creation time | / detect-zero/ | | backup job |/ notifier ) dynamically inserted by the job | BDS Clients (device model, NBD server) connect through a BB on top. Snapshot cuts in between the BDS and its implicit filters, like this: BB