On Tue, 18 Jul 2017 15:42:08 +0200 Maxime Coquelin <maxime.coque...@redhat.com> wrote:
> This is an revival from a thread I initiated earlier this year [0], that > I had to postpone due to other priorities. > > First, I'd like to thanks reviewers of my first proposal, this new > version tries to address the comments made: > 1.This is Nova's role and not Libvirt's to query hosts supported > compatibility modes and to select one, since Nova adds the vhost-user > ports and has visibility on other hosts. Hence I remove libvirt ML and > add Openstack one in the recipient list. > 2. By default, the compatibility version selected is the most recent > one, except if the admin selects on older compat version. > > The goal of this thread is to draft a solution based on the outcomes > of discussions with contributors of the different parties (DPDK/OVS > /Nova/...). > > I'm really interested on feedback from OVS & Nova contributors, > as my experience with these projects is rather limited. > > Problem statement: > ================== > > When migrating a VM from one host to another, the interfaces exposed by > QEMU must stay unchanged in order to guarantee a successful migration. > In the case of vhost-user interface, parameters like supported Virtio > feature set, max number of queues, max vring sizes,... must remain > compatible. Indeed, the frontend not being re-initialized, no > re-negotiation happens at migration time. > > For example, we have a VM that runs on host A, which has its vhost-user > backend advertising VIRTIO_F_RING_INDIRECT_DESC feature. Since the Guest > also support this feature, it is successfully negotiated, and guest > transmit packets using indirect descriptor tables, that the backend > knows to handle. > > At some point, the VM is being migrated to host B, which runs an older > version of the backend not supporting this VIRTIO_F_RING_INDIRECT_DESC > feature. The migration would break, because the Guest still have the > VIRTIO_F_RING_INDIRECT_DESC bit sets, and the virtqueue contains some > decriptors pointing to indirect tables, that backend B doesn't know to > handle. > This is just an example about Virtio features compatibility, but other > backend implementation details could cause other failures. (e.g. > configurable queues sizes) > > What we need is to be able to query the destination host's backend to > ensure migration is possible before it is initiated. > > The below proposal has been drafted based on how Qemu manages machine types: > > Proposal > ======== > > The idea is to have a table of supported version strings in OVS, > associated to key/value pairs. Nova or any other management tool could > query OVS for the list of supported versions strings for each hosts. > By default, the latest compatibility version will be selected, but the > admin can select manually an older compatibility mode in order to ensure > successful migration to an older destination host. > > Then, Nova would add OVS's vhost-user port with adding the selected > version (compatibility mode) as an extra parameter. > > Before starting the VM migration, Nova will ensure both source and > destination hosts' vhost-user interfaces run in the same compatibility > modes, and will prevent it if this is not the case. > > For example host A runs OVS-2.7, and host B OVS-2.6. > Host A's OVS-2.7 has an OVS-2.6 compatibility mode (e.g. with indirect > descriptors disabled), which should be selected at vhost-user port add > time to ensure migration will succeed to host B. > > Advantage of doing so is that Nova does not need any update if new keys > are introduced (i.e. it does not need to know how the new keys have to > be handled), all these checks remain in OVS's vhost-user implementation. > > Ideally, we would support per vhost-user interface compatibility mode, > which may have an impact also on DPDK API, as the Virtio feature update > API is global, and not per port. It sounds like this covered OVS side, but not the QEMU side. What if OVS is ahead of qemu? We might select OVS-3.0 which supports features that installed qemu doesn't support. > > - Implementation: > ----------------- > > Goal here is just to illustrate this proposal, I'm sure you will have > good suggestion to improve it. > In OVS vhost-user library, we would introduce a new structure, for > example (neither compiled nor tested): > > struct vhostuser_compat { > char *version; > uint64_t virtio_features; > uint32_t max_rx_queue_sz; > uint32_t max_nr_queues; > }; > > *version* field is the compatibility version string. It could be > something like: "upstream.ovs-dpdk.v2.6". In case for example Fedora > adds some more patches to its package that would break migration to > upstream version, it could have a dedicated compatibility string: > "fc26.ovs-dpdk.v2.6". In case OVS-v2.7 does not break compatibility with > previous OVS-v2.6 version, then no need to create a new entry, just keep > v2.6 one. > > *virtio_features* field is the Virtio features set for a given > compatibility version. When an OVS tag is to be created, it would be > associated to a DPDK version. The Virtio features for these version > would be stored in this field. It would allow to upgrade the DPDK > package for example from v16.07 to v16.11 without breaking migration. > In case the distribution wants to benefit from latests Virtio > features, it would have to create a new entry to ensure migration > won't be broken. > > *max_rx_queue_sz* > *max_nr_queues* fields are just here for example, don't think this is > needed today. I just want to illustrate that we have to anticipate > other parameters than the Virtio feature set, even if not necessary > at the moment. > > We create a table with different compatibility versions in OVS > vhost-user lib: > > static struct vhostuser_compat vu_compat[] = { > { > .version = "upstream.ovs-dpdk.v2.7", > .virtio_features = 0x12045694, > .max_rx_queue_sz = 512, > }, > { > .version = "upstream.ovs-dpdk.v2.6", > .virtio_features = 0x10045694, > .max_rx_queue_sz = 1024, > }, > }; > > At some time during installation, or system init, the table would be > parsed, and compatibility version strings would be stored into the OVS > database, or a new tool would be created to list these strings, or a > config file packaged with OVS stores the list of compatibiliy versions. > > Before launching the VM, Nova will query the version strings for the > host so that the admin can select an older compatibility mode. If none > selected by the admin, then the most recent one will be used by default, > and passed to the OVS's add-port command as parameter. Note that if no > compatibility mode is passed to the add-port command, the most recent > one is selected by OVS as default. > > When the vhost-user connection is initiated, OVS would know in which > compatibility mode to init the interface, for example by restricting the > support Virtio features of the interface. > > Cheers, > Maxime > > [0]: > https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328257.html > <b2a5501c-7df7-ad2a-002f-d731c445a...@redhat.com> I'd think in implementing at least 3 new commands in OVS. 1) One to dump the list of supported features, for instance: # ovs-appctl vhostuser/list-features 2) then one to dump negotiated features from a port. # ovs-appctl vhostuser/dump-negotiated-features <bridge> <port> 3) Add port options to enable specific features. ovs-vsctl add-port ... features=<feature1|feature2|...> On qemu side, we would need a dump of supported features as well. Those are strings, in a standardized format: token|token|token. So, during deployment those commands could be used to find out the minimum set of features available in the cluster. At some point, the CMS can decide to allow more features for newer vhostuser ports. The only thing needed during migration is that the dump-negotiated-features is included in the supported features in the destination host for both qemu and OVS. Doing so, we only need to add more keyword representing new features in newer versions. The admin doesn't need to know about this. Nova can check whenever there is a change in the cluster to calculate the new minimum common set of features. -- Flavio _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev