Hi together, 
I wanted to start discussion about Live Migration problem that currently exists 
in the nova neutron communication.

Basics Live Migration and Nova - Neutron communication
------------------------------------------------------
On a high level, Nova Live Migration happens in 3 stages. (--> is what's 
happening from network perspective)
#1 pre_live_migration
   --> libvirtdriver: nova plugs the interface (for ovs hybrid sets up the 
linuxbridge + veth and connects it to br-int)
#2 live_migration_operation
   --> instance is being migrated (using libvirt with the domain.xml that is 
currently active on the migration source)
#3 post_live_migration
   --> binding:host_id is being updated for the port
   --> libvirtdriver: domain.xml is being regenerated  
More details can be found here [1]

The problem - portbinding fails
-------------------------------
With this flow, ML2 portbinding is triggered in post_live_migration. At this 
point, the instance has already been migrated and is active on the migration 
destination.
Part of the port-binding is happening in the mechanism drivers, where the vif 
information for the port (vif-type, vif-details,..) is being updated.
If this portbinding fails, port will get the binding:vif_type "binding_failed".
After that the nova libvirt driver starts generating the domain xml again to 
persist it. Part of this generation is also generating the interface 
definition. 
This fails as the vif_type is "binding_failed". Nova will set the instance to 
error state. --> There is no rollback, as it's already too late!

Just a remark: There is no explicit check for the vif_type binding_failed. I 
have the feeling that it (luckily) fails by accident when generating the xml.

--> Ideally we would trigger the portbinding before the migration started - in 
pre_live_migration. Then, if binding fails, we could abort migration before it 
even started. The instance would still be
active and fully functional on the source host. I have a WIP patchset out 
proposing this change [2]


The impact
----------
Patchset [2] propose updating the host_id already in pre_live_migration. 
During migration, the port would already be owned by the migration target 
(although the guest is still active on the source)
Technically this works fine for all the reference implementations, but this 
could be a problem for some third party mech drivers, if they shut down the 
port on the source and activate it on the target - although instance is still 
on the source

Any thoughts on this?


Additional use cases that would be enabled with this change
-----------------------------------------------------------
When updating the host_id in pre_live_migration, we could modify the domain.xml 
with the new vif information before live migration (see patch [2] and nova spec 
[4]).
This enables the following use cases

#1 Live Migration between nodes that run different l2 agents
   E.g. you could migrate a instance from an ovs node to an lb node and vice 
versa. This could be used as l2 agent transition strategy!
#2 Live Migration with macvtap agent
   It would enable the macvtap agent to live migrate instances between hosts, 
that use a different physical_interface_mapping. See bug [3]

--> #2 is the use case that made me thinking about this whole topic....

Potential other solutions
-------------------------
#1 Have something like simultaneous portbinding - On migration, a port is bound 
to 2 hosts (like a dvr port can today).
Therefore some database refactorings would be required (work has already been 
started in the DVR context [7])
And the Rest API would need to be changed in a way, that there's not a single 
binding, but a list of bindings returned. Of course also create & update that 
list.

#2 execute portbinding without saving it to db
we could also introduce a new api( like update port, with live migration flag), 
that would run through the portbinding code and would return the port
information for the target node, but would not persist this information. Son on 
port-show you would still get the old information. Update would only happen if 
the migration flag is not present (in post_live_migration like today)
Alternatively the generated protbidning could be stored in the port context and 
be used on the final port_update be instead of running through all the code 
pathes again.


Other efforts in the area nova neutron live migration
-----------------------------------------------------
Just for reference, those are the other activities around nova-neutron live 
migration I'm aware of. But non of them is related to this IMO.

#1 ovs-hybrid plug wait for vi-plug event before doing live migration
see patches [5]
--> on nova plug, creates the linuxbridge and the veth pair and plugs it into 
the br-int. This plug is being detected by the ovs agent, which then reports 
the device as up
which again triggers this vif-plug event. This does not solve the problem as 
portbinding is not involved anyhow. This patch can also not be used for lb, ovs 
normal and macvtap, 
as for those vif-types libvirt sets up the device that the agent is looking 
for. But this happens during live migration operation.

#2 Implement setup_networks_on_host for Neutron networks
Notification that Neutron sets up a DVR router attachment on the target node
see patch [6] + related patches

#3 I also know the midonet faces some challenges during nova plug
but this is also a separate topic



Any discussion / input would be helpful, thanks a lot!


[1] 
https://review.openstack.org/#/c/274097/6/doc/source/devref/live_migration.rst
[2] https://review.openstack.org/297100
[3] https://bugs.launchpad.net/neutron/+bug/1550400
[4] https://review.openstack.org/301090
[5] https://review.openstack.org/246898 & https://review.openstack.org/246910
[6] https://review.openstack.org/275073
[7]  https://bugs.launchpad.net/neutron/+bug/1367391


-- 
-----
Andreas (IRC: scheuran)


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to