[Yahoo-eng-team] [Bug 1483601] [NEW] l2 population failed when bulk live migrate VMs

shihanzhang Tue, 11 Aug 2015 02:31:21 -0700

Public bug reported:

when we bulk live migrate VMs, the l2 population may possiblly(not always) 
failed at destination compute nodes, because when nova migrate VM at 
destination compute node, it just update port's binding:host,  the port's 
status is still active, from neutron perspective, the progress of port status 
is : active -> build -> active,
in bellow case, l2 population  will fail:
1. nova successfully live migrate vm A and VM B from compute A to compute B.
2. port A and port B status are active,  binding:host are compute B .
3. l2 agent scans these two port, then handle them one by one.
4. neutron-server firstly handle port A, its status will be build(remember port 
B status is still active), and do bellow check
in l2 population check,  this check will be fail


def _update_port_up(self, context):
        ......
  if agent_active_ports == 1 or (self.get_agent_uptime(agent) < 
cfg.CONF.l2pop.agent_boot_time):
  # First port activated on current agent in this network,
  # we have to provide it with the whole list of fdb entries

** Affects: neutron
     Importance: Undecided
         Status: New

** Description changed:

  when we bulk live migrate VMs, the l2 population may possiblly(not always) 
failed at destination compute nodes,
  because when nova migrate VM at destination compute node, it just update 
port's binding:host,  the port's status
- is still active, from neutron perspective, the progress of port status is : 
active -> build -> active,  
+ is still active, from neutron perspective, the progress of port status is : 
active -> build -> active,
  in bellow case, l2 population  will fail:
  1. nova successfully live migrate vm A and VM B from compute A to compute B.
  2. port A and port B status are active,  binding:host are compute B .
  3. l2 agent scans these two port, then handle them one by one.
- 4. neutron-server firstly handle port A, its status will be build(remember 
port B status is still active), and do bellow check 
+ 4. neutron-server firstly handle port A, its status will be build(remember 
port B status is still active), and do bellow check
  in l2 population check,  this check will be fail
  
-     def _update_port_up(self, context):
-         ......
-         if agent_active_ports == 1 or (
-                 self.get_agent_uptime(agent) < 
cfg.CONF.l2pop.agent_boot_time):
-                # First port activated on current agent in this network,
-                # we have to provide it with the whole list of fdb entries
+ def _update_port_up(self, context):
+         ......
+   if agent_active_ports == 1 or (self.get_agent_uptime(agent) < 
cfg.CONF.l2pop.agent_boot_time):
+   # First port activated on current agent in this network,
+   # we have to provide it with the whole list of fdb entries

** Description changed:

- when we bulk live migrate VMs, the l2 population may possiblly(not always) 
failed at destination compute nodes,
- because when nova migrate VM at destination compute node, it just update 
port's binding:host,  the port's status
- is still active, from neutron perspective, the progress of port status is : 
active -> build -> active,
+ when we bulk live migrate VMs, the l2 population may possiblly(not always) 
failed at destination compute nodes, because when nova migrate VM at 
destination compute node, it just update port's binding:host,  the port's 
status is still active, from neutron perspective, the progress of port status 
is : active -> build -> active,
  in bellow case, l2 population  will fail:
  1. nova successfully live migrate vm A and VM B from compute A to compute B.
  2. port A and port B status are active,  binding:host are compute B .
  3. l2 agent scans these two port, then handle them one by one.
  4. neutron-server firstly handle port A, its status will be build(remember 
port B status is still active), and do bellow check
  in l2 population check,  this check will be fail
  
  def _update_port_up(self, context):
          ......
    if agent_active_ports == 1 or (self.get_agent_uptime(agent) < 
cfg.CONF.l2pop.agent_boot_time):
    # First port activated on current agent in this network,
    # we have to provide it with the whole list of fdb entries

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1483601

Title:
  l2 population failed when bulk live migrate VMs

Status in neutron:
  New

Bug description:
  when we bulk live migrate VMs, the l2 population may possiblly(not always) 
failed at destination compute nodes, because when nova migrate VM at 
destination compute node, it just update port's binding:host,  the port's 
status is still active, from neutron perspective, the progress of port status 
is : active -> build -> active,
  in bellow case, l2 population  will fail:
  1. nova successfully live migrate vm A and VM B from compute A to compute B.
  2. port A and port B status are active,  binding:host are compute B .
  3. l2 agent scans these two port, then handle them one by one.
  4. neutron-server firstly handle port A, its status will be build(remember 
port B status is still active), and do bellow check
  in l2 population check,  this check will be fail

  def _update_port_up(self, context):
          ......
    if agent_active_ports == 1 or (self.get_agent_uptime(agent) < 
cfg.CONF.l2pop.agent_boot_time):
    # First port activated on current agent in this network,
    # we have to provide it with the whole list of fdb entries

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1483601/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1483601] [NEW] l2 population failed when bulk live migrate VMs

Reply via email to