Problem Description:
Live-migrate a guest, which has a tap device and continuously sends and 
receives ARP packets, it would mistakenly think there's another guest with the 
same IP, immedially after migration.

The steps to reproduce the problem:
1 define and start a domain with its network configured as:
    <interface type='bridge'>
      <mac address='52:54:00:7d:b0:af'/>
      <source bridge='br0'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='e4ad3dbb-7808-4175-83ee-ee0cba1c5456'/>
      </virtualport>
      <model type='virtio'/>
      <driver name='vhost' queues='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
function='0x0'/>
    </interface>
2 The guest sends ARP packets continuously: arping -I ethX xx.xx.xx.xx(self_ip)
3 Meanwhile, the guest also receives ARP packets continuously: tcpdump -i ethX 
arp host xx.xx.xx.xx(self_ip) -entttt
4 After migrateion, at the dest side,  the guest gets a lot of ARP packets 
which came from the source-side guest(which was stored while it's suspended.).
For example:
2015-03-27 16:45:56.695166 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695197 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695205 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695214 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695244 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695256 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695264 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695291 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695324 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695337 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695344 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
2015-03-27 16:45:56.695364 52:54:00:7d:b0:af > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 60: arp who-has 9.61.108.208 (ff:ff:ff:ff:ff:ff) tell 
9.61.108.208
5 These packets will confuse my process. It may think that there is another VM 
has the same IP with itself.

Reasons for the problem:
The tap device will get up  as soon as it's created(in 
virNetDevTapCreateInBridgePort), before the cpus got un-paused.
So, it kept receiving data before the guest starts to run, please note that the 
data are sent from the source side.
As soon as the guest get running, it parses the data stored before, and thinks 
they were from other guest with the same IP, which is in fact the guest from 
the source side.




There was a patch "network: Bring netdevs online later", it move the 
virNetDevSetOnline() of network device
just before start VM's vcpu. But in the Laine Stump replay mail say "It turns 
out, though, that regular tap
devices which will be connected to a bridge should be ifup'ed and attached to 
the bridge as soon as possible,
so that the forwarding delay timer of the bridge can start to count down."

I agree with Laine Stump's idea that it's not a perfect solution to start the 
tap device right before running vcpu.
so, here comes the question:
what can we do to insure our guests not receive itself's ARP packets from src 
side during migrateion?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to