[Bridge] linux bridge does not forward arp reply back packets in a vmware vm
Hello, I have a strange issue with a linux bridge created by openstack-neutron (pike release). This linux bridge is hosted in a vmware VM running latest CentOS 7, with a single network interface in promiscuous mode. >From openstack neutron perspective, the networking setup is simple: a single flat external provider network, with a single cirros VM instance connected to it. Therefore, in the linux bridge running in the vmware host, I have 3 interfaces: # brctl show bridge name bridge id STP enabled interfaces brq025a9a94-58 8000.005056a6b378 no ens160 tap2eb4cad6-cd <- neutron DHCP agent tap interface tap6d31a191-9f <- cirros VM instance tap interface The ens160 is the "physical" CentOS 7 host interface, that is in promiscuous mode. The tap2eb4cad6-cd tap interface is the neutron DHCP agent interface, and the tap6d31a191-9f tap interface is used by the cirros VM instance. The problem is the following: With a tcpdump, I am able to see the arp request (ARP, Request who-has 10.20.21.1 tell 10.20.21.233) going out from the cirros VM instance on tap interface tap6d31a191-9f, and well as on the bridge itself (brq025a9a94-58). However, the reply back to the arp request (Reply 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros VM instance anymore. With tcpdump, I am able to see the arp reply back packets in the bridge (brq025a9a94-58), however they do not show up anymore on the cirros VM instance tap interface tap6d31a191-9f. To me it seems that for whatever reason, the bridge does not forward the arp reply back packets to the cirros VM tap interface, and I do not understand why. The strange thing is that after a while, for apparently no reason, a single arp reply back packet gets through the bridge and the tap interface, and the arp table gets updated with correct IP address in the cirros VM instance. However, if I clean up the arp table in the cirros VM instance, it takes again 10 to 15 minutes of continuously sending arp requests, until a single arp reply back packets gets through. I was banging my head to the table for a few days with this issue, and finally, for apparent no reason, I manually configured the bridge max aging time to 0, to convert it in a hub, and from that moment everything started to work without any issue. Still, I do no understand why is this happening, and obviously I cannot manually set up the bridge aging time to 0 all the time in all the bridges openstack neutron creates automatically. Any thoughts? Many thanks in advance. Best regards, Adrian
Re: [Bridge] linux bridge does not forward arp reply back packets in a vmware vm
On Fri, 15 Dec 2017 15:37:39 +0200 Adrian P wrote: > Hello, > > I have a strange issue with a linux bridge created by > openstack-neutron (pike release). This linux bridge is hosted in a > vmware VM running latest CentOS 7, with a single network interface in > promiscuous mode. > > From openstack neutron perspective, the networking setup is simple: a > single flat external provider network, with a single cirros VM > instance connected to it. > > Therefore, in the linux bridge running in the vmware host, I have 3 > interfaces: > > # brctl show > bridge name bridge id STP enabled interfaces > brq025a9a94-58 8000.005056a6b378 no ens160 > tap2eb4cad6-cd ><- neutron DHCP agent tap interface > tap6d31a191-9f ><- cirros VM instance tap interface > > The ens160 is the "physical" CentOS 7 host interface, that is in > promiscuous mode. > > The tap2eb4cad6-cd tap interface is the neutron DHCP agent interface, > and the tap6d31a191-9f tap interface is used by the cirros VM > instance. > > The problem is the following: > > With a tcpdump, I am able to see the arp request (ARP, Request who-has > 10.20.21.1 tell 10.20.21.233) going out from the cirros VM instance on > tap interface tap6d31a191-9f, and well as on the bridge itself > (brq025a9a94-58). However, the reply back to the arp request (Reply > 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros VM > instance anymore. With tcpdump, I am able to see the arp reply back > packets in the bridge (brq025a9a94-58), however they do not show up > anymore on the cirros VM instance tap interface tap6d31a191-9f. > > To me it seems that for whatever reason, the bridge does not forward > the arp reply back packets to the cirros VM tap interface, and I do > not understand why. The strange thing is that after a while, for > apparently no reason, a single arp reply back packet gets through the > bridge and the tap interface, and the arp table gets updated with > correct IP address in the cirros VM instance. However, if I clean up > the arp table in the cirros VM instance, it takes again 10 to 15 > minutes of continuously sending arp requests, until a single arp reply > back packets gets through. > > I was banging my head to the table for a few days with this issue, and > finally, for apparent no reason, I manually configured the bridge max > aging time to 0, to convert it in a hub, and from that moment > everything started to work without any issue. Still, I do no > understand why is this happening, and obviously I cannot manually set > up the bridge aging time to 0 all the time in all the bridges > openstack neutron creates automatically. > > Any thoughts? > > Many thanks in advance. > > Best regards, > Adrian Does each tap instance and the ens160 have a different and valid Ethernet address? Also make sure the these are in the bridge forwarding table.
Re: [Bridge] linux bridge does not forward arp reply back packets in a vmware vm
On Fri, Dec 15, 2017 at 5:55 PM, Stephen Hemminger wrote: > On Fri, 15 Dec 2017 15:37:39 +0200 > Adrian P wrote: > >> Hello, >> >> I have a strange issue with a linux bridge created by >> openstack-neutron (pike release). This linux bridge is hosted in a >> vmware VM running latest CentOS 7, with a single network interface in >> promiscuous mode. >> >> From openstack neutron perspective, the networking setup is simple: a >> single flat external provider network, with a single cirros VM >> instance connected to it. >> >> Therefore, in the linux bridge running in the vmware host, I have 3 >> interfaces: >> >> # brctl show >> bridge name bridge id STP enabled interfaces >> brq025a9a94-58 8000.005056a6b378 no ens160 >> tap2eb4cad6-cd >><- neutron DHCP agent tap interface >> tap6d31a191-9f >><- cirros VM instance tap interface >> >> The ens160 is the "physical" CentOS 7 host interface, that is in >> promiscuous mode. >> >> The tap2eb4cad6-cd tap interface is the neutron DHCP agent interface, >> and the tap6d31a191-9f tap interface is used by the cirros VM >> instance. >> >> The problem is the following: >> >> With a tcpdump, I am able to see the arp request (ARP, Request who-has >> 10.20.21.1 tell 10.20.21.233) going out from the cirros VM instance on >> tap interface tap6d31a191-9f, and well as on the bridge itself >> (brq025a9a94-58). However, the reply back to the arp request (Reply >> 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros VM >> instance anymore. With tcpdump, I am able to see the arp reply back >> packets in the bridge (brq025a9a94-58), however they do not show up >> anymore on the cirros VM instance tap interface tap6d31a191-9f. >> >> To me it seems that for whatever reason, the bridge does not forward >> the arp reply back packets to the cirros VM tap interface, and I do >> not understand why. The strange thing is that after a while, for >> apparently no reason, a single arp reply back packet gets through the >> bridge and the tap interface, and the arp table gets updated with >> correct IP address in the cirros VM instance. However, if I clean up >> the arp table in the cirros VM instance, it takes again 10 to 15 >> minutes of continuously sending arp requests, until a single arp reply >> back packets gets through. >> >> I was banging my head to the table for a few days with this issue, and >> finally, for apparent no reason, I manually configured the bridge max >> aging time to 0, to convert it in a hub, and from that moment >> everything started to work without any issue. Still, I do no >> understand why is this happening, and obviously I cannot manually set >> up the bridge aging time to 0 all the time in all the bridges >> openstack neutron creates automatically. >> >> Any thoughts? >> >> Many thanks in advance. >> >> Best regards, >> Adrian > > Does each tap instance and the ens160 have a different and valid Ethernet > address? Also make sure the these are in the bridge forwarding table. Yes, they have valid Ethernet addresses, and they do show up in the forwarding table twice, see below: # ip addr <...> 2: ens160: mtu 1500 qdisc mq master brq025a9a94-58 state UP qlen 1000 link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff inet6 fe80::250:56ff:fea6:b378/64 scope link valid_lft forever preferred_lft forever 4: tap2eb4cad6-cd@if2: mtu 1500 qdisc noqueue master brq025a9a94-58 state UP qlen 1000 link/ether 8a:b2:15:4c:96:55 brd ff:ff:ff:ff:ff:ff link-netnsid 0 5: brq025a9a94-58: mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff inet 10.20.21.249/24 brd 10.20.21.255 scope global brq025a9a94-58 valid_lft forever preferred_lft forever inet6 fe80::803d:d0ff:fe2e:3ae4/64 scope link valid_lft forever preferred_lft forever 6: tap6d31a191-9f: mtu 1500 qdisc pfifo_fast master brq025a9a94-58 state UNKNOWN qlen 1000 link/ether fe:16:3e:9a:04:95 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe9a:495/64 scope link valid_lft forever preferred_lft forever # brctl showmacs brq025a9a94-58 port no mac addris local? ageing timer 1 00:50:56:a6:b3:78 yes0.00 1 00:50:56:a6:b3:78 yes0.00 2 8a:b2:15:4c:96:55 yes0.00 2 8a:b2:15:4c:96:55 yes0.00 3 fe:16:3e:9a:04:95 yes0.00 3 fe:16:3e:9a:04:95 yes0.00
Re: [Bridge] linux bridge does not forward arp reply back packets in a vmware vm
On Fri, 15 Dec 2017 18:29:58 +0200 Adrian P wrote: > On Fri, Dec 15, 2017 at 5:55 PM, Stephen Hemminger > wrote: > > On Fri, 15 Dec 2017 15:37:39 +0200 > > Adrian P wrote: > > > >> Hello, > >> > >> I have a strange issue with a linux bridge created by > >> openstack-neutron (pike release). This linux bridge is hosted in a > >> vmware VM running latest CentOS 7, with a single network interface in > >> promiscuous mode. > >> > >> From openstack neutron perspective, the networking setup is simple: a > >> single flat external provider network, with a single cirros VM > >> instance connected to it. > >> > >> Therefore, in the linux bridge running in the vmware host, I have 3 > >> interfaces: > >> > >> # brctl show > >> bridge name bridge id STP enabled interfaces > >> brq025a9a94-58 8000.005056a6b378 no ens160 > >> tap2eb4cad6-cd > >><- neutron DHCP agent tap interface > >> tap6d31a191-9f > >><- cirros VM instance tap interface > >> > >> The ens160 is the "physical" CentOS 7 host interface, that is in > >> promiscuous mode. > >> > >> The tap2eb4cad6-cd tap interface is the neutron DHCP agent interface, > >> and the tap6d31a191-9f tap interface is used by the cirros VM > >> instance. > >> > >> The problem is the following: > >> > >> With a tcpdump, I am able to see the arp request (ARP, Request who-has > >> 10.20.21.1 tell 10.20.21.233) going out from the cirros VM instance on > >> tap interface tap6d31a191-9f, and well as on the bridge itself > >> (brq025a9a94-58). However, the reply back to the arp request (Reply > >> 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros VM > >> instance anymore. With tcpdump, I am able to see the arp reply back > >> packets in the bridge (brq025a9a94-58), however they do not show up > >> anymore on the cirros VM instance tap interface tap6d31a191-9f. > >> > >> To me it seems that for whatever reason, the bridge does not forward > >> the arp reply back packets to the cirros VM tap interface, and I do > >> not understand why. The strange thing is that after a while, for > >> apparently no reason, a single arp reply back packet gets through the > >> bridge and the tap interface, and the arp table gets updated with > >> correct IP address in the cirros VM instance. However, if I clean up > >> the arp table in the cirros VM instance, it takes again 10 to 15 > >> minutes of continuously sending arp requests, until a single arp reply > >> back packets gets through. > >> > >> I was banging my head to the table for a few days with this issue, and > >> finally, for apparent no reason, I manually configured the bridge max > >> aging time to 0, to convert it in a hub, and from that moment > >> everything started to work without any issue. Still, I do no > >> understand why is this happening, and obviously I cannot manually set > >> up the bridge aging time to 0 all the time in all the bridges > >> openstack neutron creates automatically. > >> > >> Any thoughts? > >> > >> Many thanks in advance. > >> > >> Best regards, > >> Adrian > > > > Does each tap instance and the ens160 have a different and valid Ethernet > > address? Also make sure the these are in the bridge forwarding table. > > Yes, they have valid Ethernet addresses, and they do show up in the > forwarding table twice, see below: > > # ip addr > <...> > 2: ens160: mtu 1500 qdisc mq master > brq025a9a94-58 state UP qlen 1000 > link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff > inet6 fe80::250:56ff:fea6:b378/64 scope link >valid_lft forever preferred_lft forever > 4: tap2eb4cad6-cd@if2: mtu 1500 > qdisc noqueue master brq025a9a94-58 state UP qlen 1000 > link/ether 8a:b2:15:4c:96:55 brd ff:ff:ff:ff:ff:ff link-netnsid 0 > 5: brq025a9a94-58: mtu 1500 qdisc > noqueue state UP qlen 1000 > link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff > inet 10.20.21.249/24 brd 10.20.21.255 scope global brq025a9a94-58 >valid_lft forever preferred_lft forever > inet6 fe80::803d:d0ff:fe2e:3ae4/64 scope link >valid_lft forever preferred_lft forever > 6: tap6d31a191-9f: mtu 1500 qdisc > pfifo_fast master brq025a9a94-58 state UNKNOWN qlen 1000 > link/ether fe:16:3e:9a:04:95 brd ff:ff:ff:ff:ff:ff > inet6 fe80::fc16:3eff:fe9a:495/64 scope link >valid_lft forever preferred_lft forever > > # brctl showmacs brq025a9a94-58 > port no mac addris local? ageing timer > 1 00:50:56:a6:b3:78 yes0.00 > 1 00:50:56:a6:b3:78 yes0.00 > 2 8a:b2:15:4c:96:55 yes0.00 > 2 8a:b2:15:4c:96:55 yes0.00 > 3 fe:16:3e:9a:04:95 yes0.00 > 3 fe:16:3e:9a:04:95 yes0.00 Since there are multiple entries per port maybe you are also usi
Re: [Bridge] linux bridge does not forward arp reply back packets in a vmware vm
On Sat, Dec 16, 2017 at 3:47 AM, Stephen Hemminger wrote: > On Fri, 15 Dec 2017 18:29:58 +0200 > Adrian P wrote: > >> On Fri, Dec 15, 2017 at 5:55 PM, Stephen Hemminger >> wrote: >> > On Fri, 15 Dec 2017 15:37:39 +0200 >> > Adrian P wrote: >> > >> >> Hello, >> >> >> >> I have a strange issue with a linux bridge created by >> >> openstack-neutron (pike release). This linux bridge is hosted in a >> >> vmware VM running latest CentOS 7, with a single network interface in >> >> promiscuous mode. >> >> >> >> From openstack neutron perspective, the networking setup is simple: a >> >> single flat external provider network, with a single cirros VM >> >> instance connected to it. >> >> >> >> Therefore, in the linux bridge running in the vmware host, I have 3 >> >> interfaces: >> >> >> >> # brctl show >> >> bridge name bridge id STP enabled interfaces >> >> brq025a9a94-58 8000.005056a6b378 no ens160 >> >> tap2eb4cad6-cd >> >><- neutron DHCP agent tap interface >> >> tap6d31a191-9f >> >><- cirros VM instance tap interface >> >> >> >> The ens160 is the "physical" CentOS 7 host interface, that is in >> >> promiscuous mode. >> >> >> >> The tap2eb4cad6-cd tap interface is the neutron DHCP agent interface, >> >> and the tap6d31a191-9f tap interface is used by the cirros VM >> >> instance. >> >> >> >> The problem is the following: >> >> >> >> With a tcpdump, I am able to see the arp request (ARP, Request who-has >> >> 10.20.21.1 tell 10.20.21.233) going out from the cirros VM instance on >> >> tap interface tap6d31a191-9f, and well as on the bridge itself >> >> (brq025a9a94-58). However, the reply back to the arp request (Reply >> >> 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros VM >> >> instance anymore. With tcpdump, I am able to see the arp reply back >> >> packets in the bridge (brq025a9a94-58), however they do not show up >> >> anymore on the cirros VM instance tap interface tap6d31a191-9f. >> >> >> >> To me it seems that for whatever reason, the bridge does not forward >> >> the arp reply back packets to the cirros VM tap interface, and I do >> >> not understand why. The strange thing is that after a while, for >> >> apparently no reason, a single arp reply back packet gets through the >> >> bridge and the tap interface, and the arp table gets updated with >> >> correct IP address in the cirros VM instance. However, if I clean up >> >> the arp table in the cirros VM instance, it takes again 10 to 15 >> >> minutes of continuously sending arp requests, until a single arp reply >> >> back packets gets through. >> >> >> >> I was banging my head to the table for a few days with this issue, and >> >> finally, for apparent no reason, I manually configured the bridge max >> >> aging time to 0, to convert it in a hub, and from that moment >> >> everything started to work without any issue. Still, I do no >> >> understand why is this happening, and obviously I cannot manually set >> >> up the bridge aging time to 0 all the time in all the bridges >> >> openstack neutron creates automatically. >> >> >> >> Any thoughts? >> >> >> >> Many thanks in advance. >> >> >> >> Best regards, >> >> Adrian >> > >> > Does each tap instance and the ens160 have a different and valid Ethernet >> > address? Also make sure the these are in the bridge forwarding table. >> >> Yes, they have valid Ethernet addresses, and they do show up in the >> forwarding table twice, see below: >> >> # ip addr >> <...> >> 2: ens160: mtu 1500 qdisc mq master >> brq025a9a94-58 state UP qlen 1000 >> link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff >> inet6 fe80::250:56ff:fea6:b378/64 scope link >>valid_lft forever preferred_lft forever >> 4: tap2eb4cad6-cd@if2: mtu 1500 >> qdisc noqueue master brq025a9a94-58 state UP qlen 1000 >> link/ether 8a:b2:15:4c:96:55 brd ff:ff:ff:ff:ff:ff link-netnsid 0 >> 5: brq025a9a94-58: mtu 1500 qdisc >> noqueue state UP qlen 1000 >> link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff >> inet 10.20.21.249/24 brd 10.20.21.255 scope global brq025a9a94-58 >>valid_lft forever preferred_lft forever >> inet6 fe80::803d:d0ff:fe2e:3ae4/64 scope link >>valid_lft forever preferred_lft forever >> 6: tap6d31a191-9f: mtu 1500 qdisc >> pfifo_fast master brq025a9a94-58 state UNKNOWN qlen 1000 >> link/ether fe:16:3e:9a:04:95 brd ff:ff:ff:ff:ff:ff >> inet6 fe80::fc16:3eff:fe9a:495/64 scope link >>valid_lft forever preferred_lft forever >> >> # brctl showmacs brq025a9a94-58 >> port no mac addris local? ageing timer >> 1 00:50:56:a6:b3:78 yes0.00 >> 1 00:50:56:a6:b3:78 yes0.00 >> 2 8a:b2:15:4c:96:55 yes0.00 >> 2 8a:b2:15:4c:96:55 yes0.00 >> 3 fe:16:3e:9a