I think I see a problem in the implementation of bonding when recirculation is available. Are you able to try out a patch? If so, try the following. It is not a good way to solve the issue, but it should illustrate whether recirculation is the problem.
diff --git a/ofproto/bond.c b/ofproto/bond.c index f87cdba7908f..bb6a80411de5 100644 --- a/ofproto/bond.c +++ b/ofproto/bond.c @@ -927,7 +928,7 @@ bond_recirculation_account(struct bond *bond) static bool bond_may_recirc(const struct bond *bond) { - return bond->balance == BM_TCP && bond->recirc_id; + return bond->balance == BM_TCP && bond->recirc_id && false; } static void On Fri, Oct 05, 2018 at 04:45:21AM +0000, Arun Navasivasakthivelsamy wrote: > Also, ofproto/trace is suggesting that the packet will be hashed to a slave > (instead of just to the active port) with lacp-fallback-ab option. > > This is with active-backup bond mode: > > > [root@frankfurter02-4 ~]# ovs-appctl ofproto/trace br0 > in_port=6,dl_dst=28:99:3a:08:7a:cf > > Bridge: br0 > > Flow: > in_port=6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=28:99:3a:08:7a:cf,dl_type=0x0000 > > > Rule: table=0 cookie=0 priority=0 > > OpenFlow actions=NORMAL > > forwarding to learned port > > > Final flow: unchanged > > Megaflow: > recirc_id=0,in_port=6,vlan_tci=0x0000/0x1fff,dl_src=00:00:00:00:00:00,dl_dst=28:99:3a:08:7a:cf,dl_type=0x0000 > > Datapath actions: 5 > > > This is with balance-tcp with lacp-fallback-ab mode (LACP is not negotiated): > > > [root@frankfurter02-4 ~]# ovs-appctl ofproto/trace br0 > in_port=6,dl_dst=28:99:3a:08:7a:cf > > Bridge: br0 > > Flow: > in_port=6,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=28:99:3a:08:7a:cf,dl_type=0x0000 > > > Rule: table=0 cookie=0 priority=0 > > OpenFlow actions=NORMAL > > forwarding to learned port > > > Final flow: unchanged > > Megaflow: > recirc_id=0,in_port=6,vlan_tci=0x0000/0x1fff,dl_src=00:00:00:00:00:00,dl_dst=28:99:3a:08:7a:cf,dl_type=0x0000 > > Datapath actions: hash(hash_l4(0)),recirc(0x1) > > From: Arunkumar Navasiva > <arunkum.nava...@nutanix.com<mailto:arunkum.nava...@nutanix.com>> > Date: Thursday, October 4, 2018 at 4:18 PM > To: "ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>" > <ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>> > Subject: Issue with OVS lacp-fallback-ab option > > Hello folks, > > We’re seeing an issue with lacp-fallback-ab option on ovs 2.5/2.6/2.8. It > looks like when LACP is not enabled on the TOR switch ports, OVS on the > centos server is not falling back cleanly to active-backup, and continues to > send some portions of the traffic through the backup interface (we’ve seen > this occur with various TOR vendor switches). Please see the attached > screenshot which shows that some traffic still hashes to backup interface. We > looked at the TOR forwarding table, and MAC addresses of VMs running on this > server flaps between the two corresponding TOR switch ports of the bond . I’m > still in the early stages of debugging this, but wanted to reach out to you > to see if this is already a known issue? If not, any help on how to debug > this further will be helpful. > > > Thanks > -Arun > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev