On Tue, Apr 27, 2021 at 6:00 PM Francois <rigault.franc...@gmail.com> wrote: > > On Tue, 27 Apr 2021 at 23:08, Numan Siddique <num...@ovn.org> wrote: > > > > On Tue, Apr 27, 2021 at 4:58 PM Francois <rigault.franc...@gmail.com> wrote: > > > > > > On Tue, 27 Apr 2021 at 22:20, Numan Siddique <num...@ovn.org> wrote: > > > > > > > > On Tue, Apr 27, 2021 at 9:11 AM Francois <rigault.franc...@gmail.com> > > > > wrote: > > > > > > > > > > > > The ovn-controller running on chassis-1 will not detect the BFD > > > > failover. > > > > > > Thanks for your answer! Ok for chassis-1. > > > > > > What I don't understand is why chassis-2, who is aware that chassis-1 > > > is down, is not able to act as a gateway for its own ports. > > > > I see what's going on. So ovn-controller on chassis-2 detects the failover > > and claims the cr-<gateway_port>. But ovn-controller on chassis-1 which has > > higher priority claims it back because according to it, BFD is fine. > > > > You can probably monitor the ovn-controller logs on both chassis, and you > > might notice claim/release logs. > > > > Or you can do "tail -f ovnsb_db.db" and see that there are constant updates > > to the cr-<gateway_port>. > > > > Having 3 chassis will not result in this split brain scenario which you have > > probably observed. > > I am going to do a bit more research and see what happens on some > real OpenStack installation, maybe I messed up somewhere. > > There is nothing logged in the ovn-controller, and nothing flooding > the DB (+one line saying port_binding is down). My understanding was > that the move of gateway (as it happens for chassis-3) happens > without the involvement of the control plane, in other words in case > the first gateway fails, the flows to move to the second gateway are > already installed and can be used straight away. > > I am puzzled because if I trace the packet from chassis-2 before and > after chassis-1 dies, it always end up in flow > > 37. reg15=0x3,metadata=0x4, priority 100, cookie 0x7a15360f > set_field:0x4/0xffffff->tun_id > set_field:0x3->tun_metadata0 > move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30] > -> NXM_NX_TUN_METADATA0[16..30] is now 0x1 > bundle(eth_src,0,active_backup,ofport,members:7) > > Only difference is, when chassis-1 is up, the added > -> output to kernel tunnel > > It seems that there is no backup flow for packets not going through a > tunnel, straight to external.
I think it is expected, because ovn-controller of chassis-1 has claimed the gateway port (i.e cr-<gw_port), and hence ovn-controller on chassis-2 has the above flow you mentioned. If you run "ovn-sbctl show" you would see chassis-1 claiming the gateway chassis port. (I am talking about your 2 chassis scenario here). Along with killing ovs-vswitchd, if you also kill ovn-controller, you should not see the above tunnel flow. Instead ovn-controller on chassis-2 would claim the gateway chassis port (confirm by running ovn-sbctl show) and also remove the above table 37 flow. Thanks Numan > > Before tackling the tricky cases, I would like to make it work when > it fails "as documented" :), just one chassis dying but traffic being > quickly dispatched somewhere else. > > Thanks > > On Tue, 27 Apr 2021 at 23:08, Numan Siddique <num...@ovn.org> wrote: > > > > On Tue, Apr 27, 2021 at 4:58 PM Francois <rigault.franc...@gmail.com> wrote: > > > > > > On Tue, 27 Apr 2021 at 22:20, Numan Siddique <num...@ovn.org> wrote: > > > > > > > > On Tue, Apr 27, 2021 at 9:11 AM Francois <rigault.franc...@gmail.com> > > > > wrote: > > > > > > > > > > > > The ovn-controller running on chassis-1 will not detect the BFD > > > > failover. > > > > > > Thanks for your answer! Ok for chassis-1. > > > > > > What I don't understand is why chassis-2, who is aware that chassis-1 > > > is down, is not able to act as a gateway for its own ports. > > > > I see what's going on. So ovn-controller on chassis-2 detects the failover > > and claims the cr-<gateway_port>. But ovn-controller on chassis-1 which has > > higher priority claims it back because according to it, BFD is fine. > > > > You can probably monitor the ovn-controller logs on both chassis, and you > > might notice claim/release logs. > > > > Or you can do "tail -f ovnsb_db.db" and see that there are constant updates > > to the cr-<gateway_port>. > > > > Having 3 chassis will not result in this split brain scenario which you have > > probably observed. > > > > Thanks > > Numan > > > > > > > > > > Francois > > > _______________________________________________ > > > discuss mailing list > > > disc...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss