On Mon, Jul 4, 2022 at 12:56 PM Vladislav Odintsov <odiv...@gmail.com> wrote: > > Thanks Numan, > > would you have time to fix it or maybe give an idea how to do it, so I can > try?
It would be great if you want to give it a try. I was thinking of 2 possible approaches to fix this issue. Even though pinctrl.c sets the status to offline, it cannot set the service monitor to offline when ovn-controller releases the port binding. The first option is to handle this in binding.c as you suggested earlier. Or ovn-northd can set the status to offline, if the service monitor's logical port is no longer claimed by any ovn-controller. I'm more inclined to handle this in ovn-northd. What do you think? Thanks Numan > > Regards, > Vladislav Odintsov > > > On 4 Jul 2022, at 19:51, Numan Siddique <num...@ovn.org> wrote: > > > > On Mon, Jul 4, 2022 at 7:48 AM Vladislav Odintsov <odiv...@gmail.com > > <mailto:odiv...@gmail.com>> wrote: > >> > >> Hi, > >> > >> we’ve found a wrong behaviour with service_monitor record status for a > >> health-checked Load Balancer. > >> Its status can stay online forever even if virtual machine is stopped. > >> This leads to load balanced traffic been sent to a dead backend. > >> > >> Below is the script to reproduce the issue because I doubt about the > >> correct place for a possible fix (my guess is it should be fixed in > >> controller/binding.c in function binding_lport_set_down, but I’m not sure > >> how this can affect VM live migration…): > >> > >> # cat ./repro.sh > >> #!/bin/bash -x > >> > >> ovn-nbctl ls-add ls1 > >> ovn-nbctl lsp-add ls1 lsp1 -- \ > >> lsp-set-addresses lsp1 "00:00:00:00:00:01 192.168.0.10" > >> ovn-nbctl lb-add lb1 192.168.0.100:80 192.168.0.10:80 > >> ovn-nbctl set Load_balancer lb1 > >> ip_port_mappings:192.168.0.10=lsp1:192.168.0.8 > >> ovn-nbctl --id=@id create Load_Balancer_Health_Check > >> vip='"192.168.0.100:80"' -- set Load_Balancer lb1 health_check=@id > >> ovn-nbctl ls-lb-add ls1 lb1 > >> > >> ovs-vsctl add-port br-int test-lb -- set interface test-lb type=internal > >> external_ids:iface-id=lsp1 > >> ip li set test-lb addr 00:00:00:00:00:01 > >> ip a add 192.168.0.10/24 dev test-lb > >> ip li set test-lb up > >> > >> # check service_monitor > >> ovn-sbctl list service_mon > >> > >> # ensure state became offline > >> sleep 4 > >> ovn-sbctl list service_mon > >> > >> # start listen on :80 with netcat > >> ncat -k -l 192.168.0.10 80 & > >> > >> # ensure state turned to online > >> sleep 4 > >> ovn-sbctl list service_mon > >> > >> # trigger binding release > >> ovs-vsctl remove interface test-lb external_ids iface-id > >> > >> # ensure state remains online > >> sleep 10 > >> ovn-sbctl list service_mon > >> > >> # ensure OVS group and backend is still in bucket > >> ovs-ofctl dump-groups br-int | grep 192.168.0.10 > > > > > > Thanks for the bug report. I could reproduce it locally. Looks to me > > it should be fixed in pinctrl.c as it sets the service monitor status. > > > > I've also raised a bugzilla here - > > https://bugzilla.redhat.com/show_bug.cgi?id=2103740 > > <https://bugzilla.redhat.com/show_bug.cgi?id=2103740> > > > > Numan > > > >> > >> > >> ———— > >> Looking forward to hear any thoughts on this. > >> > >> PS. don’t forget to kill ncat ;) > >> > >> > >> > >> Regards, > >> Vladislav Odintsov > >> _______________________________________________ > >> dev mailing list > >> d...@openvswitch.org <mailto:d...@openvswitch.org> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev