Re: [ovs-dev] [ovn] bug: load balancer health check status is not updated if port binding is released from chassis

2022-07-05 Thread Vladislav Odintsov
Thanks Numan for the hint.
I’ve submitted the full patch here: 
https://patchwork.ozlabs.org/project/ovn/patch/20220705175154.3095150-1-odiv...@gmail.com/
It would be great if you can find some time for review.

Thanks.

Regards,
Vladislav Odintsov

> On 5 Jul 2022, at 18:44, Numan Siddique  wrote:
> 
> On Tue, Jul 5, 2022 at 8:32 AM Vladislav Odintsov  > wrote:
>> 
>> Hi Numan,
>> 
>> I’ve send a draft patch: 
>> https://patchwork.ozlabs.org/project/ovn/patch/20220705122031.2568471-1-odiv...@gmail.com/
>> 
>> While implementing I've encountered a problem where port binding record, 
>> which I’m trying to get from struct ovn_port has stale state (it doesn’t 
>> reflect "up" field changes, which I see in ovn-northd logs with an update 
>> and I have no idea how to pull.
>> 
>> From ovn-northd debug logs:
>> 
>> port binding "up" field update comes from SB:
>> 
>> 2022-07-05T12:17:25.445Z|00412|jsonrpc|DBG|unix:/home/ec2-user/ovn/tests/system-kmod-testsuite.dir/097/ovn-sb/ovn-sb.sock:
>>  received notification, method="update3", 
>> params=[["monid","OVN_Southbound"],"----",{"Port_Binding":{"ec0e94cc-f2d0-44de-8ad0-7fd62b7d50d6":{"modify":{"up":false,"chassis":["set",[]]]
>> 
>> next I print op->sb->up ? "UP" : "DOWN" and it remains "UP":
>> 
> 
> You need to do  :  (op->sb->n_up && op->sb->up[0]) ? "UP" : "DOWN".
> 
> Thanks
> Numan
> 
> 
> 
>> 2022-07-05T12:17:25.446Z|00414|northd|INFO|PORT: sw1-p1, pbid: 
>> ec0e94cc-f2d0-44de-8ad0-7fd62b7d50d6, PB: UP, lspid: 
>> aed98c29-03db-4761-b178-42f288a12692, LSP: UP, svc->logical_port: sw1-p1, 
>> svc->status: online
>> 
>> I guess this is a result of my misunderstanding of principle of incremental 
>> engine operation.
>> Can you help to get an idea of why port_binding structure has stale state 
>> and how to "pull changes" for it?
>> 
>> Regards,
>> Vladislav Odintsov
>> 
>>> On 4 Jul 2022, at 22:13, Numan Siddique  wrote:
>>> 
>>> On Mon, Jul 4, 2022 at 12:56 PM Vladislav Odintsov  
>>> wrote:
 
 Thanks Numan,
 
 would you have time to fix it or maybe give an idea how to do it, so I can 
 try?
>>> 
>>> It would be great if you want to give it a try.  I was thinking of 2
>>> possible approaches to fix this issue.
>>> Even though pinctrl.c sets the status to offline, it cannot set the
>>> service monitor to offline when ovn-controller releases the port
>>> binding.
>>> The first option is to handle this in binding.c as you suggested
>>> earlier.  Or ovn-northd can set the status to offline, if the service
>>> monitor's logical port
>>> is no longer claimed by any ovn-controller.  I'm more inclined to
>>> handle this in ovn-northd.   What do you think?
>>> 
>>> Thanks
>>> Numan
>>> 
 
 Regards,
 Vladislav Odintsov
 
> On 4 Jul 2022, at 19:51, Numan Siddique  wrote:
> 
> On Mon, Jul 4, 2022 at 7:48 AM Vladislav Odintsov  > wrote:
>> 
>> Hi,
>> 
>> we’ve found a wrong behaviour with service_monitor record status for a 
>> health-checked Load Balancer.
>> Its status can stay online forever even if virtual machine is stopped. 
>> This leads to load balanced traffic been sent to a dead backend.
>> 
>> Below is the script to reproduce the issue because I doubt about the 
>> correct place for a possible fix (my guess is it should be fixed in 
>> controller/binding.c in function binding_lport_set_down, but I’m not 
>> sure how this can affect VM live migration…):
>> 
>> # cat ./repro.sh
>> #!/bin/bash -x
>> 
>> ovn-nbctl ls-add ls1
>> ovn-nbctl lsp-add ls1 lsp1 -- \
>>  lsp-set-addresses lsp1 "00:00:00:00:00:01 192.168.0.10"
>> ovn-nbctl lb-add lb1 192.168.0.100:80 192.168.0.10:80
>> ovn-nbctl set Load_balancer lb1 
>> ip_port_mappings:192.168.0.10=lsp1:192.168.0.8
>> ovn-nbctl --id=@id create Load_Balancer_Health_Check 
>> vip='"192.168.0.100:80"' -- set Load_Balancer lb1 health_check=@id
>> ovn-nbctl ls-lb-add ls1 lb1
>> 
>> ovs-vsctl add-port br-int test-lb -- set interface test-lb type=internal 
>> external_ids:iface-id=lsp1
>> ip li set test-lb addr 00:00:00:00:00:01
>> ip a add 192.168.0.10/24 dev test-lb
>> ip li set test-lb up
>> 
>> # check service_monitor
>> ovn-sbctl list service_mon
>> 
>> # ensure state became offline
>> sleep 4
>> ovn-sbctl list service_mon
>> 
>> # start listen on :80 with netcat
>> ncat -k -l 192.168.0.10 80 &
>> 
>> # ensure state turned to online
>> sleep 4
>> ovn-sbctl list service_mon
>> 
>> # trigger binding release
>> ovs-vsctl remove interface test-lb external_ids iface-id
>> 
>> # ensure state remains online
>> sleep 10
>> ovn-sbctl list service_mon
>> 
>> # ensure OVS group and backend is still in bucket
>> ovs-ofctl dump-groups br-int | grep 192.168.0.10
> 
> 
> 

Re: [ovs-dev] [ovn] bug: load balancer health check status is not updated if port binding is released from chassis

2022-07-05 Thread Numan Siddique
On Tue, Jul 5, 2022 at 8:32 AM Vladislav Odintsov  wrote:
>
> Hi Numan,
>
> I’ve send a draft patch: 
> https://patchwork.ozlabs.org/project/ovn/patch/20220705122031.2568471-1-odiv...@gmail.com/
>
> While implementing I've encountered a problem where port binding record, 
> which I’m trying to get from struct ovn_port has stale state (it doesn’t 
> reflect "up" field changes, which I see in ovn-northd logs with an update and 
> I have no idea how to pull.
>
> From ovn-northd debug logs:
>
> port binding "up" field update comes from SB:
>
> 2022-07-05T12:17:25.445Z|00412|jsonrpc|DBG|unix:/home/ec2-user/ovn/tests/system-kmod-testsuite.dir/097/ovn-sb/ovn-sb.sock:
>  received notification, method="update3", 
> params=[["monid","OVN_Southbound"],"----",{"Port_Binding":{"ec0e94cc-f2d0-44de-8ad0-7fd62b7d50d6":{"modify":{"up":false,"chassis":["set",[]]]
>
> next I print op->sb->up ? "UP" : "DOWN" and it remains "UP":
>

You need to do  :  (op->sb->n_up && op->sb->up[0]) ? "UP" : "DOWN".

Thanks
Numan



> 2022-07-05T12:17:25.446Z|00414|northd|INFO|PORT: sw1-p1, pbid: 
> ec0e94cc-f2d0-44de-8ad0-7fd62b7d50d6, PB: UP, lspid: 
> aed98c29-03db-4761-b178-42f288a12692, LSP: UP, svc->logical_port: sw1-p1, 
> svc->status: online
>
> I guess this is a result of my misunderstanding of principle of incremental 
> engine operation.
> Can you help to get an idea of why port_binding structure has stale state and 
> how to "pull changes" for it?
>
> Regards,
> Vladislav Odintsov
>
> > On 4 Jul 2022, at 22:13, Numan Siddique  wrote:
> >
> > On Mon, Jul 4, 2022 at 12:56 PM Vladislav Odintsov  
> > wrote:
> >>
> >> Thanks Numan,
> >>
> >> would you have time to fix it or maybe give an idea how to do it, so I can 
> >> try?
> >
> > It would be great if you want to give it a try.  I was thinking of 2
> > possible approaches to fix this issue.
> > Even though pinctrl.c sets the status to offline, it cannot set the
> > service monitor to offline when ovn-controller releases the port
> > binding.
> > The first option is to handle this in binding.c as you suggested
> > earlier.  Or ovn-northd can set the status to offline, if the service
> > monitor's logical port
> > is no longer claimed by any ovn-controller.  I'm more inclined to
> > handle this in ovn-northd.   What do you think?
> >
> > Thanks
> > Numan
> >
> >>
> >> Regards,
> >> Vladislav Odintsov
> >>
> >>> On 4 Jul 2022, at 19:51, Numan Siddique  wrote:
> >>>
> >>> On Mon, Jul 4, 2022 at 7:48 AM Vladislav Odintsov  >>> > wrote:
> 
>  Hi,
> 
>  we’ve found a wrong behaviour with service_monitor record status for a 
>  health-checked Load Balancer.
>  Its status can stay online forever even if virtual machine is stopped. 
>  This leads to load balanced traffic been sent to a dead backend.
> 
>  Below is the script to reproduce the issue because I doubt about the 
>  correct place for a possible fix (my guess is it should be fixed in 
>  controller/binding.c in function binding_lport_set_down, but I’m not 
>  sure how this can affect VM live migration…):
> 
>  # cat ./repro.sh
>  #!/bin/bash -x
> 
>  ovn-nbctl ls-add ls1
>  ovn-nbctl lsp-add ls1 lsp1 -- \
>    lsp-set-addresses lsp1 "00:00:00:00:00:01 192.168.0.10"
>  ovn-nbctl lb-add lb1 192.168.0.100:80 192.168.0.10:80
>  ovn-nbctl set Load_balancer lb1 
>  ip_port_mappings:192.168.0.10=lsp1:192.168.0.8
>  ovn-nbctl --id=@id create Load_Balancer_Health_Check 
>  vip='"192.168.0.100:80"' -- set Load_Balancer lb1 health_check=@id
>  ovn-nbctl ls-lb-add ls1 lb1
> 
>  ovs-vsctl add-port br-int test-lb -- set interface test-lb type=internal 
>  external_ids:iface-id=lsp1
>  ip li set test-lb addr 00:00:00:00:00:01
>  ip a add 192.168.0.10/24 dev test-lb
>  ip li set test-lb up
> 
>  # check service_monitor
>  ovn-sbctl list service_mon
> 
>  # ensure state became offline
>  sleep 4
>  ovn-sbctl list service_mon
> 
>  # start listen on :80 with netcat
>  ncat -k -l 192.168.0.10 80 &
> 
>  # ensure state turned to online
>  sleep 4
>  ovn-sbctl list service_mon
> 
>  # trigger binding release
>  ovs-vsctl remove interface test-lb external_ids iface-id
> 
>  # ensure state remains online
>  sleep 10
>  ovn-sbctl list service_mon
> 
>  # ensure OVS group and backend is still in bucket
>  ovs-ofctl dump-groups br-int | grep 192.168.0.10
> >>>
> >>>
> >>> Thanks for the bug report.  I could reproduce it locally.  Looks to me
> >>> it should be fixed in pinctrl.c as it sets the service monitor status.
> >>>
> >>> I've also raised a bugzilla here -
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=2103740 
> >>> 
> >>>
> >>> Numan
> >>>
> 
> 
>  
>  Looking forward to hear any thoughts on this.

Re: [ovs-dev] [ovn] bug: load balancer health check status is not updated if port binding is released from chassis

2022-07-05 Thread Vladislav Odintsov
Hi Numan,

I’ve send a draft patch: 
https://patchwork.ozlabs.org/project/ovn/patch/20220705122031.2568471-1-odiv...@gmail.com/

While implementing I've encountered a problem where port binding record, which 
I’m trying to get from struct ovn_port has stale state (it doesn’t reflect "up" 
field changes, which I see in ovn-northd logs with an update and I have no idea 
how to pull.

From ovn-northd debug logs:

port binding "up" field update comes from SB:

2022-07-05T12:17:25.445Z|00412|jsonrpc|DBG|unix:/home/ec2-user/ovn/tests/system-kmod-testsuite.dir/097/ovn-sb/ovn-sb.sock:
 received notification, method="update3", 
params=[["monid","OVN_Southbound"],"----",{"Port_Binding":{"ec0e94cc-f2d0-44de-8ad0-7fd62b7d50d6":{"modify":{"up":false,"chassis":["set",[]]]

next I print op->sb->up ? "UP" : "DOWN" and it remains "UP":

2022-07-05T12:17:25.446Z|00414|northd|INFO|PORT: sw1-p1, pbid: 
ec0e94cc-f2d0-44de-8ad0-7fd62b7d50d6, PB: UP, lspid: 
aed98c29-03db-4761-b178-42f288a12692, LSP: UP, svc->logical_port: sw1-p1, 
svc->status: online

I guess this is a result of my misunderstanding of principle of incremental 
engine operation.
Can you help to get an idea of why port_binding structure has stale state and 
how to "pull changes" for it?

Regards,
Vladislav Odintsov

> On 4 Jul 2022, at 22:13, Numan Siddique  wrote:
> 
> On Mon, Jul 4, 2022 at 12:56 PM Vladislav Odintsov  wrote:
>> 
>> Thanks Numan,
>> 
>> would you have time to fix it or maybe give an idea how to do it, so I can 
>> try?
> 
> It would be great if you want to give it a try.  I was thinking of 2
> possible approaches to fix this issue.
> Even though pinctrl.c sets the status to offline, it cannot set the
> service monitor to offline when ovn-controller releases the port
> binding.
> The first option is to handle this in binding.c as you suggested
> earlier.  Or ovn-northd can set the status to offline, if the service
> monitor's logical port
> is no longer claimed by any ovn-controller.  I'm more inclined to
> handle this in ovn-northd.   What do you think?
> 
> Thanks
> Numan
> 
>> 
>> Regards,
>> Vladislav Odintsov
>> 
>>> On 4 Jul 2022, at 19:51, Numan Siddique  wrote:
>>> 
>>> On Mon, Jul 4, 2022 at 7:48 AM Vladislav Odintsov >> > wrote:
 
 Hi,
 
 we’ve found a wrong behaviour with service_monitor record status for a 
 health-checked Load Balancer.
 Its status can stay online forever even if virtual machine is stopped. 
 This leads to load balanced traffic been sent to a dead backend.
 
 Below is the script to reproduce the issue because I doubt about the 
 correct place for a possible fix (my guess is it should be fixed in 
 controller/binding.c in function binding_lport_set_down, but I’m not sure 
 how this can affect VM live migration…):
 
 # cat ./repro.sh
 #!/bin/bash -x
 
 ovn-nbctl ls-add ls1
 ovn-nbctl lsp-add ls1 lsp1 -- \
   lsp-set-addresses lsp1 "00:00:00:00:00:01 192.168.0.10"
 ovn-nbctl lb-add lb1 192.168.0.100:80 192.168.0.10:80
 ovn-nbctl set Load_balancer lb1 
 ip_port_mappings:192.168.0.10=lsp1:192.168.0.8
 ovn-nbctl --id=@id create Load_Balancer_Health_Check 
 vip='"192.168.0.100:80"' -- set Load_Balancer lb1 health_check=@id
 ovn-nbctl ls-lb-add ls1 lb1
 
 ovs-vsctl add-port br-int test-lb -- set interface test-lb type=internal 
 external_ids:iface-id=lsp1
 ip li set test-lb addr 00:00:00:00:00:01
 ip a add 192.168.0.10/24 dev test-lb
 ip li set test-lb up
 
 # check service_monitor
 ovn-sbctl list service_mon
 
 # ensure state became offline
 sleep 4
 ovn-sbctl list service_mon
 
 # start listen on :80 with netcat
 ncat -k -l 192.168.0.10 80 &
 
 # ensure state turned to online
 sleep 4
 ovn-sbctl list service_mon
 
 # trigger binding release
 ovs-vsctl remove interface test-lb external_ids iface-id
 
 # ensure state remains online
 sleep 10
 ovn-sbctl list service_mon
 
 # ensure OVS group and backend is still in bucket
 ovs-ofctl dump-groups br-int | grep 192.168.0.10
>>> 
>>> 
>>> Thanks for the bug report.  I could reproduce it locally.  Looks to me
>>> it should be fixed in pinctrl.c as it sets the service monitor status.
>>> 
>>> I've also raised a bugzilla here -
>>> https://bugzilla.redhat.com/show_bug.cgi?id=2103740 
>>> 
>>> 
>>> Numan
>>> 
 
 
 
 Looking forward to hear any thoughts on this.
 
 PS. don’t forget to kill ncat ;)
 
 
 
 Regards,
 Vladislav Odintsov
 ___
 dev mailing list
 d...@openvswitch.org 
 https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
 
>> 

Re: [ovs-dev] [ovn] bug: load balancer health check status is not updated if port binding is released from chassis

2022-07-04 Thread Numan Siddique
On Mon, Jul 4, 2022 at 12:56 PM Vladislav Odintsov  wrote:
>
> Thanks Numan,
>
> would you have time to fix it or maybe give an idea how to do it, so I can 
> try?

It would be great if you want to give it a try.  I was thinking of 2
possible approaches to fix this issue.
Even though pinctrl.c sets the status to offline, it cannot set the
service monitor to offline when ovn-controller releases the port
binding.
The first option is to handle this in binding.c as you suggested
earlier.  Or ovn-northd can set the status to offline, if the service
monitor's logical port
is no longer claimed by any ovn-controller.  I'm more inclined to
handle this in ovn-northd.   What do you think?

Thanks
Numan

>
> Regards,
> Vladislav Odintsov
>
> > On 4 Jul 2022, at 19:51, Numan Siddique  wrote:
> >
> > On Mon, Jul 4, 2022 at 7:48 AM Vladislav Odintsov  > > wrote:
> >>
> >> Hi,
> >>
> >> we’ve found a wrong behaviour with service_monitor record status for a 
> >> health-checked Load Balancer.
> >> Its status can stay online forever even if virtual machine is stopped. 
> >> This leads to load balanced traffic been sent to a dead backend.
> >>
> >> Below is the script to reproduce the issue because I doubt about the 
> >> correct place for a possible fix (my guess is it should be fixed in 
> >> controller/binding.c in function binding_lport_set_down, but I’m not sure 
> >> how this can affect VM live migration…):
> >>
> >> # cat ./repro.sh
> >> #!/bin/bash -x
> >>
> >> ovn-nbctl ls-add ls1
> >> ovn-nbctl lsp-add ls1 lsp1 -- \
> >>lsp-set-addresses lsp1 "00:00:00:00:00:01 192.168.0.10"
> >> ovn-nbctl lb-add lb1 192.168.0.100:80 192.168.0.10:80
> >> ovn-nbctl set Load_balancer lb1 
> >> ip_port_mappings:192.168.0.10=lsp1:192.168.0.8
> >> ovn-nbctl --id=@id create Load_Balancer_Health_Check 
> >> vip='"192.168.0.100:80"' -- set Load_Balancer lb1 health_check=@id
> >> ovn-nbctl ls-lb-add ls1 lb1
> >>
> >> ovs-vsctl add-port br-int test-lb -- set interface test-lb type=internal 
> >> external_ids:iface-id=lsp1
> >> ip li set test-lb addr 00:00:00:00:00:01
> >> ip a add 192.168.0.10/24 dev test-lb
> >> ip li set test-lb up
> >>
> >> # check service_monitor
> >> ovn-sbctl list service_mon
> >>
> >> # ensure state became offline
> >> sleep 4
> >> ovn-sbctl list service_mon
> >>
> >> # start listen on :80 with netcat
> >> ncat -k -l 192.168.0.10 80 &
> >>
> >> # ensure state turned to online
> >> sleep 4
> >> ovn-sbctl list service_mon
> >>
> >> # trigger binding release
> >> ovs-vsctl remove interface test-lb external_ids iface-id
> >>
> >> # ensure state remains online
> >> sleep 10
> >> ovn-sbctl list service_mon
> >>
> >> # ensure OVS group and backend is still in bucket
> >> ovs-ofctl dump-groups br-int | grep 192.168.0.10
> >
> >
> > Thanks for the bug report.  I could reproduce it locally.  Looks to me
> > it should be fixed in pinctrl.c as it sets the service monitor status.
> >
> > I've also raised a bugzilla here -
> > https://bugzilla.redhat.com/show_bug.cgi?id=2103740 
> > 
> >
> > Numan
> >
> >>
> >>
> >> 
> >> Looking forward to hear any thoughts on this.
> >>
> >> PS. don’t forget to kill ncat ;)
> >>
> >>
> >>
> >> Regards,
> >> Vladislav Odintsov
> >> ___
> >> dev mailing list
> >> d...@openvswitch.org 
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> >> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] bug: load balancer health check status is not updated if port binding is released from chassis

2022-07-04 Thread Vladislav Odintsov
Thanks Numan,

would you have time to fix it or maybe give an idea how to do it, so I can try?

Regards,
Vladislav Odintsov

> On 4 Jul 2022, at 19:51, Numan Siddique  wrote:
> 
> On Mon, Jul 4, 2022 at 7:48 AM Vladislav Odintsov  > wrote:
>> 
>> Hi,
>> 
>> we’ve found a wrong behaviour with service_monitor record status for a 
>> health-checked Load Balancer.
>> Its status can stay online forever even if virtual machine is stopped. This 
>> leads to load balanced traffic been sent to a dead backend.
>> 
>> Below is the script to reproduce the issue because I doubt about the correct 
>> place for a possible fix (my guess is it should be fixed in 
>> controller/binding.c in function binding_lport_set_down, but I’m not sure 
>> how this can affect VM live migration…):
>> 
>> # cat ./repro.sh
>> #!/bin/bash -x
>> 
>> ovn-nbctl ls-add ls1
>> ovn-nbctl lsp-add ls1 lsp1 -- \
>>lsp-set-addresses lsp1 "00:00:00:00:00:01 192.168.0.10"
>> ovn-nbctl lb-add lb1 192.168.0.100:80 192.168.0.10:80
>> ovn-nbctl set Load_balancer lb1 
>> ip_port_mappings:192.168.0.10=lsp1:192.168.0.8
>> ovn-nbctl --id=@id create Load_Balancer_Health_Check 
>> vip='"192.168.0.100:80"' -- set Load_Balancer lb1 health_check=@id
>> ovn-nbctl ls-lb-add ls1 lb1
>> 
>> ovs-vsctl add-port br-int test-lb -- set interface test-lb type=internal 
>> external_ids:iface-id=lsp1
>> ip li set test-lb addr 00:00:00:00:00:01
>> ip a add 192.168.0.10/24 dev test-lb
>> ip li set test-lb up
>> 
>> # check service_monitor
>> ovn-sbctl list service_mon
>> 
>> # ensure state became offline
>> sleep 4
>> ovn-sbctl list service_mon
>> 
>> # start listen on :80 with netcat
>> ncat -k -l 192.168.0.10 80 &
>> 
>> # ensure state turned to online
>> sleep 4
>> ovn-sbctl list service_mon
>> 
>> # trigger binding release
>> ovs-vsctl remove interface test-lb external_ids iface-id
>> 
>> # ensure state remains online
>> sleep 10
>> ovn-sbctl list service_mon
>> 
>> # ensure OVS group and backend is still in bucket
>> ovs-ofctl dump-groups br-int | grep 192.168.0.10
> 
> 
> Thanks for the bug report.  I could reproduce it locally.  Looks to me
> it should be fixed in pinctrl.c as it sets the service monitor status.
> 
> I've also raised a bugzilla here -
> https://bugzilla.redhat.com/show_bug.cgi?id=2103740 
> 
> 
> Numan
> 
>> 
>> 
>> 
>> Looking forward to hear any thoughts on this.
>> 
>> PS. don’t forget to kill ncat ;)
>> 
>> 
>> 
>> Regards,
>> Vladislav Odintsov
>> ___
>> dev mailing list
>> d...@openvswitch.org 
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
>> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] bug: load balancer health check status is not updated if port binding is released from chassis

2022-07-04 Thread Numan Siddique
On Mon, Jul 4, 2022 at 7:48 AM Vladislav Odintsov  wrote:
>
> Hi,
>
> we’ve found a wrong behaviour with service_monitor record status for a 
> health-checked Load Balancer.
> Its status can stay online forever even if virtual machine is stopped. This 
> leads to load balanced traffic been sent to a dead backend.
>
> Below is the script to reproduce the issue because I doubt about the correct 
> place for a possible fix (my guess is it should be fixed in 
> controller/binding.c in function binding_lport_set_down, but I’m not sure how 
> this can affect VM live migration…):
>
> # cat ./repro.sh
> #!/bin/bash -x
>
> ovn-nbctl ls-add ls1
> ovn-nbctl lsp-add ls1 lsp1 -- \
> lsp-set-addresses lsp1 "00:00:00:00:00:01 192.168.0.10"
> ovn-nbctl lb-add lb1 192.168.0.100:80 192.168.0.10:80
> ovn-nbctl set Load_balancer lb1 ip_port_mappings:192.168.0.10=lsp1:192.168.0.8
> ovn-nbctl --id=@id create Load_Balancer_Health_Check vip='"192.168.0.100:80"' 
> -- set Load_Balancer lb1 health_check=@id
> ovn-nbctl ls-lb-add ls1 lb1
>
> ovs-vsctl add-port br-int test-lb -- set interface test-lb type=internal 
> external_ids:iface-id=lsp1
> ip li set test-lb addr 00:00:00:00:00:01
> ip a add 192.168.0.10/24 dev test-lb
> ip li set test-lb up
>
> # check service_monitor
> ovn-sbctl list service_mon
>
> # ensure state became offline
> sleep 4
> ovn-sbctl list service_mon
>
> # start listen on :80 with netcat
> ncat -k -l 192.168.0.10 80 &
>
> # ensure state turned to online
> sleep 4
> ovn-sbctl list service_mon
>
> # trigger binding release
> ovs-vsctl remove interface test-lb external_ids iface-id
>
> # ensure state remains online
> sleep 10
> ovn-sbctl list service_mon
>
> # ensure OVS group and backend is still in bucket
> ovs-ofctl dump-groups br-int | grep 192.168.0.10


Thanks for the bug report.  I could reproduce it locally.  Looks to me
it should be fixed in pinctrl.c as it sets the service monitor status.

I've also raised a bugzilla here -
https://bugzilla.redhat.com/show_bug.cgi?id=2103740

Numan

>
>
> 
> Looking forward to hear any thoughts on this.
>
> PS. don’t forget to kill ncat ;)
>
>
>
> Regards,
> Vladislav Odintsov
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovn] bug: load balancer health check status is not updated if port binding is released from chassis

2022-07-04 Thread Vladislav Odintsov
Hi,

we’ve found a wrong behaviour with service_monitor record status for a 
health-checked Load Balancer.
Its status can stay online forever even if virtual machine is stopped. This 
leads to load balanced traffic been sent to a dead backend.

Below is the script to reproduce the issue because I doubt about the correct 
place for a possible fix (my guess is it should be fixed in 
controller/binding.c in function binding_lport_set_down, but I’m not sure how 
this can affect VM live migration…):

# cat ./repro.sh
#!/bin/bash -x

ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 lsp1 -- \
lsp-set-addresses lsp1 "00:00:00:00:00:01 192.168.0.10"
ovn-nbctl lb-add lb1 192.168.0.100:80 192.168.0.10:80
ovn-nbctl set Load_balancer lb1 ip_port_mappings:192.168.0.10=lsp1:192.168.0.8
ovn-nbctl --id=@id create Load_Balancer_Health_Check vip='"192.168.0.100:80"' 
-- set Load_Balancer lb1 health_check=@id
ovn-nbctl ls-lb-add ls1 lb1

ovs-vsctl add-port br-int test-lb -- set interface test-lb type=internal 
external_ids:iface-id=lsp1
ip li set test-lb addr 00:00:00:00:00:01
ip a add 192.168.0.10/24 dev test-lb
ip li set test-lb up

# check service_monitor
ovn-sbctl list service_mon

# ensure state became offline
sleep 4
ovn-sbctl list service_mon

# start listen on :80 with netcat
ncat -k -l 192.168.0.10 80 &

# ensure state turned to online
sleep 4
ovn-sbctl list service_mon

# trigger binding release
ovs-vsctl remove interface test-lb external_ids iface-id

# ensure state remains online
sleep 10
ovn-sbctl list service_mon

# ensure OVS group and backend is still in bucket
ovs-ofctl dump-groups br-int | grep 192.168.0.10



Looking forward to hear any thoughts on this.

PS. don’t forget to kill ncat ;)



Regards,
Vladislav Odintsov
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev