On Wed, Apr 20, 2022 at 2:46 AM Ales Musil <amu...@redhat.com> wrote:
> > > On Wed, Apr 20, 2022 at 8:14 AM Ales Musil <amu...@redhat.com> wrote: > >> Thank you for the review, see my reply below. >> >> On Tue, Apr 19, 2022 at 9:26 PM Ihar Hrachyshka <ihrac...@redhat.com> >> wrote: >> >>> Thanks for the patch. Two comments for the test case below. >>> >> >>> On Tue, Apr 19, 2022 at 1:12 AM Ales Musil <amu...@redhat.com> wrote: >>> > >>> > The GARP was sent even on chassis that were not serving >>> > as l3gw for the specified router. This could lead to race >>> > on the physical network when multiple chassis send the same >>> > ARP but the physical interfaces have different port numbers >>> > on the external bridge. >>> > >>> > Add check to filter out port_bindings for l3gw that >>> > are sitting on different chassis. >>> > >>> > Reported-at: https://bugzilla.redhat.com/2062580 >>> > Signed-off-by: Ales Musil <amu...@redhat.com> >>> > --- >>> > controller/pinctrl.c | 2 +- >>> > tests/ovn.at | 106 >>> +++++++++++++++++++++++++++++++++++++++++++ >>> > 2 files changed, 107 insertions(+), 1 deletion(-) >>> > >>> > diff --git a/controller/pinctrl.c b/controller/pinctrl.c >>> > index 25b37ee88..42d1c2a46 100644 >>> > --- a/controller/pinctrl.c >>> > +++ b/controller/pinctrl.c >>> > @@ -5554,7 +5554,7 @@ get_localnet_vifs_l3gwports( >>> > sbrec_port_binding_index_set_datapath(target, ld->datapath); >>> > SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target, >>> > >>> sbrec_port_binding_by_datapath) { >>> > - if (!strcmp(pb->type, "l3gateway") >>> > + if ((!strcmp(pb->type, "l3gateway") && pb->chassis == >>> chassis) >>> > || !strcmp(pb->type, "patch")) { >>> > sset_add(local_l3gw_ports, pb->logical_port); >>> > } >>> > diff --git a/tests/ovn.at b/tests/ovn.at >>> > index f9551b843..a02a9dab2 100644 >>> > --- a/tests/ovn.at >>> > +++ b/tests/ovn.at >>> > @@ -8707,6 +8707,112 @@ OVN_CLEANUP([hv1]) >>> > AT_CLEANUP >>> > ]) >>> > >>> > +OVN_FOR_EACH_NORTHD([ >>> > +AT_SETUP([send gratuitous arp for l3gateway only on selected chassis]) >>> > +ovn_start >>> > + >>> > +# Create logical switch >>> > +ovn-nbctl ls-add ls0 >>> > +# Create gateway router >>> > +ovn-nbctl lr-add lr0 >>> > +# Add router port to gateway router >>> > +ovn-nbctl lrp-add lr0 lr0-ls0 f0:00:00:00:00:01 192.168.0.1/24 >>> > +ovn-nbctl lsp-add ls0 ls0-lr0 -- set Logical_Switch_Port ls0-lr0 \ >>> > + type=router options:router-port=lr0-ls0 >>> addresses='"f0:00:00:00:00:01"' >>> > + >>> > +# Create a localnet port. >>> > +ovn-nbctl lsp-add ls0 ln_port >>> > +ovn-nbctl lsp-set-addresses ln_port unknown >>> > +ovn-nbctl lsp-set-type ln_port localnet >>> > +ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1 >>> > + >>> > +# Prepare packets >>> > +touch empty_expected >>> > +echo >>> "fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001" >>> > arp_expected >>> > + >>> > +net_add n1 >>> > +sim_add hv1 >>> > +as hv1 >>> > +ovs-vsctl \ >>> > + -- add-br br-phys \ >>> > + -- add-br br-eth0 >>> > + >>> > +ovn_attach n1 br-phys 192.168.0.10 >>> > + >>> > +AT_CHECK([ovs-vsctl set Open_vSwitch . >>> external-ids:ovn-bridge-mappings=physnet1:br-eth0]) >>> > +AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface >>> snoopvif options:tx_pcap=hv1/snoopvif-tx.pcap >>> options:rxq_pcap=hv1/snoopvif-rx.pcap]) >>> > + >>> > +sim_add hv2 >>> > +as hv2 >>> > +ovs-vsctl \ >>> > + -- add-br br-phys \ >>> > + -- add-br br-eth0 >>> > + >>> > +ovn_attach n1 br-phys 192.168.0.20 >>> > + >>> > +AT_CHECK([ovs-vsctl set Open_vSwitch . >>> external-ids:ovn-bridge-mappings=physnet1:br-eth0]) >>> > +AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface >>> snoopvif options:tx_pcap=hv2/snoopvif-tx.pcap >>> options:rxq_pcap=hv2/snoopvif-rx.pcap]) >>> > + >>> > +ovn-sbctl dump-flows > sbflows >>> > +AT_CAPTURE_FILE([sbflows]) >>> > + >>> > +# Wait until the patch ports are created in hv1 and hv2 to connect >>> br-int to br-eth0 >>> > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1]) >>> > +OVS_WAIT_UNTIL([test 1 = `as hv1 ovs-vsctl show | \ >>> > +grep "Port patch-br-int-to-ln_port" | wc -l`]) >>> > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv2]) >>> > +OVS_WAIT_UNTIL([test 1 = `as hv2 ovs-vsctl show | \ >>> > +grep "Port patch-br-int-to-ln_port" | wc -l`]) >>> > + >>> > +# Temporarily remove lr0 chassis >>> > +AT_CHECK([ovn-nbctl remove logical_router lr0 options chassis]) >>> > + >>> > +reset_pcap_file() { >>> > + local hv=$1 >>> > + local iface=$2 >>> > + local pcap_file=$3 >>> > + as $hv >>> > + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ >>> > +options:rxq_pcap=dummy-rx.pcap >>> > + rm -f ${pcap_file}*.pcap >>> > + ovs-vsctl -- set Interface $iface >>> options:tx_pcap=${pcap_file}-tx.pcap \ >>> > +options:rxq_pcap=${pcap_file}-rx.pcap >>> > +} >>> > + >>> > +reset_pcap_file hv1 snoopvif hv1/snoopvif >>> > +reset_pcap_file hv2 snoopvif hv2/snoopvif >>> > + >>> > +hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1) >>> > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1]) >>> > +OVS_WAIT_UNTIL([ >>> > + ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding >>> ls0-lr0) >>> > + test "$ls0_lr0" = $hv1_uuid >>> > +]) >>> >>> Since GARPs are periodic, you should give controllers some time (with >>> a sleep?) to produce the GARPs. >>> >> >> It shouldn't be needed as the OVN_CHECK_PACKETS is waiting for it. >> On the other hand in the spirit of potential flakiness it is probably a >> good idea. >> >> >>> >>> > + >>> > +OVN_CHECK_PACKETS([hv1/snoopvif-tx.pcap], [arp_expected]) >>> > +OVN_CHECK_PACKETS([hv2/snoopvif-tx.pcap], [empty_expected]) >>> > + >>> > +# Temporarily remove lr0 chassis >>> > +AT_CHECK([ovn-nbctl remove logical_router lr0 options chassis]) >>> > + >>> > +reset_pcap_file hv1 snoopvif hv1/snoopvif >>> > +reset_pcap_file hv2 snoopvif hv2/snoopvif >>> > + >>> > +hv2_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv2) >>> > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv2]) >>> > +OVS_WAIT_UNTIL([ >>> > + ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding >>> ls0-lr0) >>> > + test "$ls0_lr0" = $hv2_uuid >>> > +]) >>> >>> This test case is flaky. If you put "sleep 3" here, you'll see that >>> GAPR is issued on both interfaces. This is because the code for l3gw >>> is affected by the same issue as fixed for regular ports in: >>> >>> https://patchwork.ozlabs.org/project/ovn/patch/20220407025822.3861491-1-ihrac...@redhat.com/ >>> >>> Namely, when the l3gw chassis is changed from hv1 to hv2, hv1 never >>> gets the port unregistered from the queue of periodic GARPs. Strictly >>> speaking, it is not the bug that you try to fix, but since you add a >>> test case that switches chassis for a port, I think the fix should be >>> included here. (Alternatively, the test should be split in two parts - >>> one just setting a chassis and checking if only one of them observes >>> GARPs; and the other part updating chassis and checking that the old >>> chassis stops issuing GARPs.) >>> >> >> Didn't know that, however when I was testing it I did not come across the >> GARP being still sent >> over the old interface. Maybe I have not waited long enough. >> >> Anyway if I understand that right, you suggest also fix from your patch >> and leaving the test as is? >> Problematic part is that we always need set both chassis to setup the >> patch ports, otherwise we would >> not see the garp on the second one. That being said I think the best >> approach would be to squash the >> fixes and leave this test since it tests both cases WDYT? >> > > Actually looking through the code this scenario cannot hit it, even if I > add sleep. In my test case > the send_garp_rarp_delete code is called in the for each earlier, > pinctrl.c:5771. So those cases are different, > I'll add the sleep to avoid any flakiness but IMO those patches can be > separated. > > You're right, sorry, my fault. Indeed since you filter out ports that don't belong to the chassis before sset_add, you don't need to explicitly delete it later. (I should do the same in my patch.) The test changes - sleep to give chassis time to produce a GARP in case of "empty" check + _PACKETS_CONTAIN to accommodate for multiple GARPs generated - stand. Thank you. > >> >>> >>> > + >>> > +OVN_CHECK_PACKETS([hv2/snoopvif-tx.pcap], [arp_expected]) >>> > +OVN_CHECK_PACKETS([hv1/snoopvif-tx.pcap], [empty_expected]) >>> > + >>> > +OVN_CLEANUP([hv1],[hv2]) >>> > + >>> > +AT_CLEANUP >>> > +]) >>> > + >>> > OVN_FOR_EACH_NORTHD([ >>> > AT_SETUP([send gratuitous arp with nat-addresses router in localnet]) >>> > ovn_start >>> > -- >>> > 2.35.1 >>> > >>> > _______________________________________________ >>> > dev mailing list >>> > d...@openvswitch.org >>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> > >>> >>> >> >> -- >> >> Ales Musil >> >> Senior Software Engineer - RHV Network >> >> Red Hat EMEA <https://www.redhat.com> >> >> amu...@redhat.com IM: amusil >> <https://red.ht/sig> >> > > > -- > > Ales Musil > > Senior Software Engineer - RHV Network > > Red Hat EMEA <https://www.redhat.com> > > amu...@redhat.com IM: amusil > <https://red.ht/sig> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev