On Fri, Oct 13, 2023 at 4:28 AM Eelco Chaudron <echau...@redhat.com> wrote:
>
>
>
> On 8 Oct 2023, at 7:26, Mike Pattrick wrote:
>
> > Currently a bond will not always revalidate when an active member
> > changes. This can result in counter-intuitive behaviors like the fact
> > that using ovs-appctl bond/set-active-member will cause the bond to
> > revalidate but changing other_config:bond-primary will not trigger a
> > revalidate in the bond.
> >
> > When revalidation is not set but the active member changes in an
> > unbalanced bond, OVS may send traffic out of previously active member
> > instead of the new active member.
> >
> > This change will always mark unbalanced bonds for revalidation if the
> > active member changes.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
> > Signed-off-by: Mike Pattrick <m...@redhat.com>
>
> Hi Mike,
>
> Thanks for the patch. Some comments below, and the subject needs to end with 
> a dot.
>
> //Eelco
>
>
> > ---
> > v2: Added a test
> > v3: Made the test more reliable
> > ---
> >  ofproto/bond.c          |  8 +++++--
> >  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 56 insertions(+), 2 deletions(-)
> >
> > diff --git a/ofproto/bond.c b/ofproto/bond.c
> > index cfdf44f85..fb108d30a 100644
> > --- a/ofproto/bond.c
> > +++ b/ofproto/bond.c
> > @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond 
> > *, bool force)
> >  static bool bond_is_falling_back_to_ab(const struct bond *);
> >  static void bond_add_lb_output_buckets(const struct bond *);
> >  static void bond_del_lb_output_buckets(const struct bond *);
> > +static bool bond_is_balanced(const struct bond *bond) 
> > OVS_REQ_RDLOCK(rwlock);
> >
> >
> >  /* Attempts to parse 's' as the name of a bond balancing mode.  If 
> > successful,
> > @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, 
> > const struct eth_addr mac)
> >
> >  static void
> >  bond_active_member_changed(struct bond *bond)
> > +    OVS_REQ_WRLOCK(rwlock)
> >  {
> >      if (bond->active_member) {
> >          struct eth_addr mac;
> >          netdev_get_etheraddr(bond->active_member->netdev, &mac);
> >          bond->active_member_mac = mac;
> > +        if (!bond_is_balanced(bond)) {
> > +            bond->bond_revalidate = true;
> > +        }
> >      } else {
> >          bond->active_member_mac = eth_addr_zero;
> >      }
> > @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, 
> > uint32_t *recirc_id,
> >  /* Rebalancing. */
> >
> >  static bool
> > -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
>
> Any reason why the annotation was removed? clang would be smart enough to see 
> that a write lock is fine for a readlock. It seems to compile fine here. Or 
> were you trying to clean this up and move it to the next line :)

The annotation moved up to the prototype above.

>
> > +bond_is_balanced(const struct bond *bond)
> >  {
> >      return bond->rebalance_interval
> >          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
> > @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn 
> > *conn,
> >      }
> >
> >      if (bond->active_member != member) {
> > -        bond->bond_revalidate = true;
> >          bond->active_member = member;
> >          VLOG_INFO("bond %s: active member is now %s",
> >                    bond->name, member->name);
> > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > index 945037ec0..7075c35ea 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 
> > -w 2 10.1.1.2 | FORMAT_PING
> >  OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
> > +AT_SETUP([datapath - ping over active-backup bond])
>
> This does not really represent the test case. What about ‘datapath - bond 
> active-backup failover’
>
> > +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "priority=1 actions=drop"])
> > +AT_CHECK([ovs-ofctl add-flow br1 "priority=1 actions=drop"])
> > +AT_CHECK([ovs-ofctl add-flow br0 "ip,icmp actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br1 "ip,icmp actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br0 "arp actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br1 "arp actions=normal"])
>
> Do we need this ‘complex’ set of rules, will the following not work?
>
> AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
>
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
> > +on_exit 'ip link del link0a'
> > +on_exit 'ip link del link0b'
> > +AT_CHECK([ip link add link0a type veth peer name link1a])
> > +AT_CHECK([ip link add link0b type veth peer name link1b])
> > +
> > +AT_CHECK([ip link set dev link0a up])
> > +AT_CHECK([ip link set dev link1a up])
> > +AT_CHECK([ip link set dev link0b up])
> > +AT_CHECK([ip link set dev link1b up])
> > +
> > +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b 
> > bond_mode=active-backup])
> > +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b 
> > bond_mode=active-backup])
> > +
> > +dnl Set primary
>
> dnl Set the primary bond member.
>
>
> > +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
> > +                    set port bond1 other_config:bond-primary=link1a])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.3 -w 6 10.1.1.2 | FORMAT_PING], 
> > [0], [dnl
> > +6 packets transmitted, 6 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Change primary
>
> dnl Change the primary bond member.
>
> > +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
> > +                    set port bond1 other_config:bond-primary=link1b])
> > +
> > +sleep 0
>
> This short sleep makes it work, however, it could result in a flaky test. Is 
> there something else we could wait for to be 100% sure we are ready?
>
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 6 -i 0.5 -w 6 10.1.1.2 | FORMAT_PING], 
> > [0], [dnl
> > +6 packets transmitted, 6 received, 0% packet loss, time 0ms
> > +])
>
> Should we not add some code to verify traffic goes over the correct link?
>
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> >  AT_SETUP([datapath - ping over vxlan tunnel])
> >  OVS_CHECK_TUNNEL_TSO()
> >  OVS_CHECK_VXLAN()
>
>
> Can you also check this test on all datapaths? It’s failing for me once out 
> of ten on userspace;
>
>
> @@ -1,2 +1,2 @@
> -6 packets transmitted, 6 received, 0% packet loss, time 0ms
> +7 packets transmitted, 6 received, 14.2857% packet loss, time 0ms
>
> sudo make check-kernel TESTSUITEFLAGS="-k 'ping over active-backup bond'" &&  
> sudo make check-offloads TESTSUITEFLAGS="-k 'ping over active-backup bond'" 
> &&  sudo make check-system-userspace TESTSUITEFLAGS="-k 'ping over 
> active-backup bond'"

I've changed the test and ran it 100 times with and without the patch
with and without false positives or false negatives, so I'm hoping the
latest revision is a lot more reliable for you.

The other changes make sense so you should see them in the next
version of the patch.


Thanks for the review,
Mike

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to