On 2/9/24 08:06, Adrian Moreno wrote:
> In order to properly balance bond traffic, ofproto/bond periodically
> reads usage statistics of the post-recirculation rules (which are added
> to a hidden internal table).
> 
> To do that, each "struct bond_entry" (which represents a hash within a
> bond) stores the last seen statistics for its rule. When a hash is moved
> to another member (due to a bond rebalance or the previous member going
> down), the rule is typically just modified, i.e: same match different
> actions. In this case, statistics are preserved and accounting continues
> to work.
> 
> However, if the rule gets completely deleted (e.g: when all bond members
> go down) and then re-created, the new rule will have 0 tx_bytes but its
> associated entry will still store a non-zero last-seen value.
> This situation leads to an overflow of the delta calculation (computed
> as [current_stats_value - last_seen_value]), which can affect traffic
> as the hash will be considered to carry a lot of traffic and rebalancing
> will kick in.
> 
> In order to fix this situation, reset the value of last seen statistics
> on rule deletion.
> 
> Implementation note:
> The field ((struct bond_entry *) -> pr_tx_bytes) is guarded by rwlock
> but, as the comment above the function indicates, it's only called
> without having locked it when the bond is being deleted. Given that
> bond->hash is free()ed before deleting the rules (which makes writing to
> the entries a use-after-free anyway), we can use that to avoid
> double-locking.

clang is not happy about this.  All the clang jobs failed in CI.
We have to either remove the guarding annotation or just take the
lock in the unref() function.  The latter probably makes more sense.
Is there a good reason to not take the lock there?

> 
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/319
> Signed-off-by: Adrian Moreno <amore...@redhat.com>
> ---
>  ofproto/bond.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/ofproto/bond.c b/ofproto/bond.c
> index cfdf44f85..ad56bb96b 100644
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -407,6 +407,10 @@ update_recirc_rules__(struct bond *bond)
>  
>                  VLOG_ERR("failed to remove post recirculation flow %s", 
> err_s);
>                  free(err_s);
> +            } else if (bond->hash) {
> +                uint32_t hash = pr_op->hmap_node.hash & BOND_MASK;
> +                struct bond_entry *entry = &bond->hash[hash];
> +                entry->pr_tx_bytes = 0;

A few lines below we will be leaning up the rule pointer regardless
of the error.  Should this code be moved there?  Potentially re-ordered
with hmap_remove to be sure that the hash is kept intact?

>              }
>  
>              hmap_remove(&bond->pr_rule_ops, &pr_op->hmap_node);

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to