On 8/1/25 4:26 AM, chenyongch...@cestc.cn wrote:
> Hi all:
>        Thank you for your attention to this issue.
> 
>        Initially, we noticed that bond_shift_load triggered  
> bond->bond_revalidate = true,  which then led to   bond_reconfigure -> 
> bond_entry_reset    being executed. This ultimately caused 
> USERSPACE_INVALID_PORT_DROP to occur.
>        We believe that no packet loss should happen during traffic load 
> balancing.
> 
>        Later, we discovered that changes to the bond-rebalance-interval 
> setting could also lead to the same issue, which is why we reported the 
> problem.
> 
>       In fact, we are more concerned about the packet drops caused by 
> bond_shift_load. If the packet loss due to bond-rebalance-interval changes is 
> truly unavoidable, we can accept that.
>       Apologies for not describing the issue clearly enough earlier.

No worries.  Thanks for the clarification!

> 
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> chenyongch...@cestc.cn
> 
>      
>     *发件人:* Ilya Maximets <mailto:i.maxim...@ovn.org>
>     *发送时间:* 2025-08-01 04:48
>     *收件人:* Aaron Conole <mailto:acon...@redhat.com>
>     *抄送:* i.maximets <mailto:i.maxim...@ovn.org>; dev 
> <mailto:d...@openvswitch.org>; mhou <mailto:m...@redhat.com>; chenyongchang 
> <mailto:chenyongch...@cestc.cn>; Adrian Moreno <mailto:amore...@redhat.com>
>     *主题:* Re: [ovs-dev] [PATCH] bond: Do not flag a rebalance when adjusting 
> time.【请注意,邮件由i.maximets....@gmail.com代发】
>     On 7/31/25 8:41 PM, Aaron Conole wrote:
>     > Ilya Maximets <i.maxim...@ovn.org> writes:
>     >
>     >> On 7/30/25 4:47 PM, Aaron Conole via dev wrote:
>     >>> When adjusting bond parameters, any adjustment is considered 
> sufficient
>     >>> for triggering a rebalance.  This is a very simplistic config update
>     >>> scheme that triggers complete rebalancing even if the time adjustment
>     >>> would move the next expiration out beyond the last calculated 
> expiration.
>     >>>
>     >>> For the interval parameter only, we can simply recalculate the expiry
>     >>> deadline and let the next bond_run() event do the rebalance if needed.
>     >>> Even if the recalculation would cause the deadline to have occurred in
>     >>> the past, it should execute on the next bond_run() anyway.  This is
>     >>> still okay, as the rebalance interval timeout may not result in a
>     >>> full rebalance anyway.
>     >>>
>     >>> Reported-at: 
> https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg10409.html
>     >>> Signed-off-by: Aaron Conole <acon...@redhat.com>
>     >>> ---
>     >>>  ofproto/bond.c | 8 +++++++-
>     >>>  1 file changed, 7 insertions(+), 1 deletion(-)
>     >>>
>     >>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>     >>> index 3859ddca08..86e21607e5 100644
>     >>> --- a/ofproto/bond.c
>     >>> +++ b/ofproto/bond.c
>     >>> @@ -459,8 +459,14 @@ bond_reconfigure(struct bond *bond, const struct 
> bond_settings *s)
>     >>>      }
>     >>> 
>     >>>      if (bond->rebalance_interval != s->rebalance_interval) {
>     >>> +        /* Recompute the next rebalance interval by moving the 
> next_rebalance
>     >>> +         * to be offset by the new interval.  Then let the rebalance 
> code
>     >>> +         * trigger a rebalance based on the new details.
>     >>
>     >> This sentence doesn't seem necessary.
>     >
>     > Removed.
>     >
>     >>>  In this case, if
>     >>> +         * all that was updated is the rebalance interval, we can 
> skip
>     >>> +         * triggering the rest of the port reconfigure mechanism. */
>     >>
>     >> s/reconfigure mechanism/reconfiguration/
>     >
>     > Ack.
>     >
>     >>> +        int old_start_time = bond->next_rebalance - 
> bond->rebalance_interval;
>     >>>          bond->rebalance_interval = s->rebalance_interval;
>     >>> -        revalidate = true;
>     >>> +        bond->next_rebalance = old_start_time + 
> bond->rebalance_interval;
>     >>>      }
>     >>
>     >> We should likely still revalidate when the value goes to or from zero,
>     >> because that impacts if the bond considered balanced or not.  Even if
>     >> that doesn't cause any problems today, it will inevitably be forgotten
>     >> in the future.
>     >
>     > That makes sense to me - I'll add a case for it.
>     >
>     >> Also, the subtraction may technically overflow if the next_rebalance is
>     >> not initialized.  E.g. if it is not initialized yet when the bond is
>     >> going from active-backup to balance-tcp.  Which may be fine, given 
> those
>     >> are signed integers, but still doesn't feel right.
>     >
>     > I'm trying to envision a scenario that would make this happen.  For
>     > example, it gets reset when going from ab -> balance-{tcp,slb}.  And
>     > that will be reconfigured later in this very code path as well (since we
>     > will call the bond_entry_reset if the config got changed for any config
>     > option other than the interval).  I can put a guard on !next_rebalance
>     > and tell it to trigger revalidation then (since the very first execution
>     > or something should be initialized to '0' and obviously that could
>     > probably find a trigger).  But still struggling to see a place where
>     > changing bond mode won't also force the whole thing to be reinitialized
>     > in bond_entry_reset() since revalidate will be 'true' via other means.
>      
>     It doesn't cause any issues, I agree, but it just didn't feel right. :)
>     Feel free to keep this code as-is.
>      
>     >
>     >> And this patch needs a Fixes tag.
>     >
>     > I didn't put one because I'm not sure if it's really a bugfix or an
>     > improvement.  If you consider it one, I'll add the fixes tag.
>      
>     Looking closer to the code, there are two separate issues here:
>      
>     1. Traffic is re-shuffled among bond members on the rebalance interval
>        change.
>      
>     2. Traffic is being dropped for some time after the bond is reconfigured
>        until the next bond_run().  This happens if other config options
>        are changed as well.
>      
>     And, I think, the latter is what the original reporter was concerned 
> about.
>     This is a behavior introduced in:
>      
>       586adfd047cb ("bond: Avoid deadlock while updating post recirculation 
> rules.")
>      
>     Before that change we would not update post-recirc rules while the
>     configuration is cleared and not re-constructed yet.  But after the
>     586adfd047cb fix, we're now calling the update after we zeroed out the
>     configuration, effectively just removing all the post recirculation rules
>     or the lb-output mappings.
>      
>     Saying that, maybe the real fix is to use bond_update_post_recirc_rules__
>     with force instead of direct call to update_recirc_rules().  That will
>     ensure that the configuration is not blank and we actually have real
>     post-recirc rules / lb-output mapping at all times.  I do not remember
>     why I went with the direct call in that fix though.
>      
>     What do you think?
>      
>     Not re-setting the config on timeout change is still needed, as that
>     avoids re-shuffling the traffic among bond members (issue 1).
>      
>     The behavior in point 1 was always this way, AFAIU, so that can be treated
>     as just an enhancement, I suppose.
>      
>     Best regards, Ilya Maximets.
>      
>      
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to