On 17.03.2025 22:26, Ilya Maximets wrote:
> On 3/17/25 16:56, Mike Pattrick wrote:
>> On Fri, Mar 14, 2025 at 4:11 PM Vladislav Odintsov <[email protected]>
>> wrote:
>>> With this commit the rebalance of hash entries between bonding members
>>> becomes less frequent. If we know all bond members' speed, we do not
>>> move hash entries between them if load difference is less than 1.5% of
>>> minimum link speed of bond members.
>>>
>>> Reported-at:
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-March/422028.html
>>> Suggested-by: Ilya Maximets <[email protected]>
>>> Signed-off-by: Vladislav Odintsov <[email protected]>
>>> ---
> Hi, Vladislav. Thanks for the patch!
>
> As Eelco mentioned, this will need an update in the documentation as well.
> In particular in Documentation/topics/bonding.rst.
>
> May also be good to add NEWS entry, since it's a user-visible change.
Hi Eelco, Mike and Ilya,
thanks for your time and comments & reviews!
I'll address most of the comments in v2. I've got some comments inline. PSB.
>
>>> ofproto/bond.c | 11 +++++++++--
>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>> index 17bf10be5..70219db89 100644
>>> --- a/ofproto/bond.c
>>> +++ b/ofproto/bond.c
>>> @@ -1336,6 +1336,7 @@ bond_rebalance(struct bond *bond)
>>> struct ovs_list bals;
>>> bool rebalanced = false;
>>> bool use_recirc;
>>> + uint32_t min_member_mbps = UINT32_MAX;
> Reverse x-mass tree, please. It's not perfect here, but we shouldn't
> make it worse.
Ack.
>
>>> ovs_rwlock_wrlock(&rwlock);
>>> if (!bond_is_balanced(bond) || time_msec() < bond->next_rebalance) {
>>> @@ -1378,7 +1379,11 @@ bond_rebalance(struct bond *bond)
>>> ovs_list_init(&bals);
>>> HMAP_FOR_EACH (member, hmap_node, &bond->members) {
>>> if (member->enabled) {
>>> + uint32_t mbps;
>>> +
>>> insert_bal(&bals, member);
>>> + netdev_get_speed(member->netdev, &mbps, NULL);
>> This function may return an error. And also it may not return an error
>> but still set mbps to zero. We should probably discard the value in
>> those cases.
> It's not really necessary to check the result of this function here.
> If it failed, we'll just get the perdefined minimum value in the MAX
> condition below. We do similar thing in sFlow processing, for example.
>
> But I agree that it makes sense to add a comment about this.
Ack.
>
>>> + min_member_mbps = MIN(mbps, min_member_mbps);
>>> }
>>> }
>>
>> Should probably also have some sane default if all the calls to
>> netdev_get_speed failed and min_member_mbps is still UINT32_MAX.
> In the current implementation, it can't end up to still be UINT32_MAX,
> unless there are no active members. But in that case the load shift
> below will also not run, as the list will be empty.
>
>>
>>> log_bals(bond, &bals);
>>> @@ -1392,10 +1397,12 @@ bond_rebalance(struct bond *bond)
>>> uint64_t overload;
>>>
>>> overload = from->tx_bytes - to->tx_bytes;
>>> - if (overload < to->tx_bytes >> 5 || overload < 100000) {
>>> + if (overload < to->tx_bytes >> 5 ||
>>> + overload / 1000000 < MAX(1, min_member_mbps / 8 >> 6)) {
> Overload is in bytes here, so the left side is in MBps (not Mbps).
> On the right side we have the link speed converted to MBps as well
> and then shifted to get ~1.5%. So, the compared values are of the
> same order, which is good. But the MAX is taken from 1 and it means
> that this 1 is 1 MBps, and not 1 Mbps as it was before.
>
> What we could do:
>
> 1. overload * 8 / 1000000 < MAX(1, min_member_mbps >> 6)
> That will keep the calculations in bits and will match the old
> behavior in case the link speed is not available or too low.
> It's likely not possible to overflow the overload here.
>
> 2. Keep the calculations in bytes as in this patch. This corresponds
> to a link speed of 512 Mbps ((1 << 6) * 8), which maybe a little
> too high. 100 Mbps links are still common on smaller devices.
I can switch to using bits instead of bytes [1] to save old behavior
with 1Mbit/s, but is it really a significant change if we switch 1Mb to
1MB (1% vs 1,048576% for the 100Mbit/s link)?
Also, when I changed this code I noticed some strange comparison logic
(at least for me): in the left side we have total amount of data in
bytes transferred in last rebalance interval (default 10s). And on the
right side we compare it with % of a link speed in bytes *per second*.
It that correct? Shouldn't we normalize left side and divide it by
rebalance interval to compare data amount per second with data amount
per second? This is why I changed 1Mbps to 1MB (without ps) in the
comment below.
>
>>> /* The extra load on 'from' (and all less-loaded members),
>>> compared
>>> * to that of 'to' (the least-loaded member), is less than
>>> ~3%, or
>>> - * it is less than ~1Mbps. No point in rebalancing. */
>>> + * it is less than MAX of 1MB and ~1.5% of minimum bond member
>> Should this comment be "1MB or ~1.5%" ?
> nit: Don't omit the 'per second' part of the M[Bb]ps.
>
>>
>> Cheers,
>> M
>>
>>> + * link speed, if present. No point in rebalancing. */
>>> break;
>>> }
>>>
>>> --
>>> 2.48.1
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
--
regards,
Vladislav Odintsov
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev