On 14.03.2025 16:43, Ilya Maximets wrote:
> On 3/14/25 13:03, Ilya Maximets wrote:
>> On 3/14/25 10:26, Vladislav Odintsov wrote:
>>> On 13.03.2025 22:05, Ilya Maximets wrote:
>>>> On 3/12/25 19:07, Vladislav Odintsov wrote:
>>>>> Sorry, I hit the send button too early.
>>>>>
>>>>> On 12.03.2025 20:49, Vladislav Odintsov wrote:
>>>>>> On 12.03.2025 19:16, Ilya Maximets wrote:
>>>>>>> On 3/12/25 16:01, Vladislav Odintsov wrote:
>>>>>>>> On 12.03.2025 17:45, Ilya Maximets wrote:
>>>>>>>>> On 3/12/25 12:41, Vladislav Odintsov wrote:
>>>>>>>>>> On 12.03.2025 14:11, Ilya Maximets wrote:
>>>>>>>>>>> On 3/12/25 11:22, Vladislav Odintsov wrote:
>>>>>>>>>>>> Hi Han, Ilya and Adrian,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm also suffering from these messages in logs. Han, do you have
>>>>>>>>>>>> plans
>>>>>>>>>>>> on working with requested changes?
>>>>>>>>>>>>
>>>>>>>>>>>> If you've got no time at the moment, I can continue this work
>>>>>>>>>>>> addressing
>>>>>>>>>>>> reviewers' comments. Ilya, you said 1kB is a small chunk of data
>>>>>>>>>>>> and we
>>>>>>>>>>>> shouldn't care too much about it. Which value is okay to ignore
>>>>>>>>>>>> from
>>>>>>>>>>>> your perspective?
>>>>>>>>>>> Are you suffering from excessive logging or from not enough
>>>>>>>>>>> granularity
>>>>>>>>>>> in the logs? This patch increases the accuracy and doesn't
>>>>>>>>>>> reduce the
>>>>>>>>>>> amount of logs.
>>>>>>>>>>>
>>>>>>>>>>> If the amount of logs is a problem, then maybe we should move the
>>>>>>>>>>> logs
>>>>>>>>>>> for individual load shifts to DBG level (maybe keep a
>>>>>>>>>>> particularly heavy
>>>>>>>>>>> ones like >1MB move at INFO) and have a separate overview of the
>>>>>>>>>>> change
>>>>>>>>>>> instead showing the load distribution after the rebalancing, e.g.:
>>>>>>>>>>>
>>>>>>>>>>> bond X: rebalanced (now carrying: portA:XXXkB, portB:YYYkB,
>>>>>>>>>>> portC:ZZZkB)
>>>>>>>>>>>
>>>>>>>>>>> For the debug logs we could improve the accuracy as this patch does.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>> We suffer from excessive logging of "0 kB" records. There are tons of
>>>>>>>>>> them. So I planned to take your proposal and do not to log these
>>>>>>>>>> events
>>>>>>>>>> at all. But I like your last idea about hiding messages of
>>>>>>>>>> re-balancing
>>>>>>>>>> <1MB under DBG log level and keep INFO for others.
>>>>>>>>>>
>>>>>>>>>> If you guys are okay with that, I can prepare and send a patch.
>>>>>>>>> I think, if we're hiding some information under DBG, then we need
>>>>>>>>> to add
>>>>>>>>> a new log with an overview of the rebalance with a new load
>>>>>>>>> distribution.
>>>>>>>>> With that, sounds fine to me.
>>>>>>>> Did you mean that we should log total cumulative re-balanced amount
>>>>>>>> once
>>>>>>>> in a timeframe (let say 5-15 minutes) or just log with some rate-limit?
>>>>>>> The total cumulative amount at the end of
>>>>>>> ofproto/bond.c:bond_rebalance()
>>>>>>> if rebalanced == true. See the example log above. So, it the log
>>>>>>> should
>>>>>>> appear at most once per rebalance interval, which is 10s by default.
>>>>>>>
>>>>>>> The current 'shift X of load' log will remain but at DBG level,
>>>>>>> except for
>>>>>>> a very large ones.
>>>>>>>
>>>>>>> Will that solve your problem, or is that still too much of logging (once
>>>>>>> in 10s) ?
>>>>>> 50/50 :)
>>>>>>
>>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | wc -l
>>>>>> 47388
>>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift -c
>>>>>> 47339
>>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep 'shift
>>>>>> 0kB' | wc -l
>>>>>> 20384
>>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift |
>>>>>> cut -d '|' -f1 | uniq -c | sort -n | sed -e 's/^[[:space:]]*//' | cut
>>>>>> -d ' ' -f1 | uniq -c | sort -n
>>>>>> 1 102
>>>>>> 1 108
>>>>>> 1 109
>>>> <snip>
>>>>
>>>>>> 70 6
>>>>>> 112 5
>>>>>> 205 4
>>>>>> 449 3
>>>>>> 741 1
>>>>>> 1034 2
>>>>>>
>>>>>> The last command shows the amount (1st column) of shift events in one
>>>>>> microsecond. Like this:
>>>>> Sorry, there was an error:
>>>>>
>>>>> The *second* column (value) shows the total amount of shift log lines
>>>>> occurred in any one millisecond in a log by last day.
>>>>> The *first* column (key) shows the total times in a log by last day
>>>>> where there was a (value) events in one millisecond.
>>>>>
>>>>> Like this:
>>>>>
>>>>> 2025-03-12T17:49:12.307Z|2772298|bond|INFO|bond directconnect: shift
>>>>> 407kB of load (with hash 40) from eth6 to eth7 (now carrying 1283kB and
>>>>> 935kB load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772299|bond|INFO|bond directconnect: shift 1kB
>>>>> of load (with hash 82) from eth6 to eth7 (now carrying 1282kB and 936kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772300|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 85) from eth6 to eth7 (now carrying 1281kB and 937kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772301|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 45) from eth6 to eth7 (now carrying 1280kB and 938kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772302|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 233) from eth6 to eth7 (now carrying 1280kB and 938kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772303|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 19) from eth6 to eth7 (now carrying 1279kB and 938kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772304|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 1) from eth6 to eth7 (now carrying 1279kB and 939kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772305|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 227) from eth6 to eth7 (now carrying 1279kB and 939kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772306|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 76) from eth6 to eth7 (now carrying 1279kB and 939kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772307|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 14) from eth6 to eth7 (now carrying 1279kB and 939kB
>>>>> load, respectively)
>>>>>
>>>>> This server lightly loaded (<100mb/s of network traffic).
>>>>>
>>>>> Additionally:
>>>>>
>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift |
>>>>> cut -d '|' -f1 | uniq -c | sort -n
>>>>> ...(a lot of lines).
>>>>> 243 2025-03-12T06:32:25.462Z
>>>>> 243 2025-03-12T06:47:36.978Z
>>>>> 243 2025-03-12T15:28:17.831Z
>>>>> 244 2025-03-12T07:50:35.699Z
>>>>> 244 2025-03-12T13:22:43.546Z
>>>>> 245 2025-03-12T12:46:39.905Z
>>>>> 247 2025-03-12T06:51:57.431Z
>>>>> 247 2025-03-12T08:49:12.899Z
>>>>> 248 2025-03-12T07:24:31.333Z
>>>>> 248 2025-03-12T14:40:52.260Z
>>>>> 249 2025-03-12T13:24:33.725Z
>>>>> 249 2025-03-12T14:23:29.867Z
>>>>> 250 2025-03-12T08:22:30.169Z
>>>>> 250 2025-03-12T13:37:25.005Z
>>>>> 251 2025-03-12T07:15:20.215Z
>>>>> 251 2025-03-12T13:10:22.321Z
>>>>> 251 2025-03-12T14:24:39.985Z
>>>>> 252 2025-03-12T07:57:47.565Z
>>>>> 252 2025-03-12T14:40:32.224Z
>>>>> 252 2025-03-12T15:26:17.663Z
>>>>> 254 2025-03-12T08:05:08.456Z
>>>>> 254 2025-03-12T08:21:50.104Z
>>>>> 254 2025-03-12T12:42:09.455Z
>>>>>
>>>>> The 1st column here is the total shift log lines from 1 millisecond
>>>>> (shown in 2nd column).
>>>>>
>>>>> So, from my example, if there is a maximum 1 log line per 10s, the
>>>>> logfile still will be filled with only rebalance lines, but the total
>>>>> amount will be noticeably reduced.
>>>>>
>>>>> By the way, why do we need this information in log? What can we conclude
>>>>> from this information?
>>>> This information is useful for correlation with networking issues. We had
>>>> a
>>>> few cases in the past where some weird behavior happened exactly right
>>>> after
>>>> rebalancing and having rebalancing logs was crucial to figure that out.
>>>>
>>>> The proposed change will significantly reduce the amount of logging for
>>>> you,
>>>> it seems, so I'd say we should proceed with that.
>>>>
>>>> The next step, I think, would be to increase the rebalancing thresholds.
>>>> Today we do not rebalance if the overload is less than 3% between ports or
>>>> less than ~1Mbps. On modern hardware, 1Mbps is practically nothing and
>>>> even
>>>> 10 or 100 Mbps may not be worth rebalancing in some cases. So, instead of
>>>> hardcoding the 1Mbps threshold we may get a MIN of the link speeds of the
>>>> ports and use 1-3% of that as a threshold. On your lightly loaded system
>>>> that will likely eliminate most or all the rebalancing altogether.
>>>>
>>>> It will be a little tricky, since it's not always possible to get the link
>>>> speed and the link speed can be changed dynamically, but should generally
>>>> be doable.
>>>>
>>>> This way we'll keep the logs, but avoid a log spam for all cases, except
>>>> the ones where rebalancing is actually necessary.
>>>>
>>>> WDYT?
>>> Sounds good for me.
>>>
>>> I'll post logging changes shortly, since Han is not answering and the
>>> final solution is different from original.
>>>
>>> Regarding rebalance threshold, did you mean something like this?
>> Yes, something like this. I thought to store the speed in the member
>> structure
>> initially, but it creates challenges if the speed changes, and it's probably
>> not
>> a big deal to check the speed once in the rebalancing interval. So, the
>> solution
>> below might be fine. I didn't check the math in the code below, but see some
>> comments inline.
Thanks for the review!
I'll address your comments and send out the patch series.
>>
>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>> index 45a36fabb..083c0be40 100644
>>> --- a/ofproto/bond.c
>>> +++ b/ofproto/bond.c
>>> @@ -1371,6 +1371,8 @@ bond_rebalance(struct bond *bond)
>>> }
>>> }
>>>
>>> + uint32_t mbps;
>> This is temporary and can be defined inside the if condition in the loop.
>>
>>> + uint64_t min_member_bps = UINT64_MAX;
>> An empty line would be nice. But this may be moved to the top as well,
>> since this variable is used not only for the closest loop and this function
>> generally defines most of its variables at the top.
>>
>>> /* Add enabled members to 'bals' in descending order of tx_bytes.
>>> *
>>> * XXX This is O(n**2) in the number of members but it could be
>>> O(n lg n)
>>> @@ -1379,6 +1381,10 @@ bond_rebalance(struct bond *bond)
>>> HMAP_FOR_EACH (member, hmap_node, &bond->members) {
>>> if (member->enabled) {
>>> insert_bal(&bals, member);
>>> + netdev_get_speed(member->netdev, &mbps, NULL);
>>> + if (min_member_bps > mbps) {
>>> + min_member_bps = mbps * 1000000;
>>> + }
>> The comparison above is not correct as it compares apples (bps) to
>> oranges (Mbps). Use the MIN macro to avoid this issue.
You're right. Initially I compared mbps, but then changed variable to
bps and lost this place.
Thanks for the hint, I'll fix this.
> Also, this multiplication will easily overflow uin32_t, so UINT64_C(1000000)
> should be used instead to avoid type conversion issues. Ideally, I'd suggest
> to keep the value in Mbps and divide the overload if necessary instead.
> We do not care much about accuracy on small values, but we do care about not
> cropping the large ones.
You're right. Will fix that.
>
>>> }
>>> }
>>> log_bals(bond, &bals);
>>> @@ -1392,10 +1398,13 @@ bond_rebalance(struct bond *bond)
>>> uint64_t overload;
>>>
>>> overload = from->tx_bytes - to->tx_bytes;
>>> - if (overload < to->tx_bytes >> 5 || overload < 100000) {
>>> + if (overload < to->tx_bytes >> 5 ||
>>> + overload < 100000 ||
>>> + overload * 8 < min_member_bps >> 6) { /* speed can be 0. */
>> Comments should start with a capital letter. But also, I'd suggest we just
>> remove the 1Mbps condition above and instead do mbps = MAX(mbps, <some
>> value>)
>> at the point of getting the speed, where <some value> will be the link speed
>> that will result in an equivalent 1Mbps overload check.
Ack.
>>
>>> /* The extra load on 'from' (and all less-loaded members),
>>> compared
>>> * to that of 'to' (the least-loaded member), is less than
>>> ~3%, or
>>> - * it is less than ~1Mbps. No point in rebalancing. */
>>> + * it is less than ~1Mbps, or this extra load is < ~1.5% of
>>> minimum
>>> + bond member link speed, if present. No point in
>>> rebalancing. */
>>> break;
>>> }
>>>
>>> If yes, I can send both patches in a series.
>> Sure.
>>
>>>>>> I guess we should not log 0kb movement at all.
>>>>>>
>>>>>>>>> Note: chinatelecom.cn mail server doesn't seem to like
>>>>>>>>> communicating with
>>>>>>>>> gmail smtp, and so my replies likely do not reach Han. Let's wait
>>>>>>>>> maybe a
>>>>>>>>> couple days for a reply, in case at least some of these messages
>>>>>>>>> are delivered.
>>>>>>>>>
>>>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>>>>
>>>>>>>>>>>> On 05.08.2024 10:08, Adrián Moreno wrote:
>>>>>>>>>>>>> On Fri, Aug 02, 2024 at 06:38:19PM GMT, Ilya Maximets wrote:
>>>>>>>>>>>>>> On 7/2/24 16:36, Han Ding wrote:
>>>>>>>>>>>>>>> When the delta is less than 1024 in bond_shift_load, it print
>>>>>>>>>>>>>>> "shift 0kB of load".
>>>>>>>>>>>>>>> Like this:
>>>>>>>>>>>>>>> bond dpdkbond0: shift 0kB of load (with hash 71) from nic1 to
>>>>>>>>>>>>>>> nic2 (now carrying 20650165kB and 8311662kB load, respectively)
>>>>>>>>>>>>>> Hi, Han. Thanks for the patch!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm curious, is this information about movements that small
>>>>>>>>>>>>>> practically useful?
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I had the same thought.
>>>>>>>>>>>>> I guess it should not happen very often given how the algorithm
>>>>>>>>>>>>> works but
>>>>>>>>>>>>> OTOH, printing "0kB" is definitely not useful.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Han Ding<[email protected]>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> ofproto/bond.c | 24 +++++++++++++++++-------
>>>>>>>>>>>>>>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>>>>>>>>>>>>>> index c31869a..5b1975d 100644
>>>>>>>>>>>>>>> --- a/ofproto/bond.c
>>>>>>>>>>>>>>> +++ b/ofproto/bond.c
>>>>>>>>>>>>>>> @@ -1192,13 +1192,23 @@ bond_shift_load(struct bond_entry
>>>>>>>>>>>>>>> *hash, struct bond_member *to)
>>>>>>>>>>>>>>> struct bond *bond = from->bond;
>>>>>>>>>>>>>>> uint64_t delta = hash->tx_bytes;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with hash
>>>>>>>>>>>>>>> %"PRIdPTR") "
>>>>>>>>>>>>>>> - "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>>>>>> - "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>>>> - bond->name, delta / 1024, hash - bond->hash,
>>>>>>>>>>>>>>> - from->name, to->name,
>>>>>>>>>>>>>>> - (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>>>> - (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>>>> + if (delta >= 1024) {
>>>>>>>>>>>>>>> + VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with
>>>>>>>>>>>>>>> hash %"PRIdPTR") "
>>>>>>>>>>>>>>> + "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>>>>>> + "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>>>> + bond->name, delta / 1024, hash - bond->hash,
>>>>>>>>>>>>>>> + from->name, to->name,
>>>>>>>>>>>>>>> + (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>>>> + (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>>>> + } else {
>>>>>>>>>>>>>>> + VLOG_INFO("bond %s: shift %"PRIu64"B of load (with
>>>>>>>>>>>>>>> hash %"PRIdPTR") "
>>>>>>>>>>>>>>> + "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>>>> Apart from Ilya's comment, missing one more indentation space
>>>>>>>>>>>>> in this
>>>>>>>>>>>>> line (and all others).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> + "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>>>> + bond->name, delta, hash - bond->hash,
>>>>>>>>>>>>>>> + from->name, to->name,
>>>>>>>>>>>>>>> + (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>>>> + (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>> I'd suggest instead of copying the whole thing, just replace "kB"
>>>>>>>>>>>>>> with another %s and use a couple of ternary operators to produce
>>>>>>>>>>>>>> correct value and "kB" or "B" depending on the delta.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> dev mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> dev mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
--
Regards,
Vladislav Odintsov
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev