On 3/31/23 11:07, Simon Horman wrote:
> On Thu, Mar 30, 2023 at 09:04:02PM +0200, Ilya Maximets wrote:
>> On 3/30/23 11:45, Simon Horman wrote:
>>> On Fri, Mar 17, 2023 at 09:47:36PM +0100, Eelco Chaudron wrote:
>>>>
>>>>
>>>> Send from my phone
>>>>
>>>>> Op 17 mrt. 2023 om 21:11 heeft Marcelo Ricardo Leitner 
>>>>> <mleit...@redhat.com> het volgende geschreven:
>>>>>
>>>>> On Thu, Mar 16, 2023 at 09:51:34AM +0100, Eelco Chaudron wrote:
>>>>>>
>>>>>>
>>>>>>> On 22 Dec 2022, at 13:32, Eelco Chaudron wrote:
>>>>>>>
>>>>>>>> On 22 Dec 2022, at 10:26, Balazs Nemeth wrote:
>>>>>>>
>>>>>>>> The only way that stats->{n_packets,n_bytes} would decrease is due to 
>>>>>>>> an
>>>
>>> nit: s/way/ways/
>>>
>>>>>>>> overflow, or if there are bugs in how statistics are handled. In the
>>>>>>>> past, there were multiple bugs that caused a jump backward. A
>>>>>>>> workaround was in place to set the statistics to 0 in that case. When
>>>>>>>> this happened while the revalidator was under heavy load, the 
>>>>>>>> workaround
>>>>>>>> had an unintended side effect where should_revalidate returned false
>>>>>>>> causing the flow to be removed because the metric it calculated was
>>>>>>>> based on a bogus value. Since many of those bugs have now been
>>>>>>>> identified and resolved, there is no need to set the statistics to 0. 
>>>>>>>> In
>>>>>>>> addition, the (unlikely) overflow still needs to be handled
>>>>>>>> appropriately.
>>>
>>> 1. Perhaps it would be prudent to log/count if/when this occurs
>>
>> +1
>> We do have a coverage counter that will indicate the case where stats
>> jump back.  However, if we're certain that this should never happen,
>> we should, probably, emit a warning or even an error log as well, so
>> users are aware that something went wrong.
> 
> I was thinking more of a counter, which seems to already be covered.
> But I have no objection to your reasoning about having a warning (too).
> 
>>
>>> 2. I take it that the overflow handling would be follow-up work,
>>>    is that correct?
>>
>> The unsigned arithmetic should take case of overflowed counters,
>> because the result of subtraction will still give a correct difference
>> between the old and a new value, even if it overflowed and the new
>> value is smaller.  Unless, of course, it overflowed more than once.
> 
> More than once between samples?
> If so, I'm assuming that is not a case we can hit unless there is a bug.

Right.  It's actually should be practically not possible to overflow
even once with a current hardware.  Assuming we have a fancy 400 Gbps
NIC, then it should take 11.7 years to overflow a byte counter.

So, this patch is mostly removing a workaround for some bug that we
hope we fixed.  But it's not clear what the original bug was as the
commit message for this workaround didn't specify a root cause.  So,
it's hard to say if it's fixed or not.  And that's why I'm thinking
that the error message is needed.

> 
>>>>>>>> Signed-off-by: Balazs Nemeth <bnem...@redhat.com>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Depends on:
>>>>>>>> - netdev-offload-tc: Preserve tc statistics when flow gets modified.
>>>>>>>> - tc: Add support for TCA_STATS_PKT64
>>>>>>>> - ofproto-dpif-upcall: Wait for valid hw flow stats before applying 
>>>>>>>> min-revalidate-pps
>>>>>>>> - ofproto-dpif-upcall: Use last known stats ukey stats on revalidate 
>>>>>>>> missed dp flows.
>>>>>>>
>>>>>>> Did a lot of tests with this patch applied, but as mentioned it does 
>>>>>>> need all four patches mentioned above applied first.
>>>>>>>
>>>>>>> Acked-by: Eelco Chaudron <echau...@redhat.com>
>>>>>>
>>>>>> Now that all the dependent patches have been applied and backported to 
>>>>>> 2.17, can we have another look at this patch?
>>>>>
>>>>> Not sure what you mean here, Eelco. You mean, just a patch review or
>>>>> you mean something else, like evaluating if the patch is still needed
>>>>> at all?
>>>>
>>>> Guess I was not clear enough :) what I meant was for the maintainers to
>>>> have another look at this patch and if no further objections apply it.
>>>
>>> There seems to now be minor fuzz when applying this patch (locally).
>>> Perhaps a rebase is in order?
>>
>> Yeah, the coverage counter got introduced in a close proximity.

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to