On Fri, Mar 31, 2023 at 12:05:09PM +0200, Ilya Maximets wrote:
> On 3/31/23 11:07, Simon Horman wrote:
> > On Thu, Mar 30, 2023 at 09:04:02PM +0200, Ilya Maximets wrote:
> >> On 3/30/23 11:45, Simon Horman wrote:
> >>> On Fri, Mar 17, 2023 at 09:47:36PM +0100, Eelco Chaudron wrote:
> >>>>> Op 17 mrt. 2023 om 21:11 heeft Marcelo Ricardo Leitner 
> >>>>> <mleit...@redhat.com> het volgende geschreven:
> >>>>> Thu, Mar 16, 2023 at 09:51:34AM +0100, Eelco Chaudron wrote:
> >>>>>>> On 22 Dec 2022, at 13:32, Eelco Chaudron wrote:
> >>>>>>>> On 22 Dec 2022, at 10:26, Balazs Nemeth wrote:
> >>>>>>>
> >>>>>>>> The only way that stats->{n_packets,n_bytes} would decrease is due 
> >>>>>>>> to an
> >>>
> >>> nit: s/way/ways/

> >>>>>>>> overflow, or if there are bugs in how statistics are handled. In the
> >>>>>>>> past, there were multiple bugs that caused a jump backward. A
> >>>>>>>> workaround was in place to set the statistics to 0 in that case. When
> >>>>>>>> this happened while the revalidator was under heavy load, the 
> >>>>>>>> workaround
> >>>>>>>> had an unintended side effect where should_revalidate returned false
> >>>>>>>> causing the flow to be removed because the metric it calculated was
> >>>>>>>> based on a bogus value. Since many of those bugs have now been
> >>>>>>>> identified and resolved, there is no need to set the statistics to 
> >>>>>>>> 0. In
> >>>>>>>> addition, the (unlikely) overflow still needs to be handled
> >>>>>>>> appropriately.
> >>>
> >>> 1. Perhaps it would be prudent to log/count if/when this occurs
> >>
> >> +1
> >> We do have a coverage counter that will indicate the case where stats
> >> jump back.  However, if we're certain that this should never happen,
> >> we should, probably, emit a warning or even an error log as well, so
> >> users are aware that something went wrong.
> > 
> > I was thinking more of a counter, which seems to already be covered.
> > But I have no objection to your reasoning about having a warning (too).
> > 
> >>
> >>> 2. I take it that the overflow handling would be follow-up work,
> >>>    is that correct?
> >>
> >> The unsigned arithmetic should take case of overflowed counters,
> >> because the result of subtraction will still give a correct difference
> >> between the old and a new value, even if it overflowed and the new
> >> value is smaller.  Unless, of course, it overflowed more than once.
> > 
> > More than once between samples?
> > If so, I'm assuming that is not a case we can hit unless there is a bug.
> 
> Right.  It's actually should be practically not possible to overflow
> even once with a current hardware.  Assuming we have a fancy 400 Gbps
> NIC, then it should take 11.7 years to overflow a byte counter.
> 
> So, this patch is mostly removing a workaround for some bug that we
> hope we fixed.  But it's not clear what the original bug was as the
> commit message for this workaround didn't specify a root cause.  So,
> it's hard to say if it's fixed or not.  And that's why I'm thinking
> that the error message is needed.

Yes, I agree that is prudent.

...
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to