On 3/31/23 11:07, Simon Horman wrote: > On Thu, Mar 30, 2023 at 09:04:02PM +0200, Ilya Maximets wrote: >> On 3/30/23 11:45, Simon Horman wrote: >>> On Fri, Mar 17, 2023 at 09:47:36PM +0100, Eelco Chaudron wrote: >>>> >>>> >>>> Send from my phone >>>> >>>>> Op 17 mrt. 2023 om 21:11 heeft Marcelo Ricardo Leitner >>>>> <mleit...@redhat.com> het volgende geschreven: >>>>> >>>>> On Thu, Mar 16, 2023 at 09:51:34AM +0100, Eelco Chaudron wrote: >>>>>> >>>>>> >>>>>>> On 22 Dec 2022, at 13:32, Eelco Chaudron wrote: >>>>>>> >>>>>>>> On 22 Dec 2022, at 10:26, Balazs Nemeth wrote: >>>>>>> >>>>>>>> The only way that stats->{n_packets,n_bytes} would decrease is due to >>>>>>>> an >>> >>> nit: s/way/ways/ >>> >>>>>>>> overflow, or if there are bugs in how statistics are handled. In the >>>>>>>> past, there were multiple bugs that caused a jump backward. A >>>>>>>> workaround was in place to set the statistics to 0 in that case. When >>>>>>>> this happened while the revalidator was under heavy load, the >>>>>>>> workaround >>>>>>>> had an unintended side effect where should_revalidate returned false >>>>>>>> causing the flow to be removed because the metric it calculated was >>>>>>>> based on a bogus value. Since many of those bugs have now been >>>>>>>> identified and resolved, there is no need to set the statistics to 0. >>>>>>>> In >>>>>>>> addition, the (unlikely) overflow still needs to be handled >>>>>>>> appropriately. >>> >>> 1. Perhaps it would be prudent to log/count if/when this occurs >> >> +1 >> We do have a coverage counter that will indicate the case where stats >> jump back. However, if we're certain that this should never happen, >> we should, probably, emit a warning or even an error log as well, so >> users are aware that something went wrong. > > I was thinking more of a counter, which seems to already be covered. > But I have no objection to your reasoning about having a warning (too). > >> >>> 2. I take it that the overflow handling would be follow-up work, >>> is that correct? >> >> The unsigned arithmetic should take case of overflowed counters, >> because the result of subtraction will still give a correct difference >> between the old and a new value, even if it overflowed and the new >> value is smaller. Unless, of course, it overflowed more than once. > > More than once between samples? > If so, I'm assuming that is not a case we can hit unless there is a bug.
Right. It's actually should be practically not possible to overflow even once with a current hardware. Assuming we have a fancy 400 Gbps NIC, then it should take 11.7 years to overflow a byte counter. So, this patch is mostly removing a workaround for some bug that we hope we fixed. But it's not clear what the original bug was as the commit message for this workaround didn't specify a root cause. So, it's hard to say if it's fixed or not. And that's why I'm thinking that the error message is needed. > >>>>>>>> Signed-off-by: Balazs Nemeth <bnem...@redhat.com> >>>>>>>> --- >>>>>>>> >>>>>>>> Depends on: >>>>>>>> - netdev-offload-tc: Preserve tc statistics when flow gets modified. >>>>>>>> - tc: Add support for TCA_STATS_PKT64 >>>>>>>> - ofproto-dpif-upcall: Wait for valid hw flow stats before applying >>>>>>>> min-revalidate-pps >>>>>>>> - ofproto-dpif-upcall: Use last known stats ukey stats on revalidate >>>>>>>> missed dp flows. >>>>>>> >>>>>>> Did a lot of tests with this patch applied, but as mentioned it does >>>>>>> need all four patches mentioned above applied first. >>>>>>> >>>>>>> Acked-by: Eelco Chaudron <echau...@redhat.com> >>>>>> >>>>>> Now that all the dependent patches have been applied and backported to >>>>>> 2.17, can we have another look at this patch? >>>>> >>>>> Not sure what you mean here, Eelco. You mean, just a patch review or >>>>> you mean something else, like evaluating if the patch is still needed >>>>> at all? >>>> >>>> Guess I was not clear enough :) what I meant was for the maintainers to >>>> have another look at this patch and if no further objections apply it. >>> >>> There seems to now be minor fuzz when applying this patch (locally). >>> Perhaps a rebase is in order? >> >> Yeah, the coverage counter got introduced in a close proximity. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev