sorry, I was wrong. the old ukey and new ukey are independent. and current code does not copy the stats, which means the stats have already been cleared.
Peng He <xnhp0...@gmail.com> 于2023年4月27日周四 16:56写道: > Hi, Eelco and Ilya > > It has been a long time and I see there are a lot fixes on the > revdalidator's code about this statistics code. > Has this stats inconsistent issue been solved? > > I just realize that making dump seq equal in *try_ukey_replace* is not > enough. > we might need to clear the old ukey's stats. > > since we only reuse the old ukey in EVICTED states, which means we have > executed delete-megaflow-op on this ukey and > the stats info has been pushed. > > so if there are new megaflow upcalls and replaces the existing ukey's, the > ukey->stats should be cleared > to sync with the new megaflows. > > if not, when revaldiator tries to delete this ukey, the push_dp_ops will > find inconsistent stats between ukey->stats( as it's synced with > old megaflow) and new magaflow, which is just the case you describe here. > > > > > > Peng He <xnhp0...@gmail.com> 于2022年5月21日周六 13:02写道: > >> Hi, Eelco and Ilya, >> >> We have observed the similar case in our production environments. >> >> After digging some code, I would like a discussion: >> >> In the situation that the ovs revalidator is under high load, i.e. the >> number of megaflows are too large, >> and the revalidators would try to delete some megaflows, at the same >> time, there are traffic keeps >> generating new megaflows through upcalls. >> >> the underlying cmap of megaflows changes due to a lot of deletion and >> insertion (cmap will change its bucket number, capacity, >> and re-place the megaflows in different buckets), resulting in a same >> magaflow might be dumped twice in two revalidator threads in one dump stage. >> >> Consider a case: >> >> revalidator 1 gets the megaflow A dumped and its related ukey u(A), and >> decide to kill the megaflow A. >> After megaflow deletion, the pmd generates a new megaflow B with the same >> ufid, i.e. the same match >> and the same actions, and replace the ukey u(A) with u(B). >> >> Now revalidator 2 gets the old megaflow A again, and it will find the >> ukey u(B). >> u(B) is the new ukey, it's dump_seq is 0, and this leads megaflow A is >> viewed as megaflow B, and its >> statistics are contributed into ukey(B). >> >> The megaflow A and megaflow B is in essential the same one, the only >> difference is the statistics value. >> This mismatch of megaflow and ukey has two side effects: >> >> 1) the megaflow A's statistics have been contributed twice, leading to a >> amplified value of openflow rule's >> statistics. This is our observation in our environments. >> >> 2) revalidator 2 might also decides to kill ukey(B), but now its >> statistics equals to megaflow A, and thus >> results in a mismatch value in the email that Eelco observed. >> >> To fix the case 1), I think, during the ukey replace, the new ukey >> generated by upcall should take the dump_seq >> value of the replaced old key, and avoid the old megaflow being dumped >> and find the new key. >> >> and the case 2) is also fixed. >> >> Any thoughts? >> >> >> Eelco Chaudron <echau...@redhat.com> 于2022年2月22日周二 16:10写道: >> >>> >>> >>> On 21 Feb 2022, at 12:29, Eelco Chaudron wrote: >>> >>> > On 17 Feb 2022, at 14:10, Ilya Maximets wrote: >>> > >>> >> On 1/31/22 11:54, Eelco Chaudron wrote: >>> >>> Make sure to only update packet and byte counters when valid, >>> >>> or else this could lead to "temporarily/occasionally" >>> >>> out-of-sync flow counters. >>> >> >>> >> There was already the same patch submitted here: >>> >> >>> https://patchwork.ozlabs.org/project/openvswitch/patch/20200602075036.78112-1-zhaozha...@163.com/ >>> >> >>> >> And I'm still not comfortable with the change, because it seems >>> >> like it only hides the underlying datapath problem. Do you know >>> >> why exactly datapath stats become lower than previously reported? >>> >> >>> >> If it's some kind of a statistics flush, that will mean that flow >>> >> statistics will not be updated until new stats will catch up to the >>> >> old value leading to the flow revalidation and incorrect flow stats >>> >> anyway. >>> > >>> > This was very hard to reproduce, only once out of 20-30 runs if I >>> remember correctly. >>> > >>> > Without the patch, it would sometimes show very high numbers and then >>> got updated after a while with the correct numbers. >>> > At least this is what I remember, as I did not take any notes, and my >>> brain wanted to forget this patchset :) >>> > >>> > >>> > Guess the fix is needed anyway as this behavior was there since day >>> one in revalidate_ukey(), e79a6c833. >>> > >>> > I also see that you got no reply to your comments, so I’ll take >>> another stab to make sure this patch is really fixing the problem, or >>> fixing a problem, and hiding another TC problem. >>> > >>> >> We fixed incorrect stats on flow modification for dpdk offload >>> provider >>> >> previously here: >>> >> >>> https://patchwork.ozlabs.org/project/openvswitch/patch/20201012142735.5304-1-el...@nvidia.com/ >>> >> >>> >> Do we need something similar for tc? >>> > >>> > I’ll take a close look at the DPDK patch, once I get my reproducer >>> going. >>> >>> Unfortunately, after an afternoon and night or running tests, I can not >>> replicate the problem with the weird counters :( >>> >>> So for now I’ll drop this patch from the set, and hopefully, it will >>> resurface when I’m integrating the system-traffic tests into the hardware >>> offload set. >>> >>> <SNIP> >>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >> >> >> -- >> hepeng >> > > > -- > hepeng > -- hepeng _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev