sorry, I was wrong.
the old ukey and new ukey are independent. and current code does not
copy the stats,
which means the stats have already been cleared.

Peng He <xnhp0...@gmail.com> 于2023年4月27日周四 16:56写道:

> Hi, Eelco and Ilya
>
> It has been a long time and I see there are a lot fixes on the
> revdalidator's code about this statistics code.
> Has this stats inconsistent issue been solved?
>
> I just realize that making dump seq equal in *try_ukey_replace* is not
> enough.
> we might need to clear the old ukey's stats.
>
> since we only reuse the old ukey in EVICTED states, which means we have
> executed delete-megaflow-op on this ukey and
> the stats info has been pushed.
>
> so if there are new megaflow upcalls and replaces the existing ukey's, the
> ukey->stats should be cleared
> to sync with the new megaflows.
>
> if not, when revaldiator tries to delete this ukey, the push_dp_ops will
> find inconsistent stats between ukey->stats( as it's synced with
> old megaflow) and new magaflow, which is just the case you describe here.
>
>
>
>
>
> Peng He <xnhp0...@gmail.com> 于2022年5月21日周六 13:02写道:
>
>> Hi, Eelco and Ilya,
>>
>> We have observed the similar case in our production environments.
>>
>> After digging some code, I would like a discussion:
>>
>> In the situation that the ovs revalidator is under high load, i.e. the
>> number of megaflows are too large,
>> and the revalidators would try to delete some megaflows, at the same
>> time, there are traffic keeps
>> generating new megaflows through upcalls.
>>
>> the underlying cmap of megaflows changes due to a lot of deletion and
>> insertion (cmap will change its bucket number, capacity,
>> and re-place the megaflows in different buckets), resulting in a same
>> magaflow might be dumped twice in two revalidator threads in one dump stage.
>>
>> Consider a case:
>>
>> revalidator 1 gets the megaflow A dumped and its related ukey u(A), and
>> decide to kill the megaflow A.
>> After megaflow deletion, the pmd generates a new megaflow B with the same
>> ufid, i.e. the same match
>> and the same actions, and replace the ukey u(A) with u(B).
>>
>> Now revalidator 2 gets the old megaflow A again, and it will find the
>> ukey u(B).
>> u(B) is the new ukey, it's dump_seq is 0, and this leads megaflow A is
>> viewed as megaflow B, and its
>> statistics are contributed into ukey(B).
>>
>> The megaflow A and megaflow B is in essential the same one, the only
>> difference is the statistics value.
>> This mismatch of megaflow and ukey has two side effects:
>>
>> 1) the megaflow A's statistics have been contributed twice, leading to a
>> amplified value of openflow rule's
>> statistics. This is our observation in our environments.
>>
>> 2) revalidator 2 might also decides to kill ukey(B), but now its
>> statistics equals to megaflow A, and thus
>> results in a mismatch value in the email that Eelco observed.
>>
>> To fix the case 1), I think, during the ukey replace, the new ukey
>> generated by upcall should take the dump_seq
>> value of the replaced old key, and avoid the old megaflow being dumped
>> and find the new key.
>>
>> and the case 2) is also fixed.
>>
>> Any thoughts?
>>
>>
>> Eelco Chaudron <echau...@redhat.com> 于2022年2月22日周二 16:10写道:
>>
>>>
>>>
>>> On 21 Feb 2022, at 12:29, Eelco Chaudron wrote:
>>>
>>> > On 17 Feb 2022, at 14:10, Ilya Maximets wrote:
>>> >
>>> >> On 1/31/22 11:54, Eelco Chaudron wrote:
>>> >>> Make sure to only update packet and byte counters when valid,
>>> >>> or else this could lead to "temporarily/occasionally"
>>> >>> out-of-sync flow counters.
>>> >>
>>> >> There was already the same patch submitted here:
>>> >>
>>> https://patchwork.ozlabs.org/project/openvswitch/patch/20200602075036.78112-1-zhaozha...@163.com/
>>> >>
>>> >> And I'm still not comfortable with the change, because it seems
>>> >> like it only hides the underlying datapath problem.  Do you know
>>> >> why exactly datapath stats become lower than previously reported?
>>> >>
>>> >> If it's some kind of a statistics flush, that will mean that flow
>>> >> statistics will not be updated until new stats will catch up to the
>>> >> old value leading to the flow revalidation and incorrect flow stats
>>> >> anyway.
>>> >
>>> > This was very hard to reproduce, only once out of 20-30 runs if I
>>> remember correctly.
>>> >
>>> > Without the patch, it would sometimes show very high numbers and then
>>> got updated after a while with the correct numbers.
>>> > At least this is what I remember, as I did not take any notes, and my
>>> brain wanted to forget this patchset :)
>>> >
>>> >
>>> > Guess the fix is needed anyway as this behavior was there since day
>>> one in revalidate_ukey(), e79a6c833.
>>> >
>>> > I also see that you got no reply to your comments, so I’ll take
>>> another stab to make sure this patch is really fixing the problem, or
>>> fixing a problem, and hiding another TC problem.
>>> >
>>> >> We fixed incorrect stats on flow modification for dpdk offload
>>> provider
>>> >> previously here:
>>> >>
>>> https://patchwork.ozlabs.org/project/openvswitch/patch/20201012142735.5304-1-el...@nvidia.com/
>>> >>
>>> >> Do we need something similar for tc?
>>> >
>>> > I’ll take a close look at the DPDK patch, once I get my reproducer
>>> going.
>>>
>>> Unfortunately, after an afternoon and night or running tests, I can not
>>> replicate the problem with the weird counters :(
>>>
>>> So for now I’ll drop this patch from the set, and hopefully, it will
>>> resurface when I’m integrating the system-traffic tests into the hardware
>>> offload set.
>>>
>>> <SNIP>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>>
>> --
>> hepeng
>>
>
>
> --
> hepeng
>


-- 
hepeng
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to