Hi, Eelco

I am trying to reproduce the issues.
And I found every time I am running the test case #50, it will
fail at below check:

 AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=4" | ofctl_strip], [0],
[dnl
  n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop
 ])

- n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop
+ priority=1,ip,in_port=4 actions=drop


It looks like it gets zero stats. I am not sure if this fits in the fact
you mentioned sometimes it get 0 stats.

After I did some test, I found that, this is due to the fact that the
environment setting is not correct:

you can try below diff, and test

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index dfbc30e47..fadeeac29 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1771,6 +1771,8 @@ AT_CHECK([ovs-ofctl add-flows br0 flows.txt])

 AT_CHECK([ovs-ofctl del-flows br-underlay])
 AT_DATA([flows-underlay.txt], [dnl
+priority=99,arp,in_port=1,actions=LOCAL
+priority=99,arp,in_port=LOCAL,actions=1
 priority=99,dl_type=0x0800,nw_proto=47,in_port=1,actions=LOCAL
 priority=99,dl_type=0x0800,nw_proto=47,in_port=LOCAL,ip_dst=
172.31.1.1/24,actions=1
 priority=1,actions=drop



Eelco Chaudron <echau...@redhat.com> 于2022年9月30日周五 23:47写道:

> On 30 Sep 2022, at 17:41, Peng He wrote:
>
> > It's so easy to reproduce ?
> >
> > Thanks! then I should have to dig it again.
>
> It’s been on my to-do for a while but did not get to it. Sometimes it
> reproduced easily, and sometimes it takes 100+ runs :(
>
> If you apply the following series:
>
>   https://patchwork.ozlabs.org/project/openvswitch/list/?series=316861
>
> And then the diff below you can run something like:
>
>   sudo bash -c 'for i in {1..100}; do make check-offloads
> TESTSUITEFLAGS="49 50"
>
>  49: datapath - truncate and output to gre tunnel by simulated packets ok
>  50: datapath - truncate and output to gre tunnel    ok
>
> And after a while, it will fail with the error (sometimes also 0 bytes, I
> hope it’s the same issue :):
>
> system-traffic.at:1663: wait failed after 30 seconds
> n_bytes=18446744073709393054
>
>
>
> $ git diff
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 57f94df54..52d61d0a4 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -1868,6 +1868,7 @@ try_ukey_replace(struct umap *umap, struct udpif_key
> *old_ukey,
>              ovs_mutex_lock(&new_ukey->mutex);
>              cmap_replace(&umap->cmap, &old_ukey->cmap_node,
>                           &new_ukey->cmap_node, new_ukey->hash);
> +            new_ukey->dump_seq = old_ukey->dump_seq;
>              ovsrcu_postpone(ukey_delete__, old_ukey);
>              transition_ukey(old_ukey, UKEY_DELETED);
>              transition_ukey(new_ukey, UKEY_VISIBLE);
> diff --git a/tests/system-offloads-testsuite.at b/tests/
> system-offloads-testsuite.at
> index 318e6d1e6..4d546011d 100644
> --- a/tests/system-offloads-testsuite.at
> +++ b/tests/system-offloads-testsuite.at
> @@ -114,8 +114,8 @@ conntrack - DNAT load balancing with NC
>
>  # Occasionalt we fail with extreme high byte counters, i.e.
>  # n_bytes=18446744073705804134
> -datapath - truncate and output to gre tunnel by simulated packets
> -datapath - truncate and output to gre tunnel
> +#datapath - truncate and output to gre tunnel by simulated packets
> +#datapath - truncate and output to gre tunnel
>  "
>  echo "$ovs_test_skip_list" | sed "s/<SPC>/ /g"])
>
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 528d2ca64..e86699d1d 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -1708,7 +1708,10 @@ OVS_REVALIDATOR_PURGE()
>  OVS_WAIT_UNTIL([ovs-ofctl dump-flows br0 | grep "in_port=4" | ofctl_strip
> | grep "n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop"],
>                 [ovs-ofctl dump-flows br0 | grep "in_port=4" |
> ofctl_strip])
>
> -OVS_TRAFFIC_VSWITCHD_STOP
> +OVS_TRAFFIC_VSWITCHD_STOP(["dnl
> +/.*lost packet on handler.*/d
> +/.failed to flow_get.*/d
> +/.*Failed to acquire udpif_key corresponding to unexpected flow.*/d"])
>  AT_CLEANUP
>
>  dnl Create 2 bridges and 2 namespaces to test truncate over
> @@ -1834,7 +1837,10 @@ AT_CHECK([ovs-ofctl dump-flows br0 | grep
> "in_port=4" | ofctl_strip], [0], [dnl
>   n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop
>  ])
>
> -OVS_TRAFFIC_VSWITCHD_STOP
> +OVS_TRAFFIC_VSWITCHD_STOP(["dnl
> +/.*lost packet on handler.*/d
> +/.failed to flow_get.*/d
> +/.*Failed to acquire udpif_key corresponding to unexpected flow.*/d"])
>  AT_CLEANUP
>
>  AT_SETUP([datapath - configure cache size])
>
>
>
> > Eelco Chaudron <echau...@redhat.com> 于2022年9月30日周五 23:39写道:
> >
> >>
> >>
> >> On 30 Sep 2022, at 17:26, Peng He wrote:
> >>
> >>> Eelco Chaudron <echau...@redhat.com> 于2022年9月30日周五 23:01写道:
> >>>
> >>>>
> >>>>
> >>>> On 23 Sep 2022, at 18:29, Peng He wrote:
> >>>>
> >>>>> The userspace datapath mananges all the magaflows by a cmap. The cmap
> >>>>> data structrue will grow/shrink during the datapath processing and it
> >>>>> will re-position megaflows. This might result in two revalidator
> >> threads
> >>>>> might process a same megaflow during one dump stage.
> >>>>>
> >>>>> Consider a situation that, revalidator 1 processes a megaflow A, and
> >>>>> decides to delete it from the datapath, at the mean time, this
> megaflow
> >>>>> A is also queued in the process batch of revalidator 2. Normally it's
> >> ok
> >>>>> for revalidators to process the same megaflow multiple times, as the
> >>>>> dump_seq shows it's already dumped and the stats will not be
> >> contributed
> >>>>> twice.
> >>>>>
> >>>>> Assume that right after A is deleted, a PMD thread generates again
> >>>>> a new megaflow B which has the same match and action of A. The ukey
> >>>>> of megaflow B will replace the one of megaflow A. Now the ukey B is
> >>>>> new to the revalidator system and its dump seq is 0.
> >>>>>
> >>>>> Now since the dump seq of ukey B is 0, when processing megaflow A,
> >>>>> the revalidator 2 will not identify this megaflow A has already been
> >>>>> dumped by revalidator 1 and will contribute the old megaflow A's
> stats
> >>>>> again, this results in an inconsistent stats between ukeys and
> >> megaflows.
> >>>>>
> >>>>> To fix this, the newly generated the ukey B should take the dump_seq
> >>>>> of the replaced ukey A to avoid a same megaflow being revalidated
> >>>>> twice in one dump stage.
> >>>>>
> >>>>> We observe in the production environment, the OpenFlow rules' stats
> >>>>> sometimes are amplified compared to the actual value. I believe this
> >>>>> is also the reason that why somtimes there is mismatch between the
> >>>>> ukey and megaflow in stats value. The Eelco's patch
> >>>>> [ovs-dev] [PATCH v2 09/10] revalidator: Fix datapath statistics
> update
> >>>>> tried to fix it in the past.
> >>>>
> >>>> This sounds plausible, are your statistics extremely elevated?
> >>>> Mine are in the likes of n_bytes=18446744073705804134 where it should
> be
> >>>> around 100.
> >>>>
> >>>> It looks more like an overflow.
> >>> I just get the report from another team, I need to ask them, we will
> >> have a
> >>> 7 days off due to the national day.
> >>> so it will take time to get the value. :(
> >>
> >> Enjoy your time off!! This fix is not solving my problem;
> >>
> >> Still get the error once out of X runs, n_bytes=18446744073709393054
> >>
> >>> I’ll try to get my old setup up and run it continuously over the
> weekend
> >>>> and see if it’s replicated again.
> >>>>
> >>>> thanks!
> >>>
> >>>
> >>>>> Signed-off-by: Peng He <hepeng.0...@bytedance.com>
> >>>>> ---
> >>>>>  ofproto/ofproto-dpif-upcall.c | 1 +
> >>>>>  1 file changed, 1 insertion(+)
> >>>>>
> >>>>> diff --git a/ofproto/ofproto-dpif-upcall.c
> >>>> b/ofproto/ofproto-dpif-upcall.c
> >>>>> index e8bbcfeaf..89fad1bdf 100644
> >>>>> --- a/ofproto/ofproto-dpif-upcall.c
> >>>>> +++ b/ofproto/ofproto-dpif-upcall.c
> >>>>> @@ -1877,6 +1877,7 @@ try_ukey_replace(struct umap *umap, struct
> >>>> udpif_key *old_ukey,
> >>>>>              ovs_mutex_lock(&new_ukey->mutex);
> >>>>>              cmap_replace(&umap->cmap, &old_ukey->cmap_node,
> >>>>>                           &new_ukey->cmap_node, new_ukey->hash);
> >>>>> +            new_ukey->dump_seq = old_ukey->dump_seq;
> >>>>>              ovsrcu_postpone(ukey_delete__, old_ukey);
> >>>>>              transition_ukey(old_ukey, UKEY_DELETED);
> >>>>>              transition_ukey(new_ukey, UKEY_VISIBLE);
> >>>>> --
> >>>>> 2.25.1
> >>>>>
> >>>>> _______________________________________________
> >>>>> dev mailing list
> >>>>> d...@openvswitch.org
> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>
> >>>>
> >>>
> >>> --
> >>> hepeng
> >>
> >>
> >
> > --
> > hepeng
>
>

-- 
hepeng
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to