Hi, Eelco I am trying to reproduce the issues. And I found every time I am running the test case #50, it will fail at below check:
AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=4" | ofctl_strip], [0], [dnl n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop ]) - n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop + priority=1,ip,in_port=4 actions=drop It looks like it gets zero stats. I am not sure if this fits in the fact you mentioned sometimes it get 0 stats. After I did some test, I found that, this is due to the fact that the environment setting is not correct: you can try below diff, and test diff --git a/tests/system-traffic.at b/tests/system-traffic.at index dfbc30e47..fadeeac29 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -1771,6 +1771,8 @@ AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) AT_CHECK([ovs-ofctl del-flows br-underlay]) AT_DATA([flows-underlay.txt], [dnl +priority=99,arp,in_port=1,actions=LOCAL +priority=99,arp,in_port=LOCAL,actions=1 priority=99,dl_type=0x0800,nw_proto=47,in_port=1,actions=LOCAL priority=99,dl_type=0x0800,nw_proto=47,in_port=LOCAL,ip_dst= 172.31.1.1/24,actions=1 priority=1,actions=drop Eelco Chaudron <echau...@redhat.com> 于2022年9月30日周五 23:47写道: > On 30 Sep 2022, at 17:41, Peng He wrote: > > > It's so easy to reproduce ? > > > > Thanks! then I should have to dig it again. > > It’s been on my to-do for a while but did not get to it. Sometimes it > reproduced easily, and sometimes it takes 100+ runs :( > > If you apply the following series: > > https://patchwork.ozlabs.org/project/openvswitch/list/?series=316861 > > And then the diff below you can run something like: > > sudo bash -c 'for i in {1..100}; do make check-offloads > TESTSUITEFLAGS="49 50" > > 49: datapath - truncate and output to gre tunnel by simulated packets ok > 50: datapath - truncate and output to gre tunnel ok > > And after a while, it will fail with the error (sometimes also 0 bytes, I > hope it’s the same issue :): > > system-traffic.at:1663: wait failed after 30 seconds > n_bytes=18446744073709393054 > > > > $ git diff > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 57f94df54..52d61d0a4 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -1868,6 +1868,7 @@ try_ukey_replace(struct umap *umap, struct udpif_key > *old_ukey, > ovs_mutex_lock(&new_ukey->mutex); > cmap_replace(&umap->cmap, &old_ukey->cmap_node, > &new_ukey->cmap_node, new_ukey->hash); > + new_ukey->dump_seq = old_ukey->dump_seq; > ovsrcu_postpone(ukey_delete__, old_ukey); > transition_ukey(old_ukey, UKEY_DELETED); > transition_ukey(new_ukey, UKEY_VISIBLE); > diff --git a/tests/system-offloads-testsuite.at b/tests/ > system-offloads-testsuite.at > index 318e6d1e6..4d546011d 100644 > --- a/tests/system-offloads-testsuite.at > +++ b/tests/system-offloads-testsuite.at > @@ -114,8 +114,8 @@ conntrack - DNAT load balancing with NC > > # Occasionalt we fail with extreme high byte counters, i.e. > # n_bytes=18446744073705804134 > -datapath - truncate and output to gre tunnel by simulated packets > -datapath - truncate and output to gre tunnel > +#datapath - truncate and output to gre tunnel by simulated packets > +#datapath - truncate and output to gre tunnel > " > echo "$ovs_test_skip_list" | sed "s/<SPC>/ /g"]) > > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > index 528d2ca64..e86699d1d 100644 > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -1708,7 +1708,10 @@ OVS_REVALIDATOR_PURGE() > OVS_WAIT_UNTIL([ovs-ofctl dump-flows br0 | grep "in_port=4" | ofctl_strip > | grep "n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop"], > [ovs-ofctl dump-flows br0 | grep "in_port=4" | > ofctl_strip]) > > -OVS_TRAFFIC_VSWITCHD_STOP > +OVS_TRAFFIC_VSWITCHD_STOP(["dnl > +/.*lost packet on handler.*/d > +/.failed to flow_get.*/d > +/.*Failed to acquire udpif_key corresponding to unexpected flow.*/d"]) > AT_CLEANUP > > dnl Create 2 bridges and 2 namespaces to test truncate over > @@ -1834,7 +1837,10 @@ AT_CHECK([ovs-ofctl dump-flows br0 | grep > "in_port=4" | ofctl_strip], [0], [dnl > n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop > ]) > > -OVS_TRAFFIC_VSWITCHD_STOP > +OVS_TRAFFIC_VSWITCHD_STOP(["dnl > +/.*lost packet on handler.*/d > +/.failed to flow_get.*/d > +/.*Failed to acquire udpif_key corresponding to unexpected flow.*/d"]) > AT_CLEANUP > > AT_SETUP([datapath - configure cache size]) > > > > > Eelco Chaudron <echau...@redhat.com> 于2022年9月30日周五 23:39写道: > > > >> > >> > >> On 30 Sep 2022, at 17:26, Peng He wrote: > >> > >>> Eelco Chaudron <echau...@redhat.com> 于2022年9月30日周五 23:01写道: > >>> > >>>> > >>>> > >>>> On 23 Sep 2022, at 18:29, Peng He wrote: > >>>> > >>>>> The userspace datapath mananges all the magaflows by a cmap. The cmap > >>>>> data structrue will grow/shrink during the datapath processing and it > >>>>> will re-position megaflows. This might result in two revalidator > >> threads > >>>>> might process a same megaflow during one dump stage. > >>>>> > >>>>> Consider a situation that, revalidator 1 processes a megaflow A, and > >>>>> decides to delete it from the datapath, at the mean time, this > megaflow > >>>>> A is also queued in the process batch of revalidator 2. Normally it's > >> ok > >>>>> for revalidators to process the same megaflow multiple times, as the > >>>>> dump_seq shows it's already dumped and the stats will not be > >> contributed > >>>>> twice. > >>>>> > >>>>> Assume that right after A is deleted, a PMD thread generates again > >>>>> a new megaflow B which has the same match and action of A. The ukey > >>>>> of megaflow B will replace the one of megaflow A. Now the ukey B is > >>>>> new to the revalidator system and its dump seq is 0. > >>>>> > >>>>> Now since the dump seq of ukey B is 0, when processing megaflow A, > >>>>> the revalidator 2 will not identify this megaflow A has already been > >>>>> dumped by revalidator 1 and will contribute the old megaflow A's > stats > >>>>> again, this results in an inconsistent stats between ukeys and > >> megaflows. > >>>>> > >>>>> To fix this, the newly generated the ukey B should take the dump_seq > >>>>> of the replaced ukey A to avoid a same megaflow being revalidated > >>>>> twice in one dump stage. > >>>>> > >>>>> We observe in the production environment, the OpenFlow rules' stats > >>>>> sometimes are amplified compared to the actual value. I believe this > >>>>> is also the reason that why somtimes there is mismatch between the > >>>>> ukey and megaflow in stats value. The Eelco's patch > >>>>> [ovs-dev] [PATCH v2 09/10] revalidator: Fix datapath statistics > update > >>>>> tried to fix it in the past. > >>>> > >>>> This sounds plausible, are your statistics extremely elevated? > >>>> Mine are in the likes of n_bytes=18446744073705804134 where it should > be > >>>> around 100. > >>>> > >>>> It looks more like an overflow. > >>> I just get the report from another team, I need to ask them, we will > >> have a > >>> 7 days off due to the national day. > >>> so it will take time to get the value. :( > >> > >> Enjoy your time off!! This fix is not solving my problem; > >> > >> Still get the error once out of X runs, n_bytes=18446744073709393054 > >> > >>> I’ll try to get my old setup up and run it continuously over the > weekend > >>>> and see if it’s replicated again. > >>>> > >>>> thanks! > >>> > >>> > >>>>> Signed-off-by: Peng He <hepeng.0...@bytedance.com> > >>>>> --- > >>>>> ofproto/ofproto-dpif-upcall.c | 1 + > >>>>> 1 file changed, 1 insertion(+) > >>>>> > >>>>> diff --git a/ofproto/ofproto-dpif-upcall.c > >>>> b/ofproto/ofproto-dpif-upcall.c > >>>>> index e8bbcfeaf..89fad1bdf 100644 > >>>>> --- a/ofproto/ofproto-dpif-upcall.c > >>>>> +++ b/ofproto/ofproto-dpif-upcall.c > >>>>> @@ -1877,6 +1877,7 @@ try_ukey_replace(struct umap *umap, struct > >>>> udpif_key *old_ukey, > >>>>> ovs_mutex_lock(&new_ukey->mutex); > >>>>> cmap_replace(&umap->cmap, &old_ukey->cmap_node, > >>>>> &new_ukey->cmap_node, new_ukey->hash); > >>>>> + new_ukey->dump_seq = old_ukey->dump_seq; > >>>>> ovsrcu_postpone(ukey_delete__, old_ukey); > >>>>> transition_ukey(old_ukey, UKEY_DELETED); > >>>>> transition_ukey(new_ukey, UKEY_VISIBLE); > >>>>> -- > >>>>> 2.25.1 > >>>>> > >>>>> _______________________________________________ > >>>>> dev mailing list > >>>>> d...@openvswitch.org > >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>>> > >>>> > >>> > >>> -- > >>> hepeng > >> > >> > > > > -- > > hepeng > > -- hepeng _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev