On Mon, 2022-06-27 at 11:32 +0200, Eelco Chaudron wrote: > > > On 21 Jun 2022, at 10:22, Jianbo Liu wrote: > > > On Mon, 2022-06-20 at 12:18 +0200, Eelco Chaudron wrote: > > > On 27 May 2022, at 11:00, Jianbo Liu wrote: > > > > > > > OVS meters are created in advance and openflow rules refer to > > > > them > > > > by > > > > their unique ID. New tc_police API is used to offload them. By > > > > calling > > > > the API, police actions are created and meters are mapped to > > > > them. > > > > These actions then can be used in tc filter rules by the index. > > > > > > > > Signed-off-by: Jianbo Liu <jian...@nvidia.com> > > > > --- > > > > NEWS | 2 ++ > > > > lib/dpif-netlink.c | 31 +++++++++++++++++---- > > > > tests/system-offloads-traffic.at | 48 > > > > ++++++++++++++++++++++++++++++++ > > > > 3 files changed, 76 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/NEWS b/NEWS > > > > index eece0d0b2..dfd659d4e 100644 > > > > --- a/NEWS > > > > +++ b/NEWS > > > > @@ -29,6 +29,8 @@ Post-v2.17.0 > > > > - Windows: > > > > * Conntrack support for TCPv6, UDPv6, ICMPv6, FTPv6. > > > > * IPv6 Geneve tunnel support. > > > > + - Linux datapath: > > > > + * Add offloading meter tc police. > > > > > > > > > > > > v2.17.0 - 17 Feb 2022 > > > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > > > > index 06e1e8ca0..0af9ee77e 100644 > > > > --- a/lib/dpif-netlink.c > > > > +++ b/lib/dpif-netlink.c > > > > @@ -4163,11 +4163,18 @@ static int > > > > dpif_netlink_meter_set(struct dpif *dpif_, ofproto_meter_id > > > > meter_id, > > > > struct ofputil_meter_config *config) > > > > { > > > > + int err; > > > > + > > > > if (probe_broken_meters(dpif_)) { > > > > return ENOMEM; > > > > } > > > > > > > > - return dpif_netlink_meter_set__(dpif_, meter_id, config); > > > > + err = dpif_netlink_meter_set__(dpif_, meter_id, config); > > > > + if (!err && netdev_is_flow_api_enabled()) { > > > > + meter_offload_set(meter_id, config); > > > > > > Although currently we always return 0, we should still account > > > for it > > > to change in the future, so we should set err to the return > > > value. > > > > > > > If there is err from meter_offload_set, it will be passed to the > > caller > > of dpif_netlink_meter_set(). I don't agree with that, because the > > caller thinks meter_set operation fail, but actually not. Besides, > > we > > allow the case that dp meter_set success, but offloading fails, so > > the > > return the error of meter_offload_set seems unnecessary. > > If this is the case, we should change the dpif_netlink_meter_set() > API to return void. > And add a comment to the function why it would not return an error: >
OK. > --- a/lib/netdev-offload.c > +++ b/lib/netdev-offload.c > @@ -196,7 +196,7 @@ netdev_assign_flow_api(struct netdev *netdev) > return -1; > } > > -int > +void > meter_offload_set(ofproto_meter_id meter_id, > struct ofputil_meter_config *config) > { > @@ -212,8 +212,8 @@ meter_offload_set(ofproto_meter_id meter_id, > } > } > } > - > - return 0; > + /* Offload APIs could fail, for example, because the offload is > not > + * supported. This is fine, as the offload API should take care > of this. */ > } > > > > > > + err = meter_offload_set(meter_id, config); > > > > > > > + } > > > > + > > <SNIP> > > > > > diff --git a/tests/system-offloads-traffic.at b/tests/system- > > > > offloads-traffic.at > > > > index 80bc1dd5c..7ec75340f 100644 > > > > --- a/tests/system-offloads-traffic.at > > > > +++ b/tests/system-offloads-traffic.at > > > > @@ -168,3 +168,51 @@ matchall > > > > ]) > > > > OVS_TRAFFIC_VSWITCHD_STOP > > > > AT_CLEANUP > > > > + > > > > +AT_SETUP([offloads - check if meter offloading ]) > > > > > > Can we replace if with interface, as I keep on reading it as > > > "if". > > > > > > > Sure. > > > > > > +AT_KEYWORDS([meter]) > > > > +AT_SKIP_IF([test $SUPPORT_TC_INGRESS_PPS = "no"]) > > > > +OVS_TRAFFIC_VSWITCHD_START() > > > > + > > > > +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:hw- > > > > offload=true]) > > > > +AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps > > > > bands=type=drop rate=1']) > > > > + > > > > +ADD_NAMESPACES(at_ns0, at_ns1) > > > > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01") > > > > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02") > > > > + > > > > +NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.2 lladdr > > > > f0:00:00:01:01:02 dev p0]) > > > > +NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr > > > > f0:00:00:01:01:01 dev p1]) > > > > + > > > > +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 > > > > "actions=normal"]) > > > > +NS_CHECK_EXEC([at_ns0], [ping -q -c 10 -i 0.1 -w 2 10.1.1.2 | > > > > FORMAT_PING], [0], [dnl > > > > +10 packets transmitted, 10 received, 0% packet loss, time 0ms > > > > +]) > > > > + > > > > +NETNS_DAEMONIZE([at_ns1], [nc -u -l 5678 > /dev/null ], > > > > [nc0.pid]) > > > > + > > > > +AT_CHECK([ovs-ofctl -O OpenFlow13 del-flows br0]) > > > > +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 > > > > "priority=10,in_port=ovs-p0,udp actions=meter:1,normal"]) > > > > +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 "priority=1 > > > > actions=normal"]) > > > > + > > > > +NS_CHECK_EXEC([at_ns0], [sleep 0.5; echo "mark" | nc -u > > > > 10.1.1.2 > > > > 5678 -p 6789]) > > > > > > Any specific reason why you need the sleep 0.5 here? Is it to be > > > sure > > > the flow is programmed? > > > > I remember I added this because there are failures sometimes. I > > don't > > know why, but obviously they are related to this patchset. So I > > added > > the sleep to avoid them. It's only 0.5s, should be no problem, > > right? > > I did a lot of runs, but could not get it to fail without it. So if > it fails in your case it would be good to investigate. I can't reproduce today, though I run many times. It's related to my setup, I don't test on physical machine, but a virtual machine. > > > > If so, you might just want to do a ovs-vsctl dump-flows and > > > checke > > > the output? > > > > > > > I don't understand your qustion. I checked ovs-vsctl dump-flows > > below. If I remember correctly, the used is "never", not "0.001s". > > I mean in the failure case without he 0.5 seconds, what do you get. > > > > > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep "meter" | > > > > DUMP_CLEAN_SORTED], [0], [dnl > > > > +in_port(2),eth(macs),eth_type(0x0800),ipv4(proto=17,frag=no), > > > > packets:0, bytes:0, used:0.001s, actions:outputmeter(0),3 > > > > +]) > > > > > > Here you verify the DP rule is inserted, but should you not also > > > wait > > > a second to make sure the meter is reset? > > > > OK. I will add. > > > > > Because in theory your could/should sent 11 packets in 1 second, > > > so > > > 10 should be dropped!? > > > This is the case in the kernel environment, but with TC the > > > learning > > > packet is not passing trough the TC meter (this might also be a > > > corner case we need to document). > > > > > > > Yes, it's the reason. But where to document? > > Not sure where either, I guess in NEWS and maybe it’s time to add a > howto/tc-offload.rst file? > Maybe just adding comments here, to explain why it's the result. And it's not any related to "howto". > > > > > +for i in `seq 10`; do > > > > +NS_CHECK_EXEC([at_ns0], [echo "mark" | nc -u 10.1.1.2 5678 -p > > > > 6789]) > > > > +done > > > > + > > > > +AT_CHECK([ovs-appctl dpctl/dump-flows | grep "meter" | > > > > DUMP_CLEAN_SORTED], [0], [dnl > > > > +in_port(2),eth(macs),eth_type(0x0800),ipv4(proto=17,frag=no), > > > > packets:10, bytes:330, used:0.001s, actions:outputmeter(0),3 > > > > +]) > > > > + > > > > +AT_CHECK([ovs-ofctl -O OpenFlow13 meter-stats br0 | sed -e > > > > 's/duration:[[0-9]].[[0-9]]*s/duration:0.001s/'], [0], [dnl > > > > +OFPST_METER reply (OF1.3) (xid=0x2): > > > > +meter:1 flow_count:1 packet_in_count:11 byte_in_count:377 > > > > duration:0.001s bands: > > > > > > byte_in_count:517 is a lot larger than with kernel code, do we > > > know > > > why? We should document it. > > > > > > > I think DP also count the bytes of mac layer. > > Any idea why? From ovs_meter_execute() in datapath/meter.c, the packet size of band stats is added by skb->len, which should include the len of mac layer. > > > > > +0: packet_count:9 byte_count:0 > > > > > > Here in theory we should report byte_count, like this: > > > > > > 0: packet_count:10 byte_count:470 > > > > > > However, TC does not support dropped byte count, only > > > packet_count. > > > So we should be ok for now, but we must add this limitation to > > > the > > > documentation somewhere that it's clear we will not report byte > > > count > > > with TC offload. > > > > > > If you run this test without HW offload enabled you can see the > > > difference in behavior, and I think there should be none (or at > > > least > > > the corner cases should be documented). > > > You could also add a "- offloads disabled" variant of this test, > > > like > > > other features do and add some additional reasoning why it's > > > different there. > > > > > > > Sure, I will add. > > Guess this could go to the same howto/tc-offload.rst in the > limitations section? > Maybe also add comments here, don't waste time to find info from the other document :) > > > > > > +]) > > > > + > > > > +OVS_TRAFFIC_VSWITCHD_STOP > > > > +AT_CLEANUP > > > > -- > > > > 2.26.2 > > > > > > This concludes my review of v5. > > > > > > //Eelco > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev