On 27 Jun 2022, at 16:58, Jianbo Liu wrote:
> On Mon, 2022-06-27 at 11:32 +0200, Eelco Chaudron wrote: >> >> >> On 21 Jun 2022, at 10:22, Jianbo Liu wrote: >> >>> On Mon, 2022-06-20 at 12:18 +0200, Eelco Chaudron wrote: >>>> On 27 May 2022, at 11:00, Jianbo Liu wrote: >>>> >>>>> OVS meters are created in advance and openflow rules refer to >>>>> them >>>>> by >>>>> their unique ID. New tc_police API is used to offload them. By >>>>> calling >>>>> the API, police actions are created and meters are mapped to >>>>> them. >>>>> These actions then can be used in tc filter rules by the index. >>>>> >>>>> Signed-off-by: Jianbo Liu <jian...@nvidia.com> >>>>> --- >>>>> NEWS | 2 ++ >>>>> lib/dpif-netlink.c | 31 +++++++++++++++++---- >>>>> tests/system-offloads-traffic.at | 48 >>>>> ++++++++++++++++++++++++++++++++ >>>>> 3 files changed, 76 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/NEWS b/NEWS >>>>> index eece0d0b2..dfd659d4e 100644 >>>>> --- a/NEWS >>>>> +++ b/NEWS >>>>> @@ -29,6 +29,8 @@ Post-v2.17.0 >>>>> - Windows: >>>>> * Conntrack support for TCPv6, UDPv6, ICMPv6, FTPv6. >>>>> * IPv6 Geneve tunnel support. >>>>> + - Linux datapath: >>>>> + * Add offloading meter tc police. >>>>> >>>>> >>>>> v2.17.0 - 17 Feb 2022 >>>>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c >>>>> index 06e1e8ca0..0af9ee77e 100644 >>>>> --- a/lib/dpif-netlink.c >>>>> +++ b/lib/dpif-netlink.c >>>>> @@ -4163,11 +4163,18 @@ static int >>>>> dpif_netlink_meter_set(struct dpif *dpif_, ofproto_meter_id >>>>> meter_id, >>>>> struct ofputil_meter_config *config) >>>>> { >>>>> + int err; >>>>> + >>>>> if (probe_broken_meters(dpif_)) { >>>>> return ENOMEM; >>>>> } >>>>> >>>>> - return dpif_netlink_meter_set__(dpif_, meter_id, config); >>>>> + err = dpif_netlink_meter_set__(dpif_, meter_id, config); >>>>> + if (!err && netdev_is_flow_api_enabled()) { >>>>> + meter_offload_set(meter_id, config); >>>> >>>> Although currently we always return 0, we should still account >>>> for it >>>> to change in the future, so we should set err to the return >>>> value. >>>> >>> >>> If there is err from meter_offload_set, it will be passed to the >>> caller >>> of dpif_netlink_meter_set(). I don't agree with that, because the >>> caller thinks meter_set operation fail, but actually not. Besides, >>> we >>> allow the case that dp meter_set success, but offloading fails, so >>> the >>> return the error of meter_offload_set seems unnecessary. >> >> If this is the case, we should change the dpif_netlink_meter_set() >> API to return void. >> And add a comment to the function why it would not return an error: >> > > OK. > >> --- a/lib/netdev-offload.c >> +++ b/lib/netdev-offload.c >> @@ -196,7 +196,7 @@ netdev_assign_flow_api(struct netdev *netdev) >> return -1; >> } >> >> -int >> +void >> meter_offload_set(ofproto_meter_id meter_id, >> struct ofputil_meter_config *config) >> { >> @@ -212,8 +212,8 @@ meter_offload_set(ofproto_meter_id meter_id, >> } >> } >> } >> - >> - return 0; >> + /* Offload APIs could fail, for example, because the offload is >> not >> + * supported. This is fine, as the offload API should take care >> of this. */ >> } >> >> >> >>>> + err = meter_offload_set(meter_id, config); >>>> >>>>> + } >>>>> + >> >> <SNIP> >> >>>>> diff --git a/tests/system-offloads-traffic.at b/tests/system- >>>>> offloads-traffic.at >>>>> index 80bc1dd5c..7ec75340f 100644 >>>>> --- a/tests/system-offloads-traffic.at >>>>> +++ b/tests/system-offloads-traffic.at >>>>> @@ -168,3 +168,51 @@ matchall >>>>> ]) >>>>> OVS_TRAFFIC_VSWITCHD_STOP >>>>> AT_CLEANUP >>>>> + >>>>> +AT_SETUP([offloads - check if meter offloading ]) >>>> >>>> Can we replace if with interface, as I keep on reading it as >>>> "if". >>>> >>> >>> Sure. >>> >>>>> +AT_KEYWORDS([meter]) >>>>> +AT_SKIP_IF([test $SUPPORT_TC_INGRESS_PPS = "no"]) >>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>> + >>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:hw- >>>>> offload=true]) >>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps >>>>> bands=type=drop rate=1']) >>>>> + >>>>> +ADD_NAMESPACES(at_ns0, at_ns1) >>>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01") >>>>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02") >>>>> + >>>>> +NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.2 lladdr >>>>> f0:00:00:01:01:02 dev p0]) >>>>> +NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr >>>>> f0:00:00:01:01:01 dev p1]) >>>>> + >>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 >>>>> "actions=normal"]) >>>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 10 -i 0.1 -w 2 10.1.1.2 | >>>>> FORMAT_PING], [0], [dnl >>>>> +10 packets transmitted, 10 received, 0% packet loss, time 0ms >>>>> +]) >>>>> + >>>>> +NETNS_DAEMONIZE([at_ns1], [nc -u -l 5678 > /dev/null ], >>>>> [nc0.pid]) >>>>> + >>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 del-flows br0]) >>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 >>>>> "priority=10,in_port=ovs-p0,udp actions=meter:1,normal"]) >>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 "priority=1 >>>>> actions=normal"]) >>>>> + >>>>> +NS_CHECK_EXEC([at_ns0], [sleep 0.5; echo "mark" | nc -u >>>>> 10.1.1.2 >>>>> 5678 -p 6789]) >>>> >>>> Any specific reason why you need the sleep 0.5 here? Is it to be >>>> sure >>>> the flow is programmed? >>> >>> I remember I added this because there are failures sometimes. I >>> don't >>> know why, but obviously they are related to this patchset. So I >>> added >>> the sleep to avoid them. It's only 0.5s, should be no problem, >>> right? >> >> I did a lot of runs, but could not get it to fail without it. So if >> it fails in your case it would be good to investigate. > > I can't reproduce today, though I run many times. It's related to my > setup, I don't test on physical machine, but a virtual machine. > >> >>>> If so, you might just want to do a ovs-vsctl dump-flows and >>>> checke >>>> the output? >>>> >>> >>> I don't understand your qustion. I checked ovs-vsctl dump-flows >>> below. > > If I remember correctly, the used is "never", not "0.001s". Forgot to answer this ;) The kernel always reports “never” for this test case, that’s why I suggested the offload disabled test to see all the differences and understand why. >> >> I mean in the failure case without he 0.5 seconds, what do you get. >> >>>>> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep "meter" | >>>>> DUMP_CLEAN_SORTED], [0], [dnl >>>>> +in_port(2),eth(macs),eth_type(0x0800),ipv4(proto=17,frag=no), >>>>> packets:0, bytes:0, used:0.001s, actions:outputmeter(0),3 >>>>> +]) >>>> >>>> Here you verify the DP rule is inserted, but should you not also >>>> wait >>>> a second to make sure the meter is reset? >>> >>> OK. I will add. >>> >>>> Because in theory your could/should sent 11 packets in 1 second, >>>> so >>>> 10 should be dropped!? >>>> This is the case in the kernel environment, but with TC the >>>> learning >>>> packet is not passing trough the TC meter (this might also be a >>>> corner case we need to document). >>>> >>> >>> Yes, it's the reason. But where to document? >> >> Not sure where either, I guess in NEWS and maybe it’s time to add a >> howto/tc-offload.rst file? >> > > Maybe just adding comments here, to explain why it's the result. And > it's not any related to "howto". > >> >>>>> +for i in `seq 10`; do >>>>> +NS_CHECK_EXEC([at_ns0], [echo "mark" | nc -u 10.1.1.2 5678 -p >>>>> 6789]) >>>>> +done >>>>> + >>>>> +AT_CHECK([ovs-appctl dpctl/dump-flows | grep "meter" | >>>>> DUMP_CLEAN_SORTED], [0], [dnl >>>>> +in_port(2),eth(macs),eth_type(0x0800),ipv4(proto=17,frag=no), >>>>> packets:10, bytes:330, used:0.001s, actions:outputmeter(0),3 >>>>> +]) >>>>> + >>>>> +AT_CHECK([ovs-ofctl -O OpenFlow13 meter-stats br0 | sed -e >>>>> 's/duration:[[0-9]].[[0-9]]*s/duration:0.001s/'], [0], [dnl >>>>> +OFPST_METER reply (OF1.3) (xid=0x2): >>>>> +meter:1 flow_count:1 packet_in_count:11 byte_in_count:377 >>>>> duration:0.001s bands: >>>> >>>> byte_in_count:517 is a lot larger than with kernel code, do we >>>> know >>>> why? We should document it. >>>> >>> >>> I think DP also count the bytes of mac layer. >> >> Any idea why? > > From ovs_meter_execute() in datapath/meter.c, the packet size of band > stats is added by skb->len, which should include the len of mac layer. > >> >>>>> +0: packet_count:9 byte_count:0 >>>> >>>> Here in theory we should report byte_count, like this: >>>> >>>> 0: packet_count:10 byte_count:470 >>>> >>>> However, TC does not support dropped byte count, only >>>> packet_count. >>>> So we should be ok for now, but we must add this limitation to >>>> the >>>> documentation somewhere that it's clear we will not report byte >>>> count >>>> with TC offload. >>>> >>>> If you run this test without HW offload enabled you can see the >>>> difference in behavior, and I think there should be none (or at >>>> least >>>> the corner cases should be documented). >>>> You could also add a "- offloads disabled" variant of this test, >>>> like >>>> other features do and add some additional reasoning why it's >>>> different there. >>>> >>> >>> Sure, I will add. >> >> Guess this could go to the same howto/tc-offload.rst in the >> limitations section? >> > > Maybe also add comments here, don't waste time to find info from the > other document :) > >>> >>>>> +]) >>>>> + >>>>> +OVS_TRAFFIC_VSWITCHD_STOP >>>>> +AT_CLEANUP >>>>> -- >>>>> 2.26.2 >>>> >>>> This concludes my review of v5. >>>> >>>> //Eelco >>>> >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev