On 2 May 2024, at 14:46, Ilya Maximets wrote:
> On 4/29/24 16:48, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Signed-off-by: Eelco Chaudron <echau...@redhat.com>
>> ---
>
> Thanks, Eelco. Beside the avx512 failure, see a few comments below.
>
> Should this also have a Fixes tag?
>
>> lib/netdev-offload-tc.c | 32 +++++++++++++++++++++
>> tests/system-traffic.at | 62 +++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 94 insertions(+)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 921d52317..7e915d419 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>> return 0;
>> }
>>
>> +static bool
>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>> +{
>> + /* This function returns true if the tc layer will add a l4 checksum
>> action
>> + * for this set action. Refer to the csum_update_flag() function for
>> + * detailed logic. Note that even the kernel only supports updating TCP,
>> + * UDP and ICMPv6. */
>
> Nit: double spaces between sentences.
Thanks for reminding me. I keep forgetting this...
> We should add a similar comment to the scum_update_flag() prompting
> anyone changing this function to update this one as well.
ACK.
> Ideally we would just fail in csum_update_flag() itself to avoid the
> logic duplication in different places. Can it be done or is it a
> performance concern?
As we verify the validity of the offload at the netdev layer, I decided to do
it here, i.e. before we actually attempt to install the tc rule. Also once we
are in tc_replace_flower() we do a lot of pre-processing before we come to the
conclusion that we can not offload the flow.
So my preference is to keep it as is, with adding the additional comment.
>> + switch (type) {
>> + case OVS_KEY_ATTR_IPV4:
>> + case OVS_KEY_ATTR_IPV6:
>> + case OVS_KEY_ATTR_TCP:
>> + case OVS_KEY_ATTR_UDP:
>> + switch (flower->key.ip_proto) {
>> + case IPPROTO_TCP:
>> + case IPPROTO_UDP:
>> + case IPPROTO_ICMPV6:
>> + case IPPROTO_UDPLITE:
>> + return true;
>> + }
>> + break;
>> + }
>> + return false;
>> +}
>> +
>> static int
>> parse_put_flow_set_masked_action(struct tc_flower *flower,
>> struct tc_action *action,
>> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower
>> *flower,
>> return EOPNOTSUPP;
>> }
>>
>> + if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>> + && will_tc_add_l4_checksum(flower, type)) {
>> + VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>> + "due to checksum limitation", type);
>> + ofpbuf_uninit(&set_buf);
>> + return EOPNOTSUPP;
>> + }
>> +
>> for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>> struct netlink_field *f = &set_flower_map[type][i];
>>
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index bd7647cbe..d06d2f66a 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -8977,3 +8977,65 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: *0000
>> *0000 *5002 *2000 *b85e *00
>>
>> OVS_TRAFFIC_VSWITCHD_STOP
>> AT_CLEANUP
>> +
>> +
>
> One line is enough.
Ack
>> +AT_SETUP([datapath - IP mod_nw_src/set_field on fragments])
>
> Should 'IP' be glued to 'fragments' instead?
ACK
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
>> +
>> +AT_DATA([flows.txt], [dnl
>> + in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
>> + in_port=ovs-p0,ipv6,ipv6_src=fc00::1
>> actions=set_field:fc00::100->ipv6_src,ovs-p1
>> +])
>> +
>> +AT_CHECK([ovs-ofctl del-flows br0])
>> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
>> +
>> +NETNS_DAEMONIZE([at_ns1], [tcpdump -l -nn -xx -U -i p1 > p1.out],
>> + [tcpdump.pid])
>> +sleep 1
>
> Instead of sleeping we should wait for tcpdump to start listening.
> Look for the following check in some other tests:
>
> OVS_WAIT_UNTIL([grep "listening" tcpdump0_err])
ACK
>> +
>> +dnl We send each packet multiple times, ones for learning which will flow
>> +dnl through the used datapath for learning, and the others will go through
>> the
>> +dnl actuall datapath.
>
> typo: actual
ACK
>> +for i in 1 2 3 4 5; do
>> + NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 \
>> + 36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00 45 00 00 26 00 01 20 00 40 11
>> \
>> + 44 c2 0a 01 01 01 0a 01 01 02 0b c4 08 84 00 26 e9 64 01 02 03 04 05 06
>> \
>> + 07 08 09 0a > /dev/null])
>
> Ideally we would use compose-packet for this, but it can't generate such
> packets today, so it's fine to just plain-code the packet. But, please,
> add a description on what this packet is. It's very inconvenient to
> copy-paste this into some packet decoder.
Will add a decode.
>> +done
>> +
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> + "0x0000: *36b1 *ee7c *0102 *36b1 *ee7c *0103 *0800 *4500" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> + "0x0010: *0026 *0001 *2000 *4011 *43c2 *0b01 *0101 *0a01" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> + "0x0020: *0102 *0bc4 *0884 *0026 *e864 *0102 *0304 *0506" p1.out) -eq 5])
>
> Maybe it's better to use pcap output on tcpdump and compare with ovs-pcap?
> Maybe also split the hex in parts like we do in tunnel tests here:
>
> https://github.com/openvswitch/ovs/blob/bd8e9f48f180800292c10e12f26824833f18506a/tests/tunnel-push-pop.at#L834
Good idea, will sent a v2 with this change.
> Same for the IPv6 check below.
>
>> +
>> +dnl Repeat similar test with IPv6.
>> +for i in 1 2 3 4 5; do
>> + NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 \
>> + 36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd 60 00 00 00 00 18 2c 40 fc 00
>> \
>> + 00 00 00 00 00 00 00 00 00 00 00 00 00 01 fc 00 00 00 00 00 00 00 00 00
>> \
>> + 00 00 00 00 00 02 11 00 00 01 23 16 ab 36 0b c4 08 84 00 26 07 65 01 02
>> \
>> + 03 04 05 06 07 08 > /dev/null])
>> +done
>> +
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> + "0x0000: *36b1 *ee7c *0102 *36b1 *ee7c *0103 *86dd *6000" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> + "0x0010: *0000 *0018 *2c40 *fc00 *0000 *0000 *0000 *0000" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> + "0x0020: *0000 *0000 *0100 *fc00 *0000 *0000 *0000 *0000" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> + "0x0030: *0000 *0000 *0002 *1100 *0001 *2316 *ab36 *0bc4" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> + "0x0040: *0884 *0026 *0666 *0102 *0304 *0506 *0708" p1.out) -eq 5])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>
> Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev