On 2 May 2024, at 14:46, Ilya Maximets wrote:

> On 4/29/24 16:48, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Signed-off-by: Eelco Chaudron <echau...@redhat.com>
>> ---
>
> Thanks, Eelco.  Beside the avx512 failure, see a few comments below.
>
> Should this also have a Fixes tag?
>
>>  lib/netdev-offload-tc.c | 32 +++++++++++++++++++++
>>  tests/system-traffic.at | 62 +++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 94 insertions(+)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 921d52317..7e915d419 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>          return 0;
>>  }
>>
>> +static bool
>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>> +{
>> +    /* This function returns true if the tc layer will add a l4 checksum 
>> action
>> +     * for this set action. Refer to the csum_update_flag() function for
>> +     * detailed logic. Note that even the kernel only supports updating TCP,
>> +     * UDP and ICMPv6. */
>
> Nit: double spaces between sentences.

Thanks for reminding me. I keep forgetting this...

> We should add a similar comment to the scum_update_flag() prompting
> anyone changing this function to update this one as well.

ACK.

> Ideally we would just fail in csum_update_flag() itself to avoid the
> logic duplication in different places.  Can it be done or is it a
> performance concern?

As we verify the validity of the offload at the netdev layer, I decided to do 
it here, i.e. before we actually attempt to install the tc rule. Also once we 
are in tc_replace_flower() we do a lot of pre-processing before we come to the 
conclusion that we can not offload the flow.

So my preference is to keep it as is, with adding the additional comment.

>> +    switch (type) {
>> +    case OVS_KEY_ATTR_IPV4:
>> +    case OVS_KEY_ATTR_IPV6:
>> +    case OVS_KEY_ATTR_TCP:
>> +    case OVS_KEY_ATTR_UDP:
>> +        switch (flower->key.ip_proto) {
>> +        case IPPROTO_TCP:
>> +        case IPPROTO_UDP:
>> +        case IPPROTO_ICMPV6:
>> +        case IPPROTO_UDPLITE:
>> +            return true;
>> +        }
>> +        break;
>> +    }
>> +    return false;
>> +}
>> +
>>  static int
>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>                                   struct tc_action *action,
>> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
>> *flower,
>>          return EOPNOTSUPP;
>>      }
>>
>> +    if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>> +        && will_tc_add_l4_checksum(flower, type)) {
>> +        VLOG_DBG_RL(&rl, "set action type %d not supported on fragments "
>> +                    "due to checksum limitation", type);
>> +        ofpbuf_uninit(&set_buf);
>> +        return EOPNOTSUPP;
>> +    }
>> +
>>      for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>          struct netlink_field *f = &set_flower_map[type][i];
>>
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index bd7647cbe..d06d2f66a 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -8977,3 +8977,65 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: *0000 
>> *0000 *5002 *2000 *b85e *00
>>
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>> +
>> +
>
> One line is enough.

Ack
>> +AT_SETUP([datapath - IP mod_nw_src/set_field on fragments])
>
> Should 'IP' be glued to 'fragments' instead?

ACK

>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
>> +
>> +AT_DATA([flows.txt], [dnl
>> +  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
>> +  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 
>> actions=set_field:fc00::100->ipv6_src,ovs-p1
>> +])
>> +
>> +AT_CHECK([ovs-ofctl del-flows br0])
>> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
>> +
>> +NETNS_DAEMONIZE([at_ns1], [tcpdump -l -nn -xx -U -i p1 > p1.out],
>> +  [tcpdump.pid])
>> +sleep 1
>
> Instead of sleeping we should wait for tcpdump to start listening.
> Look for the following check in some other tests:
>
> OVS_WAIT_UNTIL([grep "listening" tcpdump0_err])

ACK

>> +
>> +dnl We send each packet multiple times, ones for learning which will flow
>> +dnl through the used datapath for learning, and the others will go through 
>> the
>> +dnl actuall datapath.
>
> typo: actual

ACK

>> +for i in 1 2 3 4 5; do
>> +  NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 \
>> +    36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 08 00 45 00 00 26 00 01 20 00 40 11 
>> \
>> +    44 c2 0a 01 01 01 0a 01 01 02 0b c4 08 84 00 26 e9 64 01 02 03 04 05 06 
>> \
>> +    07 08 09 0a > /dev/null])
>
> Ideally we would use compose-packet for this, but it can't generate such
> packets today, so it's fine to just plain-code the packet.  But, please,
> add a description on what this packet is.  It's very inconvenient to
> copy-paste this into some packet decoder.

Will add a decode.

>> +done
>> +
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> +  "0x0000:  *36b1 *ee7c *0102 *36b1 *ee7c *0103 *0800 *4500" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> +  "0x0010:  *0026 *0001 *2000 *4011 *43c2 *0b01 *0101 *0a01" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> +  "0x0020:  *0102 *0bc4 *0884 *0026 *e864 *0102 *0304 *0506" p1.out) -eq 5])
>
> Maybe it's better to use pcap output on tcpdump and compare with ovs-pcap?
> Maybe also split the hex in parts like we do in tunnel tests here:
>  
> https://github.com/openvswitch/ovs/blob/bd8e9f48f180800292c10e12f26824833f18506a/tests/tunnel-push-pop.at#L834

Good idea, will sent a v2 with this change.

> Same for the IPv6 check below.
>
>> +
>> +dnl Repeat similar test with IPv6.
>> +for i in 1 2 3 4 5; do
>> +  NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 \
>> +    36 b1 ee 7c 01 02 36 b1 ee 7c 01 03 86 dd 60 00 00 00 00 18 2c 40 fc 00 
>> \
>> +    00 00 00 00 00 00 00 00 00 00 00 00 00 01 fc 00 00 00 00 00 00 00 00 00 
>> \
>> +    00 00 00 00 00 02 11 00 00 01 23 16 ab 36 0b c4 08 84 00 26 07 65 01 02 
>> \
>> +    03 04 05 06 07 08 > /dev/null])
>> +done
>> +
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> +  "0x0000:  *36b1 *ee7c *0102 *36b1 *ee7c *0103 *86dd *6000" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> +  "0x0010:  *0000 *0018 *2c40 *fc00 *0000 *0000 *0000 *0000" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> +  "0x0020:  *0000 *0000 *0100 *fc00 *0000 *0000 *0000 *0000" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> +  "0x0030:  *0000 *0000 *0002 *1100 *0001 *2316 *ab36 *0bc4" p1.out) -eq 5])
>> +OVS_WAIT_UNTIL([test $(grep -c -E \
>> +  "0x0040:  *0884 *0026 *0666 *0102 *0304 *0506 *0708" p1.out) -eq 5])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>
> Best regards, Ilya Maximets.

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to