Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.
On 2 Jul 2024, at 15:38, Ilya Maximets wrote: > On 6/27/24 15:03, Eelco Chaudron wrote: >> >> >> On 4 Jun 2024, at 13:52, Ilya Maximets wrote: >> >>> On 6/4/24 13:42, Eelco Chaudron wrote: On 1 Jun 2024, at 0:08, Ilya Maximets wrote: > On 5/7/24 15:52, Eelco Chaudron wrote: >> While offloading header modifications to TC, OVS is using {TCA_PEDIT} + >> {TCA_CSUM} combination as that it the only way to represent header >> rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for >> IP fragments. >> >> Since TC already applies fragmentation bit masking, this patch simply >> needs to prevent these packets from being processed through TC. >> >> Signed-off-by: Eelco Chaudron >> --- >> v2: - Fixed and added some comments. >> - Use ovs-pcap to compare packets. >> >> NOTE: This patch needs an AVX512 fix before it can be applied. >> Intel is working on this. >> --- >> lib/netdev-offload-tc.c | 32 ++ >> lib/tc.c| 5 ++- >> tests/system-traffic.at | 93 + >> 3 files changed, 129 insertions(+), 1 deletion(-) >> >> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c >> index 921d52317..bdd307933 100644 >> --- a/lib/netdev-offload-tc.c >> +++ b/lib/netdev-offload-tc.c >> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower, >> return 0; >> } >> >> +static bool >> +will_tc_add_l4_checksum(struct tc_flower *flower, int type) >> +{ >> +/* This function returns true if the tc layer will add a l4 >> checksum action >> + * for this set action. Refer to the csum_update_flag() function >> for >> + * detailed logic. Note that even the kernel only supports >> updating TCP, >> + * UDP and ICMPv6. */ > > This comment should be outside of the function, I think. It's strange > to have it here. I see csum_update_flag() has a comment inside, but > that's strange as well. That function has other style issues as well, > there is no need to copy them. ACK, will clean this up in the next rev. >> +switch (type) { >> +case OVS_KEY_ATTR_IPV4: >> +case OVS_KEY_ATTR_IPV6: >> +case OVS_KEY_ATTR_TCP: >> +case OVS_KEY_ATTR_UDP: >> +switch (flower->key.ip_proto) { >> +case IPPROTO_TCP: >> +case IPPROTO_UDP: >> +case IPPROTO_ICMPV6: >> +case IPPROTO_UDPLITE: >> +return true; >> +} >> +break; >> +} >> +return false; >> +} >> + >> static int >> parse_put_flow_set_masked_action(struct tc_flower *flower, >> struct tc_action *action, >> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower >> *flower, >> return EOPNOTSUPP; >> } >> >> +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT > > We're using this field to make an offloading decision. We must > also set in the mask. If for some reason we're not matching on > the fragment bits, we may first receive a non-fragmented packet > and offload it, then fragmented traffic may match and fail the > checksumming. So, we need to enable the bit in the mask. The dp always matches on the fragment bit for IPv4 packets, I did some tests with this. So if we sent a non-fragment packet first the dp rule will match on fragment = 0. Or did I miss something? >>> >>> It is true today, but nothing ensures that on the netdev-offload-tc level. >>> Moreover, the netdev-offload-tc is written in a way that it expects the >>> frag bits to potentially not be in the mask: >>> https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448 >>> So, it is going to be internally inconsistent if we do not set the bit >>> explicitly. >>> >>> And if someday we'll stop adding the frag bit, we'll never know if we >>> forget to set it in netdev-offload-tc. At the very least we'll need an >>> assertion that it is set. But having an assertion will still be internally >>> inconsistent with the code linked above. So, it may be better to just fix >>> it instead anyway. >> >> I finally got some time to look at the code, and I can add the below abort() >> in case OVS decides to no longer mask out the frag bits by default. >> >> However, I have no clear idea how to make this work if we ever stop adding >> this frag mask. I don't think we can report back to request a wider mask or >> track what exists and force a revalidate. >> >> @@ -2447,6 +2479,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match >> *match, >> } >> >> mask->nw_frag = 0; >> +} else {
Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.
On 6/27/24 15:03, Eelco Chaudron wrote: > > > On 4 Jun 2024, at 13:52, Ilya Maximets wrote: > >> On 6/4/24 13:42, Eelco Chaudron wrote: >>> >>> >>> On 1 Jun 2024, at 0:08, Ilya Maximets wrote: >>> On 5/7/24 15:52, Eelco Chaudron wrote: > While offloading header modifications to TC, OVS is using {TCA_PEDIT} + > {TCA_CSUM} combination as that it the only way to represent header > rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for > IP fragments. > > Since TC already applies fragmentation bit masking, this patch simply > needs to prevent these packets from being processed through TC. > > Signed-off-by: Eelco Chaudron > --- > v2: - Fixed and added some comments. > - Use ovs-pcap to compare packets. > > NOTE: This patch needs an AVX512 fix before it can be applied. > Intel is working on this. > --- > lib/netdev-offload-tc.c | 32 ++ > lib/tc.c| 5 ++- > tests/system-traffic.at | 93 + > 3 files changed, 129 insertions(+), 1 deletion(-) > > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c > index 921d52317..bdd307933 100644 > --- a/lib/netdev-offload-tc.c > +++ b/lib/netdev-offload-tc.c > @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower, > return 0; > } > > +static bool > +will_tc_add_l4_checksum(struct tc_flower *flower, int type) > +{ > +/* This function returns true if the tc layer will add a l4 checksum > action > + * for this set action. Refer to the csum_update_flag() function for > + * detailed logic. Note that even the kernel only supports updating > TCP, > + * UDP and ICMPv6. */ This comment should be outside of the function, I think. It's strange to have it here. I see csum_update_flag() has a comment inside, but that's strange as well. That function has other style issues as well, there is no need to copy them. >>> >>> ACK, will clean this up in the next rev. >>> > +switch (type) { > +case OVS_KEY_ATTR_IPV4: > +case OVS_KEY_ATTR_IPV6: > +case OVS_KEY_ATTR_TCP: > +case OVS_KEY_ATTR_UDP: > +switch (flower->key.ip_proto) { > +case IPPROTO_TCP: > +case IPPROTO_UDP: > +case IPPROTO_ICMPV6: > +case IPPROTO_UDPLITE: > +return true; > +} > +break; > +} > +return false; > +} > + > static int > parse_put_flow_set_masked_action(struct tc_flower *flower, > struct tc_action *action, > @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower > *flower, > return EOPNOTSUPP; > } > > +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT We're using this field to make an offloading decision. We must also set in the mask. If for some reason we're not matching on the fragment bits, we may first receive a non-fragmented packet and offload it, then fragmented traffic may match and fail the checksumming. So, we need to enable the bit in the mask. >>> >>> The dp always matches on the fragment bit for IPv4 packets, I did some >>> tests with this. >>> So if we sent a non-fragment packet first the dp rule will match on >>> fragment = 0. Or >>> did I miss something? >> >> It is true today, but nothing ensures that on the netdev-offload-tc level. >> Moreover, the netdev-offload-tc is written in a way that it expects the >> frag bits to potentially not be in the mask: >> >> https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448 >> So, it is going to be internally inconsistent if we do not set the bit >> explicitly. >> >> And if someday we'll stop adding the frag bit, we'll never know if we >> forget to set it in netdev-offload-tc. At the very least we'll need an >> assertion that it is set. But having an assertion will still be internally >> inconsistent with the code linked above. So, it may be better to just fix >> it instead anyway. > > I finally got some time to look at the code, and I can add the below abort() > in case OVS decides to no longer mask out the frag bits by default. > > However, I have no clear idea how to make this work if we ever stop adding > this frag mask. I don't think we can report back to request a wider mask or > track what exists and force a revalidate. > > @@ -2447,6 +2479,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match > *match, > } > > mask->nw_frag = 0; > +} else { > +ovs_abort(0, "TC offload assumes nw_frag is always masked"); > } We don't need to abort. We should just narrow down the match by adding frag
Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.
On 4 Jun 2024, at 13:52, Ilya Maximets wrote: > On 6/4/24 13:42, Eelco Chaudron wrote: >> >> >> On 1 Jun 2024, at 0:08, Ilya Maximets wrote: >> >>> On 5/7/24 15:52, Eelco Chaudron wrote: While offloading header modifications to TC, OVS is using {TCA_PEDIT} + {TCA_CSUM} combination as that it the only way to represent header rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for IP fragments. Since TC already applies fragmentation bit masking, this patch simply needs to prevent these packets from being processed through TC. Signed-off-by: Eelco Chaudron --- v2: - Fixed and added some comments. - Use ovs-pcap to compare packets. NOTE: This patch needs an AVX512 fix before it can be applied. Intel is working on this. --- lib/netdev-offload-tc.c | 32 ++ lib/tc.c| 5 ++- tests/system-traffic.at | 93 + 3 files changed, 129 insertions(+), 1 deletion(-) diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 921d52317..bdd307933 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower, return 0; } +static bool +will_tc_add_l4_checksum(struct tc_flower *flower, int type) +{ +/* This function returns true if the tc layer will add a l4 checksum action + * for this set action. Refer to the csum_update_flag() function for + * detailed logic. Note that even the kernel only supports updating TCP, + * UDP and ICMPv6. */ >>> >>> This comment should be outside of the function, I think. It's strange >>> to have it here. I see csum_update_flag() has a comment inside, but >>> that's strange as well. That function has other style issues as well, >>> there is no need to copy them. >> >> ACK, will clean this up in the next rev. >> +switch (type) { +case OVS_KEY_ATTR_IPV4: +case OVS_KEY_ATTR_IPV6: +case OVS_KEY_ATTR_TCP: +case OVS_KEY_ATTR_UDP: +switch (flower->key.ip_proto) { +case IPPROTO_TCP: +case IPPROTO_UDP: +case IPPROTO_ICMPV6: +case IPPROTO_UDPLITE: +return true; +} +break; +} +return false; +} + static int parse_put_flow_set_masked_action(struct tc_flower *flower, struct tc_action *action, @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower, return EOPNOTSUPP; } +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT >>> >>> We're using this field to make an offloading decision. We must >>> also set in the mask. If for some reason we're not matching on >>> the fragment bits, we may first receive a non-fragmented packet >>> and offload it, then fragmented traffic may match and fail the >>> checksumming. So, we need to enable the bit in the mask. >> >> The dp always matches on the fragment bit for IPv4 packets, I did some tests >> with this. >> So if we sent a non-fragment packet first the dp rule will match on fragment >> = 0. Or >> did I miss something? > > It is true today, but nothing ensures that on the netdev-offload-tc level. > Moreover, the netdev-offload-tc is written in a way that it expects the > frag bits to potentially not be in the mask: > > https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448 > So, it is going to be internally inconsistent if we do not set the bit > explicitly. > > And if someday we'll stop adding the frag bit, we'll never know if we > forget to set it in netdev-offload-tc. At the very least we'll need an > assertion that it is set. But having an assertion will still be internally > inconsistent with the code linked above. So, it may be better to just fix > it instead anyway. I finally got some time to look at the code, and I can add the below abort() in case OVS decides to no longer mask out the frag bits by default. However, I have no clear idea how to make this work if we ever stop adding this frag mask. I don't think we can report back to request a wider mask or track what exists and force a revalidate. @@ -2447,6 +2479,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match, } mask->nw_frag = 0; +} else { +ovs_abort(0, "TC offload assumes nw_frag is always masked"); } +&& will_tc_add_l4_checksum(flower, type)) { +VLOG_DBG_RL(, "set action type %d not supported on fragments " +"due to checksum limitation", type); +ofpbuf_uninit(_buf); +return
Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.
On 4 Jun 2024, at 13:52, Ilya Maximets wrote: > On 6/4/24 13:42, Eelco Chaudron wrote: >> >> >> On 1 Jun 2024, at 0:08, Ilya Maximets wrote: >> >>> On 5/7/24 15:52, Eelco Chaudron wrote: While offloading header modifications to TC, OVS is using {TCA_PEDIT} + {TCA_CSUM} combination as that it the only way to represent header rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for IP fragments. Since TC already applies fragmentation bit masking, this patch simply needs to prevent these packets from being processed through TC. Signed-off-by: Eelco Chaudron --- v2: - Fixed and added some comments. - Use ovs-pcap to compare packets. NOTE: This patch needs an AVX512 fix before it can be applied. Intel is working on this. --- lib/netdev-offload-tc.c | 32 ++ lib/tc.c| 5 ++- tests/system-traffic.at | 93 + 3 files changed, 129 insertions(+), 1 deletion(-) diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 921d52317..bdd307933 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower, return 0; } +static bool +will_tc_add_l4_checksum(struct tc_flower *flower, int type) +{ +/* This function returns true if the tc layer will add a l4 checksum action + * for this set action. Refer to the csum_update_flag() function for + * detailed logic. Note that even the kernel only supports updating TCP, + * UDP and ICMPv6. */ >>> >>> This comment should be outside of the function, I think. It's strange >>> to have it here. I see csum_update_flag() has a comment inside, but >>> that's strange as well. That function has other style issues as well, >>> there is no need to copy them. >> >> ACK, will clean this up in the next rev. >> +switch (type) { +case OVS_KEY_ATTR_IPV4: +case OVS_KEY_ATTR_IPV6: +case OVS_KEY_ATTR_TCP: +case OVS_KEY_ATTR_UDP: +switch (flower->key.ip_proto) { +case IPPROTO_TCP: +case IPPROTO_UDP: +case IPPROTO_ICMPV6: +case IPPROTO_UDPLITE: +return true; +} +break; +} +return false; +} + static int parse_put_flow_set_masked_action(struct tc_flower *flower, struct tc_action *action, @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower, return EOPNOTSUPP; } +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT >>> >>> We're using this field to make an offloading decision. We must >>> also set in the mask. If for some reason we're not matching on >>> the fragment bits, we may first receive a non-fragmented packet >>> and offload it, then fragmented traffic may match and fail the >>> checksumming. So, we need to enable the bit in the mask. >> >> The dp always matches on the fragment bit for IPv4 packets, I did some tests >> with this. >> So if we sent a non-fragment packet first the dp rule will match on fragment >> = 0. Or >> did I miss something? > > It is true today, but nothing ensures that on the netdev-offload-tc level. > Moreover, the netdev-offload-tc is written in a way that it expects the > frag bits to potentially not be in the mask: > > https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448 > So, it is going to be internally inconsistent if we do not set the bit > explicitly. > > And if someday we'll stop adding the frag bit, we'll never know if we > forget to set it in netdev-offload-tc. At the very least we'll need an > assertion that it is set. But having an assertion will still be internally > inconsistent with the code linked above. So, it may be better to just fix > it instead anyway. Ok we could just move the ‘mask->nw_frag = 0’ out of the if branch ;) I’ll take another look before sending a v3. >> +&& will_tc_add_l4_checksum(flower, type)) { +VLOG_DBG_RL(, "set action type %d not supported on fragments " +"due to checksum limitation", type); +ofpbuf_uninit(_buf); +return EOPNOTSUPP; +} + for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) { struct netlink_field *f = _flower_map[type][i]; diff --git a/lib/tc.c b/lib/tc.c index e9bcae4e4..20472d13c 100644 --- a/lib/tc.c +++ b/lib/tc.c @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower, * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=)) * we need to force a more specific
Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.
On 6/4/24 13:42, Eelco Chaudron wrote: > > > On 1 Jun 2024, at 0:08, Ilya Maximets wrote: > >> On 5/7/24 15:52, Eelco Chaudron wrote: >>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} + >>> {TCA_CSUM} combination as that it the only way to represent header >>> rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for >>> IP fragments. >>> >>> Since TC already applies fragmentation bit masking, this patch simply >>> needs to prevent these packets from being processed through TC. >>> >>> Signed-off-by: Eelco Chaudron >>> --- >>> v2: - Fixed and added some comments. >>> - Use ovs-pcap to compare packets. >>> >>> NOTE: This patch needs an AVX512 fix before it can be applied. >>> Intel is working on this. >>> --- >>> lib/netdev-offload-tc.c | 32 ++ >>> lib/tc.c| 5 ++- >>> tests/system-traffic.at | 93 + >>> 3 files changed, 129 insertions(+), 1 deletion(-) >>> >>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c >>> index 921d52317..bdd307933 100644 >>> --- a/lib/netdev-offload-tc.c >>> +++ b/lib/netdev-offload-tc.c >>> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower, >>> return 0; >>> } >>> >>> +static bool >>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type) >>> +{ >>> +/* This function returns true if the tc layer will add a l4 checksum >>> action >>> + * for this set action. Refer to the csum_update_flag() function for >>> + * detailed logic. Note that even the kernel only supports updating >>> TCP, >>> + * UDP and ICMPv6. */ >> >> This comment should be outside of the function, I think. It's strange >> to have it here. I see csum_update_flag() has a comment inside, but >> that's strange as well. That function has other style issues as well, >> there is no need to copy them. > > ACK, will clean this up in the next rev. > >>> +switch (type) { >>> +case OVS_KEY_ATTR_IPV4: >>> +case OVS_KEY_ATTR_IPV6: >>> +case OVS_KEY_ATTR_TCP: >>> +case OVS_KEY_ATTR_UDP: >>> +switch (flower->key.ip_proto) { >>> +case IPPROTO_TCP: >>> +case IPPROTO_UDP: >>> +case IPPROTO_ICMPV6: >>> +case IPPROTO_UDPLITE: >>> +return true; >>> +} >>> +break; >>> +} >>> +return false; >>> +} >>> + >>> static int >>> parse_put_flow_set_masked_action(struct tc_flower *flower, >>> struct tc_action *action, >>> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower >>> *flower, >>> return EOPNOTSUPP; >>> } >>> >>> +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT >> >> We're using this field to make an offloading decision. We must >> also set in the mask. If for some reason we're not matching on >> the fragment bits, we may first receive a non-fragmented packet >> and offload it, then fragmented traffic may match and fail the >> checksumming. So, we need to enable the bit in the mask. > > The dp always matches on the fragment bit for IPv4 packets, I did some tests > with this. > So if we sent a non-fragment packet first the dp rule will match on fragment > = 0. Or > did I miss something? It is true today, but nothing ensures that on the netdev-offload-tc level. Moreover, the netdev-offload-tc is written in a way that it expects the frag bits to potentially not be in the mask: https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448 So, it is going to be internally inconsistent if we do not set the bit explicitly. And if someday we'll stop adding the frag bit, we'll never know if we forget to set it in netdev-offload-tc. At the very least we'll need an assertion that it is set. But having an assertion will still be internally inconsistent with the code linked above. So, it may be better to just fix it instead anyway. > >>> +&& will_tc_add_l4_checksum(flower, type)) { >>> +VLOG_DBG_RL(, "set action type %d not supported on fragments " >>> +"due to checksum limitation", type); >>> +ofpbuf_uninit(_buf); >>> +return EOPNOTSUPP; >>> +} >>> + >>> for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) { >>> struct netlink_field *f = _flower_map[type][i]; >>> >>> diff --git a/lib/tc.c b/lib/tc.c >>> index e9bcae4e4..20472d13c 100644 >>> --- a/lib/tc.c >>> +++ b/lib/tc.c >>> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower, >>> * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=)) >>> * we need to force a more specific flow as this can, for example, >>> * need a recalculation of icmp checksum if the packet that passes >>> - * is ICMPv6 and tcp checksum if its tcp. */ >>> + * is ICMPv6 and tcp checksum if its tcp. >>> + * >>> + * Ensure that you keep the pre-check function in
Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.
On 1 Jun 2024, at 0:08, Ilya Maximets wrote: > On 5/7/24 15:52, Eelco Chaudron wrote: >> While offloading header modifications to TC, OVS is using {TCA_PEDIT} + >> {TCA_CSUM} combination as that it the only way to represent header >> rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for >> IP fragments. >> >> Since TC already applies fragmentation bit masking, this patch simply >> needs to prevent these packets from being processed through TC. >> >> Signed-off-by: Eelco Chaudron >> --- >> v2: - Fixed and added some comments. >> - Use ovs-pcap to compare packets. >> >> NOTE: This patch needs an AVX512 fix before it can be applied. >> Intel is working on this. >> --- >> lib/netdev-offload-tc.c | 32 ++ >> lib/tc.c| 5 ++- >> tests/system-traffic.at | 93 + >> 3 files changed, 129 insertions(+), 1 deletion(-) >> >> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c >> index 921d52317..bdd307933 100644 >> --- a/lib/netdev-offload-tc.c >> +++ b/lib/netdev-offload-tc.c >> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower, >> return 0; >> } >> >> +static bool >> +will_tc_add_l4_checksum(struct tc_flower *flower, int type) >> +{ >> +/* This function returns true if the tc layer will add a l4 checksum >> action >> + * for this set action. Refer to the csum_update_flag() function for >> + * detailed logic. Note that even the kernel only supports updating >> TCP, >> + * UDP and ICMPv6. */ > > This comment should be outside of the function, I think. It's strange > to have it here. I see csum_update_flag() has a comment inside, but > that's strange as well. That function has other style issues as well, > there is no need to copy them. ACK, will clean this up in the next rev. >> +switch (type) { >> +case OVS_KEY_ATTR_IPV4: >> +case OVS_KEY_ATTR_IPV6: >> +case OVS_KEY_ATTR_TCP: >> +case OVS_KEY_ATTR_UDP: >> +switch (flower->key.ip_proto) { >> +case IPPROTO_TCP: >> +case IPPROTO_UDP: >> +case IPPROTO_ICMPV6: >> +case IPPROTO_UDPLITE: >> +return true; >> +} >> +break; >> +} >> +return false; >> +} >> + >> static int >> parse_put_flow_set_masked_action(struct tc_flower *flower, >> struct tc_action *action, >> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower >> *flower, >> return EOPNOTSUPP; >> } >> >> +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT > > We're using this field to make an offloading decision. We must > also set in the mask. If for some reason we're not matching on > the fragment bits, we may first receive a non-fragmented packet > and offload it, then fragmented traffic may match and fail the > checksumming. So, we need to enable the bit in the mask. The dp always matches on the fragment bit for IPv4 packets, I did some tests with this. So if we sent a non-fragment packet first the dp rule will match on fragment = 0. Or did I miss something? >> +&& will_tc_add_l4_checksum(flower, type)) { >> +VLOG_DBG_RL(, "set action type %d not supported on fragments " >> +"due to checksum limitation", type); >> +ofpbuf_uninit(_buf); >> +return EOPNOTSUPP; >> +} >> + >> for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) { >> struct netlink_field *f = _flower_map[type][i]; >> >> diff --git a/lib/tc.c b/lib/tc.c >> index e9bcae4e4..20472d13c 100644 >> --- a/lib/tc.c >> +++ b/lib/tc.c >> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower, >> * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=)) >> * we need to force a more specific flow as this can, for example, >> * need a recalculation of icmp checksum if the packet that passes >> - * is ICMPv6 and tcp checksum if its tcp. */ >> + * is ICMPv6 and tcp checksum if its tcp. >> + * >> + * Ensure that you keep the pre-check function in netdev-offload-tc.c, >> + * will_tc_add_l4_checksum(), in sync if you make any changes here. */ >> >> switch (htype) { >> case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4: >> diff --git a/tests/system-traffic.at b/tests/system-traffic.at >> index bd7647cbe..8f392abd5 100644 >> --- a/tests/system-traffic.at >> +++ b/tests/system-traffic.at >> @@ -8977,3 +8977,96 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: * >> * *5002 *2000 *b85e *00 >> >> OVS_TRAFFIC_VSWITCHD_STOP >> AT_CLEANUP >> + >> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments]) >> +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) >> +OVS_TRAFFIC_VSWITCHD_START() >> + >> +ADD_NAMESPACES(at_ns0, at_ns1) >> + >> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03) >> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02) >> + >> +AT_DATA([flows.txt], [dnl >> +
Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.
On 5/7/24 15:52, Eelco Chaudron wrote: > While offloading header modifications to TC, OVS is using {TCA_PEDIT} + > {TCA_CSUM} combination as that it the only way to represent header > rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for > IP fragments. > > Since TC already applies fragmentation bit masking, this patch simply > needs to prevent these packets from being processed through TC. > > Signed-off-by: Eelco Chaudron > --- > v2: - Fixed and added some comments. > - Use ovs-pcap to compare packets. > > NOTE: This patch needs an AVX512 fix before it can be applied. > Intel is working on this. > --- > lib/netdev-offload-tc.c | 32 ++ > lib/tc.c| 5 ++- > tests/system-traffic.at | 93 + > 3 files changed, 129 insertions(+), 1 deletion(-) > > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c > index 921d52317..bdd307933 100644 > --- a/lib/netdev-offload-tc.c > +++ b/lib/netdev-offload-tc.c > @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower, > return 0; > } > > +static bool > +will_tc_add_l4_checksum(struct tc_flower *flower, int type) > +{ > +/* This function returns true if the tc layer will add a l4 checksum > action > + * for this set action. Refer to the csum_update_flag() function for > + * detailed logic. Note that even the kernel only supports updating TCP, > + * UDP and ICMPv6. */ This comment should be outside of the function, I think. It's strange to have it here. I see csum_update_flag() has a comment inside, but that's strange as well. That function has other style issues as well, there is no need to copy them. > +switch (type) { > +case OVS_KEY_ATTR_IPV4: > +case OVS_KEY_ATTR_IPV6: > +case OVS_KEY_ATTR_TCP: > +case OVS_KEY_ATTR_UDP: > +switch (flower->key.ip_proto) { > +case IPPROTO_TCP: > +case IPPROTO_UDP: > +case IPPROTO_ICMPV6: > +case IPPROTO_UDPLITE: > +return true; > +} > +break; > +} > +return false; > +} > + > static int > parse_put_flow_set_masked_action(struct tc_flower *flower, > struct tc_action *action, > @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower > *flower, > return EOPNOTSUPP; > } > > +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT We're using this field to make an offloading decision. We must also set in the mask. If for some reason we're not matching on the fragment bits, we may first receive a non-fragmented packet and offload it, then fragmented traffic may match and fail the checksumming. So, we need to enable the bit in the mask. > +&& will_tc_add_l4_checksum(flower, type)) { > +VLOG_DBG_RL(, "set action type %d not supported on fragments " > +"due to checksum limitation", type); > +ofpbuf_uninit(_buf); > +return EOPNOTSUPP; > +} > + > for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) { > struct netlink_field *f = _flower_map[type][i]; > > diff --git a/lib/tc.c b/lib/tc.c > index e9bcae4e4..20472d13c 100644 > --- a/lib/tc.c > +++ b/lib/tc.c > @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower, > * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=)) > * we need to force a more specific flow as this can, for example, > * need a recalculation of icmp checksum if the packet that passes > - * is ICMPv6 and tcp checksum if its tcp. */ > + * is ICMPv6 and tcp checksum if its tcp. > + * > + * Ensure that you keep the pre-check function in netdev-offload-tc.c, > + * will_tc_add_l4_checksum(), in sync if you make any changes here. */ > > switch (htype) { > case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4: > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > index bd7647cbe..8f392abd5 100644 > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -8977,3 +8977,96 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: * > * *5002 *2000 *b85e *00 > > OVS_TRAFFIC_VSWITCHD_STOP > AT_CLEANUP > + > +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments]) > +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) > +OVS_TRAFFIC_VSWITCHD_START() > + > +ADD_NAMESPACES(at_ns0, at_ns1) > + > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03) > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02) > + > +AT_DATA([flows.txt], [dnl > + in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1 > + in_port=ovs-p0,ipv6,ipv6_src=fc00::1 > actions=set_field:fc00::100->ipv6_src,ovs-p1 > +]) > + > +AT_CHECK([ovs-ofctl del-flows br0]) > +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt]) > + > +NETNS_DAEMONIZE([at_ns1], > +[tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err], > +[tcpdump.pid]) > +OVS_WAIT_UNTIL([grep
[ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.
While offloading header modifications to TC, OVS is using {TCA_PEDIT} + {TCA_CSUM} combination as that it the only way to represent header rewrite. However, {TCA_CSUM} is unable to calculate L4 checksums for IP fragments. Since TC already applies fragmentation bit masking, this patch simply needs to prevent these packets from being processed through TC. Signed-off-by: Eelco Chaudron --- v2: - Fixed and added some comments. - Use ovs-pcap to compare packets. NOTE: This patch needs an AVX512 fix before it can be applied. Intel is working on this. --- lib/netdev-offload-tc.c | 32 ++ lib/tc.c| 5 ++- tests/system-traffic.at | 93 + 3 files changed, 129 insertions(+), 1 deletion(-) diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 921d52317..bdd307933 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower, return 0; } +static bool +will_tc_add_l4_checksum(struct tc_flower *flower, int type) +{ +/* This function returns true if the tc layer will add a l4 checksum action + * for this set action. Refer to the csum_update_flag() function for + * detailed logic. Note that even the kernel only supports updating TCP, + * UDP and ICMPv6. */ +switch (type) { +case OVS_KEY_ATTR_IPV4: +case OVS_KEY_ATTR_IPV6: +case OVS_KEY_ATTR_TCP: +case OVS_KEY_ATTR_UDP: +switch (flower->key.ip_proto) { +case IPPROTO_TCP: +case IPPROTO_UDP: +case IPPROTO_ICMPV6: +case IPPROTO_UDPLITE: +return true; +} +break; +} +return false; +} + static int parse_put_flow_set_masked_action(struct tc_flower *flower, struct tc_action *action, @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower *flower, return EOPNOTSUPP; } +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT +&& will_tc_add_l4_checksum(flower, type)) { +VLOG_DBG_RL(, "set action type %d not supported on fragments " +"due to checksum limitation", type); +ofpbuf_uninit(_buf); +return EOPNOTSUPP; +} + for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) { struct netlink_field *f = _flower_map[type][i]; diff --git a/lib/tc.c b/lib/tc.c index e9bcae4e4..20472d13c 100644 --- a/lib/tc.c +++ b/lib/tc.c @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower, * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=)) * we need to force a more specific flow as this can, for example, * need a recalculation of icmp checksum if the packet that passes - * is ICMPv6 and tcp checksum if its tcp. */ + * is ICMPv6 and tcp checksum if its tcp. + * + * Ensure that you keep the pre-check function in netdev-offload-tc.c, + * will_tc_add_l4_checksum(), in sync if you make any changes here. */ switch (htype) { case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4: diff --git a/tests/system-traffic.at b/tests/system-traffic.at index bd7647cbe..8f392abd5 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -8977,3 +8977,96 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: * * *5002 *2000 *b85e *00 OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP + +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments]) +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03) +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02) + +AT_DATA([flows.txt], [dnl + in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1 + in_port=ovs-p0,ipv6,ipv6_src=fc00::1 actions=set_field:fc00::100->ipv6_src,ovs-p1 +]) + +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt]) + +NETNS_DAEMONIZE([at_ns1], +[tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err], +[tcpdump.pid]) +OVS_WAIT_UNTIL([grep "listening" tcpdump.err]) + +dnl IPv4 Packet content: +dnl Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02 +dnl Type: IPv4 (0x0800) +dnl Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2 +dnl 0100 = Version: 4 +dnl 0101 = Header Length: 20 bytes (5) +dnl Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) +dnl Total Length: 38 +dnl Identification: 0x0001 (1) +dnl 001. = Flags: 0x1, More fragments +dnl 0... = Reserved bit: Not set +dnl .0.. = Don't fragment: Not set +dnl ..1. = More fragments: Set +dnl ...0 = Fragment Offset: 0 +dnl Time to Live: 64 +dnl Protocol: UDP (17) +dnl Header Checksum: 0x44c2 +dnl Data (18 bytes) +eth="36 b1 ee 7c 01 02 36 b1 ee 7c