Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.

2024-07-03 Thread Eelco Chaudron


On 2 Jul 2024, at 15:38, Ilya Maximets wrote:

> On 6/27/24 15:03, Eelco Chaudron wrote:
>>
>>
>> On 4 Jun 2024, at 13:52, Ilya Maximets wrote:
>>
>>> On 6/4/24 13:42, Eelco Chaudron wrote:


 On 1 Jun 2024, at 0:08, Ilya Maximets wrote:

> On 5/7/24 15:52, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Signed-off-by: Eelco Chaudron 
>> ---
>> v2: - Fixed and added some comments.
>> - Use ovs-pcap to compare packets.
>>
>> NOTE: This patch needs an AVX512 fix before it can be applied.
>>   Intel is working on this.
>> ---
>>  lib/netdev-offload-tc.c | 32 ++
>>  lib/tc.c|  5 ++-
>>  tests/system-traffic.at | 93 +
>>  3 files changed, 129 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 921d52317..bdd307933 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>  return 0;
>>  }
>>
>> +static bool
>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>> +{
>> +/* This function returns true if the tc layer will add a l4 
>> checksum action
>> + * for this set action.  Refer to the csum_update_flag() function 
>> for
>> + * detailed logic.  Note that even the kernel only supports 
>> updating TCP,
>> + * UDP and ICMPv6. */
>
> This comment should be outside of the function, I think.  It's strange
> to have it here.  I see csum_update_flag() has a comment inside, but
> that's strange as well.  That function has other style issues as well,
> there is no need to copy them.

 ACK, will clean this up in the next rev.

>> +switch (type) {
>> +case OVS_KEY_ATTR_IPV4:
>> +case OVS_KEY_ATTR_IPV6:
>> +case OVS_KEY_ATTR_TCP:
>> +case OVS_KEY_ATTR_UDP:
>> +switch (flower->key.ip_proto) {
>> +case IPPROTO_TCP:
>> +case IPPROTO_UDP:
>> +case IPPROTO_ICMPV6:
>> +case IPPROTO_UDPLITE:
>> +return true;
>> +}
>> +break;
>> +}
>> +return false;
>> +}
>> +
>>  static int
>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>   struct tc_action *action,
>> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
>> *flower,
>>  return EOPNOTSUPP;
>>  }
>>
>> +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>
> We're using this field to make an offloading decision.  We must
> also set in the mask.  If for some reason we're not matching on
> the fragment bits, we may first receive a non-fragmented packet
> and offload it, then fragmented traffic may match and fail the
> checksumming.  So, we need to enable the bit in the mask.

 The dp always matches on the fragment bit for IPv4 packets, I did some 
 tests with this.
 So if we sent a non-fragment packet first the dp rule will match on 
 fragment = 0. Or
 did I miss something?
>>>
>>> It is true today, but nothing ensures that on the netdev-offload-tc level.
>>> Moreover, the netdev-offload-tc is written in a way that it expects the
>>> frag bits to potentially not be in the mask:
>>> https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448
>>> So, it is going to be internally inconsistent if we do not set the bit
>>> explicitly.
>>>
>>> And if someday we'll stop adding the frag bit, we'll never know if we
>>> forget to set it in netdev-offload-tc.  At the very least we'll need an
>>> assertion that it is set.  But having an assertion will still be internally
>>> inconsistent with the code linked above.  So, it may be better to just fix
>>> it instead anyway.
>>
>> I finally got some time to look at the code, and I can add the below abort()
>> in case OVS decides to no longer mask out the frag bits by default.
>>
>> However, I have no clear idea how to make this work if we ever stop adding
>> this frag mask. I don't think we can report back to request a wider mask or
>> track what exists and force a revalidate.
>>
>> @@ -2447,6 +2479,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
>> *match,
>>  }
>>
>>  mask->nw_frag = 0;
>> +} else {

Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.

2024-07-02 Thread Ilya Maximets
On 6/27/24 15:03, Eelco Chaudron wrote:
> 
> 
> On 4 Jun 2024, at 13:52, Ilya Maximets wrote:
> 
>> On 6/4/24 13:42, Eelco Chaudron wrote:
>>>
>>>
>>> On 1 Jun 2024, at 0:08, Ilya Maximets wrote:
>>>
 On 5/7/24 15:52, Eelco Chaudron wrote:
> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
> {TCA_CSUM} combination as that it the only way to represent header
> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
> IP fragments.
>
> Since TC already applies fragmentation bit masking, this patch simply
> needs to prevent these packets from being processed through TC.
>
> Signed-off-by: Eelco Chaudron 
> ---
> v2: - Fixed and added some comments.
> - Use ovs-pcap to compare packets.
>
> NOTE: This patch needs an AVX512 fix before it can be applied.
>   Intel is working on this.
> ---
>  lib/netdev-offload-tc.c | 32 ++
>  lib/tc.c|  5 ++-
>  tests/system-traffic.at | 93 +
>  3 files changed, 129 insertions(+), 1 deletion(-)
>
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index 921d52317..bdd307933 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>  return 0;
>  }
>
> +static bool
> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
> +{
> +/* This function returns true if the tc layer will add a l4 checksum 
> action
> + * for this set action.  Refer to the csum_update_flag() function for
> + * detailed logic.  Note that even the kernel only supports updating 
> TCP,
> + * UDP and ICMPv6. */

 This comment should be outside of the function, I think.  It's strange
 to have it here.  I see csum_update_flag() has a comment inside, but
 that's strange as well.  That function has other style issues as well,
 there is no need to copy them.
>>>
>>> ACK, will clean this up in the next rev.
>>>
> +switch (type) {
> +case OVS_KEY_ATTR_IPV4:
> +case OVS_KEY_ATTR_IPV6:
> +case OVS_KEY_ATTR_TCP:
> +case OVS_KEY_ATTR_UDP:
> +switch (flower->key.ip_proto) {
> +case IPPROTO_TCP:
> +case IPPROTO_UDP:
> +case IPPROTO_ICMPV6:
> +case IPPROTO_UDPLITE:
> +return true;
> +}
> +break;
> +}
> +return false;
> +}
> +
>  static int
>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>   struct tc_action *action,
> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
> *flower,
>  return EOPNOTSUPP;
>  }
>
> +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT

 We're using this field to make an offloading decision.  We must
 also set in the mask.  If for some reason we're not matching on
 the fragment bits, we may first receive a non-fragmented packet
 and offload it, then fragmented traffic may match and fail the
 checksumming.  So, we need to enable the bit in the mask.
>>>
>>> The dp always matches on the fragment bit for IPv4 packets, I did some 
>>> tests with this.
>>> So if we sent a non-fragment packet first the dp rule will match on 
>>> fragment = 0. Or
>>> did I miss something?
>>
>> It is true today, but nothing ensures that on the netdev-offload-tc level.
>> Moreover, the netdev-offload-tc is written in a way that it expects the
>> frag bits to potentially not be in the mask:
>>  
>> https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448
>> So, it is going to be internally inconsistent if we do not set the bit
>> explicitly.
>>
>> And if someday we'll stop adding the frag bit, we'll never know if we
>> forget to set it in netdev-offload-tc.  At the very least we'll need an
>> assertion that it is set.  But having an assertion will still be internally
>> inconsistent with the code linked above.  So, it may be better to just fix
>> it instead anyway.
> 
> I finally got some time to look at the code, and I can add the below abort()
> in case OVS decides to no longer mask out the frag bits by default.
> 
> However, I have no clear idea how to make this work if we ever stop adding
> this frag mask. I don't think we can report back to request a wider mask or
> track what exists and force a revalidate.
> 
> @@ -2447,6 +2479,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
> *match,
>  }
> 
>  mask->nw_frag = 0;
> +} else {
> +ovs_abort(0, "TC offload assumes nw_frag is always masked");
>  }

We don't need to abort.  We should just narrow down the match by adding
frag 

Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.

2024-06-27 Thread Eelco Chaudron



On 4 Jun 2024, at 13:52, Ilya Maximets wrote:

> On 6/4/24 13:42, Eelco Chaudron wrote:
>>
>>
>> On 1 Jun 2024, at 0:08, Ilya Maximets wrote:
>>
>>> On 5/7/24 15:52, Eelco Chaudron wrote:
 While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
 {TCA_CSUM} combination as that it the only way to represent header
 rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
 IP fragments.

 Since TC already applies fragmentation bit masking, this patch simply
 needs to prevent these packets from being processed through TC.

 Signed-off-by: Eelco Chaudron 
 ---
 v2: - Fixed and added some comments.
 - Use ovs-pcap to compare packets.

 NOTE: This patch needs an AVX512 fix before it can be applied.
   Intel is working on this.
 ---
  lib/netdev-offload-tc.c | 32 ++
  lib/tc.c|  5 ++-
  tests/system-traffic.at | 93 +
  3 files changed, 129 insertions(+), 1 deletion(-)

 diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
 index 921d52317..bdd307933 100644
 --- a/lib/netdev-offload-tc.c
 +++ b/lib/netdev-offload-tc.c
 @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
  return 0;
  }

 +static bool
 +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
 +{
 +/* This function returns true if the tc layer will add a l4 checksum 
 action
 + * for this set action.  Refer to the csum_update_flag() function for
 + * detailed logic.  Note that even the kernel only supports updating 
 TCP,
 + * UDP and ICMPv6. */
>>>
>>> This comment should be outside of the function, I think.  It's strange
>>> to have it here.  I see csum_update_flag() has a comment inside, but
>>> that's strange as well.  That function has other style issues as well,
>>> there is no need to copy them.
>>
>> ACK, will clean this up in the next rev.
>>
 +switch (type) {
 +case OVS_KEY_ATTR_IPV4:
 +case OVS_KEY_ATTR_IPV6:
 +case OVS_KEY_ATTR_TCP:
 +case OVS_KEY_ATTR_UDP:
 +switch (flower->key.ip_proto) {
 +case IPPROTO_TCP:
 +case IPPROTO_UDP:
 +case IPPROTO_ICMPV6:
 +case IPPROTO_UDPLITE:
 +return true;
 +}
 +break;
 +}
 +return false;
 +}
 +
  static int
  parse_put_flow_set_masked_action(struct tc_flower *flower,
   struct tc_action *action,
 @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
 *flower,
  return EOPNOTSUPP;
  }

 +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>>>
>>> We're using this field to make an offloading decision.  We must
>>> also set in the mask.  If for some reason we're not matching on
>>> the fragment bits, we may first receive a non-fragmented packet
>>> and offload it, then fragmented traffic may match and fail the
>>> checksumming.  So, we need to enable the bit in the mask.
>>
>> The dp always matches on the fragment bit for IPv4 packets, I did some tests 
>> with this.
>> So if we sent a non-fragment packet first the dp rule will match on fragment 
>> = 0. Or
>> did I miss something?
>
> It is true today, but nothing ensures that on the netdev-offload-tc level.
> Moreover, the netdev-offload-tc is written in a way that it expects the
> frag bits to potentially not be in the mask:
>  
> https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448
> So, it is going to be internally inconsistent if we do not set the bit
> explicitly.
>
> And if someday we'll stop adding the frag bit, we'll never know if we
> forget to set it in netdev-offload-tc.  At the very least we'll need an
> assertion that it is set.  But having an assertion will still be internally
> inconsistent with the code linked above.  So, it may be better to just fix
> it instead anyway.

I finally got some time to look at the code, and I can add the below abort()
in case OVS decides to no longer mask out the frag bits by default.

However, I have no clear idea how to make this work if we ever stop adding
this frag mask. I don't think we can report back to request a wider mask or
track what exists and force a revalidate.

@@ -2447,6 +2479,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
*match,
 }

 mask->nw_frag = 0;
+} else {
+ovs_abort(0, "TC offload assumes nw_frag is always masked");
 }

 +&& will_tc_add_l4_checksum(flower, type)) {
 +VLOG_DBG_RL(, "set action type %d not supported on fragments "
 +"due to checksum limitation", type);
 +ofpbuf_uninit(_buf);
 +return 

Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.

2024-06-04 Thread Eelco Chaudron


On 4 Jun 2024, at 13:52, Ilya Maximets wrote:

> On 6/4/24 13:42, Eelco Chaudron wrote:
>>
>>
>> On 1 Jun 2024, at 0:08, Ilya Maximets wrote:
>>
>>> On 5/7/24 15:52, Eelco Chaudron wrote:
 While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
 {TCA_CSUM} combination as that it the only way to represent header
 rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
 IP fragments.

 Since TC already applies fragmentation bit masking, this patch simply
 needs to prevent these packets from being processed through TC.

 Signed-off-by: Eelco Chaudron 
 ---
 v2: - Fixed and added some comments.
 - Use ovs-pcap to compare packets.

 NOTE: This patch needs an AVX512 fix before it can be applied.
   Intel is working on this.
 ---
  lib/netdev-offload-tc.c | 32 ++
  lib/tc.c|  5 ++-
  tests/system-traffic.at | 93 +
  3 files changed, 129 insertions(+), 1 deletion(-)

 diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
 index 921d52317..bdd307933 100644
 --- a/lib/netdev-offload-tc.c
 +++ b/lib/netdev-offload-tc.c
 @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
  return 0;
  }

 +static bool
 +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
 +{
 +/* This function returns true if the tc layer will add a l4 checksum 
 action
 + * for this set action.  Refer to the csum_update_flag() function for
 + * detailed logic.  Note that even the kernel only supports updating 
 TCP,
 + * UDP and ICMPv6. */
>>>
>>> This comment should be outside of the function, I think.  It's strange
>>> to have it here.  I see csum_update_flag() has a comment inside, but
>>> that's strange as well.  That function has other style issues as well,
>>> there is no need to copy them.
>>
>> ACK, will clean this up in the next rev.
>>
 +switch (type) {
 +case OVS_KEY_ATTR_IPV4:
 +case OVS_KEY_ATTR_IPV6:
 +case OVS_KEY_ATTR_TCP:
 +case OVS_KEY_ATTR_UDP:
 +switch (flower->key.ip_proto) {
 +case IPPROTO_TCP:
 +case IPPROTO_UDP:
 +case IPPROTO_ICMPV6:
 +case IPPROTO_UDPLITE:
 +return true;
 +}
 +break;
 +}
 +return false;
 +}
 +
  static int
  parse_put_flow_set_masked_action(struct tc_flower *flower,
   struct tc_action *action,
 @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
 *flower,
  return EOPNOTSUPP;
  }

 +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>>>
>>> We're using this field to make an offloading decision.  We must
>>> also set in the mask.  If for some reason we're not matching on
>>> the fragment bits, we may first receive a non-fragmented packet
>>> and offload it, then fragmented traffic may match and fail the
>>> checksumming.  So, we need to enable the bit in the mask.
>>
>> The dp always matches on the fragment bit for IPv4 packets, I did some tests 
>> with this.
>> So if we sent a non-fragment packet first the dp rule will match on fragment 
>> = 0. Or
>> did I miss something?
>
> It is true today, but nothing ensures that on the netdev-offload-tc level.
> Moreover, the netdev-offload-tc is written in a way that it expects the
> frag bits to potentially not be in the mask:
>  
> https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448
> So, it is going to be internally inconsistent if we do not set the bit
> explicitly.
>
> And if someday we'll stop adding the frag bit, we'll never know if we
> forget to set it in netdev-offload-tc.  At the very least we'll need an
> assertion that it is set.  But having an assertion will still be internally
> inconsistent with the code linked above.  So, it may be better to just fix
> it instead anyway.

Ok we could just move the ‘mask->nw_frag = 0’ out of the if branch ;) I’ll take 
another look before sending a v3.

>>
 +&& will_tc_add_l4_checksum(flower, type)) {
 +VLOG_DBG_RL(, "set action type %d not supported on fragments "
 +"due to checksum limitation", type);
 +ofpbuf_uninit(_buf);
 +return EOPNOTSUPP;
 +}
 +
  for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
  struct netlink_field *f = _flower_map[type][i];

 diff --git a/lib/tc.c b/lib/tc.c
 index e9bcae4e4..20472d13c 100644
 --- a/lib/tc.c
 +++ b/lib/tc.c
 @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
   * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=))
   * we need to force a more specific 

Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.

2024-06-04 Thread Ilya Maximets
On 6/4/24 13:42, Eelco Chaudron wrote:
> 
> 
> On 1 Jun 2024, at 0:08, Ilya Maximets wrote:
> 
>> On 5/7/24 15:52, Eelco Chaudron wrote:
>>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>>> {TCA_CSUM} combination as that it the only way to represent header
>>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>>> IP fragments.
>>>
>>> Since TC already applies fragmentation bit masking, this patch simply
>>> needs to prevent these packets from being processed through TC.
>>>
>>> Signed-off-by: Eelco Chaudron 
>>> ---
>>> v2: - Fixed and added some comments.
>>> - Use ovs-pcap to compare packets.
>>>
>>> NOTE: This patch needs an AVX512 fix before it can be applied.
>>>   Intel is working on this.
>>> ---
>>>  lib/netdev-offload-tc.c | 32 ++
>>>  lib/tc.c|  5 ++-
>>>  tests/system-traffic.at | 93 +
>>>  3 files changed, 129 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>> index 921d52317..bdd307933 100644
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>>  return 0;
>>>  }
>>>
>>> +static bool
>>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>>> +{
>>> +/* This function returns true if the tc layer will add a l4 checksum 
>>> action
>>> + * for this set action.  Refer to the csum_update_flag() function for
>>> + * detailed logic.  Note that even the kernel only supports updating 
>>> TCP,
>>> + * UDP and ICMPv6. */
>>
>> This comment should be outside of the function, I think.  It's strange
>> to have it here.  I see csum_update_flag() has a comment inside, but
>> that's strange as well.  That function has other style issues as well,
>> there is no need to copy them.
> 
> ACK, will clean this up in the next rev.
> 
>>> +switch (type) {
>>> +case OVS_KEY_ATTR_IPV4:
>>> +case OVS_KEY_ATTR_IPV6:
>>> +case OVS_KEY_ATTR_TCP:
>>> +case OVS_KEY_ATTR_UDP:
>>> +switch (flower->key.ip_proto) {
>>> +case IPPROTO_TCP:
>>> +case IPPROTO_UDP:
>>> +case IPPROTO_ICMPV6:
>>> +case IPPROTO_UDPLITE:
>>> +return true;
>>> +}
>>> +break;
>>> +}
>>> +return false;
>>> +}
>>> +
>>>  static int
>>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>>   struct tc_action *action,
>>> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
>>> *flower,
>>>  return EOPNOTSUPP;
>>>  }
>>>
>>> +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>>
>> We're using this field to make an offloading decision.  We must
>> also set in the mask.  If for some reason we're not matching on
>> the fragment bits, we may first receive a non-fragmented packet
>> and offload it, then fragmented traffic may match and fail the
>> checksumming.  So, we need to enable the bit in the mask.
> 
> The dp always matches on the fragment bit for IPv4 packets, I did some tests 
> with this.
> So if we sent a non-fragment packet first the dp rule will match on fragment 
> = 0. Or
> did I miss something?

It is true today, but nothing ensures that on the netdev-offload-tc level.
Moreover, the netdev-offload-tc is written in a way that it expects the
frag bits to potentially not be in the mask:
  
https://github.com/openvswitch/ovs/blob/1d681ffe3b208a0db4945b6389142ab18404a4d1/lib/netdev-offload-tc.c#L2432-L2448
So, it is going to be internally inconsistent if we do not set the bit
explicitly.

And if someday we'll stop adding the frag bit, we'll never know if we
forget to set it in netdev-offload-tc.  At the very least we'll need an
assertion that it is set.  But having an assertion will still be internally
inconsistent with the code linked above.  So, it may be better to just fix
it instead anyway.

> 
>>> +&& will_tc_add_l4_checksum(flower, type)) {
>>> +VLOG_DBG_RL(, "set action type %d not supported on fragments "
>>> +"due to checksum limitation", type);
>>> +ofpbuf_uninit(_buf);
>>> +return EOPNOTSUPP;
>>> +}
>>> +
>>>  for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>>  struct netlink_field *f = _flower_map[type][i];
>>>
>>> diff --git a/lib/tc.c b/lib/tc.c
>>> index e9bcae4e4..20472d13c 100644
>>> --- a/lib/tc.c
>>> +++ b/lib/tc.c
>>> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
>>>   * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=))
>>>   * we need to force a more specific flow as this can, for example,
>>>   * need a recalculation of icmp checksum if the packet that passes
>>> - * is ICMPv6 and tcp checksum if its tcp. */
>>> + * is ICMPv6 and tcp checksum if its tcp.
>>> + *
>>> + * Ensure that you keep the pre-check function in 

Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.

2024-06-04 Thread Eelco Chaudron



On 1 Jun 2024, at 0:08, Ilya Maximets wrote:

> On 5/7/24 15:52, Eelco Chaudron wrote:
>> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
>> {TCA_CSUM} combination as that it the only way to represent header
>> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
>> IP fragments.
>>
>> Since TC already applies fragmentation bit masking, this patch simply
>> needs to prevent these packets from being processed through TC.
>>
>> Signed-off-by: Eelco Chaudron 
>> ---
>> v2: - Fixed and added some comments.
>> - Use ovs-pcap to compare packets.
>>
>> NOTE: This patch needs an AVX512 fix before it can be applied.
>>   Intel is working on this.
>> ---
>>  lib/netdev-offload-tc.c | 32 ++
>>  lib/tc.c|  5 ++-
>>  tests/system-traffic.at | 93 +
>>  3 files changed, 129 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 921d52317..bdd307933 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>>  return 0;
>>  }
>>
>> +static bool
>> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
>> +{
>> +/* This function returns true if the tc layer will add a l4 checksum 
>> action
>> + * for this set action.  Refer to the csum_update_flag() function for
>> + * detailed logic.  Note that even the kernel only supports updating 
>> TCP,
>> + * UDP and ICMPv6. */
>
> This comment should be outside of the function, I think.  It's strange
> to have it here.  I see csum_update_flag() has a comment inside, but
> that's strange as well.  That function has other style issues as well,
> there is no need to copy them.

ACK, will clean this up in the next rev.

>> +switch (type) {
>> +case OVS_KEY_ATTR_IPV4:
>> +case OVS_KEY_ATTR_IPV6:
>> +case OVS_KEY_ATTR_TCP:
>> +case OVS_KEY_ATTR_UDP:
>> +switch (flower->key.ip_proto) {
>> +case IPPROTO_TCP:
>> +case IPPROTO_UDP:
>> +case IPPROTO_ICMPV6:
>> +case IPPROTO_UDPLITE:
>> +return true;
>> +}
>> +break;
>> +}
>> +return false;
>> +}
>> +
>>  static int
>>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>>   struct tc_action *action,
>> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
>> *flower,
>>  return EOPNOTSUPP;
>>  }
>>
>> +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
>
> We're using this field to make an offloading decision.  We must
> also set in the mask.  If for some reason we're not matching on
> the fragment bits, we may first receive a non-fragmented packet
> and offload it, then fragmented traffic may match and fail the
> checksumming.  So, we need to enable the bit in the mask.

The dp always matches on the fragment bit for IPv4 packets, I did some tests 
with this. So if we sent a non-fragment packet first the dp rule will match on 
fragment = 0. Or did I miss something?

>> +&& will_tc_add_l4_checksum(flower, type)) {
>> +VLOG_DBG_RL(, "set action type %d not supported on fragments "
>> +"due to checksum limitation", type);
>> +ofpbuf_uninit(_buf);
>> +return EOPNOTSUPP;
>> +}
>> +
>>  for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>>  struct netlink_field *f = _flower_map[type][i];
>>
>> diff --git a/lib/tc.c b/lib/tc.c
>> index e9bcae4e4..20472d13c 100644
>> --- a/lib/tc.c
>> +++ b/lib/tc.c
>> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
>>   * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=))
>>   * we need to force a more specific flow as this can, for example,
>>   * need a recalculation of icmp checksum if the packet that passes
>> - * is ICMPv6 and tcp checksum if its tcp. */
>> + * is ICMPv6 and tcp checksum if its tcp.
>> + *
>> + * Ensure that you keep the pre-check function in netdev-offload-tc.c,
>> + * will_tc_add_l4_checksum(), in sync if you make any changes here. */
>>
>>  switch (htype) {
>>  case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index bd7647cbe..8f392abd5 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -8977,3 +8977,96 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: * 
>> * *5002 *2000 *b85e *00
>>
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>> +
>> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
>> +
>> +AT_DATA([flows.txt], [dnl
>> +  

Re: [ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.

2024-05-31 Thread Ilya Maximets
On 5/7/24 15:52, Eelco Chaudron wrote:
> While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
> {TCA_CSUM} combination as that it the only way to represent header
> rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
> IP fragments.
> 
> Since TC already applies fragmentation bit masking, this patch simply
> needs to prevent these packets from being processed through TC.
> 
> Signed-off-by: Eelco Chaudron 
> ---
> v2: - Fixed and added some comments.
> - Use ovs-pcap to compare packets.
> 
> NOTE: This patch needs an AVX512 fix before it can be applied.
>   Intel is working on this.
> ---
>  lib/netdev-offload-tc.c | 32 ++
>  lib/tc.c|  5 ++-
>  tests/system-traffic.at | 93 +
>  3 files changed, 129 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index 921d52317..bdd307933 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
>  return 0;
>  }
>  
> +static bool
> +will_tc_add_l4_checksum(struct tc_flower *flower, int type)
> +{
> +/* This function returns true if the tc layer will add a l4 checksum 
> action
> + * for this set action.  Refer to the csum_update_flag() function for
> + * detailed logic.  Note that even the kernel only supports updating TCP,
> + * UDP and ICMPv6. */

This comment should be outside of the function, I think.  It's strange
to have it here.  I see csum_update_flag() has a comment inside, but
that's strange as well.  That function has other style issues as well,
there is no need to copy them.

> +switch (type) {
> +case OVS_KEY_ATTR_IPV4:
> +case OVS_KEY_ATTR_IPV6:
> +case OVS_KEY_ATTR_TCP:
> +case OVS_KEY_ATTR_UDP:
> +switch (flower->key.ip_proto) {
> +case IPPROTO_TCP:
> +case IPPROTO_UDP:
> +case IPPROTO_ICMPV6:
> +case IPPROTO_UDPLITE:
> +return true;
> +}
> +break;
> +}
> +return false;
> +}
> +
>  static int
>  parse_put_flow_set_masked_action(struct tc_flower *flower,
>   struct tc_action *action,
> @@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
> *flower,
>  return EOPNOTSUPP;
>  }
>  
> +if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT

We're using this field to make an offloading decision.  We must
also set in the mask.  If for some reason we're not matching on
the fragment bits, we may first receive a non-fragmented packet
and offload it, then fragmented traffic may match and fail the
checksumming.  So, we need to enable the bit in the mask.

> +&& will_tc_add_l4_checksum(flower, type)) {
> +VLOG_DBG_RL(, "set action type %d not supported on fragments "
> +"due to checksum limitation", type);
> +ofpbuf_uninit(_buf);
> +return EOPNOTSUPP;
> +}
> +
>  for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
>  struct netlink_field *f = _flower_map[type][i];
>  
> diff --git a/lib/tc.c b/lib/tc.c
> index e9bcae4e4..20472d13c 100644
> --- a/lib/tc.c
> +++ b/lib/tc.c
> @@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
>   * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=))
>   * we need to force a more specific flow as this can, for example,
>   * need a recalculation of icmp checksum if the packet that passes
> - * is ICMPv6 and tcp checksum if its tcp. */
> + * is ICMPv6 and tcp checksum if its tcp.
> + *
> + * Ensure that you keep the pre-check function in netdev-offload-tc.c,
> + * will_tc_add_l4_checksum(), in sync if you make any changes here. */
>  
>  switch (htype) {
>  case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index bd7647cbe..8f392abd5 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -8977,3 +8977,96 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: * 
> * *5002 *2000 *b85e *00
>  
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
> +
> +AT_DATA([flows.txt], [dnl
> +  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
> +  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 
> actions=set_field:fc00::100->ipv6_src,ovs-p1
> +])
> +
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
> +
> +NETNS_DAEMONIZE([at_ns1],
> +[tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
> +[tcpdump.pid])
> +OVS_WAIT_UNTIL([grep 

[ovs-dev] [PATCH v2] netdev-tc-offloads: Don't offload header modification on ip fragments.

2024-05-07 Thread Eelco Chaudron
While offloading header modifications to TC, OVS is using {TCA_PEDIT} +
{TCA_CSUM} combination as that it the only way to represent header
rewrite.  However, {TCA_CSUM} is unable to calculate L4 checksums for
IP fragments.

Since TC already applies fragmentation bit masking, this patch simply
needs to prevent these packets from being processed through TC.

Signed-off-by: Eelco Chaudron 
---
v2: - Fixed and added some comments.
- Use ovs-pcap to compare packets.

NOTE: This patch needs an AVX512 fix before it can be applied.
  Intel is working on this.
---
 lib/netdev-offload-tc.c | 32 ++
 lib/tc.c|  5 ++-
 tests/system-traffic.at | 93 +
 3 files changed, 129 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 921d52317..bdd307933 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1488,6 +1488,30 @@ parse_put_flow_ct_action(struct tc_flower *flower,
 return 0;
 }
 
+static bool
+will_tc_add_l4_checksum(struct tc_flower *flower, int type)
+{
+/* This function returns true if the tc layer will add a l4 checksum action
+ * for this set action.  Refer to the csum_update_flag() function for
+ * detailed logic.  Note that even the kernel only supports updating TCP,
+ * UDP and ICMPv6. */
+switch (type) {
+case OVS_KEY_ATTR_IPV4:
+case OVS_KEY_ATTR_IPV6:
+case OVS_KEY_ATTR_TCP:
+case OVS_KEY_ATTR_UDP:
+switch (flower->key.ip_proto) {
+case IPPROTO_TCP:
+case IPPROTO_UDP:
+case IPPROTO_ICMPV6:
+case IPPROTO_UDPLITE:
+return true;
+}
+break;
+}
+return false;
+}
+
 static int
 parse_put_flow_set_masked_action(struct tc_flower *flower,
  struct tc_action *action,
@@ -1520,6 +1544,14 @@ parse_put_flow_set_masked_action(struct tc_flower 
*flower,
 return EOPNOTSUPP;
 }
 
+if (flower->key.flags & TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT
+&& will_tc_add_l4_checksum(flower, type)) {
+VLOG_DBG_RL(, "set action type %d not supported on fragments "
+"due to checksum limitation", type);
+ofpbuf_uninit(_buf);
+return EOPNOTSUPP;
+}
+
 for (i = 0; i < ARRAY_SIZE(set_flower_map[type]); i++) {
 struct netlink_field *f = _flower_map[type][i];
 
diff --git a/lib/tc.c b/lib/tc.c
index e9bcae4e4..20472d13c 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -2958,7 +2958,10 @@ csum_update_flag(struct tc_flower *flower,
  * eth(dst=),eth_type(0x0800) actions=set(ipv4(src=))
  * we need to force a more specific flow as this can, for example,
  * need a recalculation of icmp checksum if the packet that passes
- * is ICMPv6 and tcp checksum if its tcp. */
+ * is ICMPv6 and tcp checksum if its tcp.
+ *
+ * Ensure that you keep the pre-check function in netdev-offload-tc.c,
+ * will_tc_add_l4_checksum(), in sync if you make any changes here. */
 
 switch (htype) {
 case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index bd7647cbe..8f392abd5 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -8977,3 +8977,96 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: * 
* *5002 *2000 *b85e *00
 
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([datapath - mod_nw_src/set_field on IP fragments])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", 36:b1:ee:7c:01:03)
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", 36:b1:ee:7c:01:02)
+
+AT_DATA([flows.txt], [dnl
+  in_port=ovs-p0,ip,nw_src=10.1.1.1 actions=mod_nw_src=11.1.1.1,ovs-p1
+  in_port=ovs-p0,ipv6,ipv6_src=fc00::1 
actions=set_field:fc00::100->ipv6_src,ovs-p1
+])
+
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl -Oopenflow13 add-flows br0 flows.txt])
+
+NETNS_DAEMONIZE([at_ns1],
+[tcpdump -l -nn -xx -U -i p1 -w p1.pcap 2> tcpdump.err],
+[tcpdump.pid])
+OVS_WAIT_UNTIL([grep "listening" tcpdump.err])
+
+dnl IPv4 Packet content:
+dnl   Ethernet II, Src: 36:b1:ee:7c:01:03, Dst: 36:b1:ee:7c:01:02
+dnl   Type: IPv4 (0x0800)
+dnl   Internet Protocol Version 4, Src: 10.1.1.1, Dst: 10.1.1.2
+dnl   0100  = Version: 4
+dnl    0101 = Header Length: 20 bytes (5)
+dnl   Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
+dnl   Total Length: 38
+dnl   Identification: 0x0001 (1)
+dnl   001.  = Flags: 0x1, More fragments
+dnl   0...  = Reserved bit: Not set
+dnl   .0..  = Don't fragment: Not set
+dnl   ..1.  = More fragments: Set
+dnl   ...0    = Fragment Offset: 0
+dnl   Time to Live: 64
+dnl   Protocol: UDP (17)
+dnl   Header Checksum: 0x44c2
+dnl   Data (18 bytes)
+eth="36 b1 ee 7c 01 02 36 b1 ee 7c