On 9/4/25 1:57 PM, Yael Chemla wrote:
> On 01/09/2025 22:15, Ilya Maximets wrote:
>> On 8/26/25 2:31 PM, Yael Chemla wrote:
>>> On 25/07/2025 22:02, Ilya Maximets wrote:
>>>> Hi, Yael.  Thanks for the patch.
>>>>
>>>> On 7/23/25 10:13 PM, Yael Chemla wrote:
>>>>> Commit aee4f9aec ("netdev-offload-tc: Match against tunnel flags
>>>>> if supported") added support to match against 'csum' and 'df' tunnel
>>>>> flags in TC, assuming they’re only set when actually needed. But
>>>>> the code in tnl_wc_init() always sets these flags by default,
>>>>> without checking if HW offload is enabled or if the flow explicitly
>>>>> requests them. This breaks HW offload, since mlx5 (and other NICs)
>>>>> do not support offloading when these flags are set.
>>>>>
>>>>> To work around this, users are currently have to add the following
>>>>> options to their flows "options:df_default=false options:csum=false".
>>>>> This patch helps by adjusting the wildcard mask according to actual
>>>>> tunnel config. If 'csum' or 'df' are disabled, the mask will no
>>>>> longer include them.
>>>>
>>>> These are the fields of the packet header and they nave nothing to do
>>>> with the configuration of the tunnel port on the destination.  Instead,
>>>> they depend on the tunnel configuration on the traffic source.  So this
>>>> doesn't make a lot of sense to match or not match based on the
>>>> configuration of the destination port.
>>>>
>>>
>>> Hi Ilya, thank you for the clarifications.
>>> It seems we misunderstood the semantics of these options.
>>>>>
>>>>> Note that the decision to match these flags depends on
>>>>> `enc_flags_support`, which is probed once via
>>>>> `probe_enc_flags_support()`. In most cases, this probe is performed
>>>>> against the ovs-system device with `tc_policy=SKIP_HW`, which causes
>>>>> the probe to succeed regardless of whether the underlying hardware
>>>>> can actually offload these flags. Therefore, relying solely on
>>>>> `enc_flags_support` is insufficient and can result in incorrect
>>>>> behavior. This patch works around the issue by ensuring that even
>>>>> if support is globally enabled, the wildcard mask reflects the actual
>>>>> per-port tunnel configuration to avoid unnecessary matches that would
>>>>> break hardware offload.
>>>>
>>>> Again, this match represents the actual packet header that depends on
>>>> the configuration of the tunnel port on the traffic source.
>>>>
>>>> Can this be resolved on a kernel level, i.e. in the driver?  These are
>>>> just packet header fields, so it should be possible for the NIC to set
>>>> them.  E.g. the csum bit should be set if the udphdr->check != 0.
>>>>
>>>
>>> In the case of the mlx5 driver, when hardware offload is enabled the
>>> hardware cannot match on DF or CSUM. From your perspective, would it be
>>> acceptable in such a case to skip matching those bits?
>>
>> I don't think so.  We can change OVS to avoid matching on these flags
>> when it is not necessary, but datapath implementation should not ignore
>> things in general, especially since matching on these bits is exposed
>> not only through OVS but is also available for users setting up TC
>> flower directly.
> 
> We can choose not to match on DF/CSUM if using TC flower directly, while
> OVS doesn't provide that flexibility. The hw-offload is broken after
> kernel commit aee4f9aec ("netdev-offload-tc: Match against tunnel flags
> if supported").>

Sure, as I said before, we can also change OVS to avoid matches on these
flags when they are not necessary.  But, again, the datapath itself should
not ignore the bits, as it can't, in general, make that kind of decisions.

>> Is it a limitation in the hardware?  Seems strange that it can match
>> on the key flag, but can't match on df bit.  Or is the key flag also
>> mostly just ignored?
> 
> Yes, this is indeed a hardware limitation. From the kernel patches, it
> seems to be a limitation across all mainstream NICs:
> 5a1b015d521d ice: flower: validate encapsulation control flags
> 34cdd9847820 nfp: flower: validate encapsulation control flags
> 28d19ec91755 net/mlx5e: flower: validate encapsulation control flags
> 2ede54f8786f sfc: use flow_rule_is_supp_enc_control_flags()
> b48a1540b73a flow_offload: add encapsulation control flag helpers

These commits were added as a stopgap to avoid falsely accept flows offload
for which is not implemented in the driver.  I highly doubt that hardware
capabilities were consulted before making the change.  The thing is that
driver needs to have an explicit support and they do not today.  It doesn't
mean the hardware can't extract this information.

> Which key flag are you referring to? Did you mean key:option?

I meant the IP_TUNNEL_KEY_BIT that signifies existence of the tunnel ID
(which is a different thing for different tunnels, e.g. always present for
VXLAN, but is optional for GRE).

>>>> From OVS perspective, kernel reports that these fields are supported in
>>>> TC, so we use them, because that's the correct thing to do.  There is
>>>> no real way to probe the actual card, especially for tunnel ports, as
>>>> offloading is happening through the bridge port instead.
>>>
>>> Currently, probe_enc_flags_support() does not consider whether hardware
>>> offload is enabled. We have another patch for netdev-offload-tc.c that
>>> modifies this probing logic to explicitly test whether TC can match on
>>> DF and CSUM.
>>
>> How are you probing this?  I guess it's not by trying to install flows
>> with skip_sw, as you mentioned above that it's not enough, but I'm not
>> sure how the probing can be done otherwise.
> 
> Yes, it seems no other way but installing flow with skip_sw to probe.
> I’ll send the patch shortly. In essence, it tries a tc_replace_flower
> with the CSUM and DF tunnel flags on tunnel devices, and updates the
> enc_flags_support accordingly based on the result.

I'm still not sure how that supposed to work when the tunnel flows are
installed on the bridge ingress qdisc.

>>>> There is an argument that can be made that OVS doesn't really need to
>>>> match on these two bits, but it's a different argument with a different
>>>> implementation.  And we may need to match on these bits in the future.
>>>>
>>> Could you please point me to that discussion? I’d like to review the
>>> correspondence to better understand the context.
>>
>> "can be made", i.e. it was not previously discussed, so someone needs
>> to make that argument.  The base idea would be that these flags are not
>> exposed through OpenFlow today and OVS doesn't make a lot of decisions
>> based on these two flags.  At the same time the argument will fall apart
>> as soon some of these conditions are no longer true, in which case we
>> still need the datapath to behave correctly, i.e. reject the flows that
>> it can't properly support.
> 
> Thanks for clarifying. Understood that the base idea is these flags are
> not currently exposed via OpenFlow, and OVS doesn’t rely heavily on them
> today. But as you point out, this may change in the future, and we’ll
> still need the datapath to reject flows it cannot properly support.>

This can be properly implemented if we not include the bits in the mask
unless they are set by the classifier.  This way if users will explicitly
ask for a match on these bits through OpenFlow, such flows will not be
offloaded, but we have many other non-offloadable matches and actions
today, so it should be fine.  At the same time, if the driver can be
updated to properly match on the bits, that would be better.

>>>> Beside the patch being corrupted, as Aaron mentioned, it is also missing
>>>> some unit tests for the change.
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to