On 9/7/25 1:33 PM, Eli Britstein wrote:
>
>
>> -----Original Message-----
>> From: Ilya Maximets <[email protected]>
>> Sent: Thursday, 21 August 2025 13:32
>> To: Reuven Plevinsky <[email protected]>; [email protected]
>> Cc: [email protected]; Eli Britstein <[email protected]>; Maor Dickman
>> <[email protected]>
>> Subject: Re: [ovs-dev] [PATCH v2 1/1] ofproto-dpif-xlate: Clear tunnel
>> wildcard
>> bits properly.
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/19/25 11:01 AM, Reuven Plevinsky wrote:
>>> On 06/08/2025 14:26, Ilya Maximets wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 8/4/25 4:29 PM, Reuven Plevinsky via dev wrote:
>>>>> For example, consider the following OpenFlow rule:
>>>>> - IPv6 -> IPv4 tunnel translation:
>>>>> add-flow br-int 'in_port=vxlan_br-int_0, \
>>>>> actions=set_field:8.8.8.7->tun_src, \
>>>>> set_field:8.8.8.8->tun_dst,set_field:43->tun_id,geneve_br-int_1'
>>>>>
>>>>> As any set_field action, flow wildcards are set to the tun_src/dst.
>>>>> The offload layer fails if not all matches are handled ("consumed").
>>>>> Clear the irrelevant wildcards.
>>>>>
>>>>> Signed-off-by: Reuven Plevinsky <[email protected]>
>>>>> ---
>>>>> ofproto/ofproto-dpif-xlate.c | 17 +++++++++++++
>>>>> tests/tunnel-push-pop.at | 48
>> ++++++++++++++++++++++++++++++++++++
>>>>> 2 files changed, 65 insertions(+)
>>>>>
>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c
>>>>> b/ofproto/ofproto-dpif-xlate.c index 2c8197fb7307..e2ba7e6b0250
>>>>> 100644
>>>>> --- a/ofproto/ofproto-dpif-xlate.c
>>>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>>>> @@ -8078,6 +8078,20 @@ xlate_wc_init(struct xlate_ctx *ctx)
>>>>> tnl_wc_init(&ctx->xin->flow, ctx->wc);
>>>>> }
>>>>>
>>>>> +static void
>>>>> +clear_tunnel_wc(const struct flow_tnl *spec, struct flow_tnl *mask)
>>>>> +{
>>>>> + if (!spec->ip_src && !spec->ip_dst) {
>>>>> + mask->ip_src = 0;
>>>>> + mask->ip_dst = 0;
>>>>> + }
>>>>> + if (!ipv6_addr_is_set(&spec->ipv6_src) &&
>>>>> + !ipv6_addr_is_set(&spec->ipv6_dst)) {
>>>>> + mask->ipv6_src = in6addr_any;
>>>>> + mask->ipv6_dst = in6addr_any;
>>>>> + }
>>>>> +}
>>>>> +
>>>>> static void
>>>>> xlate_wc_finish(struct xlate_ctx *ctx)
>>>>> {
>>>>> @@ -8152,6 +8166,9 @@ xlate_wc_finish(struct xlate_ctx *ctx)
>>>>> if (!flow_tnl_dst_is_set(&ctx->xin->upcall_flow->tunnel)) {
>>>>> memset(&ctx->wc->masks.tunnel, 0, sizeof ctx->wc->masks.tunnel);
>>>>> }
>>>>> + /* Both IPv4/IPv6 tunnel wc may be set because of set_fields action.
>>>>> + * Clear the irrelevant one. */
>>>>> + clear_tunnel_wc(&ctx->xin->upcall_flow->tunnel,
>>>>> + &ctx->wc->masks.tunnel);
>>>>> }
>>>>>
>>>>> /* This will tweak the odp actions generated. For now, it will:
>>>>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
>>>>> index 64c41460bbb2..9b46d249fe9f 100644
>>>>> --- a/tests/tunnel-push-pop.at
>>>>> +++ b/tests/tunnel-push-pop.at
>>>>> @@ -1440,3 +1440,51 @@ AT_CHECK([tail -1 stdout], [0],
>>>>>
>>>>> OVS_VSWITCHD_STOP
>>>>> AT_CLEANUP
>>>>> +
>>>>> +AT_SETUP([tunnel - decap and re-encap with outer IP version
>>>>> +conversion]) OVS_VSWITCHD_START([add-port br0 p0 -- set Interface
>>>>> +p0 type=dummy]) AT_CHECK([ovs-vsctl add-br int-br -- set bridge
>>>>> +int-br datapath_type=dummy], [0]) AT_CHECK([ovs-vsctl add-port int-br
>> t1 -- set Interface t1 type=vxlan \
>>>>> + options:key=flow options:remote_ip=flow \
>>>>> + -- add-port int-br t2 -- set Interface t2
>>>>> type=geneve \
>>>>> + options:key=flow options:remote_ip=flow \
>>>>> + ], [0])
>>>>> +
>>>>> +AT_CHECK([ovs-ofctl add-flow br0 action=normal]) dnl Setup dummy
>>>>> +interface IP addresses.
>>>>> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.1.1/24], [0], [OK
>>>>> +])
>>>>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::1/24],
>>>>> +[0], [OK
>>>>> +])
>>>>> +dnl Checking that a local routes for added IPs were successfully
>>>>> installed.
>>>>> +AT_CHECK([ovs-appctl ovs/route/show | grep Cached | sort], [0],
>>>>> +[dnl
>>>>> +Cached: 1.1.1.0/24 dev br0 SRC 1.1.1.1 local
>>>>> +Cached: 2001:ca00::/24 dev br0 SRC 2001:cafe::1 local
>>>>> +])
>>>>> +dnl Add entries
>>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 1.1.1.2 aa:bb:cc:00:00:01],
>>>>> +[0], [OK
>>>>> +])
>>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::2
>>>>> +aa:bb:cc:00:00:01], [0], [OK
>>>>> +])
>>>>> +AT_CHECK([ovs-appctl tnl/neigh/show | grep br0 | sort], [0], [dnl
>>>>> +1.1.1.2 aa:bb:cc:00:00:01 br0
>>>>> +2001:cafe::2 aa:bb:cc:00:00:01 br0
>>>>> +])
>>>>> +
>>>>> +AT_CHECK([ovs-ofctl add-flow int-br
>>>>> +'in_port=t1,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun
>>>>> +_id=0x7a,actions=set_field:1.1.1.1->tun_src,set_field:1.1.1.2->tun_
>>>>> +dst,set_field:0x7b->tun_id,t2']) AT_CHECK([ovs-ofctl add-flow
>>>>> +int-br
>>>>> +'in_port=t2,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_id=0x7b,actions=set
>>>>> +_field:2001:cafe::1->tun_ipv6_src,set_field:2001:cafe::2->tun_ipv6_
>>>>> +dst,set_field:0x7a->tun_id,set_field:0.0.0.0->tun_src,set_field:0.0
>>>>> +.0.0->tun_dst,t1'])
>>>>> +
>>>>> +AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy
>>>>> +'tunnel(tun_id=0x7a,ipv6_src=2001:cafe::1,ipv6_dst=2001:cafe::2,ttl
>>>>> +=64),in_port(4789)'], [0], [stdout]) AT_CHECK([tail -2 stdout],
>>>>> +[0],
>>>>> + [Megaflow:
>>>>> +recirc_id=0,eth,tun_id=0x7a,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=
>>>>> +2001:cafe::2,tun_tos=0,tun_flags=-df-csum+key,in_port=t1,dl_type=0x
>>>>> +05ff Datapath actions:
>>>>> +tnl_push(tnl_port(genev_sys_6081),header(size=50,type=5,eth(dst=aa:
>>>>> +bb:cc:00:00:01,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.1
>>>>> +.1,dst=1.1.1.2,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=608
>>>>> +1,csum=0x0),geneve(vni=0x7b)),out_port(br0)),p0
>>>>> +])
>>>>> +
>>>>> +AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy
>>>>> +'tunnel(tun_id=0x7b,src=1.1.1.1,dst=1.1.1.2,ttl=64),in_port(6081)']
>>>>> +, [0], [stdout]) AT_CHECK([tail -2 stdout], [0],
>>>>> + [Megaflow:
>>>>> +recirc_id=0,eth,tun_id=0x7b,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_tos
>>>>> +=0,tun_flags=-df-csum+key,in_port=t2,dl_type=0x05ff
>>>>> +Datapath actions:
>>>>> +tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=aa:
>>>>> +bb:cc:00:00:01,src=aa:55:aa:55:00:00,dl_type=0x86dd),ipv6(src=2001:
>>>>> +cafe::1,dst=2001:cafe::2,label=0,proto=17,tclass=0x0,hlimit=64),udp
>>>>> +(src=0,dst=4789,csum=0xffff),vxlan(flags=0x8000000,vni=0x7a)),out_p
>>>>> +ort(br0)),p0
>>>>> +])
>>>>> +
>>>>> +OVS_VSWITCHD_STOP
>>>>> +AT_CLEANUP
>>>> Hi, Reuven. Thanks for the update and the test. I see what you're
>>>> trying to achive now.
>>>>
>>>> However, I don't think this is a right approach as we can't actually
>>>> clear tunnel metadata at xlate_wc_finish() stage. This is because we
>>>> must have prerequisites unwildcarded in order to clear the masks that
>>>> we think are unnecessary. And tunnel metadata doesn't have any
>>>> prerequisites, so there is no way to tell if it is necessary to match
>>>> on or not. The main problem is a match on empty metadata that the
>>>> user can add into OpenFlow rules. We must never clear those as it
>>>> may be necessary to distinguish tunneled and non-tunneled packets.
>>>> E.g. match on all-zero tun_dst signifies that the packet is not
>>>> coming from an ipv4 tunnel. Note that input port match is not enough
>>>> as a prerequisite as controller can inject packets from any port.
>>>>
>>>> For example, if you add the following change to your test:
>>>>
>>>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
>>>> index 9b46d249f..1d19e7e5b 100644
>>>> --- a/tests/tunnel-push-pop.at
>>>> +++ b/tests/tunnel-push-pop.at
>>>> @@ -1473,6 +1473,13 @@ AT_CHECK([ovs-appctl tnl/neigh/show | grep
>> br0
>>>> | sort], [0], [dnl
>>>>
>>>> AT_CHECK([ovs-ofctl add-flow int-br
>> 'in_port=t1,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun_id=0x7a
>> ,actions=set_field:1.1.1.1->tun_src,set_field:1.1.1.2->tun_dst,set_field:0x7b-
>>> tun_id,t2'])
>>>> AT_CHECK([ovs-ofctl add-flow int-br
>>>> 'in_port=t2,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_id=0x7b,actions=set_f
>>>> ield:2001:cafe::1->tun_ipv6_src,set_field:2001:cafe::2->tun_ipv6_dst,
>>>> set_field:0x7a->tun_id,set_field:0.0.0.0->tun_src,set_field:0.0.0.0->
>>>> tun_dst,t1'])
>>>> +AT_CHECK([ovs-ofctl add-flow int-br
>>>> +'tun_dst=0.0.0.0,tun_ipv6_src=::,actions=drop'])
>>>> +
>>>> +AT_CHECK([ovs-appctl ofproto/trace --names int-br in_port=t1], [0],
>>>> +[stdout]) AT_CHECK([tail -2 stdout], [0],
>>>> + [Megaflow:
>>>> +recirc_id=0,eth,tun_dst=0.0.0.0,tun_ipv6_src=::,in_port=t1,dl_type=0
>>>> +x0000
>>>> +Datapath actions: drop
>>>> +])
>>>>
>>>> AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy
>> 'tunnel(tun_id=0x7a,ipv6_src=2001:cafe::1,ipv6_dst=2001:cafe::2,ttl=64),in_p
>> ort(4789)'], [0], [stdout])
>>>> AT_CHECK([tail -2 stdout], [0],
>>>> ---
>>>>
>>>> The test will fail and the actual datapath flow will look like this:
>>>>
>>>> Megaflow: recirc_id=0,eth,in_port=t1,dl_type=0x0000
>>>>
>>>> Such a flow will drop all the traffic and not just non-tunneled one
>>>> as it was intended by the OpenFlow rules.
>>>>
>>>> This issue actually already exist for a while and need to also revert
>>>> the previous commit:
>>>> d046453b56a9 ("ofproto-dpif-xlate: Clear tunnel wc bits if
>>>> original packet is non-tunnel.") Though, I think, we need a better
>>>> solution for this before reverting as conditions where it would be a
>>>> problem
>> are kind of unlikely.
>>>>
>>>> A proper solution would be to avoid unwildcarding these fields while
>>>> setting the metadata through varois OpenFlow actions (set_field,
>>>> load, stack pop, etc.) but preserve the masks coming from the classifier.
>>>> The reason why we can actually avoid unwildcarding while setting the
>>>> fileds is that unlike other fileds, tunnel metadata is not a subject
>>>> for the optimization where we skip datapath actions if the value is
>>>> already the same. So, there is no real reason to unwildcard them.
>>>>
>>>> Do you want to try and work on this kind of change?
>>>>
>>>> BTW, I didn't test, but it seems like IPV4/IPv6 mix is a problem not
>>>> only for the hardware offloading code, but also for the kernel
>>>> datapath in general as it will reject to install the flow with mixed
>>>> metadata types. Userspace works fine in that case though. So,
>>>> technicaly, we also need to probe the datapath implementation for
>>>> support and slowpath such traffic, if it doesn't, i.e. set the SLOW_MATCH
>> flag.
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>> Hi Ilya,
>>>
>>> Thanks a lot for your review and comments. Since this issue is
>>> specific to offload, and is not supported by the current upstream DPDK
>>> implementation (while it is supported by our DOCA implementation),
>>
>> It should be supported by TC offload implementation, AFAIU. It also may be a
>> problem for the kernel datapath without offload as I mentioned above.
>>
>>> we’ve
>>> decided to take a different approach and handle it within the DOCA
>>> offload provider instead. For now, we will abandon this change.
>>
>> The same as we can't simply clear these bits at the wc_finish stage, the
>> datapath or the offload implementaion can't make this decision either without
>> breaking user-specified OpenFlow logic.
>
> Hi Ilya,
> Could you please elaborate on your concern about a potential breakage?
> Could you please provide an example?
See the test case I provided in my previous reply above. It shows that flow
with an overly broad match is getting generated and it will match traffic
that was not intended to be matched by the provided set of OpenFlow rules.
> I'd like to refer you to 2 other commits that are doing something like what
> we want to do:
> 262eded5fbd7 ("netdev-offload-tc: Fix ignoring unknown tunnel keys.")
This was a temporary workaround and it is now removed for kernels that
support matching on tunnel flags. You can see a lot of "This is wrong!"
comments in this change.
The commit message of this patch also describes other issues like such
flows will be frequently removed and re-added during revalidation, i.e.
un-offloaded and offlaoded back, because ofproto layer will see the overly
broad match and try to re-generate the narrower one on every revalidation
cycle.
> 0ef70536eb41 ("netdev-offload-dpdk: Support vxlan encap offload with load
> actions.")
This only affects the case where the original packet wasn't from a tunnel,
which is not a case here. You're trying to solve the case where the packet
is actually from a tunnel. And we'll need to revert 0ef70536eb41 along with
d046453b56a9 when we have a proper solution that doesn't unmask tunnel
metadata fields on 'set'-style actions, but preserves values comming from
the classifier, as both of these commits are practically the same and so
have the same issue with potentially installing overly broad datapath flows.
(I provided reasoning on why we can avoid unwildcarding on actions earlier
in this thread.)
All in all, the code on current main is not really correct. But the point
is to avoid making it even less correct. Instead, work towards a proper
solution and revert all the incorrect parts.
Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev