On 8/19/25 11:01 AM, Reuven Plevinsky wrote:
> On 06/08/2025 14:26, Ilya Maximets wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/4/25 4:29 PM, Reuven Plevinsky via dev wrote:
>>> For example, consider the following OpenFlow rule:
>>> - IPv6 -> IPv4 tunnel translation:
>>> add-flow br-int 'in_port=vxlan_br-int_0, \
>>>      actions=set_field:8.8.8.7->tun_src,  \
>>>      set_field:8.8.8.8->tun_dst,set_field:43->tun_id,geneve_br-int_1'
>>>
>>> As any set_field action, flow wildcards are set to the
>>> tun_src/dst. The offload layer fails if not all matches are
>>> handled ("consumed"). Clear the irrelevant wildcards.
>>>
>>> Signed-off-by: Reuven Plevinsky <rplevin...@nvidia.com>
>>> ---
>>>   ofproto/ofproto-dpif-xlate.c | 17 +++++++++++++
>>>   tests/tunnel-push-pop.at     | 48 ++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 65 insertions(+)
>>>
>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>> index 2c8197fb7307..e2ba7e6b0250 100644
>>> --- a/ofproto/ofproto-dpif-xlate.c
>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>> @@ -8078,6 +8078,20 @@ xlate_wc_init(struct xlate_ctx *ctx)
>>>       tnl_wc_init(&ctx->xin->flow, ctx->wc);
>>>   }
>>>
>>> +static void
>>> +clear_tunnel_wc(const struct flow_tnl *spec, struct flow_tnl *mask)
>>> +{
>>> +    if (!spec->ip_src && !spec->ip_dst) {
>>> +        mask->ip_src = 0;
>>> +        mask->ip_dst = 0;
>>> +    }
>>> +    if (!ipv6_addr_is_set(&spec->ipv6_src) &&
>>> +        !ipv6_addr_is_set(&spec->ipv6_dst)) {
>>> +        mask->ipv6_src = in6addr_any;
>>> +        mask->ipv6_dst = in6addr_any;
>>> +    }
>>> +}
>>> +
>>>   static void
>>>   xlate_wc_finish(struct xlate_ctx *ctx)
>>>   {
>>> @@ -8152,6 +8166,9 @@ xlate_wc_finish(struct xlate_ctx *ctx)
>>>       if (!flow_tnl_dst_is_set(&ctx->xin->upcall_flow->tunnel)) {
>>>           memset(&ctx->wc->masks.tunnel, 0, sizeof ctx->wc->masks.tunnel);
>>>       }
>>> +    /* Both IPv4/IPv6 tunnel wc may be set because of set_fields action.
>>> +     * Clear the irrelevant one. */
>>> +    clear_tunnel_wc(&ctx->xin->upcall_flow->tunnel, 
>>> &ctx->wc->masks.tunnel);
>>>   }
>>>
>>>   /* This will tweak the odp actions generated. For now, it will:
>>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
>>> index 64c41460bbb2..9b46d249fe9f 100644
>>> --- a/tests/tunnel-push-pop.at
>>> +++ b/tests/tunnel-push-pop.at
>>> @@ -1440,3 +1440,51 @@ AT_CHECK([tail -1 stdout], [0],
>>>
>>>   OVS_VSWITCHD_STOP
>>>   AT_CLEANUP
>>> +
>>> +AT_SETUP([tunnel - decap and re-encap with outer IP version conversion])
>>> +OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy])
>>> +AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br 
>>> datapath_type=dummy], [0])
>>> +AT_CHECK([ovs-vsctl add-port int-br t1 -- set Interface t1 type=vxlan \
>>> +                       options:key=flow options:remote_ip=flow \
>>> +                    -- add-port int-br t2 -- set Interface t2 type=geneve \
>>> +                       options:key=flow options:remote_ip=flow \
>>> +                       ], [0])
>>> +
>>> +AT_CHECK([ovs-ofctl add-flow br0 action=normal])
>>> +dnl Setup dummy interface IP addresses.
>>> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.1.1/24], [0], [OK
>>> +])
>>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::1/24], [0], [OK
>>> +])
>>> +dnl Checking that a local routes for added IPs were successfully installed.
>>> +AT_CHECK([ovs-appctl ovs/route/show | grep Cached | sort], [0], [dnl
>>> +Cached: 1.1.1.0/24 dev br0 SRC 1.1.1.1 local
>>> +Cached: 2001:ca00::/24 dev br0 SRC 2001:cafe::1 local
>>> +])
>>> +dnl Add entries
>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 1.1.1.2 aa:bb:cc:00:00:01], [0], [OK
>>> +])
>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::2 aa:bb:cc:00:00:01], 
>>> [0], [OK
>>> +])
>>> +AT_CHECK([ovs-appctl tnl/neigh/show | grep br0 | sort], [0], [dnl
>>> +1.1.1.2                                       aa:bb:cc:00:00:01   br0
>>> +2001:cafe::2                                  aa:bb:cc:00:00:01   br0
>>> +])
>>> +
>>> +AT_CHECK([ovs-ofctl add-flow int-br 
>>> 'in_port=t1,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun_id=0x7a,actions=set_field:1.1.1.1->tun_src,set_field:1.1.1.2->tun_dst,set_field:0x7b->tun_id,t2'])
>>> +AT_CHECK([ovs-ofctl add-flow int-br 
>>> 'in_port=t2,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_id=0x7b,actions=set_field:2001:cafe::1->tun_ipv6_src,set_field:2001:cafe::2->tun_ipv6_dst,set_field:0x7a->tun_id,set_field:0.0.0.0->tun_src,set_field:0.0.0.0->tun_dst,t1'])
>>> +
>>> +AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy 
>>> 'tunnel(tun_id=0x7a,ipv6_src=2001:cafe::1,ipv6_dst=2001:cafe::2,ttl=64),in_port(4789)'],
>>>  [0], [stdout])
>>> +AT_CHECK([tail -2 stdout], [0],
>>> +  [Megaflow: 
>>> recirc_id=0,eth,tun_id=0x7a,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun_tos=0,tun_flags=-df-csum+key,in_port=t1,dl_type=0x05ff
>>> +Datapath actions: 
>>> tnl_push(tnl_port(genev_sys_6081),header(size=50,type=5,eth(dst=aa:bb:cc:00:00:01,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.1.1,dst=1.1.1.2,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=6081,csum=0x0),geneve(vni=0x7b)),out_port(br0)),p0
>>> +])
>>> +
>>> +AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy 
>>> 'tunnel(tun_id=0x7b,src=1.1.1.1,dst=1.1.1.2,ttl=64),in_port(6081)'], [0], 
>>> [stdout])
>>> +AT_CHECK([tail -2 stdout], [0],
>>> +  [Megaflow: 
>>> recirc_id=0,eth,tun_id=0x7b,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_tos=0,tun_flags=-df-csum+key,in_port=t2,dl_type=0x05ff
>>> +Datapath actions: 
>>> tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=aa:bb:cc:00:00:01,src=aa:55:aa:55:00:00,dl_type=0x86dd),ipv6(src=2001:cafe::1,dst=2001:cafe::2,label=0,proto=17,tclass=0x0,hlimit=64),udp(src=0,dst=4789,csum=0xffff),vxlan(flags=0x8000000,vni=0x7a)),out_port(br0)),p0
>>> +])
>>> +
>>> +OVS_VSWITCHD_STOP
>>> +AT_CLEANUP
>> Hi, Reuven.  Thanks for the update and the test.  I see what you're
>> trying to achive now.
>>
>> However, I don't think this is a right approach as we can't actually
>> clear tunnel metadata at xlate_wc_finish() stage.  This is because
>> we must have prerequisites unwildcarded in order to clear the masks
>> that we think are unnecessary.  And tunnel metadata doesn't have
>> any prerequisites, so there is no way to tell if it is necessary to
>> match on or not.  The main problem is a match on empty metadata that
>> the user can add into OpenFlow rules.  We must never clear those as
>> it may be necessary to distinguish tunneled and non-tunneled packets.
>> E.g. match on all-zero tun_dst signifies that the packet is not coming
>> from an ipv4 tunnel.  Note that input port match is not enough as a
>> prerequisite as controller can inject packets from any port.
>>
>> For example, if you add the following change to your test:
>>
>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
>> index 9b46d249f..1d19e7e5b 100644
>> --- a/tests/tunnel-push-pop.at
>> +++ b/tests/tunnel-push-pop.at
>> @@ -1473,6 +1473,13 @@ AT_CHECK([ovs-appctl tnl/neigh/show | grep br0 | 
>> sort], [0], [dnl
>>
>>   AT_CHECK([ovs-ofctl add-flow int-br 
>> 'in_port=t1,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun_id=0x7a,actions=set_field:1.1.1.1->tun_src,set_field:1.1.1.2->tun_dst,set_field:0x7b->tun_id,t2'])
>>   AT_CHECK([ovs-ofctl add-flow int-br 
>> 'in_port=t2,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_id=0x7b,actions=set_field:2001:cafe::1->tun_ipv6_src,set_field:2001:cafe::2->tun_ipv6_dst,set_field:0x7a->tun_id,set_field:0.0.0.0->tun_src,set_field:0.0.0.0->tun_dst,t1'])
>> +AT_CHECK([ovs-ofctl add-flow int-br 
>> 'tun_dst=0.0.0.0,tun_ipv6_src=::,actions=drop'])
>> +
>> +AT_CHECK([ovs-appctl ofproto/trace --names int-br in_port=t1], [0], 
>> [stdout])
>> +AT_CHECK([tail -2 stdout], [0],
>> +  [Megaflow: 
>> recirc_id=0,eth,tun_dst=0.0.0.0,tun_ipv6_src=::,in_port=t1,dl_type=0x0000
>> +Datapath actions: drop
>> +])
>>
>>   AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy 
>> 'tunnel(tun_id=0x7a,ipv6_src=2001:cafe::1,ipv6_dst=2001:cafe::2,ttl=64),in_port(4789)'],
>>  [0], [stdout])
>>   AT_CHECK([tail -2 stdout], [0],
>> ---
>>
>> The test will fail and the actual datapath flow will look like this:
>>
>>    Megaflow: recirc_id=0,eth,in_port=t1,dl_type=0x0000
>>
>> Such a flow will drop all the traffic and not just non-tunneled one
>> as it was intended by the OpenFlow rules.
>>
>> This issue actually already exist for a while and need to also revert
>> the previous commit:
>>    d046453b56a9 ("ofproto-dpif-xlate: Clear tunnel wc bits if original 
>> packet is non-tunnel.")
>> Though, I think, we need a better solution for this before reverting
>> as conditions where it would be a problem are kind of unlikely.
>>
>> A proper solution would be to avoid unwildcarding these fields while
>> setting the metadata through varois OpenFlow actions (set_field, load,
>> stack pop, etc.) but preserve the masks coming from the classifier.
>> The reason why we can actually avoid unwildcarding while setting the
>> fileds is that unlike other fileds, tunnel metadata is not a subject
>> for the optimization where we skip datapath actions if the value is
>> already the same.  So, there is no real reason to unwildcard them.
>>
>> Do you want to try and work on this kind of change?
>>
>> BTW, I didn't test, but it seems like IPV4/IPv6 mix is a problem not
>> only for the hardware offloading code, but also for the kernel datapath
>> in general as it will reject to install the flow with mixed metadata
>> types.  Userspace works fine in that case though.  So, technicaly,
>> we also need to probe the datapath implementation for support and
>> slowpath such traffic, if it doesn't, i.e. set the SLOW_MATCH flag.
>>
>> Best regards, Ilya Maximets.
> 
> Hi Ilya,
> 
> Thanks a lot for your review and comments. Since this issue is specific 
> to offload, and is not supported by the current upstream DPDK 
> implementation (while it is supported by our DOCA implementation),

It should be supported by TC offload implementation, AFAIU.  It also
may be a problem for the kernel datapath without offload as I mentioned
above.

> we’ve 
> decided to take a different approach and handle it within the DOCA 
> offload provider instead. For now, we will abandon this change.

The same as we can't simply clear these bits at the wc_finish stage,
the datapath or the offload implementaion can't make this decision either
without breaking user-specified OpenFlow logic.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to