On 28.07.2025 19:59, Han Zhou wrote:


On Fri, Jul 25, 2025 at 8:19 AM Frode Nordahl <fnord...@ubuntu.com 
<mailto:fnord...@ubuntu.com>> wrote:
 >
 > While there is a hardware offload friendly approach to fixing
 > processing of fragmented traffic to load balancers in 8e6f9a8355e2
 > ("northd: Fix HW offload problem related to ct_tuple."), the change
 > is quite invasive, and not compatible with our requirement to make
 > UDP fragmentation work for switch egress pipeline on user space
 > data path in existing releases.
 >
 > In a subsequent patch an optional workaround to this problem will be
 > provided, which relies on these fields being available.

Hi Frode, thanks for the patch.

Hello, Han, thank you for taking a look at it!

To my understanding, this patch itself is the workaround for supporting 
fragmented UDP in ACLs by just putting fields like ct_udp.dst in the match, 
correct? The work-in-progress [0] seems to be the real solution instead of a 
workaround. Could you clarify a little?

The work-in-progress in [0] does indeed have the goal of being a real solution, 
containing the fix to this issue in OVN, avoiding the need to coordinate a fix 
with CMS.

We do however need a language to express intent between ovn-northd and 
ovn-controller, and this patch was sent ahead of time with hope of making the 
required field(s) available awaiting completion of [0].

Unfortunately the OVN Northbound ACL match column and OVN Southbound 
Logical_Flow match column appear to share namespace and processing in OVN 
[1][2][3], and consequently expose this API in its entirety to the CMS through 
the Northbound API.

1: 
https://github.com/ovn-org/ovn/blob/ab19375413cfc6429172d3101f125a8426a20b25/ovn-nb.xml#L2558-L2579
2: 
https://github.com/ovn-org/ovn/blob/ab19375413cfc6429172d3101f125a8426a20b25/northd/northd.c#L7425-L7426
3: 
https://github.com/ovn-org/ovn/blob/ab19375413cfc6429172d3101f125a8426a20b25/northd/northd.c#L7470-L7471


 >
 > The documentation for OVN Northbound ACL table already states that
 > the match column uses the same expression language as the OVN
 > Southbound Logical_Flow match column.
 >
 > As we know, the use of these fields may be problematic for hardware
 > offload, consequently a note is added to discourage their use.
 >
 > Related: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2115795 
<https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2115795>
 > Related: https://issues.redhat.com/browse/FDP-684 
<https://issues.redhat.com/browse/FDP-684>
 > Signed-off-by: Frode Nordahl <fnord...@ubuntu.com 
<mailto:fnord...@ubuntu.com>>
 > ---
 >
 > As discussed in the IRC meeting July 24th, I post a prerequisite patch
 > in anticipation of a patch that provides a workaround for a bug
 > preventing fragmented UDP packets to pass ACLs that match on protocol
 > and port numbers.
 >
 > There is a work in progress available [0], and we intend to propose
 > this optional workaround behind a NB_Global bug flag which can be used
 > until a CT_EXTRACT based stage is made available for the switch egress
 > pipeline.
 >

Just want to clarify here. I think the point (as emphasized by @Ilya Maximets ) 
was that the CT fields are at a lower layer. So the real reason for this 
workaround is not about missing CT_EXTRACT stage but about the complexity of 
parsing and rewriting tcp/udp/sctp fields from user defined ACLs to use CT 
metadata. Of course if we move forward to implement the rewriting approach, 
CT_EXTRACT stage would be needed to make sure HW offload works.

Agreed, the reason for bringing up the CT_EXTRACT stage in this context is 
because if we had them for the switch egress pipeline, we could use the 
registers in the logical flow language instead of bringing back the ct_ logical 
fields.

However, our ultimate goal is also to make this fix available for already 
released versions (at least for UDP), and the recent discussion suggested that 
backport of pipeline changes such as the CT_EXTRACT work would likely be off 
the table.

Or did I misunderstand this part of the discussion?

 > 0: https://github.com/sombrafam/ovn/commits/fix-fragmentation-for-acls/ 
<https://github.com/sombrafam/ovn/commits/fix-fragmentation-for-acls/>
 >
 >  lib/logical-fields.c | 15 +++++++++++++++
 >  ovn-nb.xml           |  8 ++++++++
 >  tests/ovn.at <http://ovn.at>         |  7 +++++++
 >  3 files changed, 30 insertions(+)
 >
 > diff --git a/lib/logical-fields.c b/lib/logical-fields.c
 > index e479a78c1..efbc75b72 100644
 > --- a/lib/logical-fields.c
 > +++ b/lib/logical-fields.c
 > @@ -359,6 +359,21 @@ ovn_init_symtab(struct shash *symtab)
 >
 >      expr_symtab_add_ovn_field(symtab, "icmp4.frag_mtu", OVN_ICMP4_FRAG_MTU);
 >      expr_symtab_add_ovn_field(symtab, "icmp6.frag_mtu", OVN_ICMP6_FRAG_MTU);
 > +
 > +    expr_symtab_add_field(symtab, "ct_proto", MFF_CT_NW_PROTO,
 > +                          "ct.trk", false);
 > +
 > +    expr_symtab_add_predicate(symtab, "ct_udp", "ct_proto == 17");
 > +    expr_symtab_add_field(symtab, "ct_udp.dst", MFF_CT_TP_DST,
 > +                          "ct_udp", false);
 > +
 > +    expr_symtab_add_predicate(symtab, "ct_tcp", "ct_proto == 6");
 > +    expr_symtab_add_field(symtab, "ct_tcp.dst", MFF_CT_TP_DST,
 > +                          "ct_tcp", false);
 > +
 > +    expr_symtab_add_predicate(symtab, "ct_sctp", "ct_proto == 132");
 > +    expr_symtab_add_field(symtab, "ct_sctp.dst", MFF_CT_TP_DST,
 > +                          "ct_sctp", false);
 >  }
 >

These symbols need to be documented in ovn-sb.xml.

Ack.

 >  const char *
 > diff --git a/ovn-nb.xml b/ovn-nb.xml
 > index 4a7581807..caf946eac 100644
 > --- a/ovn-nb.xml
 > +++ b/ovn-nb.xml
 > @@ -2576,6 +2576,14 @@ or
 >          Note that you can not create an ACL matching on a port with
 >          type=router or type=localnet.
 >        </p>
 > +
 > +      <p>
 > +        Note that matching directly on ct_state subfields such as
 > +        <code>ct.new <http://ct.new></code>, <code>ct.trk</code>, or 
connection tracker
 > +        metadata fields such as <code>ct_proto</code>, <code>ct_udp</code>,
 > +        <code>ct_tcp</code>, <code>ct_sctp</code> is not recommended as 
their
 > +        use may impact performance and prevent hardware offload from 
working.

As mentioned above, the reason for discouraging the use of these CT fields is 
not only about performance, but also because they are at a lower level, and 
related to internal implementation of the upper level (user-friendly) fields. 
Shall we be clear about this in this note? Probably we should also mention that 
it is only needed for supporting fragmented packets in ACL, mainly for UDP.

In general I am ok with this workaround if we make these notes clear so that 
the users understand the risks. But we should also understand that once these 
fields are released, it would be hard to remove them in the future considering 
backward compatibility. Thoughts?

I definitively feel the weight and risk of eternally having to support an API 
once making it available.

Would it help/an alternative be to find some way of having these fields 
strictly as internal OVN API, existing only between ovn-northd and 
ovn-controller in the SB DB?  Either through having ovn-northd refusing to pass 
these through from NB to SB, or if we find some light weight way of namespacing 
the expr symtab and of course documenting them as internal API?

--
Frode Nordahl

Thanks,
Han

 > +      </p>
 >      </column>
 >
 >      <column name="action">
 > diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at <http://ovn.at>
 > index 0dabec8d9..3ba958dec 100644
 > --- a/tests/ovn.at <http://ovn.at>
 > +++ b/tests/ovn.at <http://ovn.at>
 > @@ -154,7 +154,14 @@ ct_mark.natted = ct_mark[1]
 >  ct_mark.obs_collector_id = ct_mark[16..23]
 >  ct_mark.obs_stage = ct_mark[4..5]
 >  ct_mark.skip_snat = ct_mark[2]
 > +ct_proto = NXM_NX_CT_NW_PROTO
 > +ct_sctp = ct_proto == 132
 > +ct_sctp.dst = NXM_NX_CT_TP_DST
 >  ct_state = NXM_NX_CT_STATE
 > +ct_tcp = ct_proto == 6
 > +ct_tcp.dst = NXM_NX_CT_TP_DST
 > +ct_udp = ct_proto == 17
 > +ct_udp.dst = NXM_NX_CT_TP_DST
 >  ]])
 >  AT_CLEANUP
 >
 > --
 > 2.43.0

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to