On 28.07.2025 19:59, Han Zhou wrote:
On Fri, Jul 25, 2025 at 8:19 AM Frode Nordahl <fnord...@ubuntu.com <mailto:fnord...@ubuntu.com>> wrote: > > While there is a hardware offload friendly approach to fixing > processing of fragmented traffic to load balancers in 8e6f9a8355e2 > ("northd: Fix HW offload problem related to ct_tuple."), the change > is quite invasive, and not compatible with our requirement to make > UDP fragmentation work for switch egress pipeline on user space > data path in existing releases. > > In a subsequent patch an optional workaround to this problem will be > provided, which relies on these fields being available. Hi Frode, thanks for the patch.
Hello, Han, thank you for taking a look at it!
To my understanding, this patch itself is the workaround for supporting fragmented UDP in ACLs by just putting fields like ct_udp.dst in the match, correct? The work-in-progress [0] seems to be the real solution instead of a workaround. Could you clarify a little?
The work-in-progress in [0] does indeed have the goal of being a real solution, containing the fix to this issue in OVN, avoiding the need to coordinate a fix with CMS. We do however need a language to express intent between ovn-northd and ovn-controller, and this patch was sent ahead of time with hope of making the required field(s) available awaiting completion of [0]. Unfortunately the OVN Northbound ACL match column and OVN Southbound Logical_Flow match column appear to share namespace and processing in OVN [1][2][3], and consequently expose this API in its entirety to the CMS through the Northbound API. 1: https://github.com/ovn-org/ovn/blob/ab19375413cfc6429172d3101f125a8426a20b25/ovn-nb.xml#L2558-L2579 2: https://github.com/ovn-org/ovn/blob/ab19375413cfc6429172d3101f125a8426a20b25/northd/northd.c#L7425-L7426 3: https://github.com/ovn-org/ovn/blob/ab19375413cfc6429172d3101f125a8426a20b25/northd/northd.c#L7470-L7471
> > The documentation for OVN Northbound ACL table already states that > the match column uses the same expression language as the OVN > Southbound Logical_Flow match column. > > As we know, the use of these fields may be problematic for hardware > offload, consequently a note is added to discourage their use. > > Related: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2115795 <https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/2115795> > Related: https://issues.redhat.com/browse/FDP-684 <https://issues.redhat.com/browse/FDP-684> > Signed-off-by: Frode Nordahl <fnord...@ubuntu.com <mailto:fnord...@ubuntu.com>> > --- > > As discussed in the IRC meeting July 24th, I post a prerequisite patch > in anticipation of a patch that provides a workaround for a bug > preventing fragmented UDP packets to pass ACLs that match on protocol > and port numbers. > > There is a work in progress available [0], and we intend to propose > this optional workaround behind a NB_Global bug flag which can be used > until a CT_EXTRACT based stage is made available for the switch egress > pipeline. > Just want to clarify here. I think the point (as emphasized by @Ilya Maximets ) was that the CT fields are at a lower layer. So the real reason for this workaround is not about missing CT_EXTRACT stage but about the complexity of parsing and rewriting tcp/udp/sctp fields from user defined ACLs to use CT metadata. Of course if we move forward to implement the rewriting approach, CT_EXTRACT stage would be needed to make sure HW offload works.
Agreed, the reason for bringing up the CT_EXTRACT stage in this context is because if we had them for the switch egress pipeline, we could use the registers in the logical flow language instead of bringing back the ct_ logical fields. However, our ultimate goal is also to make this fix available for already released versions (at least for UDP), and the recent discussion suggested that backport of pipeline changes such as the CT_EXTRACT work would likely be off the table. Or did I misunderstand this part of the discussion?
> 0: https://github.com/sombrafam/ovn/commits/fix-fragmentation-for-acls/ <https://github.com/sombrafam/ovn/commits/fix-fragmentation-for-acls/> > > lib/logical-fields.c | 15 +++++++++++++++ > ovn-nb.xml | 8 ++++++++ > tests/ovn.at <http://ovn.at> | 7 +++++++ > 3 files changed, 30 insertions(+) > > diff --git a/lib/logical-fields.c b/lib/logical-fields.c > index e479a78c1..efbc75b72 100644 > --- a/lib/logical-fields.c > +++ b/lib/logical-fields.c > @@ -359,6 +359,21 @@ ovn_init_symtab(struct shash *symtab) > > expr_symtab_add_ovn_field(symtab, "icmp4.frag_mtu", OVN_ICMP4_FRAG_MTU); > expr_symtab_add_ovn_field(symtab, "icmp6.frag_mtu", OVN_ICMP6_FRAG_MTU); > + > + expr_symtab_add_field(symtab, "ct_proto", MFF_CT_NW_PROTO, > + "ct.trk", false); > + > + expr_symtab_add_predicate(symtab, "ct_udp", "ct_proto == 17"); > + expr_symtab_add_field(symtab, "ct_udp.dst", MFF_CT_TP_DST, > + "ct_udp", false); > + > + expr_symtab_add_predicate(symtab, "ct_tcp", "ct_proto == 6"); > + expr_symtab_add_field(symtab, "ct_tcp.dst", MFF_CT_TP_DST, > + "ct_tcp", false); > + > + expr_symtab_add_predicate(symtab, "ct_sctp", "ct_proto == 132"); > + expr_symtab_add_field(symtab, "ct_sctp.dst", MFF_CT_TP_DST, > + "ct_sctp", false); > } > These symbols need to be documented in ovn-sb.xml.
Ack.
> const char * > diff --git a/ovn-nb.xml b/ovn-nb.xml > index 4a7581807..caf946eac 100644 > --- a/ovn-nb.xml > +++ b/ovn-nb.xml > @@ -2576,6 +2576,14 @@ or > Note that you can not create an ACL matching on a port with > type=router or type=localnet. > </p> > + > + <p> > + Note that matching directly on ct_state subfields such as > + <code>ct.new <http://ct.new></code>, <code>ct.trk</code>, or connection tracker > + metadata fields such as <code>ct_proto</code>, <code>ct_udp</code>, > + <code>ct_tcp</code>, <code>ct_sctp</code> is not recommended as their > + use may impact performance and prevent hardware offload from working. As mentioned above, the reason for discouraging the use of these CT fields is not only about performance, but also because they are at a lower level, and related to internal implementation of the upper level (user-friendly) fields. Shall we be clear about this in this note? Probably we should also mention that it is only needed for supporting fragmented packets in ACL, mainly for UDP. In general I am ok with this workaround if we make these notes clear so that the users understand the risks. But we should also understand that once these fields are released, it would be hard to remove them in the future considering backward compatibility. Thoughts?
I definitively feel the weight and risk of eternally having to support an API once making it available. Would it help/an alternative be to find some way of having these fields strictly as internal OVN API, existing only between ovn-northd and ovn-controller in the SB DB? Either through having ovn-northd refusing to pass these through from NB to SB, or if we find some light weight way of namespacing the expr symtab and of course documenting them as internal API? -- Frode Nordahl
Thanks, Han > + </p> > </column> > > <column name="action"> > diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at <http://ovn.at> > index 0dabec8d9..3ba958dec 100644 > --- a/tests/ovn.at <http://ovn.at> > +++ b/tests/ovn.at <http://ovn.at> > @@ -154,7 +154,14 @@ ct_mark.natted = ct_mark[1] > ct_mark.obs_collector_id = ct_mark[16..23] > ct_mark.obs_stage = ct_mark[4..5] > ct_mark.skip_snat = ct_mark[2] > +ct_proto = NXM_NX_CT_NW_PROTO > +ct_sctp = ct_proto == 132 > +ct_sctp.dst = NXM_NX_CT_TP_DST > ct_state = NXM_NX_CT_STATE > +ct_tcp = ct_proto == 6 > +ct_tcp.dst = NXM_NX_CT_TP_DST > +ct_udp = ct_proto == 17 > +ct_udp.dst = NXM_NX_CT_TP_DST > ]]) > AT_CLEANUP > > -- > 2.43.0
_______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev