On Wed, Jun 24, 2026 at 6:14 AM Lorenzo Bianconi
<[email protected]> wrote:
>
> On Jun 16, Numan Siddique wrote:
> > From: Numan Siddique <[email protected]>
> >
> > Stateless DNAT in OVN rewrites only the outer IPv4 destination via a
> > flow-based 'ip4.dst = <logical_ip>' action. This is fine for normal
> > reply traffic, but it leaves the inner payload of an inbound ICMPv4
> > error untouched. When such an error reaches the downstream logical
> > switch pipeline, conntrack tries to correlate the embedded original
> > packet with the tracked outgoing flow. Because that embedded packet
> > is the VM's outbound datagram after stateless SNAT, its inner source
> > still carries the external (post-NAT) IP, the lookup fails, and the
> > packet is marked ct.inv, causing the LS ACL stage to drop it. The VM
> > never sees the ICMP error, kernel PMTU discovery (RFC 1191) breaks,
> > and TCP/UDP traffic to destinations beyond a smaller-MTU link
> > black-holes.
> >
> > Emit an additional, higher-priority logical flow for each stateless
> > NAT entry that matches the external IP plus any ICMPv4 Destination
> > Unreachable error (type 3) and uses the new 'icmp4.inner_ip4.src'
> > action to un-NAT the embedded inner source back to the logical IP, in
> > addition to rewriting the outer destination. Every type-3 code quotes
> > the original datagram (RFC 792), so this is correct for all of them;
> > it covers Fragmentation Needed (code 4, for PMTUD) as well as the
> > host/port unreachable codes. After this rewrite,
> > conntrack in the LS zone can correlate the error with the tracked
> > outgoing flow, the LS ACL stage allows it, and the VM receives a
> > well-formed ICMP error whose inner source is its own private address
> > - so the kernel's PMTU update path installs a correct route
> > exception.
> >
> > This behavior is gated by a per-NAT option,
> > options:stateless_icmp_helper, on the NB_Global NAT entry. It
> > defaults to true, so the inner-IP rewrite flow is emitted for every
> > stateless NAT entry out of the box and PMTUD works without any extra
> > configuration. Operators who do not want the additional flow (for
> > example to avoid the pinctrl round-trip for these ICMP errors) can
> > opt out by setting options:stateless_icmp_helper=false on the
> > individual NAT entry.
> >
> > The new flow uses priority + 1 so that:
> > - The exempted-ext-ips bypass flow (priority + 2, emits 'next;')
> > still wins for traffic explicitly excluded from NAT.
> > - Non-ICMP traffic falls through to the existing stateless DNAT
> > flow at the original priority.
> >
> > The rewrite flow round-trips the packet through ovn-controller, so it is
> > emitted with the logical router's icmp4-error CoPP meter (when one is
> > configured), rate-limiting the punt the same way OVN does for the other
> > controller-handled ICMPv4 errors.
> >
> > Only IPv4 is wired up; IPv6 stateless NAT is not currently supported
> > in OVN, so no equivalent action is needed for icmp6 Packet Too Big.
> >
> > The pinctrl-side implementation of icmp4.inner_ip4.src is in the
> > previous patch.
> >
> > Note that this is required when CMS doesn't use the gateway_mtu
> > option and an external PE router generates the ICMPv4 error
> > message.
> >
> > Assisted-by: Claude Opus 4.7, Claude Code
> > Signed-off-by: Numan Siddique <[email protected]>
> > ---
> > Documentation/ref/ovn-logical-flows.7.rst | 32 ++++++
> > NEWS | 7 ++
> > northd/northd.c | 48 ++++++++-
> > ovn-nb.xml | 38 ++++++++
> > tests/multinode.at | 73 ++++++++++++++
> > tests/ovn-northd.at | 35 ++++++-
> > tests/ovn.at | 113 +++++++++++++++++++++-
> > 7 files changed, 340 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/ref/ovn-logical-flows.7.rst
> > b/Documentation/ref/ovn-logical-flows.7.rst
> > index ce4dd53559..56e7b3ef00 100644
> > --- a/Documentation/ref/ovn-logical-flows.7.rst
> > +++ b/Documentation/ref/ovn-logical-flows.7.rst
> > @@ -2775,6 +2775,24 @@ flows do not get programmed for load balancers with
> > IPv6 *VIPs*.
> > rule is of type dnat_and_snat and has ``stateless=true`` in the options,
> > then
> > the action would be ``ip4/6.dst=(B)``.
> >
> > + For an IPv4 stateless ``dnat_and_snat`` rule that has
> > + ``options:stateless_icmp_helper`` set to ``true`` (the default), an
> > + additional flow at priority *P + 1* is added that matches ``ip && ip4.dst
> > + == A && icmp4 && icmp4.type == 3`` with the action
> > + ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``, where *P* is the
> > priority of
> > + the flow above. This rewrites the outer destination and un-NATs the
> > source
> > + embedded in the inbound ICMPv4 Destination Unreachable error payload
> > (from
> > + the external IP *A* back to the logical IP *B*) - every type-3 code
> > quotes
> > + the original datagram, including ``Fragmentation Needed`` (code 4) - so
> > that
> > + conntrack in the downstream logical switch can correlate the error with
> > the
> > + tracked outgoing flow and Path MTU discovery (RFC 1191) works end-to-end
> > + across stateless NAT. See ``options:stateless_icmp_helper`` in the
> > ``NAT``
> > + table of the
> > + ``OVN_Northbound`` database (``ovn-nb`` (5)). The priority is *P + 1* so
> > that
> > + the ``exempted_ext_ips`` bypass flow (at *P + 2*) still wins for traffic
> > + excluded from NAT, and non-ICMP traffic falls through to the regular
> > + stateless DNAT flow.
> > +
> > If the NAT rule has ``allowed_ext_ips`` configured, then there is an
> > additional match ``ip4.src == allowed_ext_ips``. Similarly, for IPV6,
> > match
> > would be ``ip6.src == allowed_ext_ips``.
> > @@ -2815,6 +2833,20 @@ the egress pipeline.
> > the IPv6 case. If the NAT rule is of type dnat_and_snat and has
> > ``stateless=true`` in the options, then the action would be
> > ``ip4/6.dst=(B)``.
> >
> > + For an IPv4 stateless ``dnat_and_snat`` rule that has
> > + ``options:stateless_icmp_helper`` set to ``true`` (the default), an
> > + additional priority-101 flow is added that matches ``ip && ip4.dst == B
> > &&
> > + inport == GW && icmp4 && icmp4.type == 3`` with the action
> > + ``ip4.dst = B; icmp4.inner_ip4.src = B; next;``. This rewrites the outer
> > + destination and un-NATs the source embedded in the inbound ICMPv4
> > + Destination Unreachable error payload (back to the logical IP *B*) -
> > every
> > + type-3 code quotes the original datagram, including ``Fragmentation
> > Needed``
> > + (code 4) - so that conntrack in the downstream logical switch can
> > correlate
> > + the error with the tracked outgoing flow and Path MTU discovery (RFC
> > 1191)
> > + works end-to-end across stateless NAT. See
> > + ``options:stateless_icmp_helper`` in the ``NAT`` table of the
> > + ``OVN_Northbound`` database (``ovn-nb`` (5)).
> > +
> > If the NAT rule cannot be handled in a distributed manner, then the
> > priority-100 flow above is only programmed on the gateway chassis.
> >
> > diff --git a/NEWS b/NEWS
> > index 748ae30eb2..18374dc71b 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -33,6 +33,13 @@ Post v26.03.0
> > The DHCP and unbound-router ARP/ND drop lflows for external
> > ports were updated to key on the external LSP's inport
> > accordingly.
> > + - Added a new "icmp4.inner_ip4.src" action that rewrites the source
> > + IPv4 address embedded in an ICMPv4 error's inner packet. ovn-northd
> > + uses it for stateless "dnat_and_snat" rules, controlled by the new
> > + "options:stateless_icmp_helper" NAT option (default true), so that
> > + inbound ICMPv4 Destination Unreachable errors (type 3, including the
> > + "fragmentation needed" message) generated by an external router are
> > + un-NATed correctly and Path MTU discovery works through stateless NAT.
> >
> > OVN v26.03.0 - xxx xx xxxx
> > --------------------------
> > diff --git a/northd/northd.c b/northd/northd.c
> > index f5aa5cca38..3bb9cafaac 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -17584,7 +17584,9 @@ static void
> > build_lrouter_in_dnat_flow(struct lflow_table *lflows,
> > const struct ovn_datapath *od,
> > const struct lr_nat_record *lrnat_rec,
> > - const struct ovn_nat *nat_entry, struct ds
> > *match,
> > + const struct ovn_nat *nat_entry,
> > + const struct shash *meter_groups,
> > + struct ds *match,
> > struct ds *actions, bool distributed_nat,
> > int cidr_bits, bool is_v6,
> > struct ovn_port *l3dgw_port, bool stateless,
> > @@ -17657,6 +17659,45 @@ build_lrouter_in_dnat_flow(struct lflow_table
> > *lflows,
> >
> > ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, priority, ds_cstr(match),
> > ds_cstr(actions), lflow_ref, WITH_HINT(&nat->header_));
> > +
> > + /* For stateless DNAT, the action above only rewrites the outer IPv4
> > + * destination. An inbound ICMPv4 error (RFC 792 / RFC 1191) carries
> > + * the original (post-NAT) packet inside its payload, whose source is
> > + * the external (post-SNAT) IP. The conntrack-based ACL check in the
> > + * downstream logical switch zone uses that inner tuple to match the
> > + * reverse direction of the tracked outgoing flow; without un-NATing
> > + * the inner ip4.src back to the logical IP, that lookup fails and the
> > + * error is dropped as ct.inv.
> > + *
> > + * Emit a higher-priority flow that matches the same external IP plus
> > any
> > + * ICMPv4 Destination Unreachable error (type 3) and rewrites the outer
> > + * ip4.dst and the embedded inner ip4.src to the logical IP. Every
> > type-3
> > + * code quotes the original datagram (RFC 792), so the inner-source
> > un-NAT
> > + * is correct for all of them; this covers Fragmentation Needed (code
> > 4,
> > + * for PMTUD per RFC 1191) as well as host/port unreachable and the
> > rest,
> > + * so PMTUD works end-to-end through stateless NAT. */
> > + if (stateless && !is_v6 &&
> > + smap_get_bool(&nat_entry->nb->options, "stateless_icmp_helper",
> > + true)) {
> > + const char *icmp4_meter = copp_meter_get(COPP_ICMP4_ERR,
> > + od->nbr->copp,
> > + meter_groups);
> > + size_t match_len = match->length;
> > +
> > + ds_put_cstr(match, " && icmp4 && icmp4.type == 3");
>
> Based on the nat's entry match, here we can pontentially have a match rule
> like:
>
> match = ip && ip4.dst == IP && (udp) && icmp4 && icmp4.type == 3
>
> that is always false, right?
Thanks Lorenzo. That's a great finding. I'll address it in v3.
Thanks
Numan
>
> Regards,
> Lorenzo
>
> > +
> > + ds_clear(actions);
> > + ds_put_format(actions,
> > + "ip4.dst=%s; icmp4.inner_ip4.src = %s; next;",
> > + nat->logical_ip, nat->logical_ip);
> > +
> > + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, priority + 1,
> > + ds_cstr(match), ds_cstr(actions), lflow_ref,
> > + WITH_CTRL_METER(icmp4_meter),
> > + WITH_HINT(&nat->header_));
> > +
> > + ds_truncate(match, match_len);
> > + }
> > }
> >
> > static void
> > @@ -18404,8 +18445,9 @@ build_lrouter_nat_defrag_and_lb(
> > lflow_ref);
> > }
> > /* S_ROUTER_IN_DNAT */
> > - build_lrouter_in_dnat_flow(lflows, od, lrnat_rec, nat_entry, match,
> > - actions, nat_entry->is_distributed,
> > + build_lrouter_in_dnat_flow(lflows, od, lrnat_rec, nat_entry,
> > + meter_groups, match, actions,
> > + nat_entry->is_distributed,
> > cidr_bits, is_v6, nat_entry->l3dgw_port,
> > stateless, lflow_ref);
> >
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index 15fb1d7e86..2fc8543868 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -5457,6 +5457,44 @@ or
> > tracking state or not.
> > </column>
> >
> > + <column name="options" key="stateless_icmp_helper"
> > + type='{"type": "boolean"}'>
> > + <p>
> > + Applies only to stateless <code>dnat_and_snat</code> rules (that
> > + is, NATs with <ref column="options" key="stateless"/> set to
> > + <code>true</code>) on IPv4 addresses. Defaults to
> > + <code>true</code>.
> > + </p>
> > +
> > + <p>
> > + A stateless DNAT rule rewrites only the outer IPv4 destination of
> > + inbound packets. For an inbound ICMPv4 error (for example a
> > + <code>Fragmentation Needed</code> message generated for Path MTU
> > + discovery, RFC 1191), the original packet embedded in the ICMP
> > + payload still carries the external, post-NAT IP as its source.
> > + When the error reaches the downstream logical switch, conntrack
> > + cannot correlate the embedded tuple with the tracked outgoing
> > + flow, the packet is marked <code>ct.inv</code> and dropped by the
> > + ACL stage, and PMTU discovery breaks.
> > + </p>
> > +
> > + <p>
> > + When this option is <code>true</code>, <code>ovn-northd</code>
> > + emits an additional, higher-priority logical flow in the router
> > + ingress DNAT stage that matches ICMPv4 Destination Unreachable
> > + (type 3) errors destined to the external IP. Every type-3 code
> > + quotes the original datagram, so this covers
> > + <code>Fragmentation Needed</code> (code 4) for PMTU discovery as
> > + well as host/port unreachable and the rest. It rewrites the outer
> > + IPv4 destination to the logical IP and, using the
> > + <code>icmp4.inner_ip4.src</code> action, un-NATs the embedded
> > + inner IPv4 source from the external IP back to the logical IP, so
> > + that conntrack can correlate the error and PMTU discovery works
> > + end-to-end. Set it to <code>false</code> to suppress this flow for
> > + an individual NAT entry.
> > + </p>
> > + </column>
> > +
> > <column name="options" key="add_route">
> > If set to <code>true</code>, then neighbor routers will have logical
> > flows added that will allow for routing to the NAT address. It also
> > will
> > diff --git a/tests/multinode.at b/tests/multinode.at
> > index 069f2a677d..37ef523f95 100644
> > --- a/tests/multinode.at
> > +++ b/tests/multinode.at
> > @@ -1041,6 +1041,79 @@ run_ns_traffic
> > AT_CLEANUP
> > ])
> >
> > +AT_SETUP([ovn multinode stateless NAT - icmp4 PMTUD inner src un-NAT])
> > +
> > +# Check that ovn-fake-multinode setup is up and running
> > +check_fake_multinode_setup
> > +
> > +# Delete the multinode NB and OVS resources before starting the test.
> > +cleanup_multinode_resources
> > +
> > +m_as ovn-chassis-1 ip link del sw0p1-p
> > +
> > +# Reset geneve tunnels
> > +for c in ovn-chassis-1 ovn-gw-1
> > +do
> > + m_as $c ovs-vsctl set open . external-ids:ovn-encap-type=geneve
> > +done
> > +
> > +OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys])
> > +OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys])
> > +
> > +# Internal switch with one VM (10.0.0.3) on ovn-chassis-1.
> > +check multinode_nbctl ls-add sw0
> > +check multinode_nbctl lsp-add sw0 sw0-port1
> > +check multinode_nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:03
> > 10.0.0.3"
> > +
> > +m_as ovn-chassis-1 /data/create_fake_vm.sh sw0-port1 sw0p1
> > 50:54:00:00:00:03 1342 10.0.0.3 24 10.0.0.1
> > +
> > +# Gateway router pinned to ovn-gw-1.
> > +check multinode_nbctl lr-add lr0 -- set Logical_Router lr0
> > options:chassis=ovn-gw-1
> > +check multinode_nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> > +check multinode_nbctl lsp-add-router-port sw0 sw0-lr0 lr0-sw0
> > +
> > +# External / provider network.
> > +check multinode_nbctl ls-add public
> > +check multinode_nbctl lsp-add-localnet-port public ln-public public
> > +check multinode_nbctl lrp-add lr0 lr0-public 00:11:22:00:ff:01
> > 172.20.0.100/24
> > +check multinode_nbctl lsp-add-router-port public public-lr0 lr0-public
> > +check multinode_nbctl lr-route-add lr0 0.0.0.0/0 172.20.0.1
> > +
> > +# Stateless dnat_and_snat for the VM. options:stateless_icmp_helper
> > defaults
> > +# to true, so ovn-northd emits the extra DNAT flow that, for an inbound
> > ICMPv4
> > +# "fragmentation needed" error destined to 172.20.0.110, un-NATs the inner
> > +# source from 172.20.0.110 back to 10.0.0.3 (icmp4.inner_ip4.src). Unlike
> > +# stateful NAT (where conntrack fixes the embedded header automatically),
> > +# stateless NAT relies entirely on this flow for PMTUD to work.
> > +check multinode_nbctl --stateless lr-nat-add lr0 dnat_and_snat
> > 172.20.0.110 10.0.0.3
> > +
> > +m_as ovn-gw-1 ovs-vsctl set open .
> > external-ids:ovn-bridge-mappings=public:br-ex
> > +m_as ovn-chassis-1 ovs-vsctl set open .
> > external-ids:ovn-bridge-mappings=public:br-ex
> > +
> > +m_wait_for_ports_up
> > +
> > +# ovn-ext0 routes between the OVN public net (172.20.0.0/24) and a
> > downstream
> > +# net (172.20.1.0/24); ovn-ext2 (172.20.1.2) is the far host behind it.
> > +m_add_internal_port ovn-gw-1 ovn-ext0 br-ex ext0 172.20.0.1/24
> > +m_add_internal_port ovn-gw-1 ovn-ext0 br-ex ext1 172.20.1.1/24
> > +m_add_internal_port ovn-gw-1 ovn-ext2 br-ex ext2 172.20.1.2/24 172.20.1.1
> > +
> > +# Baseline: the VM reaches the far host through stateless NAT.
> > +M_NS_CHECK_EXEC([ovn-chassis-1], [sw0p1], [ping -q -c 3 -i 0.3 -w 2
> > 172.20.1.2 | FORMAT_PING], \
> > +[0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +# Shrink the downstream link MTU so ovn-ext0 emits ICMPv4 "fragmentation
> > +# needed" (type 3, code 4) towards 172.20.0.110 (the VM's stateless SNAT
> > +# address) for oversized DF traffic. The error's inner packet carries
> > +# 172.20.0.110 as its source; the VM only honors the PMTU signal if OVN
> > +# un-NATs that inner source back to 10.0.0.3.
> > +M_NS_CHECK_EXEC([ovn-gw-1], [ovn-ext0], [ip link set dev ext1 mtu 1100])
> > +M_NS_CHECK_EXEC([ovn-chassis-1], [sw0p1], [ping -c 20 -i 0.5 -s 1300 -M do
> > 172.20.1.2 2>&1 | grep -q "mtu = 1100"])
> > +
> > +AT_CLEANUP
> > +
> > PMTUD_SWITCH_TESTS(["geneve"])
> > PMTUD_SWITCH_TESTS(["vxlan"])
> >
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index 7f4a88d4ec..6c19690b3b 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -1087,13 +1087,29 @@ check ovn-nbctl lr-nat-del R1 dnat_and_snat
> > 172.16.1.1
> > echo
> > echo "IPv4: stateless"
> > check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat
> > 172.16.1.1 50.0.0.11
> > +dnl Two ip4.dst= flows: the regular stateless DNAT flow plus the default
> > +dnl stateless_icmp_helper flow that also rewrites the inner ICMPv4 src.
> > +check_flow_match_sets 2 0 0 2 1 0 0
> > +dnl stateless_icmp_helper defaults to true, so the inner-IP rewrite flow
> > +dnl is present.
> > +check_flow_matches "icmp4.inner_ip4.src" 1
> > +check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
> > +
> > +echo
> > +echo "IPv4: stateless, stateless_icmp_helper=false"
> > +check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat
> > 172.16.1.1 50.0.0.11
> > +check ovn-nbctl --wait=sb set NAT . options:stateless_icmp_helper=false
> > +dnl With the helper disabled, only the regular stateless DNAT flow remains
> > +dnl and the inner-IP rewrite flow is gone.
> > check_flow_match_sets 2 0 0 1 1 0 0
> > +check_flow_matches "icmp4.inner_ip4.src" 0
> > check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
> >
> > echo
> > echo "IPv4: stateless with match"
> > check ovn-nbctl --wait=sb --match="udp" --stateless lr-nat-add R1
> > dnat_and_snat 172.16.1.1 50.0.0.11
> > -check_flow_match_sets 2 0 0 1 1 0 0
> > +dnl As above, the stateless_icmp_helper flow adds a second ip4.dst= flow.
> > +check_flow_match_sets 2 0 0 2 1 0 0
> > check ovn-nbctl lr-nat-del R1 dnat_and_snat 172.16.1.1
> >
> > echo
> > @@ -1118,6 +1134,23 @@ echo
> > echo "IPv6: stateless with match"
> > check ovn-nbctl --wait=sb --match="udp" --stateless lr-nat-add R1
> > dnat_and_snat fd01::1 fd11::2
> > check_flow_match_sets 2 0 0 0 0 1 1
> > +check ovn-nbctl lr-nat-del R1 dnat_and_snat fd01::1
> > +
> > +echo
> > +echo "IPv4: stateless, stateless_icmp_helper rate-limited by CoPP"
> > +dnl The inner-IP rewrite flow round-trips through ovn-controller, so it is
> > +dnl emitted with the router's icmp4-error CoPP meter (when configured) to
> > +dnl rate-limit the punt.
> > +check ovn-nbctl --wait=sb meter-add m-icmp4-err drop 100 pktps 10
> > +check ovn-nbctl --wait=sb copp-add copp-r1 icmp4-error m-icmp4-err
> > +check ovn-nbctl --wait=sb lr-copp-add copp-r1 R1
> > +check ovn-nbctl --wait=sb --stateless lr-nat-add R1 dnat_and_snat
> > 172.16.1.1 50.0.0.11
> > +ovn-sbctl dump-flows R1 > r1-flows
> > +AT_CAPTURE_FILE([r1-flows])
> > +check_flow_matches "icmp4.inner_ip4.src" 1
> > +dnl The stateless_icmp_helper flow carries the icmp4-error meter.
> > +AT_CHECK([ovn-sbctl list logical_flow | grep "icmp4.inner_ip4.src" -A 2 | \
> > + grep -q "controller_meter.*m-icmp4-err"], [0], [], [ignore])
> >
> > OVN_CLEANUP_NORTHD
> > AT_CLEANUP
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 59b41bde82..e43579cdd7 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -22714,6 +22714,113 @@ OVN_CLEANUP([hv1])
> > AT_CLEANUP
> > ])
> >
> > +AT_SETUP([stateless NAT - icmp4.inner_ip4.src rewrite])
> > +AT_KEYWORDS([stateless-nat icmp4-inner-ip4-src])
> > +AT_SKIP_IF([test $HAVE_SCAPY = no])
> > +ovn_start
> > +
> > +# Topology:
> > +# "external" host (ext1, on the public LS) --- lr0 (gw router) --- vm1
> > (on sw0)
> > +#
> > +# lr0 has a stateless dnat_and_snat rule that maps the external IP
> > +# 172.168.0.110 to the logical IP 10.0.0.3 (vm1). With
> > +# options:stateless_icmp_helper defaulting to true, ovn-northd emits a
> > +# higher-priority DNAT flow that matches inbound ICMPv4 "Fragmentation
> > +# Needed" errors (type 3, code 4) destined to the external IP and applies
> > +# the action "ip4.dst = 10.0.0.3; icmp4.inner_ip4.src = 10.0.0.3;".
> > +#
> > +# An inbound ICMP error quotes the VM's original outbound (post-SNAT)
> > +# datagram, so its inner source is the external IP 172.168.0.110. This
> > test
> > +# injects such an error from the external side and verifies that
> > +# ovn-controller (pinctrl) DNATs the outer destination to 10.0.0.3 and
> > +# un-NATs the inner (embedded) IPv4 source back to 10.0.0.3 - while leaving
> > +# the inner destination untouched - before delivering the packet to vm1.
> > +
> > +vm1_mac=50:54:00:00:00:01
> > +vm1_ip=10.0.0.3
> > +rtr_int_mac=00:00:00:00:ff:01
> > +rtr_ext_mac=00:00:20:20:12:13
> > +ext1_mac=00:00:00:00:00:99
> > +ext1_ip=172.168.0.50
> > +nat_ext_ip=172.168.0.110
> > +
> > +# Internal switch with vm1.
> > +check ovn-nbctl ls-add sw0
> > +check ovn-nbctl lsp-add sw0 sw0-vm1
> > +check ovn-nbctl lsp-set-addresses sw0-vm1 "$vm1_mac $vm1_ip"
> > +
> > +# Router (gateway router pinned to hv1).
> > +check ovn-nbctl lr-add lr0
> > +check ovn-nbctl lrp-add lr0 lr0-sw0 $rtr_int_mac 10.0.0.1/24
> > +check ovn-nbctl lsp-add-router-port sw0 sw0-lr0 lr0-sw0
> > +
> > +# Public switch with the external host.
> > +check ovn-nbctl ls-add public
> > +check ovn-nbctl lrp-add lr0 lr0-public $rtr_ext_mac 172.168.0.100/24
> > +check ovn-nbctl lsp-add-router-port public public-lr0 lr0-public
> > +check ovn-nbctl lsp-add public ext1
> > +check ovn-nbctl lsp-set-addresses ext1 "$ext1_mac $ext1_ip"
> > +
> > +check ovn-nbctl set logical_router lr0 options:chassis=hv1
> > +
> > +# Stateless dnat_and_snat: external 172.168.0.110 <-> logical 10.0.0.3.
> > +check ovn-nbctl --wait=sb --stateless lr-nat-add lr0 dnat_and_snat \
> > + $nat_ext_ip $vm1_ip
> > +
> > +net_add n1
> > +sim_add hv1
> > +as hv1
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.1
> > +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> > + set interface hv1-vif1 external-ids:iface-id=sw0-vm1 \
> > + options:tx_pcap=hv1/vif1-tx.pcap \
> > + options:rxq_pcap=hv1/vif1-rx.pcap \
> > + ofport-request=1
> > +ovs-vsctl -- add-port br-int hv1-vif2 -- \
> > + set interface hv1-vif2 external-ids:iface-id=ext1 \
> > + options:tx_pcap=hv1/vif2-tx.pcap \
> > + options:rxq_pcap=hv1/vif2-rx.pcap \
> > + ofport-request=2
> > +
> > +wait_for_ports_up
> > +check ovn-nbctl --wait=hv sync
> > +
> > +# The inner packet is the original (post-SNAT) datagram quoted in the ICMP
> > +# error. Use a raw 8-byte blob for the embedded L4 header so that no inner
> > +# L4 checksum (which the action does not recompute) is involved.
> > +inner_l4="0102030405060708"
> > +
> > +# ICMP fragmentation-needed error from ext1 to the NAT external IP. The
> > +# embedded original packet is the VM's outbound datagram after stateless
> > +# SNAT, so its inner source is the external IP and its inner destination is
> > +# some far host (50.0.0.100).
> > +packet=$(fmt_pkt "Ether(dst='$rtr_ext_mac', src='$ext1_mac')/ \
> > + IP(src='$ext1_ip', dst='$nat_ext_ip', ttl=64)/ \
> > + ICMP(type=3, code=4, nexthopmtu=1400)/ \
> > + IP(src='$nat_ext_ip', dst='50.0.0.100', ttl=63,
> > proto=17)/ \
> > + bytes.fromhex('$inner_l4')")
> > +
> > +# Expected packet delivered to vm1: the router has DNATed the outer
> > +# destination to 10.0.0.3, decremented the outer TTL, rewritten the L2
> > +# addresses, and (via pinctrl) un-NATed the inner IPv4 source to 10.0.0.3.
> > +# The inner destination (50.0.0.100) is left unchanged.
> > +expected=$(fmt_pkt "Ether(dst='$vm1_mac', src='$rtr_int_mac')/ \
> > + IP(src='$ext1_ip', dst='$vm1_ip', ttl=63)/ \
> > + ICMP(type=3, code=4, nexthopmtu=1400)/ \
> > + IP(src='$vm1_ip', dst='50.0.0.100', ttl=63, proto=17)/
> > \
> > + bytes.fromhex('$inner_l4')")
> > +echo $expected > vif1.expected
> > +
> > +as hv1 reset_pcap_file hv1-vif1 hv1/vif1
> > +
> > +check as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
> > +
> > +OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [vif1.expected])
> > +
> > +OVN_CLEANUP([hv1])
> > +AT_CLEANUP
> > +
> > OVN_FOR_EACH_NORTHD([
> > AT_SETUP([IP packet buffering])
> > AT_KEYWORDS([ip-buffering])
> > @@ -26941,14 +27048,16 @@ test_ip vif11 f00000000011 000001010203 $sip $dip
> > vif-north
> > # Confirm that South to North traffic works fine.
> > OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv4/vif-north-tx.pcap],
> > [vif-north.expected])
> >
> > -# Confirm that NATing happened without connection tracker
> > +# Confirm that NATing happened without connection tracker.
> > +# Two ip4.dst= flows are expected: the regular stateless DNAT flow plus the
> > +# default stateless_icmp_helper flow (which also carries
> > icmp4.inner_ip4.src).
> > ovn-sbctl dump-flows router > sbflows
> > AT_CAPTURE_FILE([sbflows])
> > AT_CHECK([for regex in ct_snat ct_dnat ip4.dst= ip4.src=; do
> > grep -c "$regex" sbflows;
> > done], [0], [0
> > 0
> > -1
> > +2
> > 1
> > ])
> >
> > --
> > 2.54.0
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev