On Tue, Jan 06, 2026 at 12:40:17PM +0100, Ilya Maximets wrote:
> On 1/6/26 11:05 AM, Dumitru Ceara wrote:
> > On 1/5/26 11:39 PM, Ilya Maximets wrote:
> >> On 1/5/26 11:03 PM, Alexandra Rukomoinikova wrote:
> >>> When a logical router port has multiple IPv4 addresses from different
> >>> networks, northd generates multiple TTL exceeded flows with identical
> >>> match conditions but different actions (different ICMP reply source IPs).
> >>> This causes non-deterministic behavior where replies may use an incorrect
> >>> source IP not belonging to the original packet's network.
> >>>
> >>> The fix adds source IP network matching to flow generation, ensuring
> >>> ICMP TTL exceeded replies always originate from an IP in the same
> >>> network as the destination of the original packet.
> >>>
> >>> Reported-at: https://issues.redhat.com/browse/FDP-2870
> >>> Signed-off-by: Alexandra Rukomoinikova <[email protected]>
> >>> ---
> >>
> >> Hi. Thanks for the patch!
> >>
> >
> > Hi Alexandra, Ilya,
> >
> > Thanks for the patch and review!
> >
> >> It is an interesting problem. Though, it seems like the real solution
> >> would be a fair bit more complicated than what's presented here.
> >>
> >> The current code is not correct for neither IPv4 not IPv6, but in
> >> different ways. The IPv4 code is non-deterministic as we may be replying
> >> with different source address every time, which is not good, as some of
> >> those addresses may not be reachable by the packet source. However,
> >> it seems like in IPv6 case we have the network match and we will actually
> >> not even reply to a packet forwarded to us from a network that is not
> >> directly attached to the router port. Which is also incorrect.
> >>
> >
> > You're right. For IPv6 we don't match on the more specific flow if the
> > source IP is not in the same subnet as any of the router port IPs. We
> > then hit the flow that drops all packets with TTL = {0, 1} indiscriminately:
> >
> > /* TTL discard */
> > ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_INPUT, 30,
> > "ip.ttl == {0, 1}", debug_drop_action(),
> > lflow_ref);
> >
> >> The full solution here, AFAIU, is to actually perform a route lookup
> >> on the packet source IP and use the nexthop network as a source for
> >> the ICMP error. This is very similar to the issue we fixed earlier
> >> for the SNAT:
> >>
> >> af6e83707568 ("northd: Use next-hop network for SNAT when
> >> lb_force_snat_ip=router_ip.")
> >>
> >> But the implementation may be a little more complicated as we do not
> >> already have a routing result here.
> >>
> >
> > Indeed, we never do route lookup based on source IP today.
> >
> >> A much simpler, but not fully correct solution would be to match the
> >> source network as it is done in this patch, but we also need a lower
> >> priority logical flow that will capture all the packets from networks
> >> not directly attached to this router port. And there we would need to
> >> just reply from the first address, if we don't want to perform a full
> >> route lookup. This will ignore all the routing configuration including
> >> policies and stuff and will reply from a wrong IP whenever the packet
> >> is from a remote network, but will surely be better than what we have
> >> today.
> >>
> >
> > I think we could just change the default drop flow mentioned above to
> > instead reply with an ICMP error from the first configured IP address on
> > that router port. We might want to use different flows for v4 and v6
> > though as router ports always have an IPv6 LLA configured and we should
> > probably not use the LLA to reply to non link local traffic.
>
> We may need both, the "default reply from a first non-LLA" per port and
> a "default discard" for the whole router. So we do not continue processing
> on packets with zero ttl in case the router port has no v4 ip address, for
> example. But yes, the "default reply" would be different for v4 and v6.
Hi everyone,
i wanted to add an opinion since the ideas above seem to be contrary to
my experiences with phyiscal routers. They seem to quite often just use
a loopback ip.
Generally for IPv4 according to RFC792 it is valid to use any ipv4
address as the source address of a time exceeded message.
For IPv6 (RFC4443 2.2) it is a little more complicated:
* If the original dst ip is owned by the LR, then the src ip of the icmp
packet must be this exact ip.
* If the original dst ip is any other ip we should decide the source ip
based on the ip the LR would use if it would send a packet to the
destination.
If we combine this i think we could a simple yet valid solution:
* lr_in_ip_input
1. Have individual flows that match on all the LR ips and generate
the icmp message with that LR ip as source
2. Have a lower priority flow that generates the icmp message with 0
as source ip
* lr_in_ip_routing: should do normal routing lookups, it writes the ip
of the sending interface to reg5 or xxreg1
* Somewhere below lr_in_policy_ecmp: if we have an icmp packet and the
source ip is 0: replace it with reg5/xxreg1
That should implement all requirements for ipv6. Since ipv4 does not
care we can use the same logic.
Not sure if having the source ip being 0 temporarily will cause any
issues in the tables in between.
Maybe this helps you, if it does not feel free to ignore it :)
Thanks a lot,
Felix
>
> >
> >> Thoughts?
> >>
> >
> > Also, I think "wrong" is a strong word in the statement above. It's of
> > course possible that users configure multiple IP networks on the same
> > broadcast domain with only some of the hosts in that domain being able
> > to route towards all the OVN IPs. But my hunch is that this kind of
> > topology is not that common.
> >
> >> Dumitru, since you're the reporter of this issue, what do you think?
> >>
> >
> > Given that as far as I know we had no complaints from users about the
> > (incorrect) behavior I'm inclining towards fixing the issue by doing
> > what Alexandra suggested in this patch with the additional change of the
> > default drop flow to always reply with the first router port IP. At
> > least for now.
> >
> > This could also be easily backported to all stable branches.
>
> Makes sense. Going with a simpler solution that covers vast majority
> of cases seems reasonable at the moment.
>
> > We'd also
> > need to add:
> >
> > Fixes: c0321040c703 ("OVN: add ICMPv6 time exceeded support to OVN
> > logical router")
> > Fixes: 7f19374c5933 ("OVN: add ICMP time exceeded support to OVN logical
> > router")
> >
> > For the "full" solution, I guess we should probably mark somehow that a
> > packet needs an ICMP error to be generated because its TTL expired (all
> > other ICMP errors we currently generate are for IP traffic destined to
> > router owned IPs). In that same flow we could already swap the source
> > and destination IPs.
> >
> > Then later, after the route lookup (which now happens with the original
> > source as destination in the packet), match on the flag we set earlier
> > (*) and use the same logic as in af6e83707568 ("northd: Use next-hop
> > network for SNAT when lb_force_snat_ip=router_ip.") and select the
> > correct network IP to use as source (esentially move the icpm4/6 {}
> > action later in the pipeline).
> >
> > But this is a more intrusive change to the pipeline and I'd do it as a
> > follow up patch.
> >
> > (*) I guess we could also avoid the flag by just matching again on TTL =
> > {0, 1}.
> >
> > Regards,
> > Dumitru
> >
> >> Best regards, Ilya Maximets.
> >>
> >
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev