On 1/5/26 11:39 PM, Ilya Maximets wrote:
> On 1/5/26 11:03 PM, Alexandra Rukomoinikova wrote:
>> When a logical router port has multiple IPv4 addresses from different
>> networks, northd generates multiple TTL exceeded flows with identical
>> match conditions but different actions (different ICMP reply source IPs).
>> This causes non-deterministic behavior where replies may use an incorrect
>> source IP not belonging to the original packet's network.
>>
>> The fix adds source IP network matching to flow generation, ensuring
>> ICMP TTL exceeded replies always originate from an IP in the same
>> network as the destination of the original packet.
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-2870
>> Signed-off-by: Alexandra Rukomoinikova <[email protected]>
>> ---
>
> Hi. Thanks for the patch!
>
Hi Alexandra, Ilya,
Thanks for the patch and review!
> It is an interesting problem. Though, it seems like the real solution
> would be a fair bit more complicated than what's presented here.
>
> The current code is not correct for neither IPv4 not IPv6, but in
> different ways. The IPv4 code is non-deterministic as we may be replying
> with different source address every time, which is not good, as some of
> those addresses may not be reachable by the packet source. However,
> it seems like in IPv6 case we have the network match and we will actually
> not even reply to a packet forwarded to us from a network that is not
> directly attached to the router port. Which is also incorrect.
>
You're right. For IPv6 we don't match on the more specific flow if the
source IP is not in the same subnet as any of the router port IPs. We
then hit the flow that drops all packets with TTL = {0, 1} indiscriminately:
/* TTL discard */
ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_INPUT, 30,
"ip.ttl == {0, 1}", debug_drop_action(),
lflow_ref);
> The full solution here, AFAIU, is to actually perform a route lookup
> on the packet source IP and use the nexthop network as a source for
> the ICMP error. This is very similar to the issue we fixed earlier
> for the SNAT:
>
> af6e83707568 ("northd: Use next-hop network for SNAT when
> lb_force_snat_ip=router_ip.")
>
> But the implementation may be a little more complicated as we do not
> already have a routing result here.
>
Indeed, we never do route lookup based on source IP today.
> A much simpler, but not fully correct solution would be to match the
> source network as it is done in this patch, but we also need a lower
> priority logical flow that will capture all the packets from networks
> not directly attached to this router port. And there we would need to
> just reply from the first address, if we don't want to perform a full
> route lookup. This will ignore all the routing configuration including
> policies and stuff and will reply from a wrong IP whenever the packet
> is from a remote network, but will surely be better than what we have
> today.
>
I think we could just change the default drop flow mentioned above to
instead reply with an ICMP error from the first configured IP address on
that router port. We might want to use different flows for v4 and v6
though as router ports always have an IPv6 LLA configured and we should
probably not use the LLA to reply to non link local traffic.
> Thoughts?
>
Also, I think "wrong" is a strong word in the statement above. It's of
course possible that users configure multiple IP networks on the same
broadcast domain with only some of the hosts in that domain being able
to route towards all the OVN IPs. But my hunch is that this kind of
topology is not that common.
> Dumitru, since you're the reporter of this issue, what do you think?
>
Given that as far as I know we had no complaints from users about the
(incorrect) behavior I'm inclining towards fixing the issue by doing
what Alexandra suggested in this patch with the additional change of the
default drop flow to always reply with the first router port IP. At
least for now.
This could also be easily backported to all stable branches. We'd also
need to add:
Fixes: c0321040c703 ("OVN: add ICMPv6 time exceeded support to OVN
logical router")
Fixes: 7f19374c5933 ("OVN: add ICMP time exceeded support to OVN logical
router")
For the "full" solution, I guess we should probably mark somehow that a
packet needs an ICMP error to be generated because its TTL expired (all
other ICMP errors we currently generate are for IP traffic destined to
router owned IPs). In that same flow we could already swap the source
and destination IPs.
Then later, after the route lookup (which now happens with the original
source as destination in the packet), match on the flag we set earlier
(*) and use the same logic as in af6e83707568 ("northd: Use next-hop
network for SNAT when lb_force_snat_ip=router_ip.") and select the
correct network IP to use as source (esentially move the icpm4/6 {}
action later in the pipeline).
But this is a more intrusive change to the pipeline and I'd do it as a
follow up patch.
(*) I guess we could also avoid the flag by just matching again on TTL =
{0, 1}.
Regards,
Dumitru
> Best regards, Ilya Maximets.
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev