Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-26 Thread David Schinazi
On Wed, Jul 26, 2023 at 11:10 AM Juliusz Chroboczek  wrote:

> > CONNECT-UDP
>
> Come on, David, we all know that MASQUE is an elaborate practical joke.
> With draft-asedeno-masque-connect-ethernet, you guys are obviously trying
> to see how far you can go before people realise you're taking the piss.
>

On that note, you know a lot about 802.11, can you help us with our
upcoming draft-masque-connect-layer-1 ?

But more seriously, L2 VPNs already exist (c.f. L2TP, OpenVPN, etc), MASQUE
is just trying to match the state of the art here. Whether the state of the
art is
where we want it to be is a different question, but for that we need a few
chairs
and beers. Are you planning on attending the Prague IETF in November?

> (You could even perform dichotomy there to measure the exact MTU and
> update
> > the OS link MTU based on that,
>
> Sure.  With v4-via-v6, we're already silentrly enabling IPv6 transit, so
> there's some precedent to fixing the system without the admin's knowledge
> :-)
>

Nice. :-)
___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-26 Thread Juliusz Chroboczek
> CONNECT-UDP

Come on, David, we all know that MASQUE is an elaborate practical joke.
With draft-asedeno-masque-connect-ethernet, you guys are obviously trying
to see how far you can go before people realise you're taking the piss.

> (You could even perform dichotomy there to measure the exact MTU and update
> the OS link MTU based on that,

Sure.  With v4-via-v6, we're already silentrly enabling IPv6 transit, so
there's some precedent to fixing the system without the admin's knowledge :-)

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-26 Thread David Schinazi
On Wed, Jul 26, 2023 at 5:18 AM Juliusz Chroboczek  wrote:

> > While you're absolutely right that this MUST NOT happen, in practice it
> does.
>
> I think we're in at least partial agreement.  The point I'm making is that
> this configuration is not something that's supported by IP, and that VPN
> implementations that cause MTU blackholes are quite simply buggy.
>

Agreed.

  (There's an argument to be made that IPv6 should support variable MTU
>   links.  Good luck pushing this idea at the IETF, which, of late, appers
>   to be mostly interested in breaking the e2e principle and proxying
>   everything at the application layer.  Sorry for the rant.)
>

(As a proxy enthusiast, I have thoughts :P. In my view, the e2e principle
as we knew it broke when people started deploying TCP "accelerators".
We brought back transport-layer e2e with QUIC thanks to e2e encryption.
So in my view, QUIC is e2e but TCP, UDP, and IP are not. In that world,
CONNECT-UDP allows you to maintain e2e because it allows QUIC.
Sorry for the rant reply, but I couldn't resist)

Of course, in practice misconfiguration happens, and so it's a good thing
> to be able to be able to automatically detect misconfiguration and discard
> the link.


Definitely. Thanks for implementing and deploying that by the way.


> It would be even better to be able to notify the network
> administrator of the issue, but that would be a little more work than I'm
> willing to do right now.
>

babeld automatically emailing sysadmins sounds like a fun time :-)

(For example, we could send Hellos in a small packets, in order to
> discover neighbours, and then send a small number of Ack Requests padded
> to MTU to every discovered neighbour.  If a neighbour never answers the
> Ack Request, then it's fairly strong evidence that there's something
> wrong.)
>

(You could even perform dichotomy there to measure the exact MTU and update
the OS link MTU based on that, but I agree that's not necessarily babeld's
job.)

David
___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-26 Thread Juliusz Chroboczek
> While you're absolutely right that this MUST NOT happen, in practice it does.

I think we're in at least partial agreement.  The point I'm making is that
this configuration is not something that's supported by IP, and that VPN
implementations that cause MTU blackholes are quite simply buggy.

  (There's an argument to be made that IPv6 should support variable MTU
  links.  Good luck pushing this idea at the IETF, which, of late, appers
  to be mostly interested in breaking the e2e principle and proxying
  everything at the application layer.  Sorry for the rant.)

Of course, in practice misconfiguration happens, and so it's a good thing
to be able to be able to automatically detect misconfiguration and discard
the link.  It would be even better to be able to notify the network
administrator of the issue, but that would be a little more work than I'm
willing to do right now.

(For example, we could send Hellos in a small packets, in order to
discover neighbours, and then send a small number of Ack Requests padded
to MTU to every discovered neighbour.  If a neighbour never answers the
Ack Request, then it's fairly strong evidence that there's something wrong.)

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-26 Thread Daniel Gröber
On Wed, Jul 26, 2023 at 02:02:14PM +0200, Juliusz Chroboczek wrote:
> > Uups, nevermind this. I was looking at the other node's hellos. The
> > neighbour relationship goes down properly as you'd expect.
> 
> Merged into master.  Shall I release 13.1?

I think you mean 1.13, but that's ready relased so it'll have to be 14.1 eer 
1.14 :)

I would appreciate it yeah.

--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-26 Thread Juliusz Chroboczek
> Uups, nevermind this. I was looking at the other node's hellos. The
> neighbour relationship goes down properly as you'd expect.

Merged into master.  Shall I release 13.1?

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-20 Thread Juliusz Chroboczek
> I can observe (some) hellos using the padding depending on the option
> setting. Problem is when I force the interface MTU to 1280 instead of the
> initial 1420 the padded hellos get dropped and don't reach the other side
> as you'd expect, but the regular sized hellos still make it through and so
> the neighbourship relationship stays up.

Clarified in the other mail, good.

> Here's an idea: what if we pad the IHU response instead of all hellos? That
> might have slightly less control overhead when RTT isn't enabled as you
> don't need to respond to every hello then? I'm not sure how babeld
> schedules IHU sending exactly.

The Hellos are periodic, so the overhead is constant.  The number of IHUs
is proportional to the number of neighbours, so there might be arbitrarily
many of those.

I'm pretty sure it doesn't matter much in practice, but at least with
Hellos the amount of overhead is easy to predict.

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-20 Thread Daniel Gröber
On Thu, Jul 20, 2023 at 02:40:40PM +0200, Daniel Gröber wrote:
> On Wed, Jul 19, 2023 at 11:25:52PM +0200, Juliusz Chroboczek wrote:
> > Could you please test the new branch "probe-mtu"?  It's now using the
> > IPV6_DONTFRAG cmsg in sendmsg, so it's enough to say
> > 
> > default probe-mtu true
> > 
> > (No global options, only per-interface options.)
> 
> I can observe (some) hellos using the padding depending on the option
> setting. Problem is when I force the interface MTU to 1280 instead of the
> initial 1420 the padded hellos get dropped and don't reach the other side
> as you'd expect, but the regular sized hellos still make it through and so
> the neighbourship relationship stays up.

Uups, nevermind this. I was looking at the other node's hellos. The
neighbour relationship goes down properly as you'd expect.

--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-20 Thread Daniel Gröber
Hi Juliusz,

On Wed, Jul 19, 2023 at 11:25:52PM +0200, Juliusz Chroboczek wrote:
> Could you please test the new branch "probe-mtu"?  It's now using the
> IPV6_DONTFRAG cmsg in sendmsg, so it's enough to say
> 
> default probe-mtu true
> 
> (No global options, only per-interface options.)

I can observe (some) hellos using the padding depending on the option
setting. Problem is when I force the interface MTU to 1280 instead of the
initial 1420 the padded hellos get dropped and don't reach the other side
as you'd expect, but the regular sized hellos still make it through and so
the neighbourship relationship stays up.

Here's an idea: what if we pad the IHU response instead of all hellos? That
might have slightly less control overhead when RTT isn't enabled as you
don't need to respond to every hello then? I'm not sure how babeld
schedules IHU sending exactly.

--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-19 Thread Juliusz Chroboczek
Daniel,

Could you please test the new branch "probe-mtu"?  It's now using the
IPV6_DONTFRAG cmsg in sendmsg, so it's enough to say

default probe-mtu true

(No global options, only per-interface options.)

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-19 Thread Juliusz Chroboczek
> To test dont-fragment I first set it to disabled and changed the
> (wireguard) interface MTU from 1420 to 1280 at runtime. Doing this I can
> observe babel hellos being fragmented in tcpdump.
> 
> When setting dont-fragment true this trick doesn't work and the neighbour
> relationship to the other node doesn't get established.
> 
> So it looks like it's working.

Thanks for the report.

I think I'll rework it to use the per-message option as you suggested in
a previous mail, so that probe-mtu automatically triggers dont-fragment on
the affected interface.  One configuration option less.

Then, after you confirm it still works for you, I'll merge into master.

Thanks again,

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-19 Thread Daniel Gröber
Hi Juliusz,

While my (now fixed) tunnel stacking mitigation works for locally generated
wg packets it doesn't when they are being routed for another host on the
(ethernet) network. This was what motivated the MTU probing idea in the
first place.

I belive the probe-mtu option is still useful in general and even somewhat
in the wireguard case. The hello packet padding will force PMTU discovery
on the tunnel endpoint address to happen, which in turn allows my nftables
rule to trigger even when the apparent interface MTU is 1500 :)

Since that's a bit of a hack I've added another rule to my mitigation to
just filter fragmented wireguard packets outright:

meta mark 0x1000  meta protocol ip6  exthdr frag != missing  counter drop

On Wed, Jul 19, 2023 at 12:04:02AM +0200, Juliusz Chroboczek wrote:
> Completely untested.  Please checkout the branch "probe-mtu", then say
> this in your config file:
> 
> dont-fragment true
> default probe-mtu true

The padding logic looks good. I can see hello packet of the right
(interface MTU) size leaving when probe-mtu is enabled.

To test dont-fragment I first set it to disabled and changed the
(wireguard) interface MTU from 1420 to 1280 at runtime. Doing this I can
observe babel hellos being fragmented in tcpdump.

When setting dont-fragment true this trick doesn't work and the neighbour
relationship to the other node doesn't get established.

So it looks like it's working.

Thanks,
--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-19 Thread Daniel Gröber
Hi,

after some more testing and tcpdumping I have to revise my theory of what's
going on, turns out Juliusz was right all along (as usual :)

On Mon, Jul 17, 2023 at 02:26:40AM +0200, Daniel Gröber wrote:
> Let me try to give some more context:
> 
> My mesh network deploys two wg tunnels per node. One wg-over-v6 and one
> wg-over-v4 tunnel to support dualstack, v4-only and v6-only underlay
> networks.
> 
> Nodes run babel over all wg interfaces and will receive a default route
> covering the wg-over-v6 tunnel endpoint addresses. Some nodes are served by
> IPv6 routers that are themselves part of the wg mesh network and only have
> v6 connectivity via wg-over-v4.
> 
> This can cause wg-over-v6 tunnels on such nodes to want to cross a
> wg-over-v4 tunnel.
> 
> All wg interfaces have MTU 1420 configured which is the worst case for
> wg-over-v6 or v4 (with MTU 1500). In the wg-over-wg-over-v4 case this
> results in packets that are too big for the v4 underlay network
> (1420+80+60=1560).
> 
> Wireguard drops packets when they exceed the underlay network's MTU.

Not true, wireguard will fragment it's UDP packets based on PMTU results if
available in the route cache (ip -6 route show cache). It does this by
setting skb->ignore_df=1.

> When this happens no PTB ICMP errors are generated by wireguard inside
> the tunnel

This is still true, wireguard does not forward ICMP PTB errors from the
endpoint to inside the tunnel, but it doesn't need to since fragmentation
happens on it's UDP packets.

Now one would expect wg tunnel stacking to just work despite fragmentation
of the encapsulate packets being inefficient and still undesirable. However
it turns out two of my tunnel stacking mitigation attempts taken together
were conspiring against me!

My first approach to fixing the tunnel stacking was to force wireguard
output packets to be sent over the (ethernet) upstream interface only,
using policy routing. This turns out to be ineffective see below.

On top of this I applied the following nftables rule to prevent wg output
from ever going over interfaces with MTU less than 1500. This was
originally concieved to accomodate workstations rather than "core" routers
but was rolled out on the routers too. 

meta mark 0x1000  meta protocol ip6  rt mtu < 1440 \
counter reject with icmpx type admin-prohibited \
comment "wg endpoint loopback prevention"

Note the `rt mtu` match is misnamed and is actually in terms of TCP MSS so
1440+60=1500 (depends on the underlying IP protocol though). Fwmark 0x1000
is what the wg tunnels tag their encapsulated packets with.

The fatal problem here is that the first mitigation will cause the upstream
router to just hairpin the wg packets back at us since we're (usually) also
announcing the endpoint's prefix via BGP. This will cause the fwmark to get
stripped obviously so the otherwise effective nftables loopback prevention
rule was being bypassed. doh!

After removing the policy routing bit stacked tunnels seem to get pruned as
they should now.

Thanks,
--Daniel


___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-18 Thread Daniel Gröber
On Tue, Jul 18, 2023 at 03:37:14PM -0700, David Schinazi wrote:
> [ ... an hour passes by with this email half written ... ]
> 
> Oh, and in the meantime Juliusz just went ahead and implemented probe-mtu.
> Nicely done, sir! Looking at the PR it validates that the kernel-provided
> MTU gets through the network. I wonder if that breaks popular tunnel
> implementations today, as I suspect many don't set that correctly.

Ha, good thing you mentioned it I was just about to go back to patch
writing.

Interesting approach. IPV6_DONTFRAG is (again) not documented in ipv6(7) so
I had no idea this exists :)

FYI: From looking at the linux code it looks like it's possible to set
IPV6_DONTFRAG per-sendmsg() call (in the cmsg field, see
ip6_datagram_send_ctl() in linux) so this could also be a per-interface
option.

Awesome work Juliusz!

--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-18 Thread David Schinazi
Hi Juliusz,

While you're absolutely right that this MUST NOT happen, in practice it
does. A rare scenario is when routes change deep in a network causing the
e2e PMTU to change without the link MTU on the endpoints observing any
change. This phenomenon happens much more commonly on tunnels when the
tunnel takes a new path (e.g., moving IKEv2/IPsec to a different underlying
interface via RFC 4555) - in that scenario the endpoint experiencing the
migration (e.g. the cell phone) knows that something changed but the e2e
peer does not. In IPv4 this can be (poorly) solved by in-network
fragmentation, but that's not allowed in v6.

If Babel were to magically know the MTU of its interfaces (including
tunnels), it would make sense to consider that information as part of route
metrics. The remaining question is where to perform the PMTUD, it feels
like the responsibility of the tunnel but could also be reused across
different tunnel types.

[ ... an hour passes by with this email half written ... ]

Oh, and in the meantime Juliusz just went ahead and implemented probe-mtu.
Nicely done, sir! Looking at the PR it validates that the kernel-provided
MTU gets through the network. I wonder if that breaks popular tunnel
implementations today, as I suspect many don't set that correctly.

David

On Tue, Jul 18, 2023 at 1:42 PM Juliusz Chroboczek  wrote:

> >> RFC 2460: "link MTU - the maximum transmission unit, i.e., maximum
> packet
> >>size in octets, that can be conveyed over a link."
>
> > I read this as "link MTU" being the maximum packet size that you could
> ever
> > hope to be able send but the link technology could very well not allow
> the
> > maximum at times.
>
> Daniel, the specs are perfectly clear: there is no licence given to nodes
> to systematically drop packets smaller than MTU.  In fact, such links
> break TCP, as you've discovered.
>
> > I'm still not sold on your argument, but it hardly matters. Tunnels on
> top
> > of the internet exist so we kind of just have to deal with it.
>
> Nobody is denying that.  Please see RFC 4459, which describes how to make
> them work reasonably well.
>
> -- Juliusz
>
> ___
> Babel-users mailing list
> Babel-users@alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
>
___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-18 Thread Juliusz Chroboczek
Completely untested.  Please checkout the branch "probe-mtu", then say
this in your config file:

dont-fragment true
default probe-mtu true

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-18 Thread Juliusz Chroboczek
>> RFC 2460: "link MTU - the maximum transmission unit, i.e., maximum packet
>>size in octets, that can be conveyed over a link."

> I read this as "link MTU" being the maximum packet size that you could ever
> hope to be able send but the link technology could very well not allow the
> maximum at times.

Daniel, the specs are perfectly clear: there is no licence given to nodes
to systematically drop packets smaller than MTU.  In fact, such links
break TCP, as you've discovered.

> I'm still not sold on your argument, but it hardly matters. Tunnels on top
> of the internet exist so we kind of just have to deal with it.

Nobody is denying that.  Please see RFC 4459, which describes how to make
them work reasonably well.

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-17 Thread Daniel Gröber
On Mon, Jul 17, 2023 at 11:41:01AM +0200, Juliusz Chroboczek wrote:
> >> Sorry, I wasn't clear.  IP requires every link to have a well-defined
> >> MTU: all the nodes connected to a link must agree on the link's MTU.
> 
> > I don't think that can be true either. PMTU can vary and paths can be
> > asymmetric so two nodes could very well see different MTUs across the
> > internet. There's just not many ASen that run with less than 1500 MTU :)
> 
> I'm not speaking about PMTU.  I'm speaking about link MTU.

Yeah I got that confused. That what happens when you write technical emails
at 2am ;)

> > Do you have a referece for this "MTU well-definedness" criteria, I don't
> > think I ever heard of this.
> 
> RFC 2460: "link MTU - the maximum transmission unit, i.e., maximum packet
>size in octets, that can be conveyed over a link."

I read this as "link MTU" being the maximum packet size that you could ever
hope to be able send but the link technology could very well not allow the
maximum at times. Unfortunately they didn't use the usual RFC2119
requirement level terminology here so who knows :)

> RFC 4861: "All nodes on a link must use the same MTU (or Maximum Receive
>Unit) in order for multicast to work properly."

I mean that only applies when you want to run NDP over the link so that's
hardly relevant for L3 tunnel interfaces or internet backbone links in
general.

I'm still not sold on your argument, but it hardly matters. Tunnels on top
of the internet exist so we kind of just have to deal with it.

> > Wireguard drops packets when they exceed the underlay network's MTU.
> > this happens no PTB ICMP errors are generated by wireguard inside the
> > tunnel,
> 
> If true, that's very surprising, and looks to me like a bug in Wireguard.
> 
> But yeah, I'll add an option to probe for MTU on each Hello.

I've been looking at how to implement this probing. The IPV6_MTU_DISCOVER
sockopt used to configure the kernel behaviour unfortunately conflates
multiple behaviours (oh joy), this list is for IPv6 on v4 DF also comes in
but thankfully babel only uses a v6 socket:

- whether EMSGSIZE is returned to send() when a UDP packet is too big (or
  the packet is simply dropped)
- whether the interface MTU or PMTU result controls the above error
  condition when enabled
- whether UDP send() calls with too large a size are automatically
  fragmented locally or return the error
- whether ICMP PTB messages are interpreted at all (a DNS-over-UDP security
  feature apparently)

That got me wondering: is babeld currently relying on the kernel to
fragment large UPDATE packets? From my reading of the code it doesn't look
like it. If my reading is right `(struct buffered).size` determines the
maximum UDP payload size and this is initialized from the interface MTU.

This means we can probably just set IPV6_MTU_DISCOVER to the undocumented
IP_PMTUDISC_INTERFACE[1] to maximally disable PMTU behaviour. This option
1) prevents local fragmentation of any sort (interface MTU or PMTU), 2)
disables updating the PMTU cache from ICMP-PTB messages for this socket
since we don't need that anyway and 3) causes too big send() calls to fail
with EMSGSIZE (if my reading of the kernel code is right).

[1]: Introduced around 2013, see kernel commits 482fc6094a 93b36cf342
1b34657635 0b95227a7b for the full story.

In principle we could also use the older IP_PMTUDISC_DONT since we don't
technically have to turn off ICMP-PTB interpretation but I fell like it's
neater if we disable that too.

--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-17 Thread Johannes Kimmel

Hi,

I also wish there was some form of ensuring a path with a minimum MTU. 
My use case is providing a minimum MTU for VXLAN overlay networks in 
very heterogeneous networks consisting of different tunnel mechanisms 
(gre, wireguard, via v4 and v6), direct ethernet links and ptp connections.


To allow proper L2 connections with an MTU of 1500, links must at least 
have an MTU of 1570 to have room for unfragmented VXLAN packets. This is 
important since VXLAN VTEPs must not fragment packets [0], which is very 
annoying in this case.


Having some sort of mechanism within babel that propagates routes 
between (not necessarily directly connected) VTEPs, that ensures a 
minimum MTU along a path would be very welcome. Otherwise packets might 
choose a path with lower metric and insufficient MTU, which will cause 
in dropped packets.


Cheers

[0]: https://datatracker.ietf.org/doc/html/rfc7348#section-4.3

On 16.07.23 20:51, Daniel Gröber wrote:

Hi babelers,

I've been running babel on top of my wireguard IPv6 network for a while now
and I have a problem that keeps biting me and I can't find a good solution
for: babel is oblivious to a link's MTU and picks paths that involve
wireguard-in-wireguard tunnels even though paths without this stacking are
available.

The stacking (and subsequent path MTU reduction) is I belive not even
bounded, so there is no static MTU I could configure on all my hosts to
take care of this like one would do with a plain wireguard setup.

I was able to fix this on my routers by configuring the firewall to drop
UDP tunnel packets that are going to traverse interfaces with
MTU<=1440. This works alright but I also have babel running on workstations
that are behind these routers and there is no good way to classify which
UDP packets are part of my network's wireguard tunnels and which aren't.

So this got me thinking (for the hundreth time) perhaps this should be
something the routing protocol takes care of? Babeld would essentially have
to pad it's hello packets to a (configurable) size to detect if
fragmentation is required (or they are being blackholed outright).

My use-case would be well served if I could just specify a minimum MTU all
paths must satisfy though more elaborate things could be done I suppose
(metric based on MTU?).

Opinions? Anybody have any better ideas on how to prevent this sort of
tunnel stacking?

Thanks,
--Daniel

PS: Just to clarify why the tunnel stacking happens in my setup: my network
tunnels IPv6 over IPv4 (most of the time), but I want to support IPv6-only
underlay networks so I have wireguard tunnels with IPv6 endpoints which can
in turn get routed over V6-over-V4 wg tunnels (when the ether is flowing
just right).

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-17 Thread Juliusz Chroboczek
>> Sorry, I wasn't clear.  IP requires every link to have a well-defined
>> MTU: all the nodes connected to a link must agree on the link's MTU.

> I don't think that can be true either. PMTU can vary and paths can be
> asymmetric so two nodes could very well see different MTUs across the
> internet. There's just not many ASen that run with less than 1500 MTU :)

I'm not speaking about PMTU.  I'm speaking about link MTU.

> Do you have a referece for this "MTU well-definedness" criteria, I don't
> think I ever heard of this.

RFC 2460: "link MTU - the maximum transmission unit, i.e., maximum packet
   size in octets, that can be conveyed over a link."

RFC 4861: "All nodes on a link must use the same MTU (or Maximum Receive
   Unit) in order for multicast to work properly."

> Wireguard drops packets when they exceed the underlay network's MTU.
> this happens no PTB ICMP errors are generated by wireguard inside the
> tunnel,

If true, that's very surprising, and looks to me like a bug in Wireguard.

But yeah, I'll add an option to probe for MTU on each Hello.

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-16 Thread Daniel Gröber
On Mon, Jul 17, 2023 at 12:47:30AM +0200, Juliusz Chroboczek wrote:
> >> IP does not support variable MTU links.
> > 
> > Excuse me, but that's plain false. IP was designed in an environment where
> > (non-ethernet) networks with various MTU standards were commonplace
> 
> Sorry, I wasn't clear.  IP requires every link to have a well-defined
> MTU: all the nodes connected to a link must agree on the link's MTU.

I don't think that can be true either. PMTU can vary and paths can be
asymmetric so two nodes could very well see different MTUs across the
internet. There's just not many ASen that run with less than 1500 MTU :)

Do you have a referece for this "MTU well-definedness" criteria, I don't
think I ever heard of this.

> > There is a way: My routing protocol just has to stop picking links that are
> > obviously going to cause a problem.
> 
> Could you please describe the problem in detail?  Because I'm probably
> missing something.

Let me try to give some more context:

My mesh network deploys two wg tunnels per node. One wg-over-v6 and one
wg-over-v4 tunnel to support dualstack, v4-only and v6-only underlay
networks.

Nodes run babel over all wg interfaces and will receive a default route
covering the wg-over-v6 tunnel endpoint addresses. Some nodes are served by
IPv6 routers that are themselves part of the wg mesh network and only have
v6 connectivity via wg-over-v4.

This can cause wg-over-v6 tunnels on such nodes to want to cross a
wg-over-v4 tunnel.

All wg interfaces have MTU 1420 configured which is the worst case for
wg-over-v6 or v4 (with MTU 1500). In the wg-over-wg-over-v4 case this
results in packets that are too big for the v4 underlay network
(1420+80+60=1560).

Wireguard drops packets when they exceed the underlay network's MTU. When
this happens no PTB ICMP errors are generated by wireguard inside the
tunnel, packets are simply dropped and TCP applications running on the
overlay IPv6 network break badly as no ICMP errors reach the sender.

This can be avoided by simply ignoring the wg-over-v6 tunnel which only
exists for deployment consistency as a wg-over-v4 tunnel with (actual) 1440
MTU is available too which can reach the entire network.

Worth mentioning: The reason I have to run two wg tunnels per node to begin
with is that wireguard's strategy for dual-stack support is that it doesn't
have one. It supports only one endpoint address per tunnel (well wg-peer
really) and if you pick wrong because, say, IPv6 addresses are available
but dont work, the tunnel simply blackholes everything. Yey, joy is me.

> If Wireguard implements RFC 4459 Section 3.2, then pushing a too large
> packet over the tunnel, then Wireguard should synthesise an ICMP "packet
> too large", which will cause the sender to retry with a smaller packet.
> Is that not the case?

Yeah, having wg forward PTB errors from the underlay to inside the tunnel
was something I considered for fixing this but I belive that would be
called "insecure" by the wg project since the ICMP erros aren't signed like
normal wireguard packets. So what happens when an attacker sends spoofed
PTB with MTU=0 etc. ;)

Furthermore on IPv4 which unfortunately is the underlay in my network more
often than not ICMP blackholes are very common so breakage would could
ensue again.

This really is just putting lipstick on a pig. It would "work" I suppose
but I don't want my network to use these paths because the double
encapsulation is just plain inefficient!

Prune thy inefficient paths I say :]

> I'm not opposed to your probing idea, but I'd really prefer to fully
> understand the problem first.

Sure thing, I'm not opposed to working the problem. I've just been dealing
with this problem (and ducktape "solutions" surrounding it) for a while now
and I just want to get this squared away so I can go back to my (mostly)
IPv6-only bliss :D

I think RFC4459 simply didn't consider L3 routing protocol based
solutions. Probably since the usual network vendor suspects would never
implementing something uncouth like this but we need not be constrained by
the inefficiencies of the commercial world in the free software community,
now do we :)

Speaking of which I'm working on a babeld patch to see if my idea
works. Just have to dig through the kernel code first to figure out which
one of the amazingly (badly) named IP_PMTUDISC_* options I want to use to
force it to neither do fragmentation nor attempt PMTU for the babel socket.

Thanks,
--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-16 Thread Juliusz Chroboczek
>> IP does not support variable MTU links.
> 
> Excuse me, but that's plain false. IP was designed in an environment where
> (non-ethernet) networks with various MTU standards were commonplace

Sorry, I wasn't clear.  IP requires every link to have a well-defined
MTU: all the nodes connected to a link must agree on the link's MTU.

Now, I agree that it is possible to simulate a variable-MTU link, as
describes in RFC 4459 Section 3.2, and it will mostly work.  But that's
not what IP was designed for, and I don't know whether it's possible to
make it reliable.

> There is a way: My routing protocol just has to stop picking links that are
> obviously going to cause a problem.

Could you please describe the problem in detail?  Because I'm probably
missing something.

If Wireguard implements RFC 4459 Section 3.2, then pushing a too large
packet over the tunnel, then Wireguard should synthesise an ICMP "packet
too large", which will cause the sender to retry with a smaller packet.
Is that not the case?

I'm not opposed to your probing idea, but I'd really prefer to fully
understand the problem first.

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-16 Thread Daniel Gröber
On Sun, Jul 16, 2023 at 11:43:44PM +0200, Juliusz Chroboczek wrote:
> IP does not support variable MTU links.

Excuse me, but that's plain false. IP was designed in an environment where
(non-ethernet) networks with various MTU standards were commonplace and
this is very much supported. Why else would we have standards for Path MTU
discovery cf. RFC1191/RFC1981 that have become mandatory for IPv6?

> And every tunnel is able to carry packets up to its MTU?  If that's not
> the case, then there's no way your network can work,

There is a way: My routing protocol just has to stop picking links that are
obviously going to cause a problem. The way my network is structured
(remember: mesh network) there always is a path that avoids the tunnel
overhead stacking problem but since babel is blind to it it can and does
pick problematic paths sometimes.

> > Enable a config option for "minimum path MTU" on each babel node. Nodes
> > then pad all hello packets to this size and set appropriate sockopts to
> > stop the kernel from doing PMTUdisc behind our backs (on IPv6) and setting
> > DF=1 (on IPv4).
>
> We can only control fragmentation in the overlay.

True, but controlling fragmentation in the underlay is simply not
necessary. If the tunnel underlay were to fragment[1] my tunnel MTU
wouldn't be impacted so it doesn't break anything and babel can feel free
to use that path. Only the case where the underlay drops packets instead of
fragmenting is relevant.

[1]: Which has pps performance implications and is hence usually
avoided. Wireguard in particular doesn't allow fragmentation.

FYI: Do note that with IPv6 in-network fragmentation is not a "thing"
anymore, this is IPv4 legacy think :) Endpoints fragment nobody else.

> Can you explain what the tunnelling protocol will do, and whether it will
> prevent fragmantation in the underlay?

>From what I observed it's clear Wireguard never fragmentsw it's UDP packets
so it likely sets DF=1 when run on top of v4 and ignores PMTU on v6. IMO
that's a reasonable behaviour for a tunnel protocol.

Thanks,
--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-16 Thread Juliusz Chroboczek
> Problem is when the underlay L3 network is composed of more tunnels and not
> 1500 MTU ethernet links, then at each hop the path MTU could be reduced by
> the tunnel overhead again and again and again (across the entire
> path). Hence no predictable MTU I can deploy across all my interfaces
> exists. QED :)

I'm still not following.  Every tunnel has an MTU, right?  And every
tunnel is able to carry packets up to its MTU?  If that's not the case,
then there's no way your network can work, since IP does not support
variable MTU links.

> Enable a config option for "minimum path MTU" on each babel node. Nodes
> then pad all hello packets to this size and set appropriate sockopts to
> stop the kernel from doing PMTUdisc behind our backs (on IPv6) and setting
> DF=1 (on IPv4).

We can only control fragmentation in the overlay.  Can you explain what
the tunnelling protocol will do, and whether it will prevent fragmantation
in the underlay?

-- Juliusz



___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-16 Thread Daniel Gröber
Hi Juliusz,

On Sun, Jul 16, 2023 at 09:22:40PM +0200, Juliusz Chroboczek wrote:
> > I've been running babel on top of my wireguard IPv6 network for a while now
> > and I have a problem that keeps biting me and I can't find a good solution
> > for: babel is oblivious to a link's MTU and picks paths that involve
> > wireguard-in-wireguard tunnels even though paths without this stacking are
> > available.
> 
> Is the MTU of your interfaces set correctly?  Please type
> 
> ip link show
> 
> and check that the value is right.
> 
> Babeld already checks the interface's MTU, so if the MTU is set correctly,
> it's a simple matter of tweaking this code:
> 
>   https://github.com/jech/babeld/blob/master/interface.c#L300
> 
> If the MTU is not set correctly, then you'll run into trouble with
> higher-layer protocols.

I must have not explained the problem sufficiently because the interface
MTU doesn't matter at all here. All that is important is that tunnel
interfaces are involved in the L3 network carrying tunnel packets.

"Usually" the underlying L3 network is the IPv4 internet which has a (more
or less) predictable 1500 MTU, though I would call that a very
1500MTU-normative assessment. So the tunnel interface's MTU will be
1500 minus overhead. Easy.

Problem is when the underlay L3 network is composed of more tunnels and not
1500 MTU ethernet links, then at each hop the path MTU could be reduced by
the tunnel overhead again and again and again (across the entire
path). Hence no predictable MTU I can deploy across all my interfaces
exists. QED :)

Babeld really has to take care of the the *PATH* MTU not just look at
whatever is configured on the local interfaces for this to work.

Here's one way this could be done:

Enable a config option for "minimum path MTU" on each babel node. Nodes
then pad all hello packets to this size and set appropriate sockopts to
stop the kernel from doing PMTUdisc behind our backs (on IPv6) and setting
DF=1 (on IPv4). When paths with lesser MTU are encountered these packets
will simply get dropped by the network preventing neighbour relationships
from forming.

Problem solved :)

--Daniel

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] MTU based routing for tunnel based babel networks?

2023-07-16 Thread Juliusz Chroboczek
> I've been running babel on top of my wireguard IPv6 network for a while now
> and I have a problem that keeps biting me and I can't find a good solution
> for: babel is oblivious to a link's MTU and picks paths that involve
> wireguard-in-wireguard tunnels even though paths without this stacking are
> available.

Is the MTU of your interfaces set correctly?  Please type

ip link show

and check that the value is right.

Babeld already checks the interface's MTU, so if the MTU is set correctly,
it's a simple matter of tweaking this code:

  https://github.com/jech/babeld/blob/master/interface.c#L300

If the MTU is not set correctly, then you'll run into trouble with
higher-layer protocols.

> So this got me thinking (for the hundreth time) perhaps this should be
> something the routing protocol takes care of? Babeld would essentially have
> to pad it's hello packets to a (configurable) size to detect if
> fragmentation is required (or they are being blackholed outright).

That's certainly a good idea, it would allow us to discard interfaces
whose MTU is set incorrectly.  I'll think it over.

> PS: Just to clarify why the tunnel stacking happens in my setup: my network
> tunnels IPv6 over IPv4 (most of the time), but I want to support IPv6-only
> underlay networks so I have wireguard tunnels with IPv6 endpoints which can
> in turn get routed over V6-over-V4 wg tunnels (when the ether is flowing
> just right).

Hehe.

-- Juliusz

___
Babel-users mailing list
Babel-users@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users