Re: BIRD patches for IP-in-IP

2016-12-22 Thread Ondrej Zajicek
On Thu, Dec 22, 2016 at 01:57:56AM +, Mohammad Banikazemi wrote:
>Hi, I just came across the following exchange on the BIRD mailing list and
>wanted to verify if the suggested solution is already available in BIRD.
>In particular,

Hi

No, it is not. But thanks for reminding it to me. I definitely should add that.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: BIRD patches for IP-in-IP

2016-12-21 Thread Mohammad Banikazemi
Hi, I just came across the following exchange on the BIRD mailing list and wanted to verify if the suggested solution is already available in BIRD. In particular,
 
>> it seems like simplest approach is just to allow setting>> 'onlink' flag and iface from BGP import filter, like:
>> _onlink_ = true;>> iface = "tunl0";>> gw = bgp_nexthop;
 
Does BIRD support setting these options (onlink, iface, and gw options) already?
 
>> and some option that avoids default gateway setting by BGP protocol.
 
Is this something that can be simply configured in BIRD or it requires changes to BIRD itself? Could you please elaborate.
 
Thanks,
 
Mohammad
 
 
 
On Tue, Sep 27, 2016 at 03:09:52PM +, Neil Jerram wrote:> Hi BIRD users!>> Attached are 3 patches that my team has been using for routing through> IP-in-IP tunnels, rebased on 1.6.1.  I'd like to explain why we find them> useful, and start a conversation about whether they or something like them> could be upstreamed (or perhaps if there's some better way of achieving our> aims).> ...> 1. Does the routing approach above make sense?  (Or is there some better or> simpler or already supported way that we could achieve the same thing?)
Hi
Using BGP-based routing in NBMA tunnels is an interesting approach. Wedefinitely should support this. But i would avoid things like 'krt_tunnel'attribute until we have support for lightweight tunnels using RTA_ENCAP.
For IPIP tunnels, it seems like simplest approach is just to allow setting'onlink' flag and iface from BGP import filter, like:
_onlink_ = true;iface = "tunl0";gw = bgp_nexthop;
and some option that avoids default gateway setting by BGP protocol.Does this make sense?
BTW, it seems that this approach works for NBMA IPIP tunnels but not forNBMA GRE tunnels, due to a hack that IPIP code accepts 'onlink' gw as anouter IP address, while GRE code resolves next hops through 'neighborcache' to get outer IP addresses, so this must be used to get similarbehavior:
 ip neigh add 10.1.1.1 lladdr 10.1.1.1 dev gre0 ip route add 10.1.2.0/24 via 10.1.1.1 dev gre0 onlink
This is conceptually more clear and has some other advantages, but inthis case IPIP behavior is more useful. Does anybody know if there is away how to convince GRE iface to behave like IPIP iface in this regard?
--Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)"To err is human -- to blame it on a computer is even more so."-- next part --A non-text attachment was scrubbed...Name: signature.ascType: application/pgp-signatureSize: 181 bytesDesc: Digital signatureURL: ;;



Re: BIRD patches for IP-in-IP

2016-09-28 Thread 'Gustavo Ponza'

On 09/29/2016 12:04 AM, Ondrej Zajicek wrote:

On Wed, Sep 28, 2016 at 05:29:01PM +0200, 'Gustavo Ponza' wrote:

Hi Ondrej,


Using BGP-based routing in NBMA tunnels is an interesting approach. We
definitely should support this. But i would avoid things like 'krt_tunnel'
attribute until we have support for lightweight tunnels using RTA_ENCAP.

For IPIP tunnels, it seems like simplest approach is just to allow setting
'onlink' flag and iface from BGP import filter, like:

onlink = true;
iface = "tunl0";
gw = bgp_nexthop;

and some option that avoids default gateway setting by BGP protocol.
Does this make sense?

is it possible to extend the above feature for IPIP encapsulation
on RIPv2 routing environment? Thanks

Yes, most of that is protocol-independent.



Very thanks!

--

73, gus i0ojj
A proud member of linux team



Re: BIRD patches for IP-in-IP

2016-09-28 Thread Ondrej Zajicek
On Wed, Sep 28, 2016 at 05:29:01PM +0200, 'Gustavo Ponza' wrote:
> Hi Ondrej,
> 
> >Using BGP-based routing in NBMA tunnels is an interesting approach. We
> >definitely should support this. But i would avoid things like 'krt_tunnel'
> >attribute until we have support for lightweight tunnels using RTA_ENCAP.
> >
> >For IPIP tunnels, it seems like simplest approach is just to allow setting
> >'onlink' flag and iface from BGP import filter, like:
> >
> >onlink = true;
> >iface = "tunl0";
> >gw = bgp_nexthop;
> >
> >and some option that avoids default gateway setting by BGP protocol.
> >Does this make sense?
> 
> is it possible to extend the above feature for IPIP encapsulation
> on RIPv2 routing environment? Thanks

Yes, most of that is protocol-independent.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: BIRD patches for IP-in-IP

2016-09-28 Thread 'Gustavo Ponza'

Hi Ondrej,


Using BGP-based routing in NBMA tunnels is an interesting approach. We
definitely should support this. But i would avoid things like 'krt_tunnel'
attribute until we have support for lightweight tunnels using RTA_ENCAP.

For IPIP tunnels, it seems like simplest approach is just to allow setting
'onlink' flag and iface from BGP import filter, like:

onlink = true;
iface = "tunl0";
gw = bgp_nexthop;

and some option that avoids default gateway setting by BGP protocol.
Does this make sense?


is it possible to extend the above feature for IPIP encapsulation
on RIPv2 routing environment? Thanks

--

73, gus i0ojj
A proud member of linux team



Re: BIRD patches for IP-in-IP

2016-09-28 Thread Ondrej Zajicek
On Wed, Sep 28, 2016 at 02:54:26PM +0200, Christian Tacke wrote:
> 
> Hi,
> 
> I have followed this only a little...
> 
> 
> On Wed, Sep 28, 2016 at 14:24:32 +0200, Ondrej Zajicek wrote:
> [...]
> > For IPIP tunnels, it seems like simplest approach is just to allow setting
> > 'onlink' flag and iface from BGP import filter, like:
> > 
> > onlink = true;
> > iface = "tunl0";
> [...]
> 
> Hmm, why not the krt_ prefix?

These are generic route options. Option iface is already here [*], just
read-only.

[*] Well, named ifname / ifindex, not iface.

> And maybe they should be set in the kernel export filter
> then?

You should be able to set any option anywhere.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: BIRD patches for IP-in-IP

2016-09-28 Thread Christian Tacke

Hi,

I have followed this only a little...


On Wed, Sep 28, 2016 at 14:24:32 +0200, Ondrej Zajicek wrote:
[...]
> For IPIP tunnels, it seems like simplest approach is just to allow setting
> 'onlink' flag and iface from BGP import filter, like:
> 
> onlink = true;
> iface = "tunl0";
[...]

Hmm, why not the krt_ prefix?
And maybe they should be set in the kernel export filter
then?

Or maybe I have missed something... Excuse me in that case!


Cheers

Christian

-- 
www.cosmokey.com


Re: BIRD patches for IP-in-IP

2016-09-28 Thread Ondrej Zajicek
On Tue, Sep 27, 2016 at 03:09:52PM +, Neil Jerram wrote:
> Hi BIRD users!
> 
> Attached are 3 patches that my team has been using for routing through
> IP-in-IP tunnels, rebased on 1.6.1.  I'd like to explain why we find them
> useful, and start a conversation about whether they or something like them
> could be upstreamed (or perhaps if there's some better way of achieving our
> aims).
> ...
> 1. Does the routing approach above make sense?  (Or is there some better or
> simpler or already supported way that we could achieve the same thing?)


Hi

Using BGP-based routing in NBMA tunnels is an interesting approach. We
definitely should support this. But i would avoid things like 'krt_tunnel'
attribute until we have support for lightweight tunnels using RTA_ENCAP.

For IPIP tunnels, it seems like simplest approach is just to allow setting
'onlink' flag and iface from BGP import filter, like:

onlink = true;
iface = "tunl0";
gw = bgp_nexthop;

and some option that avoids default gateway setting by BGP protocol.
Does this make sense?


BTW, it seems that this approach works for NBMA IPIP tunnels but not for
NBMA GRE tunnels, due to a hack that IPIP code accepts 'onlink' gw as an
outer IP address, while GRE code resolves next hops through 'neighbor
cache' to get outer IP addresses, so this must be used to get similar
behavior:

 ip neigh add 10.1.1.1 lladdr 10.1.1.1 dev gre0
 ip route add 10.1.2.0/24 via 10.1.1.1 dev gre0 onlink

This is conceptually more clear and has some other advantages, but in
this case IPIP behavior is more useful. Does anybody know if there is a
way how to convince GRE iface to behave like IPIP iface in this regard?


-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: BIRD patches for IP-in-IP

2016-09-27 Thread Neil Jerram
Hi, and thanks for your answer!

Yes, we can certainly use that approach instead.  In some of our testing we
use L2TP to create tunnels as you suggest, and then run BIRD through those
tunnels.  This approach doesn't require any BIRD modification.

However, the big advantage of the IP-in-IP approach is that it doesn't
require us to allocate (and manage) an extra IP address on every compute
host (or to do the L2TP tunnel setup, of course).  That makes many
deployments a lot simpler - and so possibly justifies the kind of BIRD
enhancement that I've described?

Regards,
Neil


On Tue, Sep 27, 2016 at 4:51 PM Baptiste Jonglez <
bapti...@bitsofnetworks.org> wrote:

> Hi,
>
> On Tue, Sep 27, 2016 at 03:09:52PM +, Neil Jerram wrote:
> > Attached are 3 patches that my team has been using for routing through
> > IP-in-IP tunnels, rebased on 1.6.1.  I'd like to explain why we find them
> > useful, and start a conversation about whether they or something like
> them
> > could be upstreamed (or perhaps if there's some better way of achieving
> our
> > aims).
> >
> > Calico [1] uses BIRD for BGP routing between the hosts in various cloud
> > orchestration systems (Kubernetes, OpenStack etc.), to distribute routes
> to
> > the pods/VMs/containers in those systems, each of which has its own IP.
> If
> > all the hosts are directly connected to each other, this is
> > straightforward, but sometimes they are not.  For example GCE instances
> are
> > not directly connected to each other: there is at least one router
> between
> > them, that knows about routing GCE addresses, and to/from the Internet,
> and
> > we cannot peer with it or otherwise tell it how to route pod/VM/container
> > IPs.  So if we use GCE to create e.g. OpenStack compute hosts, with
> Calico
> > networking, we need to do something extra to allow VM-addressed data to
> > pass between the compute hosts.
> >
> > One of our solutions is to use IP-in-IP; it works as shown by this
> diagram:
> >
> >10.65.0.3 via 10.240.0.5 dev tunl0 onlink
> >default via 10.240.0.1
> >|
> >  +-|--+ ++
> >  | o  | ||
> >  |   Host A   | ++  |   Host B   |
> >  ||-| Router |--||
> >  | 10.240.0.4 | ++  | 10.240.0.5 |
> >  ||---. ||
> >  ++|++
> >^   ^   +---v---+|
> >  src 10.65.0.2 |   |   | tunl0 ||
> >  dst 10.65.0.3 |   |   +---+|
> >|\  |v
> >  +---+   ''
>  +---+
> >  |   Pod A   |  src 10.240.0.4|   Pod B
>  |
> >  | 10.65.0.2 |  dst 10.240.0.5|
> 10.65.0.3 |
> >  +---+  --
> +---+
> >  src 10.65.0.2
> >  dst 10.65.0.3
>
> Can't you just use a tunnel between Host A and Host B and run BGP on top
> of this tunnel?  It would seem to be cleaner than hacking multi-hop BGP to
> obtain appriopriate next-hop values, unless I am missing something.
>
> It would look something like this:
>
>  +-|--+ ++
>  | o Host A   | |   Host B   |
>  || ++  ||
>  |  10.240.0.4|-| Router |--|10.240.0.5  |
>  || ++  ||
>  |   10.65.0.4|--.  +---+   +---+ .->10.65.0.5   |
>  ++   `>| tunlA |-->| tunlB |-  ++
> +---+   +---+
>
>
> The BGP session would be established between 10.65.0.4 (IP of host A on
> tunlA) and 10.65.0.5 (IP of host B on tunlB), so that the routes learnt
> via BGP would be immediately correct.
>
> Basically, it's a simple overlay network.
>
> > The diagram shows Pod A sending a packet to Pod B, using IP addresses
> that
> > are unknown to the 'Router' between the two hosts.  Host A has an
> IP-in-IP
> > device, tunl0, and a route that says to use that device for data to Pod
> B's
> > address (10.65.0.3).  When the packet has passed through that device, it
> > has a new outer IP header, with src 10.240.0.4 and dst 10.240.0.5, and is
> > routed again according to the routing table - so now it can successfully
> > reach Host B.
> >
> > So how is BIRD involved?  We statically program the local Pod route on
> each
> > host:
> >
> > On Host A: 10.65.0.2 dev 
> > On Host B: 10.65.0

Re: BIRD patches for IP-in-IP

2016-09-27 Thread Baptiste Jonglez
Hi,

On Tue, Sep 27, 2016 at 03:09:52PM +, Neil Jerram wrote:
> Attached are 3 patches that my team has been using for routing through
> IP-in-IP tunnels, rebased on 1.6.1.  I'd like to explain why we find them
> useful, and start a conversation about whether they or something like them
> could be upstreamed (or perhaps if there's some better way of achieving our
> aims).
> 
> Calico [1] uses BIRD for BGP routing between the hosts in various cloud
> orchestration systems (Kubernetes, OpenStack etc.), to distribute routes to
> the pods/VMs/containers in those systems, each of which has its own IP.  If
> all the hosts are directly connected to each other, this is
> straightforward, but sometimes they are not.  For example GCE instances are
> not directly connected to each other: there is at least one router between
> them, that knows about routing GCE addresses, and to/from the Internet, and
> we cannot peer with it or otherwise tell it how to route pod/VM/container
> IPs.  So if we use GCE to create e.g. OpenStack compute hosts, with Calico
> networking, we need to do something extra to allow VM-addressed data to
> pass between the compute hosts.
> 
> One of our solutions is to use IP-in-IP; it works as shown by this diagram:
> 
>10.65.0.3 via 10.240.0.5 dev tunl0 onlink
>default via 10.240.0.1
>|
>  +-|--+ ++
>  | o  | ||
>  |   Host A   | ++  |   Host B   |
>  ||-| Router |--||
>  | 10.240.0.4 | ++  | 10.240.0.5 |
>  ||---. ||
>  ++|++
>^   ^   +---v---+|
>  src 10.65.0.2 |   |   | tunl0 ||
>  dst 10.65.0.3 |   |   +---+|
>|\  |v
>  +---+   ''   +---+
>  |   Pod A   |  src 10.240.0.4|   Pod B   |
>  | 10.65.0.2 |  dst 10.240.0.5| 10.65.0.3 |
>  +---+  --+---+
>  src 10.65.0.2
>  dst 10.65.0.3

Can't you just use a tunnel between Host A and Host B and run BGP on top
of this tunnel?  It would seem to be cleaner than hacking multi-hop BGP to
obtain appriopriate next-hop values, unless I am missing something.

It would look something like this:

 +-|--+ ++
 | o Host A   | |   Host B   |
 || ++  ||
 |  10.240.0.4|-| Router |--|10.240.0.5  |
 || ++  ||
 |   10.65.0.4|--.  +---+   +---+ .->10.65.0.5   |
 ++   `>| tunlA |-->| tunlB |-  ++
+---+   +---+


The BGP session would be established between 10.65.0.4 (IP of host A on
tunlA) and 10.65.0.5 (IP of host B on tunlB), so that the routes learnt
via BGP would be immediately correct.

Basically, it's a simple overlay network.

> The diagram shows Pod A sending a packet to Pod B, using IP addresses that
> are unknown to the 'Router' between the two hosts.  Host A has an IP-in-IP
> device, tunl0, and a route that says to use that device for data to Pod B's
> address (10.65.0.3).  When the packet has passed through that device, it
> has a new outer IP header, with src 10.240.0.4 and dst 10.240.0.5, and is
> routed again according to the routing table - so now it can successfully
> reach Host B.
> 
> So how is BIRD involved?  We statically program the local Pod route on each
> host:
> 
> On Host A: 10.65.0.2 dev 
> On Host B: 10.65.0.3 dev 
> 
> then run a BIRD BGP session between Host A and Host B to propagate those
> routes to the other host - which would normally give us:
> 
> On Host A: 10.65.0.3 via 10.240.0.5
> On Host B: 10.65.0.2 via 10.240.0.4
> 
> But we don't want those normal routes, because then the data would get lost
> at 'Router'.  So we enhance and configure BIRD as follows.
> 
> - In the export filter for protocol kernel, for the relevant routes, we set
> an attribute 'krt_tunnel = tunl0'.
> 
> - We modify BIRD, as in the attached patches, to understand that that means
> that those routes should have 'dev tunl0'.
> 
> Then instead, we get:
> 
> On Host A: 10.65.0.3 via 10.240.0.5 dev tunl0 onlink
> On Host B: 10.65.0.2 via 10.240.0.4 dev tunl0 onlink
> 
> which allows successful routing of data be

Re: BIRD patches for IP-in-IP

2016-09-27 Thread Neil Jerram
And here are the patches :-)

On Tue, Sep 27, 2016 at 4:09 PM Neil Jerram  wrote:

> Hi BIRD users!
>
> Attached are 3 patches that my team has been using for routing through
> IP-in-IP tunnels, rebased on 1.6.1.  I'd like to explain why we find them
> useful, and start a conversation about whether they or something like them
> could be upstreamed (or perhaps if there's some better way of achieving our
> aims).
>
> Calico [1] uses BIRD for BGP routing between the hosts in various cloud
> orchestration systems (Kubernetes, OpenStack etc.), to distribute routes to
> the pods/VMs/containers in those systems, each of which has its own IP.  If
> all the hosts are directly connected to each other, this is
> straightforward, but sometimes they are not.  For example GCE instances are
> not directly connected to each other: there is at least one router between
> them, that knows about routing GCE addresses, and to/from the Internet, and
> we cannot peer with it or otherwise tell it how to route pod/VM/container
> IPs.  So if we use GCE to create e.g. OpenStack compute hosts, with Calico
> networking, we need to do something extra to allow VM-addressed data to
> pass between the compute hosts.
>
> One of our solutions is to use IP-in-IP; it works as shown by this diagram:
>
>10.65.0.3 via 10.240.0.5 dev tunl0 onlink
>default via 10.240.0.1
>|
>  +-|--+ ++
>  | o  | ||
>  |   Host A   | ++  |   Host B   |
>  ||-| Router |--||
>  | 10.240.0.4 | ++  | 10.240.0.5 |
>  ||---. ||
>  ++|++
>^   ^   +---v---+|
>  src 10.65.0.2 |   |   | tunl0 ||
>  dst 10.65.0.3 |   |   +---+|
>|\  |v
>  +---+   ''   +---+
>  |   Pod A   |  src 10.240.0.4|   Pod B   |
>  | 10.65.0.2 |  dst 10.240.0.5| 10.65.0.3 |
>  +---+  --+---+
>  src 10.65.0.2
>  dst 10.65.0.3
>
> The diagram shows Pod A sending a packet to Pod B, using IP addresses that
> are unknown to the 'Router' between the two hosts.  Host A has an IP-in-IP
> device, tunl0, and a route that says to use that device for data to Pod B's
> address (10.65.0.3).  When the packet has passed through that device, it
> has a new outer IP header, with src 10.240.0.4 and dst 10.240.0.5, and is
> routed again according to the routing table - so now it can successfully
> reach Host B.
>
> So how is BIRD involved?  We statically program the local Pod route on
> each host:
>
> On Host A: 10.65.0.2 dev 
> On Host B: 10.65.0.3 dev 
>
> then run a BIRD BGP session between Host A and Host B to propagate those
> routes to the other host - which would normally give us:
>
> On Host A: 10.65.0.3 via 10.240.0.5
> On Host B: 10.65.0.2 via 10.240.0.4
>
> But we don't want those normal routes, because then the data would get
> lost at 'Router'.  So we enhance and configure BIRD as follows.
>
> - In the export filter for protocol kernel, for the relevant routes, we
> set an attribute 'krt_tunnel = tunl0'.
>
> - We modify BIRD, as in the attached patches, to understand that that
> means that those routes should have 'dev tunl0'.
>
> Then instead, we get:
>
> On Host A: 10.65.0.3 via 10.240.0.5 dev tunl0 onlink
> On Host B: 10.65.0.2 via 10.240.0.4 dev tunl0 onlink
>
> which allows successful routing of data between the Pods.
>
>
> Thanks for reading this far!  I now have three questions:
>
> 1. Does the routing approach above make sense?  (Or is there some better
> or simpler or already supported way that we could achieve the same thing?)
>
> 2. If (1), would the BIRD team accept patches broadly on the lines of
> those that are attached?
>
> 3. If (2), please let me know if the attached patches are already
> acceptable, or otherwise what further work is needed for them.
>
> Many thanks,
> Neil
>
>
From bed50e27dd14aa98a89a2c9e0e7a63a87bcaa830 Mon Sep 17 00:00:00 2001
From: Shaun Crampton 
Date: Wed, 17 Jun 2015 16:14:41 -0700
Subject: [PATCH 3/3] Disable recursive route check for GCE compatibility.

GCE uses a /32 for VM IPs with a default gateway that is, thus,
off-subnet.  This patch removes a check in BIRD that prevents
BIRD from accepting such a route as a valid next hop.

The following trail asserts that the check is not really needed
but that it is normally a useful sanity check:

https:/

BIRD patches for IP-in-IP

2016-09-27 Thread Neil Jerram
Hi BIRD users!

Attached are 3 patches that my team has been using for routing through
IP-in-IP tunnels, rebased on 1.6.1.  I'd like to explain why we find them
useful, and start a conversation about whether they or something like them
could be upstreamed (or perhaps if there's some better way of achieving our
aims).

Calico [1] uses BIRD for BGP routing between the hosts in various cloud
orchestration systems (Kubernetes, OpenStack etc.), to distribute routes to
the pods/VMs/containers in those systems, each of which has its own IP.  If
all the hosts are directly connected to each other, this is
straightforward, but sometimes they are not.  For example GCE instances are
not directly connected to each other: there is at least one router between
them, that knows about routing GCE addresses, and to/from the Internet, and
we cannot peer with it or otherwise tell it how to route pod/VM/container
IPs.  So if we use GCE to create e.g. OpenStack compute hosts, with Calico
networking, we need to do something extra to allow VM-addressed data to
pass between the compute hosts.

One of our solutions is to use IP-in-IP; it works as shown by this diagram:

   10.65.0.3 via 10.240.0.5 dev tunl0 onlink
   default via 10.240.0.1
   |
 +-|--+ ++
 | o  | ||
 |   Host A   | ++  |   Host B   |
 ||-| Router |--||
 | 10.240.0.4 | ++  | 10.240.0.5 |
 ||---. ||
 ++|++
   ^   ^   +---v---+|
 src 10.65.0.2 |   |   | tunl0 ||
 dst 10.65.0.3 |   |   +---+|
   |\  |v
 +---+   ''   +---+
 |   Pod A   |  src 10.240.0.4|   Pod B   |
 | 10.65.0.2 |  dst 10.240.0.5| 10.65.0.3 |
 +---+  --+---+
 src 10.65.0.2
 dst 10.65.0.3

The diagram shows Pod A sending a packet to Pod B, using IP addresses that
are unknown to the 'Router' between the two hosts.  Host A has an IP-in-IP
device, tunl0, and a route that says to use that device for data to Pod B's
address (10.65.0.3).  When the packet has passed through that device, it
has a new outer IP header, with src 10.240.0.4 and dst 10.240.0.5, and is
routed again according to the routing table - so now it can successfully
reach Host B.

So how is BIRD involved?  We statically program the local Pod route on each
host:

On Host A: 10.65.0.2 dev 
On Host B: 10.65.0.3 dev 

then run a BIRD BGP session between Host A and Host B to propagate those
routes to the other host - which would normally give us:

On Host A: 10.65.0.3 via 10.240.0.5
On Host B: 10.65.0.2 via 10.240.0.4

But we don't want those normal routes, because then the data would get lost
at 'Router'.  So we enhance and configure BIRD as follows.

- In the export filter for protocol kernel, for the relevant routes, we set
an attribute 'krt_tunnel = tunl0'.

- We modify BIRD, as in the attached patches, to understand that that means
that those routes should have 'dev tunl0'.

Then instead, we get:

On Host A: 10.65.0.3 via 10.240.0.5 dev tunl0 onlink
On Host B: 10.65.0.2 via 10.240.0.4 dev tunl0 onlink

which allows successful routing of data between the Pods.


Thanks for reading this far!  I now have three questions:

1. Does the routing approach above make sense?  (Or is there some better or
simpler or already supported way that we could achieve the same thing?)

2. If (1), would the BIRD team accept patches broadly on the lines of those
that are attached?

3. If (2), please let me know if the attached patches are already
acceptable, or otherwise what further work is needed for them.

Many thanks,
Neil