Re: Scaling BFD support

2022-06-24 Thread Matthew Walster
On Fri, 24 Jun 2022, 22:34 Mikhail Grishin,  wrote:

>
>
> Arnold Nipper пишет 24.06.2022 12:32:
> > On 23.06.2022 23:41, Douglas Fischer wrote:
> >> Sincerely, what caught my attention was the "Auth: none" part.
> >> On a room with more than thousand persons, confirm if the voice you
> >> rear is really from the person you think it is makes sense to me.
> >>
> >
> > Well, on an IX LAN, you should know how is talking to you ;-) Requring
> > auth doesn't add any security IMO.
>

Not to mention it only affects BFD, not the BGP session it supports. You
aren't affecting anything of value by targeting unauthenticated BFD.

It also up for customers wishes. We provide selective BFD timers.
> Some of IXP members local , some 1000+ kilometers away. Some "requires"
> sub-second failure detection (you need to think about your
> infrastructure to support this).
>

Those people are silly. Sub-second failure detection is fine when you're
talking about an MPLS tunnel with precomputed secondary paths or fast
reroute, but this is BGP. Your network is very unlikely to reconverge in
under a second after a BGP session goes down if there are more than a
handful of prefixes, as everything has to recalculate best routes etc.

But hey, it probably fixes *someone's* use case...

M

>


Re: Bird setting TTL to 1 at the end of a passive BGP session opening

2022-04-01 Thread Matthew Walster
The setup of the TCP session is handled by the kernel, hence the higher
TTL. Once TCP is established, (e)BGP tends to use a TTL of 1 unless it's a
multihop session.l, or you're using GTSM.

It's expected, and is partly due to the limitations of how sockets are
implemented in Linux.

M

On Fri, 1 Apr 2022, 03:10 Rumen Telbizov, 
wrote:

> Hello Bird users,
>
> First time poster and new subscriber.
> I noticed something strange and wanted to report it here in case this is
> in fact a bug that deserves attention.
>
> I run bird 2.0.7-4.1 on Debian 11.
> I have a BGP section configured as passive that acts as a TCP health-check
> endpoint.
>
> It is as follows:
> *--- cut --*
> protocol bgp HEALTHCHECKv4 {
> hold time 6;
> startup hold time 20;
> connect delay time 3;
> connect retry time 6;
> error wait time 3, 12;
> passive on;
>
> local 100.64.0.5 as 65000;
> neighbor 100.64.0.4 as 65535;
> }
> *--- cut --*
>
>
> What ends up happening on the wire is this:
> *--- cut --*
> 23:15:09.443792 IP (tos 0x0, ttl 254, id 4040, offset 0, flags [DF], proto
> TCP (6), length 60)
> 100.64.0.4.16141 > 100.64.0.5.179: Flags [S], cksum 0xa78f (correct),
> seq 723435095, win 8961, options [mss 8621,sackOK,TS val 3290475421 ecr
> 0,nop,wscale 0], length 0
> 23:15:09.443823 IP (tos 0xc0, ttl 255, id 0, offset 0, flags [DF], proto
> TCP (6), length 60)
> 100.64.0.5.179 > 100.64.0.4.16141: Flags [S.], cksum 0xc8b7 (incorrect
> -> 0x0785), seq 124371865, ack 723435096, win 62643, options [mss
> 8961,sackOK,TS val 2210037294 ecr 3290475421,nop,wscale 7], length 0
> 23:15:09.37 IP (tos 0x0, ttl 254, id 4041, offset 0, flags [DF], proto
> TCP (6), length 52)
> 100.64.0.4.16141 > 100.64.0.5.179: Flags [.], cksum 0x2550 (correct),
> seq 1, ack 1, win 8961, options [nop,nop,TS val 3290475422 ecr 2210037294],
> length 0
> 23:15:09.71 IP (tos 0x0, ttl 254, id 4042, offset 0, flags [DF], proto
> TCP (6), length 52)
> 100.64.0.4.16141 > 100.64.0.5.179: Flags [F.], cksum 0x254e (correct),
> seq 1, ack 1, win 8961, options [nop,nop,TS val 3290475423 ecr 2210037294],
> length 0
> 23:15:09.444576 IP (tos 0xc0, *ttl 1*, id 55411, offset 0, flags [DF],
> proto TCP (6), length 99)
> 100.64.0.5.179 > 100.64.0.4.16141: Flags [P.], cksum 0xc8de (incorrect
> -> 0x58b6), seq 1:48, ack 2, win 490, options [nop,nop,TS val 2210037294
> ecr 3290475423], length 47: BGP
> Open Message (1), length: 47
>   Version 4, my AS 65000, Holdtime 6s, ID 100.64.0.5
>   Optional parameters, length: 18
> Option Capabilities Advertisement (2), length: 16
>   Route Refresh (2), length: 0
>   Graceful Restart (64), length: 2
> Restart Flags: [none], Restart Time 120s
> 0x:  0078
>   32-Bit AS Number (65), length: 4
>  4 Byte AS 65000
> 0x:   fde8
>   Enhanced Route Refresh (70), length: 0
> no decoder for Capability 70
>   Long-lived Graceful Restart (71), length: 0
>
> 23:15:09.444602 IP (tos 0xc0, *ttl 1*, id 55412, offset 0, flags [DF],
> proto TCP (6), length 52)
> 100.64.0.5.179 > 100.64.0.4.16141: Flags [F.], cksum 0xc8af (incorrect
> -> 0x4635), seq 48, ack 2, win 490, options [nop,nop,TS val 2210037294 ecr
> 3290475423], length 0
> 23:15:09.444670 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto
> ICMP (1), length 56)
> 100.64.0.4 > 100.64.0.5: ICMP time exceeded in-transit, length 36
> IP (tos 0xc0, ttl 1, id 55411, offset 0, flags [DF], proto TCP (6),
> length 99)
> 100.64.0.5.179 > 100.64.0.4.16141:  [|tcp]
> *--- cut --*
>
> As you can see the TTL on our packets is initially set to 255. At the end
> of the connection
> during the last PUSH and FIN packets all of a sudden bird sets the TTL to
> 1.
>
> I have no ttl security enabled and even if I explicitly disable it the
> problem persists.
> A workaround that I found to work is to pass multihop 20 directive which
> then changes the ttl 1 above to ttl 20
> which alleviates the problem.
>
> Let me know if you need any additional information.
>
> Regards,
> --
> Rumen Telbizov
> Site Reliability Engineer 
>


Re: BIRD router/route server functions

2018-06-05 Thread Matthew Walster
On Tue, 5 Jun 2018 at 21:04, Rae Ho (ITSC)  wrote:

> Seems the problem is domain name?
>

​No, I think either you've got a firewall (iptables etc) running and
blocking tcp/179, or you haven't put "listen bgp" into your configuration,
so bird is not listening on tcp/179.

What is the output of "netstat -lnt"?

M​


Re: about the bgp route reflector problem?

2017-11-02 Thread Matthew Walster
Maybe I'm missing something here, but in the examples you show,
120.26.0.0/18 and 122.72.90.78/32 have not been reflected? Route Reflection
deals with iBGP only -- eBGP to iBGP does not need a route reflector. Only
the attributes learned on the iBGP session from R4 will be reflected (and
have the originator and cluster attributes set) to R1.

Does that make sense?

Matthew Walster

On 2 November 2017 at 11:30, 曾小小  wrote:

> about the bgp route reflector problem?
> Why does the reflector client receive EBGP routing entries without
> attributes??
>
>
> my topology is shown below:
>
> R3 (as 100)
> |
>  |   (ebgp)
> |
> |
>  |( ibgp)
> R1 (RR)R4 (client) (as 200)
> |
>  |
>  |
> |   ( ibgp )
>  |
> |
> R2 (client)(as 200)
> My configuration is as follows:
>
> ===The R1 (RR) configuration is as follows:
>
> #ebgp parts
>
> protocol bgp bgp_pa_r3 {
> description "ebgp-pa-r3";
> multihop 10;
> table tab_pa_adsl;
> igp table tab_ospf_10;
> local as 200;
> neighbor 192.168.1.1 as 100;
> source address 192.168.1.2;
> import all;
> export all;
> next hop self;
> default bgp_local_pref 5;
> }
>
> #rr parts
> template bgp rr_client {
> description "ibgp-rr1";
> local 192.168.2.1 as 200;
> multihop;
> rr client;
> rr cluster id 1.1.1.1;
> }
>
> protocol bgp bgp_pa_r2 from rr_client {
> debug all;
> enable route refresh on;
> table tab_rr_1;
> igp table tab_ospf_10;
> neighbor 192.168.2.2 as 200;
> export all;
> import all;
> }
>
> protocol bgp bgp_pa_r4 from rr_client {
> table tab_rr_1;
> igp table tab_ospf_10;
> neighbor 192.168.2.4 as 200;
> export all;
> import all;
># next hop self;
> }
>
> ==The R2 client configuration is as follows:
>
> protocol bgp bgp_pa_r2 {
> router id 192.168.2.2;
> debug all;
> #   debug { states,interfaces,events };
> description "ibgp-rr1";
> import all;
> export all;
> local as 200;
> neighbor 192.168.2.1 as 200;
> source address 192.168.2.2;
> next hop self;
> }
>
> ==
>
> Check the route of R2 and find that the route entry received R4 has the
> BGP.originator_id and BGP.cluster_list attributes,
>  but the entry received from R1 does not have this attribute. why?
>
>  Thank you very much for your help!! thanks!!
>
>
> The route entries viewed by R2 are as follows:
>
> bird> show route protocol bgp_pa_r2 all
> 1007-121.52.236.16/32
> 1008- Type: BGP unicast univ
> 1012- BGP.origin: IGP
>   BGP.as_path:
>   BGP.next_hop: 192.168.2.4
>   BGP.local_pref: 100
>  * BGP.originator_id: 192.168.2.4*
> *  BGP.cluster_list: 1.1.1.1*
> 1007-116.211.98.20/32
> 1008- Type: BGP unicast univ
> 1012- BGP.origin: IGP
>   BGP.as_path:
>   BGP.next_hop: 192.168.2.4
>   BGP.local_pref: 100
>*   BGP.originator_id: 192.168.2.4*
> *  BGP.cluster_list: 1.1.1.1*
> 1007-120.26.0.0/18
> 1008- Type: BGP unicast univ
> 1012- BGP.origin: IGP
>   BGP.as_path: 100
>   BGP.next_hop: 192.168.1.1
>   BGP.local_pref: 5
>
> 1007-122.72.90.78/32
> 1008- Type: BGP unicast univ
> 1012- BGP.origin: IGP
>   BGP.as_path: 100
>   BGP.next_hop: 192.168.1.1
>   BGP.local_pref: 5
>
>


Re: Version 1.6.2

2017-08-31 Thread Matthew Walster
Ondrej,

On 11 July 2017 at 13:43, Ondrej Zajicek  wrote:

> On Tue, Jul 11, 2017 at 01:06:28PM +0200, Job Snijders wrote:
> > Hi all,
> >
> > Apologies for bumping up an old thread
> >
> > Can I maybe help testing this feature?
> >
> > The use case is that I operate a large public BGP Looking Glass for the
> > NLNOG RING project at http://lg.ring.nlnog.net/summary/lg01/ipv4 BIRD is
> > used as the collector as it is quite fast and memory efficient.
>
> Hi
>
> Thanks for reminding me. We have some outdated branch with MRT table dump
> code that i forgot to review and merge to the master branch. I will do
> that ASAP.


​Can I quietly bump, please =)

Matthew Walster​


Re: TCP md5 authentication failures for almost on all the server's BGP peering

2017-08-23 Thread Matthew Walster
Harish,

On 22 August 2017 at 09:24, Harish Shetty  wrote:

> I am using bird-1.4.5-1.el6,
>

​That release is more than 3 years old at this point, bird-1.6.3 was
released 2016-12-22 and is probably your best bet to try that first and see
if the problem is fixed.

M​


EBGP Multihop TTL vs. TCP accept()

2017-08-16 Thread Matthew Walster
Hello,

I've just spent a while trying to debug a BGP session not coming up with
bird, it turns out I had mistakenly set the multihop setting to "1" instead
of "2" -- my fault.

However, the reason it took me so long to realise this error was because
the TCP session is established, with default connection parameters, THEN
the TTL limit is assigned to the skb. As far as I'm aware, with TCP sockets
in both Linux and BSD, the connections are already established when
accept() returns a new connection.

As I understand it, that's why inbound connections can't be filtered by the
incoming source address either until the TCP session is already
established, which usually means iptables/pf rules on the box synced with
whoever the configured peers are. Which tends to lead to log spam when
[known security company] does yet another scan of global address space for
open BGP neighbours and fires a bunch of alerts...

This may be a case of "that's just what you have to live with" but I was
wondering if there's an alternative? Obviously libwrap0 (tcp-wrappers)
won't help here because that also just processes a rule-set after accept()
-- it's presumably something that needs to be addressed in the kernel. In
any case, you can't set the TTL on a listen() for the SYN-ACK that is
returned on a SYNC as far as I'm aware, unless you modify the default TTL
for all TCP connections, you can't even set it with "ip route" or "route"
on a per-destination basis.

How do other people handle this situation? Do they create
iptables/pf/whatever rules dynamically generated from their bird.conf for
neighbor access control? How about ebgp multihop -- or does everyone just
set it to 16/64 and forget about it? Or do people largely not care?

M


Re: AUTOMATIC INCOMING FILTERING

2017-08-10 Thread Matthew Walster
On 10 August 2017 at 16:27, Janvier Rwakagabo 
wrote:
>
> Has anyone automated prefix filtering, for example if a peer acquire a new
> prefix to be received automatically may be any IRR, you can share the
> working configuration.
>

​Janvier,

Yes, there are many ways of doing this. In the past, I've used things like
https://github.com/snar/bgpq3 and I've been playing around with my own
version too: https://github.com/dotwaffle/prefixlister​

Essentially, you run those tools periodically with the ASN or AS-SET you
want to generate the prefixes for, saving the output to a file. You then
include that file from within your main bird.conf and specify that prefix
set within your policy.

Be warned, though: While the RIPE region generally has very good IRR
listings (route/route6 objects) things aren't so good in other RIRs -- many
North American networks register at RADB, as do other regions if there
isn't a nice IRRDB available at their RIR, but especially in regions like
Asia and South America you will find a very low takeup of RPSL entries in
an IRRDB.

If you choose to peer with a network that does not have route objects
covering all of it's networks, you would do very well to at the very least
implement a prefix-limit on the BGP session, that stays "hard down" if it
is tripped.

Matthew Walster


Re: Hardware requirements for BIRD

2017-03-07 Thread Matthew Walster
On 7 March 2017 at 05:57, Clément Guivy  wrote:

> Hello, I am considering the setup of BIRD as a router to handle our
> internet traffic. One information I fail to find is hardware requirements.
>

​Clément,

Let's just clear one thing up straight away -- BIRD is a daemon for routing
protocols, not for routing traffic itself. BIRD itself will handle your
requirements in terms of the BGP information incredibly well. As I
understand it, BIRD only utilises one CPU core, but this is not the
bottleneck factor here.

When the FIB has been calculated, it is usually exported to your kernel
(we'll assume Linux for now) via Netlink messages. Depending on how
efficient your kernel is at building the trie structure, this may actually
take more time than processing the BGP Updates!

Once the routes are loaded into the kernel, it is the kernel (usually) that
forwards the traffic. This is usually (roughly) proportional to the
performance of your processor. You will probably have to make iptables
changes to prevent that restricting the performance at high levels.

That said, 1Gbps of IMIX traffic should easily be forwarded by any modern
x86-like server out there. Just be aware that it will be more susceptible
to small-packet attacks due to the lower packet-per-second throughput
compared to routers you may be used to.

Hope that helps!

Matthew Walster


Re: full route table

2017-01-19 Thread Matthew Walster
On 19 January 2017 at 13:14, Ondrej Zajicek  wrote:

> There is no direct way to import MRT data to BIRD. You could use some
> script
> to parse MRT data and convert it to static route definitions.
>

​In the past, when seeking to do this I've used bgpdump[0], bgpsimple[1],
and/or imported it into exabgp with mrtparse[2].

Hope they help,

M


[0] https://bitbucket.org/ripencc/bgpdump/wiki/Home
[1] https://code.google.com/archive/p/bgpsimple/wikis/README.wiki
[2] https://github.com/t2mune/mrtparse​


Optimizing large BGP -> Linux kernel additions/removals

2015-08-17 Thread Matthew Walster
When a BGP sessions comes up/down, there is a period of high CPU while all
the new best-routes are computed -- this is understandable.

However, on boxes I have where this table is then promoted to the Linux
kernel, the CPU usage stays high for some considerable time. For a full
(~600,000 prefix)

>From the looks of things, a bunch of netlink messages are being generated
to the kernel to RTM_DELROUTE then RTM_NEWROUTE -- each of which is causing
the kernel's trie to rebalance which is a fairly costly operation.

I was wondering if anyone had experimented with anything such as
implementing either:

1. a corking mechanism (i.e. stop balancing the trie until a signal is sent
to uncork)

2. fib_trie garbage collection (i.e. only rebalance the trie once per time
interval)

3. "double buffering" (i.e. for a given operation such as protocol flap,
memcpy the trie, perform operations, then update the root node pointer to
the new optimised trie)

Any and all of these ideas may be horrific, I'm just interested whether
anyone's running full tables in linux, filled by (e.g.) BIRD, and have
encountered this issue.

Unfortunately there doesn't appear to be an RTM_CHANGE or similar in Linux,
so the DELROUTE will seemingly cause a tree to either be pruned or
re-branched, followed by the NEWROUTE causing a full rebalance run --
whereas a CHANGE would (could) hopefully just over-write the value.

Many thanks in advance,



Matthew Walster