Re: use TCP_NODELAY on TCP sockets?

2024-05-16 Thread Erin Shepherd
send(..., MSG_MORE)
send(..., 0)

Is roughly equivalent to
setsockopt(..., TCP_CORK, 1)
send(..., 0)
send(..., 0)
setsockopt(..., TCP_CORK, 0)

In cases where you know that you're going to write more but you want to be able 
to discard the user space buffer, MSG_MORE is useful. If holding onto the 
buffers is easy so they can then be passed to sendmsg(), then you may as well 
do that. 

(And of course you can combine both techniques)

- Erin

On Thu, 16 May 2024, at 01:32, Job Snijders wrote:
> On Wed, May 15, 2024 at 09:09:47PM +0200, Erin Shepherd wrote:
> > It seems absent from the BSDs, but on Linux you can pass the MSG_MORE
> > flag to send() to override TCP_NODELAY for a specific write
> 
> Am I understanding correctly this is a variant on TCP_NOPUSH/TCP_CORK?
> "more data is coming, dont push the send button yet!"
> 
> In OpenBGPD, TCP_NODELAY is set on the socket (a socket option available
> on all platforms, I think?), and then all data is coalesced into
> sendmsg(), no need for 'corking'. From my limited testing it seems a
> full routing table should fit in ~ TCP 41,000 packets.
> 
> BIRD has a code path sk_sendmsg()->sendmsg() called from
> sk_maybe_write(); but based my limited testing I'm not sure this path is
> followed in all cases, because I see way more than 41K packets for a
> full table feed (with TCP_NODELAY enabled).
> 
> Perhaps there are two separate questions here:
> 
> - are BGP messages (slightly) delayed because of TCP_NODELAY not being
>   set? (I think yes)
> - are BGP messages as efficiently coalesced into as few TCP packets as
>   possible? (with TCP_NODELAY set, I am not sure)
> 
> Kind regards,
> 
> Job
> 
> ps. To clarify why I started this thread: last week I fell into the TCP
> subsystem rabbit hole: why are things the way they are? I started
> auditing various programs related to my $dayjob and thought it would be
> good to open a conversation with the BIRD developer community. My goal
> is not necessarily to get this patch 'as-is' merged, but to learn from
> and with friendly and respected BGP developers.
> 

Re: use TCP_NODELAY on TCP sockets?

2024-05-15 Thread Erin Shepherd
It seems absent from the BSDs, but on Linux you can pass the MSG_MORE flag to 
send() to override TCP_NODELAY for a specific write

On Wed, 15 May 2024, at 19:40, Job Snijders via Bird-users wrote:
> Dear Marco,
> 
> On Wed, 15 May 2024 at 19:27, Marco d'Itri  wrote:
>> On May 15, Job Snijders via Bird-users  wrote:
>> 
>> > But please be very careful in considering this patch, because it does
>> > introduce some subtle changes in the on-the-wire behaviour of BIRD. For
>> > example, without this patch an UPDATE for a handful of routes and the
>> > End-of-RIB marker might end up in the same TCP packet (if this fits);
>> > but with this patch, the End-of-RIB marker ends up in its own TCP
>> > packet. As things are today, setting TCP_NODELAY will increase the
> 
>> 
>> I think that this can be fixed easily with TCP_CORK.
>> Basically, with TCP_NODELAY + TCP_CORK you can have the optimal balance 
>> of latency and overhead without using writev.
> 
> 
> Yes, thanks for sharing this suggestion.
> 
> Note that TCP_CORK is a Linux-only feature (on FreeBSD it seems aliased to 
> TCP_NOPUSH, but I might be misunderstanding that code). There are subtle 
> differences between NOPUSH and CORK: resetting CORK triggers a flush, whereas 
> resetting NOPUSH does not.
> 
> It is of course reasonable to optimize one platform and not others, we work 
> with the tools that we have - but a portable approach (using writev()?) would 
> seem attractive to me :-)
> 
> Kind regards,
> 
> Job
>> 

Re: babel RTT metric false samples

2024-04-13 Thread Erin Shepherd
I guess it might not fit with bird's abstractions (or perhaps the Babel 
protocol), but has thought been given to using SO_TIMESTAMPING to have the 
kernel compute TX/RX timestamps? 

- Erin


On Sat, 13 Apr 2024, at 16:14, Maria Matejka via Bird-users wrote:
> Hello Stephanie, Toke and list,
> 
> On Fri, Apr 12, 2024 at 04:22:50PM +0200, Toke Høiland-Jørgensen via 
> Bird-users wrote:
> 
>> Stephanie Wilde-Hobbs via Bird-users bird-users@network.cz writes:
>> 
>>> The babel RTT metric measurements provided by bird appears suspect for my 
>>> setup. The metric through a tunnel with a latency of about 5ms is shown in 
>>> babel as 150+ms.
>>> 
> […]
> 
>>> Debug logs show many RTT samples with approximately correct timestamps 
>>> (4-6ms) then the occasional IHU with 800-1200ms calculated instead. 
>>> Calculating the RTT metric by hand using babel packet logs shows that the 
>>> calculations are correct. By correlating two packet dumps (the machines 
>>> have <1ms NTP timekeeping) I can also see that the packets for which high 
>>> RTT is calculated have similar transit times through the tunnel as other 
>>> packets. Hence, I suspect the accuracy of the packet timestamps recorded by 
>>> bird. Is the current packet timestamping system giving correct timestamps 
>>> if the packet arrives while babel is processing another event?
>>> 
>> Hmm, so Babel implementation in Bird tries to get a timestamp as early as 
>> possible after receiving the packet, and set it as late as possible before 
>> sending out the packet. However, the former in practice means after 
>> returning from poll(), so if the packet has been sitting around in the OS 
>> buffer for a while before Bird gets around to process it, the timestamp is 
>> not set until Bird is done processing it. Likewise, if the packet sits 
>> around in a socket buffer (or in a lower-level buffer on the sending side) 
>> after Bird has sent it out, that time will also be counted as part of the 
>> RTT.
>> 
> I would suspect that the kernel table prune routine may be the case. It just 
> runs from begin to end synchronously.
> 
> I have just fast-tracked Babel in its own thread for BIRD 3, it may be worth 
> checking. (There should be also artifacts from the build process for download 
> available.) This should get you rid of most of the cases of suspiciously high 
> RTT.
> 
> `https://gitlab.nic.cz/labs/bird/-/tree/babel-in-threads`
> Just to be noted, updating a route in BIRD 3 is still a locking process so it 
> may still tamper the RTT measurements. At least it should happen only in 
> cases where Babel is doing the update. Anyway, with BIRD 3 internals, it 
> should be possible to easily *detect* such situations and disregard these 
> single measurements as unreliable. (Not implemented, though.)
> 
> There are even some thoughts on implementing lockless import queues for 
> routing tables, yet now we have to prioritize BIRD 3 stabilization to 
> actually release it as a stable version. Import queues must wait.
> 
> Also with this testing, feel free to report any weird behavior, notably 
> crashes of BIRD 3, as bugs. That would be very helpful with stabilizing BIRD 
> 3. Thanks a lot!
> 
> Maria
> 
> – Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
> 

Re: Take Specific Value Inside BGP Community

2024-02-19 Thread Erin Shepherd
It's a bit non-obvious, but you can do this with a loop. For example we 
basically replicate the Euro-IX routeserver announcement control communities 
inside our network, and we use the following function to translate them when 
exporting routes to a (supporting) route server

# Translate Euro-IX common communities for use by a route server
function translate_routeserver_communities(
  int dest_asn
)
{
  lclist announce_controls = filter(bgp_large_community, [(OURAS, 0..1, *)]);
  for lc c in announce_controls do {
bgp_large_community.add((dest_asn, c.data1, c.data2));
  }

  lclist prepends = filter(bgp_large_community, [(OURAS, 101..103, *)]);
  for lc c in prepends do {
bgp_large_community.add((dest_asn, c.data1, c.data2));
  }
}

(I think we could combine those into one with a filter(bgp_large_community, 
[(OURAS, 0..1, *), (OURAS, 101..103, *)])? This code originally looked slightly 
different and at the time combining these wasn't posible.

Note that this doesn't strip the input communities - we do that as a separate 
step later where we strip all non-informational communities

- Erin

On Tue, 20 Feb 2024, at 00:30, Ilham Maulana wrote:
> Hi,
> 
> 
> Is it possible in bird to copy specific value inside BGP Community?
> 
> Example:
> 
> Route 1, Community (a,b,*c*) -> inbound
> Route 2, Community (a,b,*d*) -> inbound
> --
> Route 1, Community (k,l,*c*) -> outbound
> Route 2, Community (k,l,*d*) -> outbound
> 
> The specific value I want to preserve is c and d, and it is dynamic variable.
> 
> whatever c and d inbound, copy to c and d outbound.
> 
> Thanks.
> Ilham
> 5f622eb5-c189-4e05-8584-e4a1d2071a6b


Re: Overloading RTR to load IRR (Was: Defines for mixed IPv6/IPv4)

2024-01-25 Thread Erin Shepherd
Spitballing slightly here, but could you avoid this problem by adding 
0.0.0.0/0+ ::0/0+ AS0 RoAs to the table and accepting ROA Unknowns?

Obviously the disadvantage here is that if your IRR RTR server goes down you're 
basically unfiltered, but it at least avoids the availability problem

- Erin 

On Thu, 25 Jan 2024, at 11:41, Job Snijders via Bird-users wrote:
> On Thu, Jan 25, 2024 at 11:13:51AM +0100, Jeroen Massar via Bird-users wrote:
> > a quick stab at generating the slurm file:
> 
> why use SLURM though? SLURM doesn't have a 'maxLength' field like the
> regular JSON input formatted in this style has:
> https://console.rpki-client.org/rpki.json - which might help with
> aggregation.
> 
> More importantly, a risk I perceive with overloading RTR functionality
> to load IRR data into routers is in the realm of fail-safes:
> 
> For RPKI-derived data, most ISPs do something along the lines of:
> 
>if (roa_check(rpki, net, bgp_path.last) = ROA_INVALID) then reject;
> 
> For IRR-derived data, you'd have to do:
> 
>if (roa_check(irr, net, peerasn) != ROA_VALID) then reject;
> 
> The above means that suddenly your EBGP routers/route servers have a
> very hard dependency on the IRR RTR session being up in order to accept
> routes (fail closed), whereas how the RPKI-derived data is used is in a
> 'fail open' fashion.
> 
> The above friction goes back to RPKI ROAs being defined as "if ROAs
> exist, all BGP routes that do not match any of the ROAs are invalid"
> (following the RFC 6811 algorithm), but for IRR route/route6 objects
> such a definition was never established, those predate the RFC 6811
> algorithm.
> 
> Kind regards,
> 
> Job
> 

Re: [Babel-users] [RFC] Replace WireGuard AllowedIPs with IP route attribute

2023-11-18 Thread Erin Shepherd
On Sat, 18 Nov 2023, at 03:19, Daniel Gröber wrote:
> Hi Alexander,
> 
> On Thu, Nov 09, 2023 at 12:57:26PM +0100, Alexander Zubkov wrote:
> > But as I understood the technology, it works only in one way (for
> > outgoing packets) and the decapsulation should be processed separately,
> > for example in case of VXLAN and MPLS they have their own tables.
> 
> That would be a problem as I specifically want to tie the source address
> filtering to this too. I'll have a look at the internals (if and) when I
> get around to starting work on this.

Is tying source address filtering to the routing table the right thing to do 
here? It seems to me that it would cause issues similar to those we see more 
generally with Unicast Reverse Path Filtering

Bird considers the VRF interface to be outside the VRF

2023-07-18 Thread Erin Shepherd
Bird only treats the interfaces enslaved to the VRF as part of the VRF, but not 
the VRF virtual interface itself. This means that e.g. OSPF won't pick up 
loopback addresses defined on the VRF interface itself. You have to 
additionally add a dummy interfaces with the IPs attached, which seems to cause 
some confusion of its own on the kernel side.

Ideally the VRF interfaces would be considered to be in the VRF.

I've attached a patch which fixes this; I don't think the design is quite 
right, and its possible I introduced some bugs, but in testing it seems to work 
fine

- Erin

0001-Treat-the-VRF-interface-as-inside-the-VRF.patch
Description: Binary data


[PATCH] Graceful shutdown request signal

2023-07-17 Thread Erin Shepherd
I ended up picking this up and implementing it, and in my experiments it seems 
to be working fine

Patch attached

- Erin

On Tue, 20 Jun 2023, at 21:07, Maria Matejka via Bird-users wrote:
> Hello!
> 
> Well, it's a pity that systemd doesn't allow for custom operations – in such 
> case you could call "systemd graceful bird2" or "systemd restart bird2"…
> 
> Anyway, feel free to implement it, it should be like 10 lines of code. 
> Sigusr2 is probably ok. Then it'll be on anybody to choose whether to restart 
> gracefully or hardly by systemd.
> 
> Maria
> 
> 
> On 20 June 2023 20:20:50 CEST, Erin Shepherd  
> wrote:
>> Hello,
>> 
>> I run bird on a system which uses systemd as a service supervisor, and would 
>> like to implement graceful restart in a way which works well with it.
>> 
>> In particular, what I'd like to do is:
>>  • If I restart the bird service (e.g. for an upgrade), Bird performs a 
>> graceful restart
>>  • If I manually stop bird (systemctl stop bird2), shutdown the system, or 
>> any other action which is liable to cause a longer period of downtime, 
>> perform a graceful restart (for BGP at least)
>> Ideally systemd would have an ExecStopRestart option which could be used to 
>> distinguish between stops and restarts, but unfortunately it doesn't (I 
>> might submit a feature request for this regardless). However, it does 
>> support `KillSignal and RestartKillSignal options.`
>> 
>> `So therefore I was wondering how the Bird developers and users would feel 
>> about introducing a signal (SIGUSR2 maybe?) which can be used to request 
>> that Bird perform a graceful restart?`
>> 
>> `- Erin`
> -- 
> Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.


0001-Add-support-for-graceful-restart-by-signal-SIGUSR2.patch
Description: Binary data


Graceful shutdown request signal

2023-06-20 Thread Erin Shepherd
Hello,

I run bird on a system which uses systemd as a service supervisor, and would 
like to implement graceful restart in a way which works well with it.

In particular, what I'd like to do is:
 • If I restart the bird service (e.g. for an upgrade), Bird performs a 
graceful restart
 • If I manually stop bird (systemctl stop bird2), shutdown the system, or any 
other action which is liable to cause a longer period of downtime, perform a 
graceful restart (for BGP at least)
Ideally systemd would have an ExecStopRestart option which could be used to 
distinguish between stops and restarts, but unfortunately it doesn't (I might 
submit a feature request for this regardless). However, it does support 
`KillSignal and RestartKillSignal options.`

`So therefore I was wondering how the Bird developers and users would feel 
about introducing a signal (SIGUSR2 maybe?) which can be used to request that 
Bird perform a graceful restart?`

`- Erin`