Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
On Thu, Apr 14, 2022 at 08:58:53PM +0200, Toke Høiland-Jørgensen wrote: > Side note: why is bird replacing all the routes in the first place? :) FYI: I figured this out in the end. Turns out I had the BGP sessions in bird configured as `multihop 1`/direct with global scope addressess that babel would (sometimes) re-route indirectly. Every time that happens the BGP session would then break :) I use link-locals now, much better. Doesn't really make sense to re-route the BGP session endoints haha. --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Hi Toke, so I was debugging not seeing a performance improvement by your patch see below On Fri, Apr 15, 2022 at 12:48:05AM +0200, Toke Høiland-Jørgensen wrote: > diff --git a/kernel_netlink.c b/kernel_netlink.c > index efe1243c3b07..36aae29124a5 100644 > --- a/kernel_netlink.c > +++ b/kernel_netlink.c > @@ -236,7 +236,7 @@ static int nl_setup = 0; > static int > netlink_socket(struct netlink *nl, uint32_t groups) > { > -int rc; > +int rc, strict; > int rcvsize = 512 * 1024; > > nl->sock = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE); > @@ -271,6 +271,9 @@ netlink_socket(struct netlink *nl, uint32_t groups) > perror("setsockopt(SO_RCVBUF)"); > } > } > +rc = setsockopt(nl->sock, SOL_NETLINK, NETLINK_GET_STRICT_CHK, &strict, > sizeof(strict)); You're using `strict` uninitialized here. If I set strict = 1 it works though :D --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Hi Toke, On Fri, Apr 15, 2022 at 12:48:05AM +0200, Toke Høiland-Jørgensen wrote: > Poked a bit more into the kernel fib code; the more I'm looking at it, > the more I'm convinced that the contention between add and dump is a > fundamental feature of the way the routing table is implemented, so I'm > not so sure it's simply a "bug" that can be "fixed" :( Hmm, so do you think we should still send a report to netdev then? > > That is a good question, bird should really be able to see that the route > > is already installed and just don't bother. I see this del/add behaviour > > even when bgp is otherwise nice and converged though so I assumed bird is > > just like this. > > Hmm, that's odd. What's your "background radiation" (i.e. route updates > per second when Bird is running normally and babeld isn't started? I > just checked my own router (which also imports a full v6 table), and > that churns less than one route per second. So if you're seeing a lot of > churn, maybe it's something in your config that could be fixed? No, no, it's also just a couple routes per second here too. > Alternatively, an option could be to improve Bird's performance when > replacing routes; for one thing, there's this comment in bird's > netlink.c: Right if they don't quite trust the kernel that could explain the add/del behaviour then. > I've been meaning to look into adding nexthop support to Bird anyway, so > this could be a nice occasion to bump that up my list. Don't take that > as a promise, though... :P I'm not sure how nexthop objects would help with this problem specifically? If bird doesn't trust the kernel even if it can update the nexthop directly it can't necessarily trust the other route attributes are right either and so would still have to replace the FIB entry. > > As I said before it always triggers when I (re)start babeld but I can't see > > anything obvious in the log even with debug on as to why. Particularily I > > don't see any bgp state events so the sessions should be fine but for some > > reason it decides to churn everything anyway. > > Well, the trigger when starting babeld would be the initial route dump, > I suppose: If you have lots of route churn happening in the background, > the drop in insert performance caused by the dump would be your trigger, > no? To clarify: when I start babeld bird is not yet churning just doing background level updates but the act of starting babeld seems to somehow make bird start churning routes soon after. I don't think the route dump alone should/could make bird do anything to it's route otherwise a iproute2 dump would likely also do it. I can tell bird is starting to churn because its CPU usage goes up to 100% (most in the kernel). It's pretty mystifying how this could be connected to be sure, I'll have to do more testing. > One thing I noticed when playing around with your reproducer example, > which may be something we could apply to the babeld case: If I run 'ip > -6 route show table 1337' I get the slowdown, but if I just run a > regular 'ip -6 route show', I do not. This seems to be because iproute2 > is adding the table to the route dump request, which will make the > kernel dump only the requested table. And since the lock that's being > contended is per table, that should nicely get rid of the contention. A > patch to do this is included below (only compile-tested, so no idea if > it'll actually work :)). Ah this is excellent, thanks! I was wondering if the kernel keeps the tables in separate datastructures or not. Your patch seems to be against a different babeld branch than what I have (can't see any CHANGE_RULE stuff here) but removing that bit it applies fine. I just tested it and it does indeed seem to work, however I think we also need to make bird use table specific dumps since I'm still seeing the slowdown and it doesn't seem to set rtm_table in nl_request_dump_route either. I'll get on that. --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
On Thu, Apr 14, 2022 at 08:58:53PM +0200, Toke Høiland-Jørgensen wrote: > Yeah, I do. He's also one of the maintainers of the routing code, so > definitely the right person to Cc on this (explicitly Cc'ing maintainers > makes sure they see your email as not everyone follows netdev > rigorously). Kk :) > Ah, okay, that's interesting. Playing around with your examples, on my > laptop the performance goes from ~90k/s to ~1k/s when doing just a > single 'ip -6 route show table 1337'. The dump itself takes between 5-10 > seconds, so with the 30-sec interval in babeld I guess the periodic dump > can coincide with the update at random. > > Side note: why is bird replacing all the routes in the first place? :) That is a good question, bird should really be able to see that the route is already installed and just don't bother. I see this del/add behaviour even when bgp is otherwise nice and converged though so I assumed bird is just like this. As I said before it always triggers when I (re)start babeld but I can't see anything obvious in the log even with debug on as to why. Particularily I don't see any bgp state events so the sessions should be fine but for some reason it decides to churn everything anyway. > > I'm currently working on babel ECMP support in bird though maybe I'll > > have a stab at RTT after that. > > On the subject of ECMP and Babel, you may want to read this thread: > https://mailarchive.ietf.org/arch/msg/babel/i4tqsRIL3DS9e22GJ0QuoMef-P0/ > > I.e., it's not just a matter of writing the code we'll also need to > define the semantics in the spec. Just so you know what you're getting > yourself into ;) Interesting thread, thanks. I think for my use-case the loop avoidance point is moot though since I'm mainly interested in using this on endpoints, not routers. So perhaps calling this ECMP is not the right nomenclature? --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Hi Toke, On Thu, Apr 14, 2022 at 12:12:36AM +0200, Toke Høiland-Jørgensen wrote: > How about submitting this report to netdev and asking for advice there? > From a quick glance at the kernel fib code, this does not look like it's > an easy fix (if it can be fixed at all), but we should really get > someone who is an expert in the kernel routing code (which I'm not, > sadly) to weight in. You could add an explicit Cc to David Ahern > when doing submitting, and please keep me in Cc as > well. Or if you'd prefer, I can submit the report on your behalf? I'll try to get around to that but no promises :) Do you know David? I don't like just CCing people I don't know at random. > As for why you're seeing this in particular when Babel is running, now > that we know the route dump is the culprit, it's quite obvious: While > Babel listens for new route notifications from the kernel, it doesn't > actually use those notifications directly; instead, it just sets a flag > (see kernel_route_notify() in babeld.c), and does a full dump whenever > it gets a notification. Which obviously interacts really badly with lots > of routes being inserted at the same time, as that will basically send > Babel into a loop of doing nothing but route dumps. I saw that too and I was poking at the babeld code for a while before settling on the iproute2 reproducer, also compared it quite closely with bird and I can't say I really see a difference in what they do other than netlink buffer sizing. Both will periodically dump the whole table so if I had two instances of bird running concurrently I could experience the same problem as it seems to be the recvmsg call that's blocking forever in the kernel while the table churn is going on so it's not even related to babeld doing a quadratic number of dumps or anything. What is also interesting is that babeld already seems to correctly filter the notifications by table id so all my route churn never actually sets the kernel_routes_changed flag (see parse_kernel_route_rta import_tables check at the bottom). > Bird does things a bit differently: it will directly update its internal > routing table from the netlink notification messages, and only does a > full dump at intervals (by default once every minute, but it can be > configured to run entirely without dumps). Right but the important part is that it does very much still do the dumps :) Also I wonder how netlink buffer overruns are dealt with when there isn't a periodic dump? Wouldn't it still have to do a full dump to resync if that happens? > AFAICT the babeld code will require quite a bit of surgery to change > this behaviour; to the point where I think it may be simpler to > implement the RTT extension in Bird (but I'm obviously biased here)... :) In order to scale the number of native babel routes further you're probably right but that's not necessary for my use-case anyway. If this kernel bug goes away babeld would still work fine IMO. I'm currently working on babel ECMP support in bird though maybe I'll have a stab at RTT after that. --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
> As for why you're seeing this in particular when Babel is running, now > that we know the route dump is the culprit, it's quite obvious: While > Babel listens for new route notifications from the kernel, it doesn't > actually use those notifications directly; instead, it just sets a flag > (see kernel_route_notify() in babeld.c), and does a full dump whenever > it gets a notification. You're right, as usual. > Bird does things a bit differently: it will directly update its internal > routing table from the netlink notification messages, and only does a > full dump at intervals (by default once every minute, but it can be > configured to run entirely without dumps). Yeah, that's the right way. Could you please point me at the place in BIRD where you parse a netlink notification? -- Juliusz ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Hi Toke, So this is definetly a kernel bug. I've managed to reproduce it with only iproute2 commands. The problem seems to be dumping the whole FIB while lots of individual route modifications are taking place. First we have to generate some ip-route(1) -batch commands to use. You can use a bgp route dump I've uploaded or create some synthetic prefixes if you like: $ get_prefixes () { curl https://dxld.net/bgp.prefixes; } $ get_prefixes | awk '{ print "route add table 1337 unreachable " $1 }' > add-routes $ get_prefixes | awk '{ print "route del table 1337 unreachable " $1; print "route add table 1337 unreachable " $1 }' > change-routes To reproduce this I first insert a bunch of routes from that route dump using ip -batch: # ip -batch ./add-routes Then to simulate what bird is doing I use a version of this dump where every route is removed and re-added in a loop: # while sleep 0.1; do ip -batch ./change-routes; done While this is going on monitor route insertion performance using # while sleep 0.1; do { timeout 1 ip -6 monitor; } | wc -l; done On my system this shows ~10k routes/s. If we now dump the table while change-routes is running the performance drops to ~500 routes/s on my system: # while sleep 0.1; do ip -6 route show table 1337 >/dev/null; done FYI: peeking at `perf top` shows fib6_walk_continue and mutex_spin_on_owner as the main offenders and almost all of the CPU time during this test is spent in the kernel. --Daniel PS: To clean up use `ip -6 route flush table 1337`. On Fri, Apr 08, 2022 at 02:38:43PM +0200, d...@darkboxed.org wrote: > On Fri, Apr 08, 2022 at 01:57:01PM +0200, Toke Høiland-Jørgensen wrote: > > Daniel Gröber writes: > > > I'll probably try that tomorrow then. > > > > Alright, let met know how it goes; I can go poking at the kernel, but > > having a reproducer makes that a lot easier :) > > So i tried ip -batch but it seems it's, um, batching the sendmsg calls too > much :) > > Bird does a separate sendto call for each route but iproute2 batches them > into only 1k ish calls for 100k routes so I can't reproduce the problem > with that unfortunately. > > I did do some stracing against babeld with `strace -e raw=all | ts -i '%.s` > just to see what the timing of recvmsg calls is and how they vary. It seems > to me the problem only happens when babeld is exclusively calling recvmsg > (I assume during kernel_dump()), when it's in a steady state and starts > calling select() between the recvmsg() calls performance is fine. > > From skimming the code it seems babeld occationally schedules a full dump > though so that might be why the reproducibility is so sporadic. > > During babled startup seems to be the best chance for repro. For some > reason bird pretty reliably also starts churning pretty soon after I > restart babled not sure why but it makes testing easier so I'll debug that > later :) > > I also tried tweaking the iov_len size for recvmsg() in babled to match > that of bird which is quite large without much change. Lowering the size > just gave me message truncated errors not sure what's up with that. > > If you want to play along, `while sleep 0.1; do { timeout 1 ip -6 monitor > route; } | wc -l; done` is what I'm using to monitor the route insertion > performance now. The {} is load bearing (for some reason) and it does error > with "No buffer space available" when lots of churn is going on but it > works anyway. > > --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
On Thu, Apr 07, 2022 at 11:02:15PM +0200, Daniel Gröber wrote: > Hi Toke, > > On Thu, Apr 07, 2022 at 12:19:46AM +0200, Toke Høiland-Jørgensen wrote: > > I doubt that can have fixed this, though? But if it's gone, well, good > > news? :P > > I just managed to trigger it again so probably no :) This is on > 5.10.0-13-amd64/5.10.106-1 with babeld 1.9.1 and bird version 2.0.9. Just got another trigger, this time with babeld 1.11 + bird 2.0.9. Interesting tidbit: if I SIGSTOP babeld insertion performance goes right back up. When I SIGCONT it it goes back to snailtown. Note it's probably not babeld starving bird of CPU as this is on a quadcore AMD GX-412TC and babeld/bird are using at most one CPU each. --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Hi Toke, On Thu, Apr 07, 2022 at 12:19:46AM +0200, Toke Høiland-Jørgensen wrote: > I doubt that can have fixed this, though? But if it's gone, well, good > news? :P I just managed to trigger it again so probably no :) This is on 5.10.0-13-amd64/5.10.106-1 with babeld 1.9.1 and bird version 2.0.9. While it's happening bird prints something like the following every once in a while: I/O loop cycle took 6793 ms for 6 events I get 500-2000ish events from `ip -6 monitor` over a 10sec interval as opposed to the 80-150k I usually see. > Hmm, I've definitely had issues with dnsmasq not handling lots of route > updates well. I got rid of it now, but when I was running a full BGP > table on an oldish openwrt, I basically had to kill dnsmasq every time > the BGP session went up or down, otherwise it would take several minutes > to recover :/ I did kill dnsmasq while it already started happening but that didn't help. On the other hand if I kill babeld the insertion speed seems to go back up again. So that would seem to suggest dnsmasq isn't the problem. Interestingly I can't seem to trigger it by just (non-gracefully) restarting bird so there must be a trigger other than just lots of route insertion activity going on. For completeness' sake: I added a babel protocol to my bird config before this triggered again but it's not involved in the main bgp routing table only a tiny isolated one nor is it handling any of the interfaces babled is on. For now I'm assuming this is not causing to the problem. --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Hi Toke, Looks like I never responded to this :O. On Thu, Feb 24, 2022 at 11:36:06PM +0100, Toke Høiland-Jørgensen wrote: > Yeah, I find this a bit surprising as well. What kernel version are you > seeing this on, and what does the CPU usage show while it's ongoing > (just starting 'top' and sorting by CPU usage should show you which > process(es) are using the most CPU time). The CPU usage is pretty much what you'd expect babel and bird are the top offenders but dnsmasq is also spinning quite heavily. During the original tests I was monitoring CPU usage with htop to make sure I'm not measuring idle route insertion activity FYI. I just tried to reproduce the problem to make sure dnsmasq isn't interfering also, but I can't seem to reproduce it now. Perhaps this was actully a kernel bug that since got fixed by a kernel upgrade. This is on a Debian 11 (bullseye) system, according to my dpkg.log I likely had 5.10.0-11-amd64/5.10.92-1 at the time of the original tests whereas I have 5.10.0-12-amd64/5.10.103-1 now. I tried with the old version too but I can't seem to get the problem to trigger anymore now. Good I guess but unsatisfying :/ > > I am aware of the babel support in bird, but in my setup the whole > > point of using babel is for the RTT metric support which bird doesn't > > seem to support yet. > > Ah, right, yeah, it doesn't. But good to know there's demand for this, > that's a motivation for implementing it :) Definitely :) --Daniel ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Juliusz Chroboczek writes: >> Probably because babeld subscribes to netlink notifications for all new >> routes, and only filters them on the table name fairly late, >> specifically here: >> >> https://github.com/jech/babeld/blob/master/kernel_netlink.c#L1175 > > Do you see how it can be done better? Hmm, no, not really :( Looked at the Bird code, and it seems like it's doing both the subscribe and parsing quite similar to the way babeld is. So it's actually a bit puzzling why it's hurting performance that much. the only obvious difference that I can see from my admittedly cursory glance is that babeld makes heavy use of indirect calls; but we're not talking millions of operations per second here, so it really shouldn't be taking such a heavy toll... -Toke ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Ages ago I attempted a string of optimizations for babeld which sped it up by a lot, but introduced new bugs along the way, and juliusz preferred the relative cleanliness of the babeld code compared to: using ebpf to filter routes in the kernel (big speedup, but it was buggy) inlining qsort (2x speedup), and leveraging sse/neon for comparisons using uthashes to manage scheduling updates (or one day a timerwheel) switching to a struct for the common params (someone elses branch) threads for managing I/O and bailing out Some of that work made it back into the mainline. Some things (like unicast updates) sort of fell out of that. In actuality I was also experimenting with a custom processor and trying to find optimizations that could make it into hw of some form or another, and it was a mess. I am not proud of those few weeks of hacking/flailing. I had a goal of 64k routes, and ultimately wanted to be working on optimizing updates (the protocol supports longer lasting announcements), in response to not being able to meet compute deadlines and fell short (with still buggy code) at about 30k routes on the very limited hw I was using. Anyway, that tree https://github.com/dtaht/rabeld had the semi-broken ebpf code for filtering kernel updates better, I've also long held out hope that some new kernel support for switching routes faster could be leveraged, and I've kind of longed that someone else would stress out the bird version, not just of babel, but of multiple routing protocols, using tools like rtod, here: https://github.com/dtaht/rtod I like bird's codebase a lot. On Wed, Feb 23, 2022 at 7:13 PM Daniel Gröber wrote: > > Hi Toke and Juliusz, > > On Wed, Feb 23, 2022 at 09:43:29PM +0100, Toke Høiland-Jørgensen wrote: > > Probably because babeld subscribes to netlink notifications for all new > > routes, and only filters them on the table name fairly late, > > specifically here: > > > > https://github.com/jech/babeld/blob/master/kernel_netlink.c#L1175 > > Thanks for the pointer, I figured it would be something like that but I'm > still surprised babled should be able to (seemingly) block the kernel from > processing other netlink messages but I haven't had the time to really > review the code yet properly yet. > > I would have expected the kernel to just drop events when babled falls > behind with processing. > > > So babeld will process and parse all route entries even if it won't > > export them. > > Right, so I wonder if there is a way to let the kernel do the filtering > before passing events to babeld. Perhaps just making babled faster at > processing route updates would be a better solution though. Maybe I'll try > my hand at some profiling when I get a chance. > > > implementation in Bird as well; that has no issues with running > > concurrently with a full BGP table. It is even possible to run babel and > > BGP in the same Bird instance, but I split mine out to two instances > > (one for BGP, one for Babel) because I had issues with the > > single-threaded nature of Bird causing Babel to miss hello updates while > > processing a large BGP update. > > I am aware of the babel support in bird, but in my setup the whole point of > using babel is for the RTT metric support which bird doesn't seem to > support yet. > > I had a look at FRR too since it supposedly does support RTT but according > to the babel homepage using it is discouraged. I was wondering if that is > still correct actually? > > On Thu, Feb 24, 2022 at 12:55:01AM +0100, Juliusz Chroboczek wrote: > > > I run Bird in a similar setup as yours, BTW, but using the Babel > > > implementation in Bird > > > > Just to clarify: there are two major implementations of Babel: > > > > - babeld, which is a research project, and was written over the years by > > myself and a number of students, most of whom only stayed during an > > internship before moving on; > > > While I find babeld more convenient than BIRD, since it requires little > > configuration in many common cases, I recommend that people use BIRD in > > preference to babeld in production deployments. > > Yeah like I said above I am using babeld because of the RTT metric support > otherwise I would have preferred bird :) > > --Daniel > ___ > Babel-users mailing list > Babel-users@alioth-lists.debian.net > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users -- I tried to build a better future, a few times: https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org Dave Täht CEO, TekLibre, LLC ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
Hi Toke and Juliusz, On Wed, Feb 23, 2022 at 09:43:29PM +0100, Toke Høiland-Jørgensen wrote: > Probably because babeld subscribes to netlink notifications for all new > routes, and only filters them on the table name fairly late, > specifically here: > > https://github.com/jech/babeld/blob/master/kernel_netlink.c#L1175 Thanks for the pointer, I figured it would be something like that but I'm still surprised babled should be able to (seemingly) block the kernel from processing other netlink messages but I haven't had the time to really review the code yet properly yet. I would have expected the kernel to just drop events when babled falls behind with processing. > So babeld will process and parse all route entries even if it won't > export them. Right, so I wonder if there is a way to let the kernel do the filtering before passing events to babeld. Perhaps just making babled faster at processing route updates would be a better solution though. Maybe I'll try my hand at some profiling when I get a chance. > implementation in Bird as well; that has no issues with running > concurrently with a full BGP table. It is even possible to run babel and > BGP in the same Bird instance, but I split mine out to two instances > (one for BGP, one for Babel) because I had issues with the > single-threaded nature of Bird causing Babel to miss hello updates while > processing a large BGP update. I am aware of the babel support in bird, but in my setup the whole point of using babel is for the RTT metric support which bird doesn't seem to support yet. I had a look at FRR too since it supposedly does support RTT but according to the babel homepage using it is discouraged. I was wondering if that is still correct actually? On Thu, Feb 24, 2022 at 12:55:01AM +0100, Juliusz Chroboczek wrote: > > I run Bird in a similar setup as yours, BTW, but using the Babel > > implementation in Bird > > Just to clarify: there are two major implementations of Babel: > > - babeld, which is a research project, and was written over the years by > myself and a number of students, most of whom only stayed during an > internship before moving on; > While I find babeld more convenient than BIRD, since it requires little > configuration in many common cases, I recommend that people use BIRD in > preference to babeld in production deployments. Yeah like I said above I am using babeld because of the RTT metric support otherwise I would have preferred bird :) --Daniel signature.asc Description: PGP signature ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] babeld slashes kernel route manipulation performance by 17000%
> Probably because babeld subscribes to netlink notifications for all new > routes, and only filters them on the table name fairly late, > specifically here: > > https://github.com/jech/babeld/blob/master/kernel_netlink.c#L1175 Do you see how it can be done better? > I run Bird in a similar setup as yours, BTW, but using the Babel > implementation in Bird Just to clarify: there are two major implementations of Babel: - babeld, which is a research project, and was written over the years by myself and a number of students, most of whom only stayed during an internship before moving on; - the one that's integrated in BIRD, which was written by Toke, one of the most competent programmers I have had the pleasure to meet. While I find babeld more convenient than BIRD, since it requires little configuration in many common cases, I recommend that people use BIRD in preference to babeld in production deployments. (Both BIRD and babeld aim to implement Babel and its extensions as standardised and documented at the IETF, so any failure of the two implementations to interoperate is considered as a bug. In other words, you should be able to mix and match BIRD and babeld in a single network.) -- Juliusz ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
[Babel-users] babeld slashes kernel route manipulation performance by 17000%
Hi, I'm seeing a rather odd issue in my babeld deployment. I'm using babeld on one of my linux routers which also has bird running with a v6 BGP full table session thats being inserted into the kernel FIB. Whenever bird tries to clear/reinsert all routes in the kernel table I'm seeing a 17000% reduction[1] in route insertion performance if babled is running simultaniously :) Since I've anticipated babeld not being quite ready to even see/filter a kernel table with 100k+ routes I've set things up such that bird inserts its routes into a separate table so babeld has a chance to just completely ignore them and only see a nice and short main routing table. Any ideas/pointers how babled could be slowing this down so much? --Daniel [1]: Measured by counting how many lines ip monitor spits out over a 10 second period while bird is having at it with or without babeld running, thusly: `timeout 10 ip -6 monitor | wc -l`. With babeld: root@Debby:~# { timeout 10 ip -6 monitor; } | wc -l 296 Without babled: root@Debby:~# { timeout 10 ip -6 monitor; } | wc -l 104809 signature.asc Description: PGP signature ___ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users