On Fri, Feb 24, 2017 at 12:19 PM, Juliusz Chroboczek <j...@irif.fr> wrote: > Applied, thanks, although you're just fixing the symptom, not the cause. > > I'm really not happy with this part of the code, it's one of the most > fragile bits in babeld. Plus it's racy (if the interface goes down and > back up before we notice, bad things may happen). I suspect it's > unavoidable, the kernel interfaces we're using are intrinsically racy.
I have a patch for recognizing errors better pending here, but it still needs some work and stress testing. https://github.com/dtaht/rabeld/blob/master/kernel_netlink.c#L470 These were all errnos that I saw in my stress testing in the past few weeks, and why they happened. We could get away from having a single netlink socket for all messages and instead have one per logical group (routes,interfaces,addresses,rules), and have one per each - doing an interface check when ENETDOWN hits, for example. I too have now seen the kernel return EAGAIN indefinitely - in my case wpa_supplicant or the wifi driver had crashed beyond recovery - and I haven't the foggiest idea what to do about it... (closing/reopening the socket crashed things harder) The new noprefixroute stuff helps a little on separating interface changes from automatic kernel flushes. > If anyone competent wants to rework this code, I'll be grateful. I'll try in the coming weeks. Assuming you think I'm competent. > > -- Juliusz -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org _______________________________________________ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users