On Wed, Feb 10, 2016 at 04:38:03PM +0100, Mikhail Sennikovskii wrote: > Hello Fellows, > > as a follow-up for the recent discussion about BIRD support of IPv6 ECMP on > Linux > ( http://trubka.network.cz/pipermail/bird-users/2016-January/010163.html ), > I've created a patch, which addresses this problem.
Hi Thanks for the patch, i have several comments: 1) In the patch, the incoming direction (collection from kernel) is handled in unix/krt.c, while outgoing direction (krt_replace_rte()) is handled in linux/netlink.c . It would make more sense to handle both directions in the same place. As BIRD uses internally representation with one route and mpnh structure, i would prefer to move collection also to linux/netlink.c, so generic code in unix/krt.c would not care about OS-specific interface (and does not need any changes). Linux code would just collect next hops and then call krt_got_route() for the whole multipath route. 2) Function krt_replace_rte() should be smarter than just flushing the whole old multipath rte. It should compare old and new next hops (we could suppose they are sorted) and either add missing or remove surplussing next hops by calling nl_send_route(). 3) When collecting received routes, you should avoid creating list of routes for the same net, and then merging such list by rt_merge_list(), you could simply collect next hops (struct mpnh *) for the current route. 4) Also rt_merge_list() uses mpnh_merge_rta() which uses rte_update_pool. That is probably not safe outside of nest update code. The linux netlink code should have its own linpool (static and shared, because netlink socket is also shared), so you could allocate mpnh structures from it. The linpool should be flushed when the route is collected and propagated to krt_got_route(). So you don't need to do rta_lookup() for temporary next hops. Note that such linpool could be also used for IPv4 multipath next hops in nl_parse_multipath() instead of nh_buffer. > Note that the patch still does not support the learn mode for IPv6 ECMP. > The support for it can be added later. Well, the learn mode is not really a problem. With the modifications mentioned above, learning would work automatically (it is handled by unix/krt.c, regardless of how route is generated). What is problematic is handling asynchronous notifications (because you don't know inside netlink code what are other next hops and whether you will get another one related to the same net). -- Elen sila lumenn' omentielvo Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so."