So, here is puzzle someone might have a hint.Both of these are tested with the SAME FreeBSD package. Both are installed on
basically the same DUT FreeBSD (cloned from the same master).Both are based on the git commit 04a3aabf58d95 (which is the candidate for
the bad commit identified earlier) One of them consistent fails, the other passes the test.
Now look at the attached PDF (sorry, of PDF, but was easier to highlight diff).
On the left side is the GOOD run, on the right side the BAD run.You’ll notice that the “rib_link: 30.0.5.0/24 vrf 0: adding dest to table” is already missing at the first instance when the route should get installed and many more
FIB related calls are missing. Anyone have any guess what’s going on here? Relevant function is in zebra_rib.c: static void rib_link (struct route_node *rn, struct rib *rib) { struct rib *head; rib_dest_t *dest; assert (rib && rn); if (IS_ZEBRA_DEBUG_RIB) rnode_debug (rn, "rn %p, rib %p", (void *)rn, (void *)rib); dest = rib_dest_from_rnode (rn); if (!dest) { if (IS_ZEBRA_DEBUG_RIB) rnode_debug (rn, "adding dest to table"); dest = XCALLOC (MTYPE_RIB_DEST, sizeof (rib_dest_t)); route_lock_node (rn); /* rn route table reference */ rn->info = dest; dest->rnode = rn; } […]The first rnode_debug seems to work in both cases, but then in the bad case,
the dest has a value and the “if (!dest)” is skipped. Potentially some missed clearing of data structures?(This is not immediately after startup - but in both runs I run the same sequence)
- Martin On 23 Feb 2016, at 14:44, Martin Winter wrote:
Donald,give me a few more hours to narrow the issue down. Still have a few more ideasto correctly replicate this.At this time, I worry you would not be able to reproduce the issue and justwaste your time. (Getting the config.log from the port build isn’t that trivial) For hints on how I set up the FreeBSD system, see https://git-us.netdef.org/projects/OSR/repos/ci-files/browse/doc/Quagga_FreeBSD10.md - Martin On 23 Feb 2016, at 13:21, Donald Sharp wrote:Martin - Can I see the config.h and config.log generated for your test? thanks! donald On Tue, Feb 23, 2016 at 4:15 PM, Martin Winter < mwin...@opensourcerouting.org> wrote:On 23 Feb 2016, at 5:23, Donald Sharp wrote: Martin -What's your configure line for freebsd? Can you point me at yourconfig.log for this build? My freebsd vm is unhappy with me right now.That’s where I have issues. My CI integrated system spots the error (andevery time), but if I try to run it outside, then it seems to work. Current config inside the CI system is as follows: --enable-user=quagga --enable-group=quagga--sysconfdir=/usr/local/etc/quagga-test --localstatedir=/var/run/quagga--enable-vtysh --with-pkg-extra-version=-ci.NetDEF.org-20160223.101813-git.04a3aab --enable-fpm --enable-tcp-zebra --disable-irdb --enable-isisd --enable-isis-topology --enable-bgp-announce --enable-opaque-lsa --without-libpam --enable-pimd --enable-rtadv --disable-snmp --enable-tcp-zebra --prefix=/usr/local --mandir=/usr/local/man --infodir=/usr/local/info/ --build=amd64-portbld-freebsd10.1Still trying to reproduce it outside of the automated system (so I can domore troubleshooting)(The automated CI system uses my CI plans to build packages, then installsthemand uses them for the tests. When I do it outside of CI system I’m justdoingthe usual make/make install and usually with less optimization to get debuginfo) This bug affects at least 3, probably more BGP errors Anyway, beside of someone looking at the specific patch and see if somethingmight go wrong on FreeBSD, I just need a bit more time to reproduce it.Hope to know more in another 8..12 hrs. - Martin On Mon, Feb 22, 2016 at 10:26 PM, Martin Winter <mwin...@opensourcerouting.org> wrote: On 22 Feb 2016, at 6:40, Donald Sharp wrote:Martin -Any word on this? I'd like to push this patch if it fixes your issues.Not yet.I’m dealing with (most likely) several issues which seem overlap. So ittakes a bit more time.I still tend to think that Timo’s fix is good - but only addressing partof the problems.I’ll prefer to keep it on hold at this time until at least one moreFreeBSD bug is fixed. One of the git bisect on BGP errors pointed to commit commit 04a3aabf58d95d01c4c8168eeff43cf9d9892eee Author: Nicolas Dichtel <nicolas.dich...@6wind.com> Date: Thu Sep 3 10:47:43 2015 +0200 vrf: add a runtime check before playing with netns This patch adds a runtime check to determine if netns are available. Some systems like OpenWRT have the system call setns() but don't have the kernel option CONFIG_NET_NS enabled. Reported-by: Christian Franke <ch...@opensourcerouting.org> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com> Tested-by: Christian Franke <ch...@opensourcerouting.org>Doing a git bisect points to this commit breaking at least 3 BGP tests. I’m still trying to manually verify and understand how this breaks it, sogive me more time.(Feel free to guess on how this commit could break FreeBSD… seems to berelated to similar not correctly installing forwarding entries as well) - Martin On Thu, Feb 18, 2016 at 10:30 PM, Martin Winter <mwin...@opensourcerouting.org> wrote: On 18 Feb 2016, at 18:35, Donald Sharp wrote:Martin -rt_socket.c is not compiled on linux so no need for verification.Once youhave a run please let me know and I'll push your updated patch.It’s running now. Will know soon. It definitely fixes some of theerrors and I hope it actually fixes all of the FreeBSD errors. Will know in approx one more day.(BTW: Testing on a git commit which does no longer exist because ofrebase,but I needed the same to compare. Lucky that I still had the VMs withthe old git checkout running…) - Martin donaldOn Thu, Feb 18, 2016 at 9:19 PM, Martin Winter < mwin...@opensourcerouting.org> wrote: Timo,On 17 Feb 2016, at 23:01, Timo Teras wrote: On Wed, 17 Feb 2016 20:11:07 -0800 "Martin Winter" <mwin...@opensourcerouting.org> wrote:This is based on Quagga proposed 6 branch of Feb 10 (git f48d5b9957- which does no longer exist, [no] thanks to rebase on a publicgit) on FreeBSD 10.2.Zebra seems to fail delete any routes in FreeBSD RIB. It fails withupdates (better routes to different interface) and it fails to remove them when quagga is killed.Thanks for the testing. Sounds like this is breakage caused by myatomic FIB patch which was pretty untested on *BSD.Looks like the code talking to kernel does not handle RTM_CHANGE correctly. As first test, perhaps just removing RTM_CHANGE usagemight fix the issues and revert to how it used to work.Using RTM_CHANGE on kernels where it works is better, but it's leftanexercise for developer who has access and will to fix it on *BSD.Thanks for the patch. Seems like you never tested it (failed tocompile),but was close enough to guess what you probably meant. See inline onpatchdiff --git a/zebra/rt_socket.c b/zebra/rt_socket.cindex 4d0a7db..9012280 100644--- a/zebra/rt_socket.c +++ b/zebra/rt_socket.c @@ -68,7 +68,7 @@ sin_masklen (struct in_addr mask) /* Interface between zebra message and rtm message. */ static int-kernel_rtm_ipv4 (int cmd, struct prefix *p, struct rib *rib, intfamily)+kernel_rtm_ipv4 (int cmd, struct prefix *p, struct rib *rib){ struct sockaddr_in *mask = NULL; @@ -245,7 +245,7 @@ sin6_masklen (struct in6_addr mask) /* Interface between zebra message and rtm message. */ static int-kernel_rtm_ipv6 (int cmd, struct prefix *p, struct rib *rib, intfamily)+kernel_rtm_ipv6 (int cmd, struct prefix *p, struct rib *rib){struct sockaddr_in6 *mask; struct sockaddr_in6 sin_dest, sin_mask, sin_gate;@@ -363,33 +363,32 @@ kernel_rtm_ipv6 (int cmd, struct prefix *p,structrib *rib, int family)#endif +static int +kernel_rtm_ipv6 (int cmd, struct prefix *p, struct rib *rib) I assume this should bekernel_rtm - not kernel_rtm_ipv6(otherwise you have a duplicate for kernel_rtm_ipv6() and a loopin case of AF_INET6) +{ + switch (PREFIX_FAMILY(p))+ { + case AF_INET: + return kernel_rtm_ipv4 (cmd, p, rib); + case AF_INET6: + return kernel_rtm_ipv6 (cmd, p, rib); + } + return 0; +} + intkernel_route_rib (struct prefix *p, struct rib *old, struct rib*new) { - struct rib *rib; - int route = 0, cmd; - - if (!old && new) - cmd = RTM_ADD; - else if (old && !new) - cmd = RTM_DELETE; - else - cmd = RTM_CHANGE; - - rib = new ? new : old; + int route = 0; if (zserv_privs.change(ZPRIVS_RAISE)) zlog (NULL, LOG_ERR, "Can't raise privileges"); - switch (PREFIX_FAMILY(p)) - { - case AF_INET: - route = kernel_rtm_ipv4 (cmd, p, rib, AF_INET); - break; - case AF_INET6: - route = kernel_rtm_ipv6 (cmd, p, rib, AF_INET6); - break; - } + if (old) + route |= kernel_rtm (RTM_DELETE, p, rib); Changed toroute |= kernel_rtm (RTM_DELETE, p, old); + + if (new)+ route |= kernel_rtm (RTM_ADD, p, rib); and changed toroute |= kernel_rtm (RTM_ADD, p, new);(You removed the declaration of “rib” above - so rib is undefined)if (zserv_privs.change(ZPRIVS_LOWER))zlog (NULL, LOG_ERR, "Can't lower privileges"); Attached is a updated patchWith the changes it fixes the specific issue I’ve mentioned. I havenot verified thepatch on Linux yet. Will do a full run with this patch to see howmany ofmy approx20..30 failing FreeBSD testcases it fixes (I assume many to all…)- Martin _______________________________________________ Quagga-dev mailing list Quagga-dev@lists.quagga.net https://lists.quagga.net/mailman/listinfo/quagga-dev
diff_zebra.pdf
Description: Adobe PDF document
_______________________________________________ Quagga-dev mailing list Quagga-dev@lists.quagga.net https://lists.quagga.net/mailman/listinfo/quagga-dev