So, here is puzzle someone might have a hint.

Both of these are tested with the SAME FreeBSD package. Both are installed on
basically the same DUT FreeBSD (cloned from the same master).
Both are based on the git commit 04a3aabf58d95 (which is the candidate for
the bad commit identified earlier)
One of them consistent fails, the other passes the test.

Now look at the attached PDF (sorry, of PDF, but was easier to highlight diff).

On the left side is the GOOD run, on the right side the BAD run.

You’ll notice that the “rib_link: 30.0.5.0/24 vrf 0: adding dest to table” is already missing at the first instance when the route should get installed and many more
FIB related calls are missing.

Anyone have any guess what’s going on here?

Relevant function is in zebra_rib.c:

        static void
        rib_link (struct route_node *rn, struct rib *rib)
        {
          struct rib *head;
          rib_dest_t *dest;

          assert (rib && rn);

          if (IS_ZEBRA_DEBUG_RIB)
            rnode_debug (rn, "rn %p, rib %p", (void *)rn, (void *)rib);

          dest = rib_dest_from_rnode (rn);
          if (!dest)
            {
              if (IS_ZEBRA_DEBUG_RIB)
                rnode_debug (rn, "adding dest to table");

              dest = XCALLOC (MTYPE_RIB_DEST, sizeof (rib_dest_t));
              route_lock_node (rn); /* rn route table reference */
              rn->info = dest;
              dest->rnode = rn;
            }
        […]

The first rnode_debug seems to work in both cases, but then in the bad case,
the dest has a value and the “if (!dest)” is skipped.

Potentially some missed clearing of data structures?
(This is not immediately after startup - but in both runs I run the same sequence)

- Martin


On 23 Feb 2016, at 14:44, Martin Winter wrote:

Donald,

give me a few more hours to narrow the issue down. Still have a few more ideas
to correctly replicate this.
At this time, I worry you would not be able to reproduce the issue and just
waste your time.

(Getting the config.log from the port build isn’t that trivial)

For hints on how I set up the FreeBSD system, see
https://git-us.netdef.org/projects/OSR/repos/ci-files/browse/doc/Quagga_FreeBSD10.md

- Martin

On 23 Feb 2016, at 13:21, Donald Sharp wrote:

Martin -

Can I see the config.h and config.log generated for your test?

thanks!

donald

On Tue, Feb 23, 2016 at 4:15 PM, Martin Winter <
mwin...@opensourcerouting.org> wrote:

On 23 Feb 2016, at 5:23, Donald Sharp wrote:

Martin -

What's your configure line for freebsd?  Can you point me at your
config.log for this build? My freebsd vm is unhappy with me right now.


That’s where I have issues. My CI integrated system spots the error (and
every time), but if
I try to run it outside, then it seems to work.

Current config inside the CI system is as follows:

--enable-user=quagga --enable-group=quagga
--sysconfdir=/usr/local/etc/quagga-test --localstatedir=/var/run/quagga
--enable-vtysh
--with-pkg-extra-version=-ci.NetDEF.org-20160223.101813-git.04a3aab
--enable-fpm --enable-tcp-zebra --disable-irdb --enable-isisd
--enable-isis-topology --enable-bgp-announce --enable-opaque-lsa
--without-libpam --enable-pimd --enable-rtadv --disable-snmp
--enable-tcp-zebra --prefix=/usr/local --mandir=/usr/local/man
--infodir=/usr/local/info/ --build=amd64-portbld-freebsd10.1

Still trying to reproduce it outside of the automated system (so I can do
more troubleshooting)

(The automated CI system uses my CI plans to build packages, then installs
them
and uses them for the tests. When I do it outside of CI system I’m just
doing
the usual make/make install and usually with less optimization to get debug
info)

This bug affects at least 3, probably more BGP errors

Anyway, beside of someone looking at the specific patch and see if
something
might go wrong on FreeBSD, I just need a bit more time to reproduce it.
Hope to know more in another 8..12 hrs.

- Martin


On Mon, Feb 22, 2016 at 10:26 PM, Martin Winter <
mwin...@opensourcerouting.org> wrote:

On 22 Feb 2016, at 6:40, Donald Sharp wrote:

Martin -


Any word on this? I'd like to push this patch if it fixes your issues.


Not yet.
I’m dealing with (most likely) several issues which seem overlap. So it
takes a bit
more time.
I still tend to think that Timo’s fix is good - but only addressing part
of the problems.
I’ll prefer to keep it on hold at this time until at least one more
FreeBSD bug is fixed.


One of the git bisect on BGP errors pointed to commit

commit 04a3aabf58d95d01c4c8168eeff43cf9d9892eee
Author: Nicolas Dichtel <nicolas.dich...@6wind.com>
Date:   Thu Sep 3 10:47:43 2015 +0200

  vrf: add a runtime check before playing with netns

  This patch adds a runtime check to determine if netns are
available. Some
  systems like OpenWRT have the system call setns() but don't
have the kernel
  option CONFIG_NET_NS enabled.

  Reported-by: Christian Franke <ch...@opensourcerouting.org>
  Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
  Tested-by: Christian Franke <ch...@opensourcerouting.org>


Doing a git bisect points to this commit breaking at least 3 BGP tests. I’m still trying to manually verify and understand how this breaks it, so
give me
more time.

(Feel free to guess on how this commit could break FreeBSD… seems to be
related
to similar not correctly installing forwarding entries as well)

- Martin



On Thu, Feb 18, 2016 at 10:30 PM, Martin Winter <

mwin...@opensourcerouting.org> wrote:

On 18 Feb 2016, at 18:35, Donald Sharp wrote:


Martin -


rt_socket.c is not compiled on linux so no need for verification.
Once

you

have a run please let me know and I'll push your updated patch.


It’s running now. Will know soon. It definitely fixes some of the
errors
and I hope it actually fixes all of the FreeBSD errors.

Will know in approx one more day.

(BTW: Testing on a git commit which does no longer exist because of
rebase,
but I needed the same to compare. Lucky that I still had the VMs with
the
old git checkout running…)

- Martin


donald

On Thu, Feb 18, 2016 at 9:19 PM, Martin Winter <
mwin...@opensourcerouting.org> wrote:

Timo,


On 17 Feb 2016, at 23:01, Timo Teras wrote:

On Wed, 17 Feb 2016 20:11:07 -0800

"Martin Winter" <mwin...@opensourcerouting.org> wrote:

This is based on Quagga proposed 6 branch of Feb 10 (git f48d5b9957
-

which does no longer exist, [no] thanks to rebase on a public
git) on FreeBSD 10.2.

Zebra seems to fail delete any routes in FreeBSD RIB. It fails with
updates (better routes to different interface) and
it fails to remove them when quagga is killed.


Thanks for the testing. Sounds like this is breakage caused by my
atomic FIB patch which was pretty untested on *BSD.

Looks like the code talking to kernel does not handle RTM_CHANGE correctly. As first test, perhaps just removing RTM_CHANGE usage
might
fix the issues and revert to how it used to work.

Using RTM_CHANGE on kernels where it works is better, but it's left
an
exercise for developer who has access and will to fix it on *BSD.


Thanks for the patch. Seems like you never tested it (failed to

compile),


but was close enough to guess what you probably meant. See inline on


patch



diff --git a/zebra/rt_socket.c b/zebra/rt_socket.c

index 4d0a7db..9012280 100644
--- a/zebra/rt_socket.c
+++ b/zebra/rt_socket.c
@@ -68,7 +68,7 @@ sin_masklen (struct in_addr mask)

/* Interface between zebra message and rtm message. */
static int
-kernel_rtm_ipv4 (int cmd, struct prefix *p, struct rib *rib, int

family)


+kernel_rtm_ipv4 (int cmd, struct prefix *p, struct rib *rib)


{
struct sockaddr_in *mask = NULL;
@@ -245,7 +245,7 @@ sin6_masklen (struct in6_addr mask)

/* Interface between zebra message and rtm message. */
static int
-kernel_rtm_ipv6 (int cmd, struct prefix *p, struct rib *rib, int

family)


+kernel_rtm_ipv6 (int cmd, struct prefix *p, struct rib *rib)

{
struct sockaddr_in6 *mask;
struct sockaddr_in6 sin_dest, sin_mask, sin_gate;
@@ -363,33 +363,32 @@ kernel_rtm_ipv6 (int cmd, struct prefix *p,

struct


rib *rib, int family)


#endif

+static int
+kernel_rtm_ipv6 (int cmd, struct prefix *p, struct rib *rib)


I assume this should be
kernel_rtm - not kernel_rtm_ipv6
(otherwise you have a duplicate for kernel_rtm_ipv6() and a loop
in case of AF_INET6)


+{

+  switch (PREFIX_FAMILY(p))
+    {
+    case AF_INET:
+      return kernel_rtm_ipv4 (cmd, p, rib);
+    case AF_INET6:
+      return kernel_rtm_ipv6 (cmd, p, rib);
+    }
+  return 0;
+}
+
int
kernel_route_rib (struct prefix *p, struct rib *old, struct rib
*new)
{
-  struct rib *rib;
-  int route = 0, cmd;
-
-  if (!old && new)
-    cmd = RTM_ADD;
-  else if (old && !new)
-    cmd = RTM_DELETE;
-  else
-    cmd = RTM_CHANGE;
-
-  rib = new ? new : old;
+  int route = 0;

if (zserv_privs.change(ZPRIVS_RAISE))
zlog (NULL, LOG_ERR, "Can't raise privileges");

-  switch (PREFIX_FAMILY(p))
-    {
-    case AF_INET:
-      route = kernel_rtm_ipv4 (cmd, p, rib, AF_INET);
-      break;
-    case AF_INET6:
-      route = kernel_rtm_ipv6 (cmd, p, rib, AF_INET6);
-      break;
-    }
+  if (old)
+    route |= kernel_rtm (RTM_DELETE, p, rib);


Changed to
route |= kernel_rtm (RTM_DELETE, p, old);

+

+  if (new)
+    route |= kernel_rtm (RTM_ADD, p, rib);


and changed to
route |= kernel_rtm (RTM_ADD, p, new);

(You removed the declaration of “rib” above - so rib is undefined)


if (zserv_privs.change(ZPRIVS_LOWER))

zlog (NULL, LOG_ERR, "Can't lower privileges");


Attached is a updated patch

With the changes it fixes the specific issue I’ve mentioned. I have
not
verified the
patch on Linux yet. Will do a full run with this patch to see how
many

of


my approx

20..30 failing FreeBSD testcases it fixes (I assume many to all…)

- Martin


_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
https://lists.quagga.net/mailman/listinfo/quagga-dev




Attachment: diff_zebra.pdf
Description: Adobe PDF document

_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to