Re: Unwanted IPv6 static route flap

2020-03-10 Thread Neil Jerram
Hi Ondrej,

I've added my guesses below, about what the logs mean.  Would you mind
taking a look and commenting further?

[...]  Here is a case where node R was killed (-9) at 15:50:03 and
> restarted at 15:50:08.  Node M saw this route flap at 15:50:11:
>
> [2020-03-04T15:50:11.257106] Deleted fd00:10:244:0:1cc0:b1ac:ad47:e7c0/122
> via 2001:20::2 dev eth0 proto bird metric 1024 pref medium
> [2020-03-04T15:50:11.257459] fd00:10:244:0:1cc0:b1ac:ad47:e7c0/122 via
> 2001:20::2 dev eth0 proto bird metric 1024 pref medium
>
> Here is the node M log with "debug all", for both v4 and v6, from when R
> restarted until the route flap:
>
> 2020-03-04T15:50:09.246741514Z KRT: Scanning routing table
> 2020-03-04T15:50:09.246805309Z BGP: connect_timeout
> 2020-03-04T15:50:09.246820405Z BGP: Closing connection
> 2020-03-04T15:50:09.246830788Z BGP: Connecting
> 2020-03-04T15:50:09.246843075Z bird: Mesh_172_17_0_2: Connecting to
> 172.17.0.2 from local address 172.17.0.3
> 2020-03-04T15:50:09.246852834Z KRT: Got 2001:20::/64, type=1, oif=36,
> table=254, prid=2, proto=kernel1
> 2020-03-04T15:50:09.246861952Z KRT: Got
> fd00:10:244:0:1cc0:b1ac:ad47:e7c0/122, type=1, oif=36, table=254, prid=12,
> proto=kernel1
>

I presume this indicates this BIRD6 seeing that the e7c0 route is present
in the kernel routing table.


> 2020-03-04T15:50:09.246871442Z KRT: Got
> fd00:10:244:0:586d:4461:e980:a280/128, type=1, oif=6, table=254, prid=3,
> proto=kernel1
> 2020-03-04T15:50:09.246879832Z Running filter
> `calico_kernel_programming'...done (2)
> 2020-03-04T15:50:09.246889555Z KRT: Got
> fd00:10:244:0:586d:4461:e980:a280/122, type=6, oif=1, table=254, prid=12,
> proto=kernel1
> 2020-03-04T15:50:09.246898969Z KRT: Got
> fd00:10:244:0:58fd:b191:5c13:9cc0/122, type=1, oif=36, table=254, prid=12,
> proto=kernel1
> 2020-03-04T15:50:09.246908792Z Running filter
> `calico_kernel_programming'...done (2)
> 2020-03-04T15:50:09.24691778Z KRT: Got fe80::/64, type=1, oif=36,
> table=254, prid=2, proto=kernel1
> 2020-03-04T15:50:09.246927197Z KRT: Ignoring route - strange class/scope
> 2020-03-04T15:50:09.246935966Z KRT: Got fe80::/64, type=1, oif=6,
> table=254, prid=2, proto=kernel1
> 2020-03-04T15:50:09.246944298Z KRT: Ignoring route - strange class/scope
> 2020-03-04T15:50:09.247009224Z KRT: Got ::1/128, type=2, oif=1, table=255,
> prid=2, proto=(none)
> 2020-03-04T15:50:09.247065042Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247088778Z KRT: Got 2001:20::/128, type=4, oif=36,
> table=255, prid=2, proto=(none)
> 2020-03-04T15:50:09.247115256Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247131555Z KRT: Got 2001:20::1/128, type=2, oif=36,
> table=255, prid=2, proto=(none)
> 2020-03-04T15:50:09.247145697Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247159089Z KRT: Got fd00:10:96::ac35/128, type=2,
> oif=3, table=255, prid=2, proto=(none)
> 2020-03-04T15:50:09.24717877Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247192816Z KRT: Got fe80::/128, type=4, oif=36,
> table=255, prid=2, proto=(none)
> 2020-03-04T15:50:09.247206392Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247219343Z KRT: Got fe80::/128, type=4, oif=6,
> table=255, prid=2, proto=(none)
> 2020-03-04T15:50:09.247262729Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247279256Z KRT: Got fe80::42:acff:fe11:3/128, type=2,
> oif=36, table=255, prid=2, proto=(none)
> 2020-03-04T15:50:09.247294287Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247307585Z KRT: Got fe80::ecee:eeff:feee:/128,
> type=2, oif=6, table=255, prid=2, proto=(none)
> 2020-03-04T15:50:09.247322176Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247335799Z KRT: Got ff00::/8, type=1, oif=36,
> table=255, prid=3, proto=(none)
> 2020-03-04T15:50:09.247349484Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.24736411Z KRT: Got ff00::/8, type=1, oif=2, table=255,
> prid=3, proto=(none)
> 2020-03-04T15:50:09.247378221Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247391919Z KRT: Got ff00::/8, type=1, oif=6,
> table=255, prid=3, proto=(none)
> 2020-03-04T15:50:09.247405262Z KRT: Ignoring route - unknown table 255
> 2020-03-04T15:50:09.247417972Z Running filter
> `calico_kernel_programming'...done (2)
> 2020-03-04T15:50:09.247439521Z KRT: Pruning table master
> 2020-03-04T15:50:09.247455481Z Running filter
> `calico_kernel_programming'...done (2)
> 2020-03-04T15:50:09.247554245Z Running filter
> `calico_kernel_programming'...done (2)
> 2020-03-04T15:50:09.247607455Z KRT: Pruning inherited routes
> 2020-03-04T15:50:09.247630001Z fd00:10:244:0:586d:4461:e980:a280/128:
> uptodate (metric=1024)
> 2020-03-04T15:50:09.24764445Z KRT: Scanning interfaces
> 2020-03-04T15:50:09.247654547Z bird6: device1: Scanning interfaces
> 2020-03-04T15:50:09.247664209Z BGP: Waiting for connect success
> 2020-03-04T15:50:09.247672821Z KRT: Scanning routing table
> 2020-03-04T15:50:09.

Re: [PATCH 0/4] Add MAC authentication support to the Babel protocol

2020-03-10 Thread Ondrej Zajicek
On Tue, Mar 10, 2020 at 04:58:26PM +0100, Toke Høiland-Jørgensen wrote:
> > I think that random_bytes() should not fail.
> 
> Preferably not; but we don't really have any guarantees that the syscall
> will succeed, do we? I guess I can add some sanity checks on startup and
> bail out if (e.g.) /dev/urandom cannot be opened. It would still be
> possible for read() or getrandom() to fail later on, though, no?

It is mostly whether we want error handling directly in random_bytes(),
or in caller code. If we could have some reasonable error handling code
in the caller (e.g. log error message and drop packet), then we can do
that, but otherwise (as there are no error handling code in Babel patch)
it seems better to just die() directly in random_bytes() code if
underlying syscalls fail.

Definitely, we should not silently ignore these errors.

It seems getrandom() and getentropy() should not fail for buflen <= 256,
so we may die() for unexpected errors from these syscalls.

For read() from /dev/urandom, it might be good to handle EINTR, but we
may die() for other errors.


> > There are '#if defined(NATIVE_LITTLE_ENDIAN)' in the code, does
> > anybody define these?
> 
> Hmm, probably not? The FreeBSD Blake implementation seems to have a
> #define based on __BYTE_ORDER, so guess we could just add something like
> that as well?

We already have CPU_BIG_ENDIAN in BIRD, so perhaps just use that.


-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."



Re: [PATCH 0/4] Add MAC authentication support to the Babel protocol

2020-03-10 Thread Toke Høiland-Jørgensen
Ondrej Zajicek  writes:

> On Sun, Feb 23, 2020 at 11:56:33PM +0100, Toke Høiland-Jørgensen wrote:
>> This series adds MAC authentication support to the Babel protocol as 
>> specified
>> in by the IETF Babel working group in draft-babel-hmac-10:
>
> Hi
>
> Some more comments / questions:
>
>
> 1/4:
>
> BIRD_CHECK_GETRANDOM_SYSCALL - direct syscall case seems unnecessary,
> as we can fallback to /dev/urandom anyways.
>
> BIRD_CHECK_GETRANDOM - just use generic AC_CHECK_FUNCS /
> AC_SEARCH_LIBS ?

OK.

> I think that random_bytes() should not fail.

Preferably not; but we don't really have any guarantees that the syscall
will succeed, do we? I guess I can add some sanity checks on startup and
bail out if (e.g.) /dev/urandom cannot be opened. It would still be
possible for read() or getrandom() to fail later on, though, no?

> 2/4:
>
> blake2 - We definitely need unit tests here. Ideally there should exist some
> reference data / hash pairs for blake2. See mac_test.c

Yup, there does seem to be some test vectors in the blake2 repository;
will add those.

> There are '#if defined(NATIVE_LITTLE_ENDIAN)' in the code, does
> anybody define these?

Hmm, probably not? The FreeBSD Blake implementation seems to have a
#define based on __BYTE_ORDER, so guess we could just add something like
that as well?

> 3/4:
>
> What is point of separating babel_parse_state and babel_read_state?
>
> Why export packet/TLV structures from packets.c?

Well, I did both of these to be able to have all the auth-related code
in a separate file, while still reusing the packet parsing macros etc...

> General pattern in BIRD (including Babel) is that wire format details
> is hidden in packets.c and more abstract structures are exported
> outside (e.g. union babel_msg). Seems to me that it would make sense
> to have low-level auth code (TLV read/write code, packet
> signing/verifying) directly in packets.c, while high-level code
> (challenge response mechanism) in babel.c.

Hmm, I could have sworn that I got the idea of splitting it into its own
file from one of the other protocols, but looking at them now that does
not actually seem to be the case. So I guess I'll just move everything
back into {packets,babel}.c :)

-Toke



Re: [PATCH 0/4] Add MAC authentication support to the Babel protocol

2020-03-10 Thread Ondrej Zajicek
On Sun, Feb 23, 2020 at 11:56:33PM +0100, Toke Høiland-Jørgensen wrote:
> This series adds MAC authentication support to the Babel protocol as specified
> in by the IETF Babel working group in draft-babel-hmac-10:

Hi

Some more comments / questions:


1/4:

BIRD_CHECK_GETRANDOM_SYSCALL - direct syscall case seems unnecessary,
as we can fallback to /dev/urandom anyways.

BIRD_CHECK_GETRANDOM - just use generic AC_CHECK_FUNCS / AC_SEARCH_LIBS ?

I think that random_bytes() should not fail.


2/4:

blake2 - We definitely need unit tests here. Ideally there should exist some
reference data / hash pairs for blake2. See mac_test.c

There are '#if defined(NATIVE_LITTLE_ENDIAN)' in the code, does anybody define 
these?


3/4:

What is point of separating babel_parse_state and babel_read_state?

Why export packet/TLV structures from packets.c? General pattern in BIRD
(including Babel) is that wire format details is hidden in packets.c and more
abstract structures are exported outside (e.g. union babel_msg). Seems to me
that it would make sense to have low-level auth code (TLV read/write code,
packet signing/verifying) directly in packets.c, while high-level code
(challenge response mechanism) in babel.c.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."



Re: BGP routes not being propagated to kernel via OSPF

2020-03-10 Thread Ondrej Zajicek
On Mon, Mar 09, 2020 at 10:56:33AM -0600, Brian Topping wrote:
> As a followup to this thread, I don’t know why I didn’t try this to start 
> with, but installing 2.0.4 on the router having problems cleared it up with 
> no change to configuration. 
> 
> I did try to upgrade to 2.0.5 and the problem returned.
> 
> Now that I know for sure that this is a problem in BIRD (either with a 
> regression or some change that is required by spec but not clear how to 
> configure for),  I will try to diff the sources and see what might have 
> caused this.

Hi

Could you compare output of 'show ospf state' and 'show ospf lsadb' between the 
routers that compute the external routes correctly and ones that does not?

Also, in your original post the 'show route' output from the problematic OSPF 
does not contain not only external routes, but also 10.10.0.0/22:

> 10.10.0.0/22 unicast [backbone 02:07:46.207] I (150/10) [c.d.143.113]

There were some changes between 2.0.5, but they were supposed to be limited to 
NSSA (type 7) routes.

What exactly is your topology? You seem to use NBMA, do you have fully 
consistent set of neighbors on all eligible routers?

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."