Re: IPv6 default route flapping

2021-04-20 Thread Greg Troxel

Jan Schaumann  writes:

> Apr 20 01:32:32 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> Apr 20 01:32:34 netbsd dhcpcd[17397]: xennet0: Router Advertisement from 
> fe80::caa:49ff:feaf:1815
> Apr 20 01:32:35 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 is 
> unreachable
> Apr 20 01:32:35 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> Apr 20 01:32:44 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 is 
> reachable again
> Apr 20 01:32:52 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 is 
> unreachable
> Apr 20 01:32:52 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> Apr 20 01:32:54 netbsd dhcpcd[17397]: xennet0: Router Advertisement from 
> fe80::caa:49ff:feaf:1815
> Apr 20 01:32:55 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 is 
> unreachable
> Apr 20 01:32:56 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router

I don't know what's wrong, but obviously you should 'tcpdump -w IPV6
ip6' and look at it.


signature.asc
Description: PGP signature


Re: IPv6 default route flapping

2021-04-20 Thread Jan Schaumann
Greg Troxel  wrote:
> 
> Jan Schaumann  writes:
> 
> > Apr 20 01:32:32 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> > Apr 20 01:32:34 netbsd dhcpcd[17397]: xennet0: Router Advertisement from 
> > fe80::caa:49ff:feaf:1815
> > Apr 20 01:32:35 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 is 
> > unreachable
> > Apr 20 01:32:35 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> > Apr 20 01:32:44 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 is 
> > reachable again
> > Apr 20 01:32:52 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 is 
> > unreachable
> > Apr 20 01:32:52 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> > Apr 20 01:32:54 netbsd dhcpcd[17397]: xennet0: Router Advertisement from 
> > fe80::caa:49ff:feaf:1815
> > Apr 20 01:32:55 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 is 
> > unreachable
> > Apr 20 01:32:56 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> 
> I don't know what's wrong, but obviously you should 'tcpdump -w IPV6
> ip6' and look at it.

That's unfortunately not very revealing, as it merely
shows the neighbor solicitations and router advertisements:

IP6 fe80::7e12:b688:167c:785f > fe80::caa:49ff:feaf:1815: ICMP6, neighbor
solicitation, who has fe80::caa:49ff:feaf:1815, length 32
IP6 fe80::7e12:b688:167c:785f > ff02::1:ffaf:1815:
ICMP6, neighbor solicitation, who has fe80::caa:49ff:feaf:1815, length 32
IP6 fe80::7e12:b688:167c:785f > ff02::2: ICMP6, router solicitation, length 16
IP6 fe80::7e12:b688:167c:785f > ff02::1:ffaf:1815:
ICMP6, neighbor solicitation, who has fe80::caa:49ff:feaf:1815, length 32
IP6 fe80::7e12:b688:167c:785f > ff02::1:ffaf:1815:
ICMP6, neighbor solicitation, who has fe80::caa:49ff:feaf:1815, length 32
IP6 fe80::caa:49ff:feaf:1815 > ff02::1: ICMP6, router advertisement, length 56
IP6 fe80::7e12:b688:167c:785f > ff02::1:ffaf:1815:
ICMP6, neighbor solicitation, who has fe80::caa:49ff:feaf:1815, length 32
IP6 fe80::7e12:b688:167c:785f > ff02::2: ICMP6, router solicitation, length 16
IP6 fe80::7e12:b688:167c:785f > ff02::1:ffaf:1815:
ICMP6, neighbor solicitation, who has fe80::caa:49ff:feaf:1815, length 32
IP6 fe80::7e12:b688:167c:785f > ff02::2: ICMP6, router solicitation, length 16
IP6 fe80::7e12:b688:167c:785f > ff02::2: ICMP6, router solicitation, length 16
IP6 fe80::caa:49ff:feaf:1815 > ff02::1: ICMP6, router advertisement, length 56
IP6 fe80::7e12:b688:167c:785f > fe80::caa:49ff:feaf:1815: ICMP6, neighbor
solicitation, who has fe80::caa:49ff:feaf:1815, length 32
IP6 fe80::7e12:b688:167c:785f > fe80::caa:49ff:feaf:1815: ICMP6, neighbor
solicitation, who has fe80::caa:49ff:feaf:1815, length 32

-Jan


Re: IPv6 default route flapping

2021-04-20 Thread Martin Husemann
On Tue, Apr 20, 2021 at 10:09:47AM -0400, Jan Schaumann wrote:
> That's unfortunately not very revealing, as it merely
> shows the neighbor solicitations and router advertisements:
> 
> IP6 fe80::7e12:b688:167c:785f > fe80::caa:49ff:feaf:1815: ICMP6, neighbor
> solicitation, who has fe80::caa:49ff:feaf:1815, length 32
> IP6 fe80::7e12:b688:167c:785f > ff02::1:ffaf:1815:
> ICMP6, neighbor solicitation, who has fe80::caa:49ff:feaf:1815, length 32
> IP6 fe80::7e12:b688:167c:785f > ff02::2: ICMP6, router solicitation, length 16
> IP6 fe80::7e12:b688:167c:785f > ff02::1:ffaf:1815:
> ICMP6, neighbor solicitation, who has fe80::caa:49ff:feaf:1815, length 32
> IP6 fe80::7e12:b688:167c:785f > ff02::1:ffaf:1815:
> ICMP6, neighbor solicitation, who has fe80::caa:49ff:feaf:1815, length 32
> IP6 fe80::caa:49ff:feaf:1815 > ff02::1: ICMP6, router advertisement, length 56

Well, adding -v or -vv would help here, especially to show the RA lifetime.

But more interesting is what happens on the routing socket, as that is
where dhcpcd gets the idea of reachability from.

You can see that with: route monitor

Martin


Re: IPv6 default route flapping

2021-04-20 Thread Jan Schaumann
Martin Husemann  wrote:

> Well, adding -v or -vv would help here, especially to show the RA lifetime.

RA lifetime is 1800s:

IP6 (hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::caa:49ff:feaf:1
815 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 56
hop limit 255, Flags [managed], pref medium, router lifetime 1800s, 
reachable time 0ms, retrans timer 0ms
  source link-address option (1), length 8 (1): 0e:aa:49:af:18:15
0x:  0eaa 49af 1815
  prefix info option (3), length 32 (4): 2600:1f18:400c:b800::/64, 
Flags [onlink], valid time infinity, pref. time infinity
0x:  4080       2600
0x0010:  1f18 400c b800    

> But more interesting is what happens on the routing socket, as that is
> where dhcpcd gets the idea of reachability from.
> 
> You can see that with: route monitor

got message of size 168 on Tue Apr 20 14:21:51 2021
RTM_MISS: Lookup failed on this address: len 168, pid 0, seq 0, errno 0, flags: 
0x40
locks: 0 inits: 0
sockaddrs: 0x3
 fe80::caa:49ff:feaf:1815%xennet0 link#1
got message of size 168 on Tue Apr 20 14:21:54 2021
RTM_MISS: Lookup failed on this address: len 168, pid 0, seq 0, errno 0, flags: 
0x40
locks: 0 inits: 0
sockaddrs: 0x3
 fe80::caa:49ff:feaf:1815%xennet0 link#1
got message of size 200 on Tue Apr 20 14:21:57 2021
RTM_MISS: Lookup failed on this address: len 200, pid 0, seq 0, errno 0, flags: 
0x40
locks: 0 inits: 0
sockaddrs: 0x43
 fe80::caa:49ff:feaf:1815%xennet0 link#1
2600:1f18:400c:b800:bc3c:63cc:7e5d:1f96
got message of size 168 on Tue Apr 20 14:22:03 2021
RTM_CHANGE: Change Metrics, Flags or Gateway: len 168, pid 0, seq 0, errno 0, 
flags: 0x2445
locks: 0 inits: 0
sockaddrs: 0x3
 fe80::caa:49ff:feaf:1815%xennet0 0e:aa:49:af:18:15
got message of size 168 on Tue Apr 20 14:22:11 2021
RTM_MISS: Lookup failed on this address: len 168, pid 0, seq 0, errno 0, flags: 
0x40
locks: 0 inits: 0
sockaddrs: 0x3
 fe80::caa:49ff:feaf:1815%xennet0 link#1
got message of size 168 on Tue Apr 20 14:22:14 2021
RTM_MISS: Lookup failed on this address: len 168, pid 0, seq 0, errno 0, flags: 
0x40
locks: 0 inits: 0
sockaddrs: 0x3
 fe80::caa:49ff:feaf:1815%xennet0 link#1
got message of size 200 on Tue Apr 20 14:22:17 2021
RTM_MISS: Lookup failed on this address: len 200, pid 0, seq 0, errno 0, flags: 
0x40
locks: 0 inits: 0
sockaddrs: 0x43
 fe80::caa:49ff:feaf:1815%xennet0 link#1
2600:1f18:400c:b800:bc3c:63cc:7e5d:1f96
got message of size 168 on Tue Apr 20 14:22:23 2021
RTM_CHANGE: Change Metrics, Flags or Gateway: len 168, pid 0, seq 0, errno 0, 
flags:
0x2445
locks: 0 inits: 0
sockaddrs: 0x3
 fe80::caa:49ff:feaf:1815%xennet0 0e:aa:49:af:18:15

and so on.

-Jan


Re: IPv6 default route flapping

2021-04-20 Thread Martin Husemann
On Tue, Apr 20, 2021 at 10:30:23AM -0400, Jan Schaumann wrote:
> RTM_MISS: Lookup failed on this address: len 200, pid 0, seq 0, errno 0, 
> flags: 0x40
> locks: 0 inits: 0
> sockaddrs: 0x43
>  fe80::caa:49ff:feaf:1815%xennet0 link#1
> 2600:1f18:400c:b800:bc3c:63cc:7e5d:1f96

This must be triggering the "unreachable"...

> got message of size 168 on Tue Apr 20 14:22:03 2021
> RTM_CHANGE: Change Metrics, Flags or Gateway: len 168, pid 0, seq 0, errno 0, 
> flags: 0x2445
> locks: 0 inits: 0
> sockaddrs: 0x3
>  fe80::caa:49ff:feaf:1815%xennet0 0e:aa:49:af:18:15

... and this should make it reachable again.

I don't understand the RTM_MISS with GATEWAY flag message at all.

Martin


Re: IPv6 default route flapping

2021-04-20 Thread Joerg Sonnenberger
On Tue, Apr 20, 2021 at 10:09:47AM -0400, Jan Schaumann wrote:
> Greg Troxel  wrote:
> > 
> > Jan Schaumann  writes:
> > 
> > > Apr 20 01:32:32 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> > > Apr 20 01:32:34 netbsd dhcpcd[17397]: xennet0: Router Advertisement from 
> > > fe80::caa:49ff:feaf:1815
> > > Apr 20 01:32:35 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 
> > > is unreachable
> > > Apr 20 01:32:35 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> > > Apr 20 01:32:44 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 
> > > is reachable again
> > > Apr 20 01:32:52 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 
> > > is unreachable
> > > Apr 20 01:32:52 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> > > Apr 20 01:32:54 netbsd dhcpcd[17397]: xennet0: Router Advertisement from 
> > > fe80::caa:49ff:feaf:1815
> > > Apr 20 01:32:55 netbsd dhcpcd[17397]: xennet0: fe80::caa:49ff:feaf:1815 
> > > is unreachable
> > > Apr 20 01:32:56 netbsd dhcpcd[17397]: xennet0: soliciting an IPv6 router
> > 
> > I don't know what's wrong, but obviously you should 'tcpdump -w IPV6
> > ip6' and look at it.
> 
> That's unfortunately not very revealing, as it merely
> shows the neighbor solicitations and router advertisements:

In the cases we had before, it was always two advertisements short after
each other. The first one declared the router to no longer be active and
the second declared it to be alive again. I forgot which of the timer
fields was the mark for that.

I was wondering about adding a hackish workaround for such misconfigured
routers so that the first event is not processed for a second or two and
ignored if it is cancelled within that window.

Joerg


Re: IPv6 default route flapping

2021-04-20 Thread Greg Troxel

Martin Husemann  writes:

> Well, adding -v or -vv would help here, especially to show the RA lifetime.
>
> But more interesting is what happens on the routing socket, as that is
> where dhcpcd gets the idea of reachability from.

A great suggestion and I 'll add turning on dhcpcd debugging and looking
at all 3.

Besides -v, tcpdump prints timestamps and those were omitted.   Also the
lines were miswrapped making them very hard to read.

It looks like there are neighbor solicitations and I am not seeing the
replies.

Are you really sure you aren't blocking ND replies in a firewall?   Also
add to your time history the packets that were blocked, with timestamps.


signature.asc
Description: PGP signature


Re: IPv6 default route flapping

2021-04-20 Thread Jan Schaumann
Greg Troxel  wrote:

> Besides -v, tcpdump prints timestamps and those were omitted.   Also the
> lines were miswrapped making them very hard to read.

You can grab a pcap file from here:
https://www.netmeister.org/tmp/ipv6.pcap

That's "tcpdump -w ipv6.pcap ip6" while running "ping6
www.yahoo.com".

> It looks like there are neighbor solicitations and I am not seeing the
> replies.
> 
> Are you really sure you aren't blocking ND replies in a firewall?   Also
> add to your time history the packets that were blocked, with timestamps.

The Security Group has no restrictions and allows any
and all traffic.

None of the other OS instances on the same VPC are
seeing this problem.  I've tried FreeBSd, Ubuntu,
CentOS and OmniOS; if it was a network issue, I'd
expect other OS instances to similarly see such
problems.

-Jan


Re: IPv6 default route flapping

2021-04-20 Thread Robert Elz
Date:Tue, 20 Apr 2021 12:54:57 -0400
From:Jan Schaumann 
Message-ID:  <20210420165457.ge5...@netmeister.org>

  | You can grab a pcap file from here:
  | https://www.netmeister.org/tmp/ipv6.pcap

It looks to me as if the problem might be that NetBSD has removed too much
of RA processing from the kernel.

It seems as if what is happening, is that the router is sending RA's with
the source-link addr option, which isn't being added to the neighbour
cache.

Then NetBSD is doing a NS to discover the address it ignored from the RA,
but instead of replying with a ND as would perhaps be expected, the router
is simply sending another RA (containing the relevant addr info, which would
answer the NS if processed).

I'd suggest putting RA processing back into the kernel to avoid this kind
of issue.

I have another reason for wanting that ... I run a reasonably current HEAD
kernel (9.99.80 from mid Feb at the minute, though I can upgrade that as
soon as there's a reason) but a fairly old userland (8.99.12 from late Jan, 
2018)

Upgrading the kernel without upgrading userland is supposed to be supported.


My dhcpcd comes from that vintage (more recent than NetBSD 8 which is
still supported), which is before the RA processing was moved from kernel
to dhcpcd.

Things mostly work, except that if the router fails to deprecate an old
IPv6 addr (which can happen if the router reboots - it doesn't keep state
of that kind of thing) and gets handed a new IPv6 addr from the ISP - it
advertises the new one, and says nothing about the old.   In my system
currently nothing is aging out that old v6 addr, I need to "ifconfig -alias"
it manually.   Much the same happens if for some other reason (like the
cable being removed) my laptop goes off net at the time the ISP decides I
have been using my old addr long enough, and issues a new one.  When I'm
connected to the net when that happens, everything works, the old addr is
deprecated, and then removed (so the router is clearly sending the lifetime 0
RA for the old prefix) but if I am not on the net, and so miss that packet
(or packet sequence) the old addr remains live, seemingly forever (I've never
managed to be quite patient enough to wait for forever to arrive to test
this theory...)

kre



Re: IPv6 default route flapping

2021-04-20 Thread Joerg Sonnenberger
On Wed, Apr 21, 2021 at 12:54:36AM +0700, Robert Elz wrote:
> It seems as if what is happening, is that the router is sending RA's with
> the source-link addr option, which isn't being added to the neighbour
> cache.
> 
> Then NetBSD is doing a NS to discover the address it ignored from the RA,
> but instead of replying with a ND as would perhaps be expected, the router
> is simply sending another RA (containing the relevant addr info, which would
> answer the NS if processed).

I'm not entirely sure that the behavior of sending a RA as "answer" to a
NS is valid under RFC 4861, but I also don't understand why the existing
code (nd6_rtr_cache) doesn't cover this. That would be a good place to
debugging this.

> I'd suggest putting RA processing back into the kernel to avoid this kind
> of issue.

Except of course that it introduces back all the reasons for why it was
removed in first place and ignores that it shouldn't happen.

> 
> I have another reason for wanting that ... I run a reasonably current HEAD
> kernel (9.99.80 from mid Feb at the minute, though I can upgrade that as
> soon as there's a reason) but a fairly old userland (8.99.12 from late Jan, 
> 2018)
> 
> Upgrading the kernel without upgrading userland is supposed to be supported.

Are you really arguing that non-essential functionality should be kept
in the kernel just because it was once present? That doesn't strike me
as very useful at all...

Joerg


Re: IPv6 default route flapping

2021-04-20 Thread Greg Troxel

Joerg Sonnenberger  writes:

> On Wed, Apr 21, 2021 at 12:54:36AM +0700, Robert Elz wrote:
>> It seems as if what is happening, is that the router is sending RA's with
>> the source-link addr option, which isn't being added to the neighbour
>> cache.
>> 
>> Then NetBSD is doing a NS to discover the address it ignored from the RA,
>> but instead of replying with a ND as would perhaps be expected, the router
>> is simply sending another RA (containing the relevant addr info, which would
>> answer the NS if processed).
>
> I'm not entirely sure that the behavior of sending a RA as "answer" to a
> NS is valid under RFC 4861

Agreed.   Not sending a ND reply to an ND query is obviously highly
irregular, even if it turns out to be allowed.


signature.asc
Description: PGP signature


Re: IPv6 default route flapping

2021-04-20 Thread Robert Elz
Date:Tue, 20 Apr 2021 21:13:15 +0200
From:Joerg Sonnenberger 
Message-ID:  

  | I'm not entirely sure that the behavior of sending a RA as "answer" to a
  | NS is valid under RFC 4861,

Nor am I, but it appears that is what is happening, and that other clients
don't mind (and probably old NetBSD wouldn't either) so may end up being
something we need to support, just because otherwise it all looks like our
fault, even if whatever router is doing that is technically broken.

(That part of this reply also applies to Greg's message).

  | but I also don't understand why the existing
  | code (nd6_rtr_cache) doesn't cover this.

I didn't look yet, but I will look at it when I get a chance.

  | > I'd suggest putting RA processing back into the kernel to avoid this kind
  | > of issue.
  |
  | Except of course that it introduces back all the reasons for why it was
  | removed in first place and ignores that it shouldn't happen.

I'm not sure which "it" you're meaning here (the second one, the first
is obvious, but I doubt that the two are referring to the same thing).
Is the second "it" the RA in response to a NS?

Perhaps that's true, but this might become a de-facto thing to do by
(at least some subset of) the router community - the RA conveys all the
same info an ND would, plus more, potentially reducing traffic, so strictly
valid or not, I can see how someone might decide that's a good approach to
take.

As for "all the reasons for why it was removed in first place" - I never
believed any of those to be valid.  I can believe that some fairly minor
work was required on the implementation, but nothing so difficult as to
require its removal.

Note that I'm not suggesting that processing using dhcpcd might not generally
be the better approach to use, but the kernel ought to be able to cope when
(for whatever reason) that's not running, or not doing its job.

  | Are you really arguing that non-essential functionality should be kept
  | in the kernel just because it was once present? That doesn't strike me
  | as very useful at all...

Of course, in many cases, yes.  There is tons of that already, much of it
controlled by COMPAT_xxx options - and this could be as well if anyone really
believes that an extra kernel option to eliminate this small piece of code
is really worth the effort.   But we have lots of support for ancient code
(we still support all the old sys calls and interfaces that think a time_t
is 32 bits for example, and the last NetBSD that had 32 bit time_t was eons
ago).

When something truly is useless (drivers for hardware no-one has and no-one
would want, even if still available to be obtained for example) then sure,
it can simply be removed, similarly support for protocol stacks that no-one
uses.   But keeping old (particularly versions that are still support, but
really, all) userland running correctly with current kernels should be
considered mandatory.   No exceptions.

kre



Re: IPv6 default route flapping

2021-04-20 Thread Robert Elz
Date:Wed, 21 Apr 2021 03:48:00 +0700
From:Robert Elz 
Message-ID:  <2704.1618951...@jinx.noi.kre.to>


  |   | Except of course that it introduces back all the reasons for why it was
  |   | removed in first place and ignores that it shouldn't happen.
  |
  | I'm not sure which "it" you're meaning here (the second one, the first
  | is obvious, but I doubt that the two are referring to the same thing).
  | Is the second "it" the RA in response to a NS?

Note that I can't count, there were 3 "it"s in that sentence, I missed
the first one so in my sentence replace "first" with "second" and "second"
with "third" ... the (real) first "it" was also clear (my request/desire)
which was replacing the second "it" (kernel RA processing), but the third
remains less certain.

kre



Re: IPv6 default route flapping

2021-04-20 Thread Jan Schaumann
Robert Elz  wrote:
 
> It seems as if what is happening, is that the router is sending RA's with
> the source-link addr option, which isn't being added to the neighbour
> cache.

Yes, it looks like that's what's going on here.

It seems that:

A RS is sent by the node.

The router replies with a RA, including the source
link-layer address option.

The node follows RFC4861:

"If there is no existing Neighbor Cache entry for the
solicitation's sender, the router creates one,
installs the link- layer address and sets its
reachability state to STALE as specified in Section
7.3.3."

So now we have a STALE cache entry.

So when we want to then send a packet, the node
changes the state to DELAY and gives it a 5s
expiration, and sends a NS.

The router appears to either ignore the NS or treat it
as a RS and instead of replying with a NA, sends a RA,
again with the source link-layer address, which
restarts the cycle.

Now on Ubuntu, it looks like the node similarly marks
the link address it learned via the RA into STALE
state, moves it out of STALE into DELAY, but then,
upon receiving the RA, changes it to REACHABLE without
sending a NS.

This appears to be in violation of RFC4861:

"Receipt of other Neighbor Discovery messages, such
as Router Advertisements and Neighbor Advertisement
with the Solicited flag set to zero, MUST NOT be
treated as a reachability confirmation."

(The RAs do _not_ have the Solicited flag set.)

So if this is correct, then it looks like (a) the
router is misbehaving (it should send a NA when we so
politely ask), and (b) at least Ubuntu is wrong in
accepting an unsolicited RA as a reachability
confirmation.

Now the really strange thing then is that on FreeBSD,
I notice that after the RA, it sends out a NS, and it
receives a NA from the router!  I can't make sense of
this.

Here are the three pcaps:
http://www.netmeister.org/tmp/ubuntu.pcap
http://www.netmeister.org/tmp/freebsd.pcap
http://www.netmeister.org/tmp/netbsd.pcap

All three on the same VPC talking to the same router.

Btw, if you want to replicate the setup and have an
AWS account, you can use ami-0018b2d98332ba7e3 (in
us-east-1), which is the AMI I'm using here.

-Jan


Re: IPv6 default route flapping

2021-04-20 Thread Jan Schaumann
Jan Schaumann  wrote:

> Btw, if you want to replicate the setup and have an
> AWS account, you can use ami-0018b2d98332ba7e3 (in
> us-east-1), which is the AMI I'm using here.

Oh, and if anybody here wants to access the actual
system, send me an ssh pubkey, and I can add you to an
instance that exhibits this problem.

-Jan


Re: IPv6 default route flapping

2021-04-21 Thread John Nemeth
On Apr 20, 21:13, Joerg Sonnenberger wrote:
} On Wed, Apr 21, 2021 at 12:54:36AM +0700, Robert Elz wrote:
} > It seems as if what is happening, is that the router is sending RA's with
} > the source-link addr option, which isn't being added to the neighbour
} > cache.
} > 
} > Then NetBSD is doing a NS to discover the address it ignored from the RA,
} > but instead of replying with a ND as would perhaps be expected, the router
} > is simply sending another RA (containing the relevant addr info, which would
} > answer the NS if processed).
} 
} I'm not entirely sure that the behavior of sending a RA as "answer" to a
} NS is valid under RFC 4861, but I also don't understand why the existing
} code (nd6_rtr_cache) doesn't cover this. That would be a good place to
} debugging this.
} 
} > I'd suggest putting RA processing back into the kernel to avoid this kind
} > of issue.
} 
} Except of course that it introduces back all the reasons for why it was
} removed in first place and ignores that it shouldn't happen.

 RA processing is fundamental to the operation of IPv6.  Removing
it was extremely stupid and not well thought out.  The decision
should be reverted.  I didn't have a problem with a sysctl to
disable it, but I have a huge problem with removing it completely.

} > I have another reason for wanting that ... I run a reasonably current HEAD
} > kernel (9.99.80 from mid Feb at the minute, though I can upgrade that as
} > soon as there's a reason) but a fairly old userland (8.99.12 from late Jan, 
} > 2018)
} > 
} > Upgrading the kernel without upgrading userland is supposed to be supported.
} 
} Are you really arguing that non-essential functionality should be kept

 It is essential functionality.  Arguing otherwise is just silly.

} in the kernel just because it was once present? That doesn't strike me
} as very useful at all...

 Also, we should not be forced to use one particular application
to get basic IPv6 functionality.  dhcpcd started out as a simple
DHCP client and then grew into being a full blown network manager
as well as encompasing half the IPv6 stack.

 If nothing else, consider the case of trying to rescue a broken
system.  You shouldn't have to run an over complicated program just
to enable basic network functionality (the day will come when you
will be on a network that doesn't run IPv4, or have useful services
on IPv4).

}-- End of excerpt from Joerg Sonnenberger


Re: IPv6 default route flapping

2021-04-21 Thread Joerg Sonnenberger
On Wed, Apr 21, 2021 at 02:21:51AM -0700, John Nemeth wrote:
> On Apr 20, 21:13, Joerg Sonnenberger wrote:
> } On Wed, Apr 21, 2021 at 12:54:36AM +0700, Robert Elz wrote:
> } > It seems as if what is happening, is that the router is sending RA's with
> } > the source-link addr option, which isn't being added to the neighbour
> } > cache.
> } > 
> } > Then NetBSD is doing a NS to discover the address it ignored from the RA,
> } > but instead of replying with a ND as would perhaps be expected, the router
> } > is simply sending another RA (containing the relevant addr info, which 
> would
> } > answer the NS if processed).
> } 
> } I'm not entirely sure that the behavior of sending a RA as "answer" to a
> } NS is valid under RFC 4861, but I also don't understand why the existing
> } code (nd6_rtr_cache) doesn't cover this. That would be a good place to
> } debugging this.
> } 
> } > I'd suggest putting RA processing back into the kernel to avoid this kind
> } > of issue.
> } 
> } Except of course that it introduces back all the reasons for why it was
> } removed in first place and ignores that it shouldn't happen.
> 
>  RA processing is fundamental to the operation of IPv6.  Removing
> it was extremely stupid and not well thought out.  The decision
> should be reverted.  I didn't have a problem with a sysctl to
> disable it, but I have a huge problem with removing it completely.

RA processing is not time-sensitive. It doesn't require any special
interfaces. A correct and useful implementation requires non-trivial
policy decisions. There are a lot of reasons for why it should not be
part of the kernel and very few of why it should be. In fact, other
systems never implemented it in the kernel or have similary moved it for
pretty much the same reason.

> } > I have another reason for wanting that ... I run a reasonably current HEAD
> } > kernel (9.99.80 from mid Feb at the minute, though I can upgrade that as
> } > soon as there's a reason) but a fairly old userland (8.99.12 from late 
> Jan, 
> } > 2018)
> } > 
> } > Upgrading the kernel without upgrading userland is supposed to be 
> supported.
> } 
> } Are you really arguing that non-essential functionality should be kept
> 
>  It is essential functionality.  Arguing otherwise is just silly.

It is core functionality. It is not essential. It is not necessary for
link-local communication or when using static configuration. Again, that
doesn't say anything about whether it *has* to be in the kernel.

> } in the kernel just because it was once present? That doesn't strike me
> } as very useful at all...
> 
>  Also, we should not be forced to use one particular application
> to get basic IPv6 functionality.  dhcpcd started out as a simple
> DHCP client and then grew into being a full blown network manager
> as well as encompasing half the IPv6 stack.

Maybe because the problem domain is larger than "a DHCP(v4) client" and
IPv4 and IPv6 functionality are often coupled? Maybe because splitting
it artifically would result in duplicatation of code for little benefit?
Let's just ignore the silly exagerration...

>  If nothing else, consider the case of trying to rescue a broken
> system.  You shouldn't have to run an over complicated program just
> to enable basic network functionality (the day will come when you
> will be on a network that doesn't run IPv4, or have useful services
> on IPv4).

You can configure the local network by hand via ifconfig or route, just
like you would for IPv4. Or you know, use the same tool that you would
use for DHCP. 

Joerg


Re: IPv6 default route flapping

2021-04-21 Thread Robert Swindells


Joerg Sonnenberger  wrote:
>On Wed, Apr 21, 2021 at 02:21:51AM -0700, John Nemeth wrote:
>> On Apr 20, 21:13, Joerg Sonnenberger wrote:
>> } On Wed, Apr 21, 2021 at 12:54:36AM +0700, Robert Elz wrote:
>> } > It seems as if what is happening, is that the router is sending RA's with
>> } > the source-link addr option, which isn't being added to the neighbour
>> } > cache.
>> } > 
>> } > Then NetBSD is doing a NS to discover the address it ignored from the RA,
>> } > but instead of replying with a ND as would perhaps be expected, the 
>> router
>> } > is simply sending another RA (containing the relevant addr info, which 
>> would
>> } > answer the NS if processed).
>> } 
>> } I'm not entirely sure that the behavior of sending a RA as "answer" to a
>> } NS is valid under RFC 4861, but I also don't understand why the existing
>> } code (nd6_rtr_cache) doesn't cover this. That would be a good place to
>> } debugging this.
>> } 
>> } > I'd suggest putting RA processing back into the kernel to avoid this kind
>> } > of issue.
>> } 
>> } Except of course that it introduces back all the reasons for why it was
>> } removed in first place and ignores that it shouldn't happen.
>> 
>>  RA processing is fundamental to the operation of IPv6.  Removing
>> it was extremely stupid and not well thought out.  The decision
>> should be reverted.  I didn't have a problem with a sysctl to
>> disable it, but I have a huge problem with removing it completely.
>
>RA processing is not time-sensitive. It doesn't require any special
>interfaces. A correct and useful implementation requires non-trivial
>policy decisions. There are a lot of reasons for why it should not be
>part of the kernel and very few of why it should be. In fact, other
>systems never implemented it in the kernel or have similary moved it for
>pretty much the same reason.

One subsystem that made use of RA processing being in the kernel is
MobileIPv6, it used notifications from the kernel to work out that a
node has moved, I did get a suggestion from roy@ on an alternative way
to do this but haven't implemented it yet.

Think we need to make dhcpcd work with rump if it is the only way to do
RA processing.




Re: IPv6 default route flapping

2021-04-21 Thread Paul Goyette

On Wed, 21 Apr 2021, Robert Swindells wrote:


Think we need to make dhcpcd work with rump if it is the only way to
do RA processing.


And create an appropriate set of rump-based atf tests to detect future
regressions.


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+


Re: IPv6 default route flapping

2021-04-21 Thread Jan Schaumann
Jan Schaumann  wrote:

> So if this is correct, then it looks like (a) the
> router is misbehaving (it should send a NA when we so
> politely ask)

Joerg found the culprit:

It turns out that apparently the router will only
reply with NAs to a NS that originates from a
predicted IPv6 address.  That is, it seems to be
configured to only reply to an SLAAC address derived
from the hardware address (that it knows), not from
the DUID derived SLAAC address.

Changing 'slaac private' to 'slaac hwaddr' in
/etc/dhcpcd.conf did the trick -- now the router
replies with NAs and my IPv6 connection is stable.

Thanks everybody helping me debug this!

-Jan