I'm currently on vacation and can't look into this soon. One thing that comes to mind: do these machines keep proper time or are they having issues with timer interrupts stopping because of too new KVM version and missing hypervisor flag (someone with access to a real computer please chip in with a link to a thread where this has been discussed before and the name of the KVM flag).
I fixed a bug in this area in the summer and you would observe this kind of behavior if timers are not running correctly. Thanks, Florian On October 23, 2018 6:18:19 PM GMT+02:00, "Aaron A. Glenn" <aag@bsd.network> wrote: >Hello, > >AS57335 operates a 49 node anycast instance exclusively running >OpenBSD. All >instances are hosted virtual machines (aka "VPS" instances) and all are >running a recent snapshot (kern.version=OpenBSD 6.4-current >(GENERIC.MP) #381: >Mon Oct 22 22:18:48 MDT 2018). Eleven of these nodes exhibit strange >ndp(8) >behavior causing IPv6 BGP sessions to flap at inconsistent intervals. > >All eleven instances have the following in common: > > vio(4) network interface > netmask of /64 > do not use autoconf > Linux KVM hypervisor hosts (hw.vendor=QEMU & pvbus0 at mainbus0: KVM) > kern.timecounter.hardware=acpihpet0 > v6 gateway is Cisco (based on OUI lookup) > no pf(4) rules > >BGP session traffic is the only regular/recurring v6 traffic on the >nodes. >Running a `ping6 google.com` in the background will occasionally allow >BGP >sessions to stay alive for 6-12 hours (in some cases, one to two days). > >From looking at `ndp -nA 1` output, the gateway address state will >change to >Delay with an expiry of ~45 seconds then set to Stale with an expiry of >24h. >When set Stale with 24h expiry, a link-local address with the gateway >linklayer in it (ex. fe80::e25f:b9ff:fed1:527f%vio0) is added with a >state of >Delay and an expiry of 5 seconds. Once expiry reaches 1 second >remaining, the >link-local entry begins three attempts at Probe, and at the first >attempt the >gateway address expiry goes from 23h59m55s to 5s. After three Probe >attempts, >the link-local entry is removed, and the gateway address expiry goes to >45s or >sometimes a bit less (38s is the lowest I've caught). > >I admit of all the RFCs I've read, NDP is not any of them; nor have I >gone >spelunking in the code base at all (I peeked once; and would need a >buddy to >have a useful look again). > >I am happy to add a pubkey to any and all systems exhibiting this >behavior; >and of course provide any additional detail that might be useful. > >Thanks > >(please cc me as I am not subscribed to this list)