bgpd and indirect nexthops
Hi! I have ibgp setup where I receive number of routes with indirect nexthops. "nexthop qualify via bgp" is configured. I'm using snapshot from 31.11.2016 and current bgpd sources. When I start bgpd it usually comes up with either some or none of the indirect nexthops resolved: # bgpctl show nex Flags: * = nexthop valid Nexthop Route Prio Gateway Iface 10.150.0.10 * 10.150.0.12 10.150.0.0/2348 10.150.8.1 <~50 nexthops some working some not removed> * 10.150.8.1 10.150.8.0/24 4 connected vio0 (UP, active) "bgpctl reload" always fixes the problem and all nexthops are resolved to 10.150.0.0/23 -> 10.150.8.1 I have tried to find a way to fix it properly but so far I have managed to get it working with following definitely wrong patch. There seems to be race somewhere between nexthop insertion and validation. Any idea where should I look or what to check? vio0: flags=8843 mtu 1500 lladdr 55:54:00:0b:39:f1 index 1 priority 0 llprio 3 groups: core egress media: Ethernet autoselect status: active inet 10.150.8.5 netmask 0xff00 broadcast 10.150.8.255 config: AS 65001 router-id 10.150.8.5 nexthop qualify via bgp network inet connected neighbor 10.150.8.1 { remote-as 65001 } match from any prefix 0.0.0.0/0 prefixlen >= 1 set rtlabel INT match to any set localpref 90 Index: kroute.c === RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v retrieving revision 1.210 diff -u -p -r1.210 kroute.c --- kroute.c5 Oct 2016 07:38:06 - 1.210 +++ kroute.c3 Nov 2016 19:15:49 - @@ -897,6 +897,8 @@ kr_nexthop_add(u_int rtableid, struct bg return (-1); } + RB_FOREACH(h, knexthop_tree, KT2KNT(kt)) + knexthop_validate(kt, h); return (0); }
Re: SOCKET_LOCK looking for testers
On Thu, Nov 03, 2016 at 11:21:52AM +0100, Martin Pieuchot wrote: > Here's the next iteration of my diff introducing a rwlock to serialize > the network input path with socket paths. Changes are: This crashes with very similar symptoms to the ones I reported to you when I tested the netlock patch. It panics as soon as I scan the box with Nessus (running on another host). (This is hand-typed, since the loop doesn't end (ddb.panic=0 since the USB keyboard doesn't work in ddb).) [...] End of stack trace. panic: rw_enter: socketlock locking against myself Starting stack trace... panic() at panic+0x10b rw_enter() at rw_enter+0x1f8 if_downall() at if_downall+0x26 boot() at boot+0xe4 reboot() at reboot+0x30 panic() at panic+0x106 rw_enter() at rw_enter+0x1f8 if_downall() at if_downall+0x26 boot() at boot+0xe4 reboot() at reboot+0x30 panic() at panic+0x106 rw_enter() at rw_enter+0x1f8 if_downall() at if_downall+0x26 boot() at boot+0xe4 reboot() at reboot+0x30 panic() at panic+0x106 rw_enter() at rw_enter+0x1f8 if_downall() at if_downall+0x26 boot() at boot+0xe4 reboot() at reboot+0x30 panic() at panic+0x106 [...] panic() at panic+0x106 rw_enter() at rw_enter+0x1f8 soclose() at soclose+0x1e soo_close() at soo_close+0x1c fdrop() at fdrop+0x2c closef() at closef+0xcb doaccept() at doaccept+0x416 syscall() at syscall+0x27b --- syscall (number 30) --- end of kernel end trace frame: 0x4, count: 154 0x242f17f09aa: End of stack trace. panic: rw_enter: socketlock locking against myself [...] dmesg: OpenBSD 6.0-current (GENERIC.MP) #3: Thu Nov 3 11:40:49 CET 2016 bu...@build.tako.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4272250880 (4074MB) avail mem = 4138213376 (3946MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xbfec5000 (44 entries) bios0: vendor Apple Inc. version "IM81.88Z.00C1.B00.0802091538" date 02/09/08 bios0: Apple Inc. iMac8,1 acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP HPET APIC MCFG ASF! SBST ECDT SSDT SSDT SSDT SSDT acpi0: wakeup devices FRWR(S3) ARPT(S3) GIGE(S3) UHC1(S3) UHC2(S3) UHC3(S3) UHC4(S3) UHC5(S3) EHC1(S3) EHC2(S3) EC__(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM)2 Duo CPU E8135 @ 2.40GHz, 2392.61 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,NXE,LONG,LAHF,PERF,SENSOR cpu0: 6MB 64b/line 16-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 265MHz cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM)2 Duo CPU E8135 @ 2.40GHz, 2392.26 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,NXE,LONG,LAHF,PERF,SENSOR cpu1: 6MB 64b/line 16-way L2 cache cpu1: smt 0, core 1, package 0 ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 24 pins acpimcfg0 at acpi0 addr 0xf000, bus 0-255 acpiec0 at acpi0 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (PEGP) acpiprt2 at acpi0: bus 3 (RP04) acpiprt3 at acpi0: bus 4 (RP05) acpiprt4 at acpi0: bus 5 (RP06) acpicpu0 at acpi0: !C3(100@57 mwait.3@0x31), !C2(500@1 mwait@0x10), C1(1000@1 mwait), PSS acpicpu1 at acpi0: !C3(100@57 mwait.3@0x31), !C2(500@1 mwait@0x10), C1(1000@1 mwait), PSS "APP0002" at acpi0 not configured acpibtn0 at acpi0: PWRB acpibtn1 at acpi0: SLPB "APP0001" at acpi0 not configured acpivideo0 at acpi0: GFX0 cpu0: Enhanced SpeedStep 2392 MHz: speeds: 2400, 2133, 1867, 1600, 800 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel GM965 Host" rev 0x03 ppb0 at pci0 dev 1 function 0 "Intel GM965 PCIE" rev 0x03: msi pci1 at ppb0 bus 1 radeondrm0 at pci1 dev 0 function 0 "ATI Mobility Radeon HD 2400 XT" rev 0x00 drm0 at radeondrm0 radeondrm0: msi uhci0 at pci0 dev 26 function 0 "Intel 82801H USB" rev 0x04: apic 1 int 20 uhci1 at pci0 dev 26 function 1 "Intel 82801H USB" rev 0x04: apic 1 int 16 ehci0 at pci0 dev 26 function 7 "Intel 82801H USB" rev 0x04: apic 1 int 21 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 azalia0 at pci0 dev 27 function 0 "Intel 82801H HD Audio" rev 0x04: msi azalia0: codecs: Realtek ALC885 audio0 at azalia0 ppb1 at pci0 dev 28 function 0 "Intel 82801H PCIE" rev 0x04: msi pci2 at ppb1 bus 2 ppb2 at pci0 dev 28 function 3 "Intel 82801H PCIE" rev 0x04: msi pci3 at ppb2 bus 3 "AT&T/Lucent FW643 1394" rev 0x06 at pci3 dev 0 function 0 not configured ppb3 at pci0 dev 28 function 4 "Intel 82801H PCIE" rev 0x04: msi pci4 at ppb3 bus 4 "Br
Re: simplify vxlan_lookup return value handling
On Wed, Oct 26, 2016 at 17:40 +0200, Mike Belopuhov wrote: > On Tue, Oct 25, 2016 at 21:22 +0200, Mike Belopuhov wrote: > > After my previous commit to the vxlan(4) driver it can > > no longer return -1 making this code path obsolete. > > OK to remove it? > > Reyk has pointed out that the error assignment can be omitted. > OK? > Still looking for OKs... > diff --git sys/netinet/udp_usrreq.c sys/netinet/udp_usrreq.c > index 60a7bdf..7ecbb9b 100644 > --- sys/netinet/udp_usrreq.c > +++ sys/netinet/udp_usrreq.c > @@ -388,17 +388,12 @@ udp_input(struct mbuf *m, ...) > #if NVXLAN > 0 > if (vxlan_enable > 0 && > #if NPF > 0 > !(m->m_pkthdr.pf.flags & PF_TAG_DIVERTED) && > #endif > - (error = vxlan_lookup(m, uh, iphlen, &srcsa.sa, &dstsa.sa)) != 0) { > - if (error == -1) { > - udpstat.udps_hdrops++; > - m_freem(m); > - } > + vxlan_lookup(m, uh, iphlen, &srcsa.sa, &dstsa.sa) != 0) > return; > - } > #endif > > if (m->m_flags & (M_BCAST|M_MCAST)) { > struct inpcb *last; > /*
SOCKET_LOCK looking for testers
Here's the next iteration of my diff introducing a rwlock to serialize the network input path with socket paths. Changes are: - more timeout_set_proc() that should fix problems reported by Chris Jackman. - I introduced a set of macro to make it easier to audit existing splsoftnet(). - It makes use of splassert_fail() if the lock is not held. My plan is to commit it, assuming it is stable enough, then fix the remaining issues in tree. This includes: - Analyze and if needed fix the two code paths were we do an unlock/lock dance - Remove unneeded/recursive splsoftnet() dances. Once that's done we should be able to remove the KERNEL_LOCK() from the input path. So please test and report back. diff --git sys/kern/sys_socket.c sys/kern/sys_socket.c index 7a90f78..a7be8a1 100644 --- sys/kern/sys_socket.c +++ sys/kern/sys_socket.c @@ -133,7 +133,7 @@ soo_poll(struct file *fp, int events, struct proc *p) int revents = 0; int s; - s = splsoftnet(); + SOCKET_LOCK(s); if (events & (POLLIN | POLLRDNORM)) { if (soreadable(so)) revents |= events & (POLLIN | POLLRDNORM); @@ -159,7 +159,7 @@ soo_poll(struct file *fp, int events, struct proc *p) so->so_snd.sb_flagsintr |= SB_SEL; } } - splx(s); + SOCKET_UNLOCK(s); return (revents); } diff --git sys/kern/uipc_socket.c sys/kern/uipc_socket.c index 9e8d05f..dd067b3 100644 --- sys/kern/uipc_socket.c +++ sys/kern/uipc_socket.c @@ -89,6 +89,11 @@ struct pool sosplice_pool; struct taskq *sosplice_taskq; #endif +/* + * Serialize socket operations. + */ +struct rwlock socketlock = RWLOCK_INITIALIZER("socketlock"); + void soinit(void) { @@ -123,7 +128,7 @@ socreate(int dom, struct socket **aso, int type, int proto) return (EPROTONOSUPPORT); if (prp->pr_type != type) return (EPROTOTYPE); - s = splsoftnet(); + SOCKET_LOCK(s); so = pool_get(&socket_pool, PR_WAITOK | PR_ZERO); TAILQ_INIT(&so->so_q0); TAILQ_INIT(&so->so_q); @@ -141,10 +146,10 @@ socreate(int dom, struct socket **aso, int type, int proto) if (error) { so->so_state |= SS_NOFDREF; sofree(so); - splx(s); + SOCKET_UNLOCK(s); return (error); } - splx(s); + SOCKET_UNLOCK(s); *aso = so; return (0); } @@ -154,9 +159,9 @@ sobind(struct socket *so, struct mbuf *nam, struct proc *p) { int s, error; - s = splsoftnet(); + SOCKET_LOCK(s); error = (*so->so_proto->pr_usrreq)(so, PRU_BIND, NULL, nam, NULL, p); - splx(s); + SOCKET_UNLOCK(s); return (error); } @@ -171,11 +176,11 @@ solisten(struct socket *so, int backlog) if (isspliced(so) || issplicedback(so)) return (EOPNOTSUPP); #endif /* SOCKET_SPLICE */ - s = splsoftnet(); + SOCKET_LOCK(s); error = (*so->so_proto->pr_usrreq)(so, PRU_LISTEN, NULL, NULL, NULL, curproc); if (error) { - splx(s); + SOCKET_UNLOCK(s); return (error); } if (TAILQ_FIRST(&so->so_q) == NULL) @@ -185,14 +190,14 @@ solisten(struct socket *so, int backlog) if (backlog < sominconn) backlog = sominconn; so->so_qlimit = backlog; - splx(s); + SOCKET_UNLOCK(s); return (0); } void sofree(struct socket *so) { - splsoftassert(IPL_SOFTNET); + SOCKET_ASSERT_LOCKED(); if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0) return; @@ -232,7 +237,7 @@ soclose(struct socket *so) struct socket *so2; int s, error = 0; - s = splsoftnet(); + SOCKET_LOCK(s); if (so->so_options & SO_ACCEPTCONN) { while ((so2 = TAILQ_FIRST(&so->so_q0)) != NULL) { (void) soqremque(so2, 0); @@ -256,7 +261,7 @@ soclose(struct socket *so) (so->so_state & SS_NBIO)) goto drop; while (so->so_state & SS_ISCONNECTED) { - error = tsleep(&so->so_timeo, + error = rwsleep(&so->so_timeo, &socketlock, PSOCK | PCATCH, "netcls", so->so_linger * hz); if (error) @@ -276,14 +281,14 @@ discard: panic("soclose NOFDREF: so %p, so_type %d", so, so->so_type); so->so_state |= SS_NOFDREF; sofree(so); - splx(s); + SOCKET_UNLOCK(s); return (error); } int soabort(struct socket *so) { - splsoftassert(IPL_SOFTNET); + SOCKET_ASSERT_LOCKED(); return (*so->so_proto->pr_usrreq)(so, PRU_ABORT, NULL, NULL, NULL, curproc); @@ -294,7 +299
Re: Problems with rdomain and net/if.c v1.455
On Wed, Nov 02, 2016 at 05:44:14PM +0100, Martin Pieuchot wrote: > On 28/10/16(Fri) 16:27, Claudio Jeker wrote: > > On Fri, Oct 28, 2016 at 04:19:35PM +0200, Nils Frohberg wrote: > > > I currently cannot access the local IP of an interface on rdomain 1: > > > > > > Script started on Fri Oct 28 15:02:20 2016 > > > $ doas pfctl -d > > > pfctl: pf not enabled > > > $ doas ifconfig vether0 > > > vether0: no such interface > > > $ doas ifconfig vether0 rdomain 1 > > > $ doas ifconfig vether0 inet 192.168.42.2 > > > $ doas ifconfig vether0 > > > vether0: flags=8843 rdomain 1 mtu > > > 1500 > > > lladdr fe:e1:ba:d1:73:55 > > > index 7 priority 0 llprio 3 > > > groups: vether > > > media: Ethernet autoselect > > > status: active > > > inet 192.168.42.2 netmask 0xff00 broadcast 192.168.42.255 > > > $ route -T1 -n show -inet > > > Routing tables > > > > > > Internet: > > > DestinationGatewayFlags Refs Use Mtu Prio > > > Iface > > > 192.168.42/24 192.168.42.2 UCn00 - 4 > > > vether0 > > > 192.168.42.2 fe:e1:ba:d1:73:55 UHLl 00 - 1 > > > vether0 > > > 192.168.42.255 192.168.42.2 UHb00 - 1 > > > vether0 > > > $ ping -V1 192.168.42.2 > > > PING 192.168.42.2 (192.168.42.2): 56 data bytes > > > ^C > > > --- 192.168.42.2 ping statistics --- > > > 4 packets transmitted, 0 packets received, 100.0% packet loss > > > $ ^D > > > > > > Script done on Fri Oct 28 15:04:16 2016 > > > > > > Connecting to other boxes via routes on rdomain 1 works as usual > > > (not shown in the script output above). > > > > > > I tracked the problem to being caused by v1.455 of if.c. It seems > > > that ifp->if_rdomain == 0 when running the ping above. In this case, > > > m->m_pkthdr.ph_rtableid should be set back to 1 after the reset > > > though (as it was before m_resethdr(m)). Shouldn't ifp->if_rdomain > > > be equal to 1 in this case? If yes, there must be a bug earlier in > > > the code. > > > > > > The following diff fixes things for me, but I don't know if > > > ifp->if_rdomain isn't the one that needs to be fixed: > > > > Can you check the ifp->if_index. I bet it is pointing to the loopback > > interface lo0 and so you end up with the wrong rdomain id. > > This is a nasty consequence of using lo0 accross domains. > > Diff below should fix that by automagically creating a loopback > interface when a new routing domain is created. That mean loX will > now correspond to routing domain X. Thanks (to all) for looking into this. Your patch works for me: Script started on Thu Nov 3 08:30:47 2016 $ doas pfctl -d pfctl: pf not enabled $ ifconfig vether0 vether0: no such interface $ ifconfig lo1 lo1: no such interface $ doas ifconfig vether0 rdomain 1 $ ifconfig lo1 lo1: flags=8008 rdomain 1 mtu 32768 index 7 priority 0 llprio 3 groups: lo $ doas ifconfig vether0 inet 192.168.42.2 $ ifconfig vether0 vether0: flags=8843 rdomain 1 mtu 1500 lladdr fe:e1:ba:d0:20:89 index 6 priority 0 llprio 3 groups: vether media: Ethernet autoselect status: active inet 192.168.42.2 netmask 0xff00 broadcast 192.168.42.255 $ route -T1 -n show -inet Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface 192.168.42/24 192.168.42.2 UCn00 - 4 vether0 192.168.42.2 fe:e1:ba:d0:20:89 UHLl 00 - 1 vether0 192.168.42.255 192.168.42.2 UHb00 - 1 vether0 $ ping -V1 192.168.42.2 PING 192.168.42.2 (192.168.42.2): 56 data bytes 64 bytes from 192.168.42.2: icmp_seq=0 ttl=255 time=0.095 ms 64 bytes from 192.168.42.2: icmp_seq=1 ttl=255 time=0.087 ms 64 bytes from 192.168.42.2: icmp_seq=2 ttl=255 time=0.026 ms 64 bytes from 192.168.42.2: icmp_seq=3 ttl=255 time=0.069 ms ^C --- 192.168.42.2 ping statistics --- 4 packets transmitted, 4 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 0.026/0.069/0.095/0.027 ms $ ^D Script done on Thu Nov 3 08:34:03 2016 TCP connections also work (tested with a little nc -l/nc dance). > > Not that if you are currently using a lo2 and a rdomain 2 this diff > will break your rdomain setup. > > I'll write the manual page for rtable_loindex(9) if you're ok with > the approach. > > Index: kern/init_main.c > === > RCS file: /cvs/src/sys/kern/init_main.c,v > retrieving revision 1.261 > diff -u -p -r1.261 init_main.c > --- kern/init_main.c 24 Oct 2016 04:38:44 - 1.261 > +++ kern/init_main.c 2 Nov 2016 16:05:43 - > @@ -388,6 +388,9 @@ main(void *framep) > msginit(); > #endif > > + /* Create default routing table before attaching lo0. */ > + rtable_init(); > + > /* Attach ps