bgpd and indirect nexthops

2016-11-03 Thread Rivo Nurges
Hi!

I have ibgp setup where I receive number of routes with indirect nexthops.
"nexthop qualify via bgp" is configured. I'm using snapshot from 31.11.2016 and 
current bgpd sources.

When I start bgpd it usually comes up with either some or none of the indirect 
nexthops resolved:
# bgpctl show nex
Flags: * = nexthop valid

  Nexthop Route  Prio Gateway Iface
  10.150.0.10
* 10.150.0.12 10.150.0.0/2348 10.150.8.1
<~50 nexthops some working some not removed>
* 10.150.8.1  10.150.8.0/24 4 connected   vio0 (UP, active)

"bgpctl reload" always fixes the problem and all nexthops are resolved to 
10.150.0.0/23 -> 10.150.8.1

I have tried to find a way to fix it properly but so far I have managed to get 
it working with following definitely wrong patch.
There seems to be race somewhere between nexthop insertion and validation.
Any idea where should I look or what to check?

vio0: flags=8843 mtu 1500
lladdr 55:54:00:0b:39:f1
index 1 priority 0 llprio 3
groups: core egress
media: Ethernet autoselect
status: active
inet 10.150.8.5 netmask 0xff00 broadcast 10.150.8.255

config:
AS 65001
router-id 10.150.8.5
nexthop qualify via bgp
network inet connected
neighbor 10.150.8.1 {
  remote-as 65001
}
match from any prefix 0.0.0.0/0 prefixlen >= 1 set rtlabel INT
match to any set localpref 90

Index: kroute.c
===
RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v
retrieving revision 1.210
diff -u -p -r1.210 kroute.c
--- kroute.c5 Oct 2016 07:38:06 -   1.210
+++ kroute.c3 Nov 2016 19:15:49 -
@@ -897,6 +897,8 @@ kr_nexthop_add(u_int rtableid, struct bg
return (-1);
}

+   RB_FOREACH(h, knexthop_tree, KT2KNT(kt))
+   knexthop_validate(kt, h);
return (0);
}



Re: SOCKET_LOCK looking for testers

2016-11-03 Thread Nils Frohberg
On Thu, Nov 03, 2016 at 11:21:52AM +0100, Martin Pieuchot wrote:
> Here's the next iteration of my diff introducing a rwlock to serialize
> the network input path with socket paths.  Changes are:

This crashes with very similar symptoms to the ones I reported to
you when I tested the netlock patch. It panics as soon as I scan
the box with Nessus (running on another host).

(This is hand-typed, since the loop doesn't end (ddb.panic=0 since
the USB keyboard doesn't work in ddb).)

[...]
End of stack trace.
panic: rw_enter: socketlock locking against myself
Starting stack trace...
panic() at panic+0x10b
rw_enter() at rw_enter+0x1f8
if_downall() at if_downall+0x26
boot() at boot+0xe4
reboot() at reboot+0x30
panic() at panic+0x106
rw_enter() at rw_enter+0x1f8
if_downall() at if_downall+0x26
boot() at boot+0xe4
reboot() at reboot+0x30
panic() at panic+0x106
rw_enter() at rw_enter+0x1f8
if_downall() at if_downall+0x26
boot() at boot+0xe4
reboot() at reboot+0x30
panic() at panic+0x106
rw_enter() at rw_enter+0x1f8
if_downall() at if_downall+0x26
boot() at boot+0xe4
reboot() at reboot+0x30
panic() at panic+0x106
[...]
panic() at panic+0x106
rw_enter() at rw_enter+0x1f8
soclose() at soclose+0x1e
soo_close() at soo_close+0x1c
fdrop() at fdrop+0x2c
closef() at closef+0xcb
doaccept() at doaccept+0x416
syscall() at syscall+0x27b
--- syscall (number 30) ---
end of kernel
end trace frame: 0x4, count: 154
0x242f17f09aa:
End of stack trace.
panic: rw_enter: socketlock locking against myself
[...]


dmesg:

OpenBSD 6.0-current (GENERIC.MP) #3: Thu Nov  3 11:40:49 CET 2016
bu...@build.tako.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4272250880 (4074MB)
avail mem = 4138213376 (3946MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xbfec5000 (44 entries)
bios0: vendor Apple Inc. version "IM81.88Z.00C1.B00.0802091538" date 02/09/08
bios0: Apple Inc. iMac8,1
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP HPET APIC MCFG ASF! SBST ECDT SSDT SSDT SSDT SSDT
acpi0: wakeup devices FRWR(S3) ARPT(S3) GIGE(S3) UHC1(S3) UHC2(S3) UHC3(S3) 
UHC4(S3) UHC5(S3) EHC1(S3) EHC2(S3) EC__(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Duo CPU E8135 @ 2.40GHz, 2392.61 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,NXE,LONG,LAHF,PERF,SENSOR
cpu0: 6MB 64b/line 16-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 265MHz
cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM)2 Duo CPU E8135 @ 2.40GHz, 2392.26 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,NXE,LONG,LAHF,PERF,SENSOR
cpu1: 6MB 64b/line 16-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xf000, bus 0-255
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (PEGP)
acpiprt2 at acpi0: bus 3 (RP04)
acpiprt3 at acpi0: bus 4 (RP05)
acpiprt4 at acpi0: bus 5 (RP06)
acpicpu0 at acpi0: !C3(100@57 mwait.3@0x31), !C2(500@1 mwait@0x10), C1(1000@1 
mwait), PSS
acpicpu1 at acpi0: !C3(100@57 mwait.3@0x31), !C2(500@1 mwait@0x10), C1(1000@1 
mwait), PSS
"APP0002" at acpi0 not configured
acpibtn0 at acpi0: PWRB
acpibtn1 at acpi0: SLPB
"APP0001" at acpi0 not configured
acpivideo0 at acpi0: GFX0
cpu0: Enhanced SpeedStep 2392 MHz: speeds: 2400, 2133, 1867, 1600, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel GM965 Host" rev 0x03
ppb0 at pci0 dev 1 function 0 "Intel GM965 PCIE" rev 0x03: msi
pci1 at ppb0 bus 1
radeondrm0 at pci1 dev 0 function 0 "ATI Mobility Radeon HD 2400 XT" rev 0x00
drm0 at radeondrm0
radeondrm0: msi
uhci0 at pci0 dev 26 function 0 "Intel 82801H USB" rev 0x04: apic 1 int 20
uhci1 at pci0 dev 26 function 1 "Intel 82801H USB" rev 0x04: apic 1 int 16
ehci0 at pci0 dev 26 function 7 "Intel 82801H USB" rev 0x04: apic 1 int 21
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
azalia0 at pci0 dev 27 function 0 "Intel 82801H HD Audio" rev 0x04: msi
azalia0: codecs: Realtek ALC885
audio0 at azalia0
ppb1 at pci0 dev 28 function 0 "Intel 82801H PCIE" rev 0x04: msi
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 3 "Intel 82801H PCIE" rev 0x04: msi
pci3 at ppb2 bus 3
"AT&T/Lucent FW643 1394" rev 0x06 at pci3 dev 0 function 0 not configured
ppb3 at pci0 dev 28 function 4 "Intel 82801H PCIE" rev 0x04: msi
pci4 at ppb3 bus 4
"Br

Re: simplify vxlan_lookup return value handling

2016-11-03 Thread Mike Belopuhov
On Wed, Oct 26, 2016 at 17:40 +0200, Mike Belopuhov wrote:
> On Tue, Oct 25, 2016 at 21:22 +0200, Mike Belopuhov wrote:
> > After my previous commit to the vxlan(4) driver it can
> > no longer return -1 making this code path obsolete.
> > OK to remove it?
> 
> Reyk has pointed out that the error assignment can be omitted.
> OK?
>

Still looking for OKs...

> diff --git sys/netinet/udp_usrreq.c sys/netinet/udp_usrreq.c
> index 60a7bdf..7ecbb9b 100644
> --- sys/netinet/udp_usrreq.c
> +++ sys/netinet/udp_usrreq.c
> @@ -388,17 +388,12 @@ udp_input(struct mbuf *m, ...)
>  #if NVXLAN > 0
>   if (vxlan_enable > 0 &&
>  #if NPF > 0
>   !(m->m_pkthdr.pf.flags & PF_TAG_DIVERTED) &&
>  #endif
> - (error = vxlan_lookup(m, uh, iphlen, &srcsa.sa, &dstsa.sa)) != 0) {
> - if (error == -1) {
> - udpstat.udps_hdrops++;
> - m_freem(m);
> - }
> + vxlan_lookup(m, uh, iphlen, &srcsa.sa, &dstsa.sa) != 0)
>   return;
> - }
>  #endif
>  
>   if (m->m_flags & (M_BCAST|M_MCAST)) {
>   struct inpcb *last;
>   /*



SOCKET_LOCK looking for testers

2016-11-03 Thread Martin Pieuchot
Here's the next iteration of my diff introducing a rwlock to serialize
the network input path with socket paths.  Changes are:

  - more timeout_set_proc() that should fix problems reported by
Chris Jackman.

  - I introduced a set of macro to make it easier to audit existing
splsoftnet().

  - It makes use of splassert_fail() if the lock is not held.


My plan is to commit it, assuming it is stable enough, then fix the
remaining issues in tree.  This includes:

  - Analyze and if needed fix the two code paths were we do an unlock/lock
dance

  - Remove unneeded/recursive splsoftnet() dances.

Once that's done we should be able to remove the KERNEL_LOCK() from the
input path.

So please test and report back.

diff --git sys/kern/sys_socket.c sys/kern/sys_socket.c
index 7a90f78..a7be8a1 100644
--- sys/kern/sys_socket.c
+++ sys/kern/sys_socket.c
@@ -133,7 +133,7 @@ soo_poll(struct file *fp, int events, struct proc *p)
int revents = 0;
int s;
 
-   s = splsoftnet();
+   SOCKET_LOCK(s);
if (events & (POLLIN | POLLRDNORM)) {
if (soreadable(so))
revents |= events & (POLLIN | POLLRDNORM);
@@ -159,7 +159,7 @@ soo_poll(struct file *fp, int events, struct proc *p)
so->so_snd.sb_flagsintr |= SB_SEL;
}
}
-   splx(s);
+   SOCKET_UNLOCK(s);
return (revents);
 }
 
diff --git sys/kern/uipc_socket.c sys/kern/uipc_socket.c
index 9e8d05f..dd067b3 100644
--- sys/kern/uipc_socket.c
+++ sys/kern/uipc_socket.c
@@ -89,6 +89,11 @@ struct pool sosplice_pool;
 struct taskq *sosplice_taskq;
 #endif
 
+/*
+ * Serialize socket operations.
+ */
+struct rwlock socketlock = RWLOCK_INITIALIZER("socketlock");
+
 void
 soinit(void)
 {
@@ -123,7 +128,7 @@ socreate(int dom, struct socket **aso, int type, int proto)
return (EPROTONOSUPPORT);
if (prp->pr_type != type)
return (EPROTOTYPE);
-   s = splsoftnet();
+   SOCKET_LOCK(s);
so = pool_get(&socket_pool, PR_WAITOK | PR_ZERO);
TAILQ_INIT(&so->so_q0);
TAILQ_INIT(&so->so_q);
@@ -141,10 +146,10 @@ socreate(int dom, struct socket **aso, int type, int 
proto)
if (error) {
so->so_state |= SS_NOFDREF;
sofree(so);
-   splx(s);
+   SOCKET_UNLOCK(s);
return (error);
}
-   splx(s);
+   SOCKET_UNLOCK(s);
*aso = so;
return (0);
 }
@@ -154,9 +159,9 @@ sobind(struct socket *so, struct mbuf *nam, struct proc *p)
 {
int s, error;
 
-   s = splsoftnet();
+   SOCKET_LOCK(s);
error = (*so->so_proto->pr_usrreq)(so, PRU_BIND, NULL, nam, NULL, p);
-   splx(s);
+   SOCKET_UNLOCK(s);
return (error);
 }
 
@@ -171,11 +176,11 @@ solisten(struct socket *so, int backlog)
if (isspliced(so) || issplicedback(so))
return (EOPNOTSUPP);
 #endif /* SOCKET_SPLICE */
-   s = splsoftnet();
+   SOCKET_LOCK(s);
error = (*so->so_proto->pr_usrreq)(so, PRU_LISTEN, NULL, NULL, NULL,
curproc);
if (error) {
-   splx(s);
+   SOCKET_UNLOCK(s);
return (error);
}
if (TAILQ_FIRST(&so->so_q) == NULL)
@@ -185,14 +190,14 @@ solisten(struct socket *so, int backlog)
if (backlog < sominconn)
backlog = sominconn;
so->so_qlimit = backlog;
-   splx(s);
+   SOCKET_UNLOCK(s);
return (0);
 }
 
 void
 sofree(struct socket *so)
 {
-   splsoftassert(IPL_SOFTNET);
+   SOCKET_ASSERT_LOCKED();
 
if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0)
return;
@@ -232,7 +237,7 @@ soclose(struct socket *so)
struct socket *so2;
int s, error = 0;
 
-   s = splsoftnet();
+   SOCKET_LOCK(s);
if (so->so_options & SO_ACCEPTCONN) {
while ((so2 = TAILQ_FIRST(&so->so_q0)) != NULL) {
(void) soqremque(so2, 0);
@@ -256,7 +261,7 @@ soclose(struct socket *so)
(so->so_state & SS_NBIO))
goto drop;
while (so->so_state & SS_ISCONNECTED) {
-   error = tsleep(&so->so_timeo,
+   error = rwsleep(&so->so_timeo, &socketlock,
PSOCK | PCATCH, "netcls",
so->so_linger * hz);
if (error)
@@ -276,14 +281,14 @@ discard:
panic("soclose NOFDREF: so %p, so_type %d", so, so->so_type);
so->so_state |= SS_NOFDREF;
sofree(so);
-   splx(s);
+   SOCKET_UNLOCK(s);
return (error);
 }
 
 int
 soabort(struct socket *so)
 {
-   splsoftassert(IPL_SOFTNET);
+   SOCKET_ASSERT_LOCKED();
 
return (*so->so_proto->pr_usrreq)(so, PRU_ABORT, NULL, NULL, NULL,
   curproc);
@@ -294,7 +299

Re: Problems with rdomain and net/if.c v1.455

2016-11-03 Thread Nils Frohberg
On Wed, Nov 02, 2016 at 05:44:14PM +0100, Martin Pieuchot wrote:
> On 28/10/16(Fri) 16:27, Claudio Jeker wrote:
> > On Fri, Oct 28, 2016 at 04:19:35PM +0200, Nils Frohberg wrote:
> > > I currently cannot access the local IP of an interface on rdomain 1:
> > > 
> > > Script started on Fri Oct 28 15:02:20 2016
> > > $ doas pfctl -d
> > > pfctl: pf not enabled  
> > > $ doas ifconfig vether0
> > > vether0: no such interface  
> > > $ doas ifconfig vether0 rdomain 1
> > > $ doas ifconfig vether0 inet 192.168.42.2
> > > $ doas ifconfig vether0
> > > vether0: flags=8843 rdomain 1 mtu 
> > > 1500  
> > > lladdr fe:e1:ba:d1:73:55  
> > > index 7 priority 0 llprio 3  
> > > groups: vether  
> > > media: Ethernet autoselect  
> > > status: active  
> > > inet 192.168.42.2 netmask 0xff00 broadcast 192.168.42.255  
> > > $ route -T1 -n show -inet
> > > Routing tables  
> > > 
> > > Internet:  
> > > DestinationGatewayFlags   Refs  Use   Mtu  Prio 
> > > Iface  
> > > 192.168.42/24  192.168.42.2   UCn00 - 4 
> > > vether0 
> > > 192.168.42.2   fe:e1:ba:d1:73:55  UHLl   00 - 1 
> > > vether0
> > > 192.168.42.255 192.168.42.2   UHb00 - 1 
> > > vether0 
> > > $ ping -V1 192.168.42.2
> > > PING 192.168.42.2 (192.168.42.2): 56 data bytes  
> > > ^C  
> > > --- 192.168.42.2 ping statistics ---  
> > > 4 packets transmitted, 0 packets received, 100.0% packet loss  
> > > $ ^D
> > > 
> > > Script done on Fri Oct 28 15:04:16 2016
> > > 
> > > Connecting to other boxes via routes on rdomain 1 works as usual
> > > (not shown in the script output above).
> > > 
> > > I tracked the problem to being caused by v1.455 of if.c. It seems
> > > that ifp->if_rdomain == 0 when running the ping above. In this case,
> > > m->m_pkthdr.ph_rtableid should be set back to 1 after the reset
> > > though (as it was before m_resethdr(m)). Shouldn't ifp->if_rdomain
> > > be equal to 1 in this case? If yes, there must be a bug earlier in
> > > the code.
> > > 
> > > The following diff fixes things for me, but I don't know if
> > > ifp->if_rdomain isn't the one that needs to be fixed:
> > 
> > Can you check the ifp->if_index. I bet it is pointing to the loopback
> > interface lo0 and so you end up with the wrong rdomain id.
> > This is a nasty consequence of using lo0 accross domains.
> 
> Diff below should fix that by automagically creating a loopback
> interface when a new routing domain is created.  That mean loX will
> now correspond to routing domain X.

Thanks (to all) for looking into this. Your patch works for me:

Script started on Thu Nov  3 08:30:47 2016
$ doas pfctl -d
pfctl: pf not enabled
$ ifconfig vether0
vether0: no such interface
$ ifconfig lo1
lo1: no such interface
$ doas ifconfig vether0 rdomain 1
$ ifconfig lo1
lo1: flags=8008 rdomain 1 mtu 32768
index 7 priority 0 llprio 3
groups: lo
$ doas ifconfig vether0 inet 192.168.42.2
$ ifconfig vether0
vether0: flags=8843 rdomain 1 mtu 1500
lladdr fe:e1:ba:d0:20:89
index 6 priority 0 llprio 3
groups: vether
media: Ethernet autoselect
status: active
inet 192.168.42.2 netmask 0xff00 broadcast 192.168.42.255
$ route -T1 -n show -inet
Routing tables

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
192.168.42/24  192.168.42.2   UCn00 - 4 vether0
192.168.42.2   fe:e1:ba:d0:20:89  UHLl   00 - 1 vether0
192.168.42.255 192.168.42.2   UHb00 - 1 vether0
$ ping -V1 192.168.42.2
PING 192.168.42.2 (192.168.42.2): 56 data bytes
64 bytes from 192.168.42.2: icmp_seq=0 ttl=255 time=0.095 ms
64 bytes from 192.168.42.2: icmp_seq=1 ttl=255 time=0.087 ms
64 bytes from 192.168.42.2: icmp_seq=2 ttl=255 time=0.026 ms
64 bytes from 192.168.42.2: icmp_seq=3 ttl=255 time=0.069 ms
^C
--- 192.168.42.2 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.026/0.069/0.095/0.027 ms
$ ^D

Script done on Thu Nov  3 08:34:03 2016

TCP connections also work (tested with a little nc -l/nc dance).

> 
> Not that if you are currently using a lo2 and a rdomain 2 this diff
> will break your rdomain setup.
> 
> I'll write the manual page for rtable_loindex(9) if you're ok with
> the approach.
> 
> Index: kern/init_main.c
> ===
> RCS file: /cvs/src/sys/kern/init_main.c,v
> retrieving revision 1.261
> diff -u -p -r1.261 init_main.c
> --- kern/init_main.c  24 Oct 2016 04:38:44 -  1.261
> +++ kern/init_main.c  2 Nov 2016 16:05:43 -
> @@ -388,6 +388,9 @@ main(void *framep)
>   msginit();
>  #endif
>  
> + /* Create default routing table before attaching lo0. */
> + rtable_init();
> +
>   /* Attach ps