ospfd seq out of order in ls_upd floods

2021-06-05 Thread Stuart Henderson
Sometimes I see authentication errors from ospfd, mainly (though
possibly not entirely always) on a 30 minute cycle, e.g. these log entries

2021-06-03T05:30:04.952Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T05:51:43.785Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T05:51:43.785Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T05:56:03.248Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T05:56:03.248Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T05:59:58.978Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T08:21:43.851Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T08:21:43.851Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T08:26:03.434Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T08:26:03.434Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T08:29:59.087Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T08:30:04.097Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T08:51:43.858Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T08:51:43.858Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T08:56:03.470Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T08:56:03.470Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T09:00:01.033Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T09:00:06.043Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T09:21:43.876Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T09:21:43.876Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T09:26:03.518Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T09:26:03.518Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T09:30:00.060Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T09:30:05.070Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T09:51:43.893Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T09:51:43.893Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T09:56:03.555Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T09:56:03.555Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T09:59:59.086Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T11:00:05.088Z  ospfd[31748]: spf_calc: area 0.0.0.0 calculated
2021-06-03T11:21:43.932Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T11:21:43.932Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760
2021-06-03T11:26:03.675Z  ospfd[76044]: auth_validate: decreasing seq num, 
interface vlan760
2021-06-03T11:26:03.675Z  ospfd[76044]: recv_packet: authentication error, 
interface vlan760

The cyclical nature suggests it's in the ls_upd floods and looking at
packets confirms that, note the ls_upd where 8715337 and 8715336 are sent
out of order:

11:26:02.246716 xxx.xx.xxx.28 > 224.0.0.5: OSPFv2-hello  52: rtrid 
yyy.yy.yyy.13 backbone auth MD5 key-id 1 seq 8715334 off 0 len 16 E mask 
255.255.255.248 int 1 pri 1 dead 4 dr xxx.xx.xxx.28 bdr xxx.xx.xxx.25 nbrs 
yyy.yy.yyy.11 yyy.yy.yyy.8 [tos 0xc0] [ttl 1] (id 38681, len 88)
11:26:03.249345 xxx.xx.xxx.28 > 224.0.0.5: OSPFv2-hello  52: rtrid 
yyy.yy.yyy.13 backbone auth MD5 key-id 1 seq 8715335 off 0 len 16 E mask 
255.255.255.248 int 1 pri 1 dead 4 dr xxx.xx.xxx.28 bdr xxx.xx.xxx.25 nbrs 
yyy.yy.yyy.11 yyy.yy.yyy.8 [tos 0xc0] [ttl 1] (id 38060, len 88)
11:26:03.584768 xxx.xx.xxx.28 > 224.0.0.5: OSPFv2-ls_upd  628: rtrid 
yyy.yy.yyy.13 backbone auth MD5 key-id 1 seq 8715337 off 0 len 16 { S 864D 
age 9 ase 10.2.62.0 asbr yyy.yy.yyy.9 mask 255.255.255.0 type 1 tos 0 metric 
100 } { S 864D age 9 ase 10.2.66.0 asbr yyy.yy.yyy.9 mask 255.255.255.0 
type 1 tos 0 metric 100 } { E S 80013EB3 age 9 rtr yyy.yy.yyy.9 E { net 
185.91.102.64 mask 255.255.255.248 tos 0 metric 10 } { net 10.88.15.0 mask 
255.255.255.224 tos 0 metric 10 } { net xxx.xx.xxx.64 mask 255.255.255.255 tos 
0 metric 10 } { net yyy.yy.yyy.72 mask 255.255.255.248 tos 0 metric 10 } { net 
zzz.zzz.44.0 mask 255.255.255.192 tos 0 metric 10 } { net zzz.zzz.43.96 mask 
255.255.255.240 tos 0 metric 10 } { net zzz.zzz.43.64 mask 255.255.255.224 tos 
0 metric 10 } { net zzz.zzz.43.128 mask 255.255.255.240 tos 0 metric 10 } { net 
zzz.zzz.43.32 mask 255.255.255.224 tos 0 metric 10 } { net zzz.zzz.41.176 mask 
255.255.255.248 tos 0 metric 10 } { net zzz.zzz.41.160 mask 255.255.255.240 tos 
0 metric 10 } { net zzz.zzz.41.144 mask 255.255.255.240 tos 0 metric 10 } { net 
zzz.zzz.41.32 mask 255.255.255.224 tos 0 

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-05 Thread Alexandr Nedvedicky
Hello David,


> 
> my best argument for mutexes in pf is that currently we use smr critical
> sections around things like the bridge and aggr port selection, which
> are very obviously read-only workloads. pf may be a largely read-only
> workload, but where it is at the moment means it's not going to get run
> concurrently anyway.
> 

I'm glad you mention bridges and smr. with bluhm's parallel forwarding
diff [1] Hrvoje noticed panics with various bridges:

db_enter() at db_enter+0x10
panic() at panic+0x12a
__assert() at __assert+0x2b
assertwaitok() at assertwaitok+0xdc
mi_switch() at mi_switch+0x40
sleep_finish() at sleep_finish+0x11c
rw_enter() at rw_enter+0x229
pf_test() at pf_test+0x1055
tpmr_pf() at tpmr_pf+0x7e
tpmr_input() at tpmr_input+0x124
ether_input() at ether_input+0xf5
if_input_process() at if_input_process+0x6f
ifiq_process() at ifiq_process+0x69

the things start to go downhill right here in ether_input():

 426 
 427 smr_read_enter();
 428 eb = SMR_PTR_GET(&ac->ac_brport);
 429 if (eb != NULL) {
 430 m = (*eb->eb_input)(ifp, m, dst, eb->eb_port);
 431 if (m == NULL) {
 432 smr_read_leave();
 433 return;
 434 }
 435 }
 436 smr_read_leave();
 437 

we panic, because PF is going to sleep on one of its locks. 
this is the case of 1 of 40 packets. The one, which needs
to create state. ether_input() can not expect that ->eb_input(),
won't block. May be someday one day in future, but we are not
there yet.

we don't see panic in current tree, because of exclusive
net_lock. The ether_input() runs exclusively as a writer
on netlock, thus all further locks down the stack are
grabbed without blocking.

My personal though on this: non-blocking ->eb_input() is
pretty hard requirement. I don't mind if packet processing
gives up a CPU to wait for another packet to get its job
done. This gives other process chance to use cpu, be it
another packet or any process in system. lack of sleeping
points may lead to less responsive system, when network
will get more busy.

diff below combines smr_read... with reference counting using 
traditional atomic ops. The idea is to adjust code found in ether_input():

 426 
 427 smr_read_enter();
 428 eb = SMR_PTR_GET(&ac->ac_brport);
 429 if (eb != NULL)
 430 eb->eb_port_take(eb->eb_port);
 431 smr_read_leave();
 432 if (eb != NULL) {
 433 m = (*eb->eb_input)(ifp, m, dst, eb->eb_port);
 434 eb->eb_port_rele(eb->eb_port);
 435 if (m == NULL) {
 436 return;
 437 }
 438 }
 439 

According to early tests it works well. Currently there is a one
mysterious panic, which Hrvoje may be able to comment on
how to trigger it. The stack looks as follows:

kernel diagnostic assertion "smr->smr_func == NULL" failed: file
"/sys/kern/kern_smr.c", line 247
panic() at panic+0x12a
__assert() at
__assert+0x2b
smr_call_impl() at smr_call_impl+0xd4
veb_port_input() at veb_port_input+0x2fa
ether_input() at ether_input+0x10b
if_input_process() at if_input_process+0x6f
ifiq_process() at ifiq_process+0x69
taskq_thread() at taskq_thread+0x81

It's not obvious from stack above, but we die in etherbridge_map(),
which is being called from here in veb_port_input():

#if 0 && defined(IPSEC)
if (ISSET(ifp->if_flags, IFF_LINK2) &&
(m = veb_ipsec_in(ifp0, m)) == NULL)
return (NULL);
#endif

eh = mtod(m, struct ether_header *);

if (ISSET(p->p_bif_flags, IFBIF_LEARNING))
etherbridge_map(&sc->sc_eb, p, src);

CLR(m->m_flags, M_BCAST|M_MCAST);
SET(m->m_flags, M_PROTO1);

the panic itself must be triggered by 'ebe_rele(oebe);' found
at the end of etherbridge_map(). I've spent few days with it,
and have no idea how it could happen. looks like result of
memory corruption/use-after-free bug.

anyway I would like to know if it would make sense to go with diff
below. This will allow us to keep hunting for other bugs sitting
in parallel forwarding path.

thanks and
sashan

[1] paralle://marc.info/?l=openbsd-tech&m=161903387904923&w=2 

8<---8<---8<--8<
diff --git a/sys/net/if_bridge.c b/sys/net/if_bridge.c
index 703be01648f..c41f3c93668 100644
--- a/sys/net/if_bridge.c
+++ b/sys/net/if_bridge.c

Re: [External] : Re: parallel forwarding vs. bridges

2021-06-05 Thread Alexandr Nedvedicky
Hello David,



> the scope of the pf locks likely needs reduction anyway. one of my

I agree. pf_lock covers too much in PF currently. it protects,
all rules, tables and fragment caches.

> production firewalls panicked with the pf lock trying to lock against
> itself a couple of nights ago:
> 
> db_enter() at db_enter+0x5
> panic(81d47212) at panic+0x12a
> rw_enter(82060e70,1) at rw_enter+0x261
> pf_test(18,2,8158f000,800024d03db8) at pf_test+0x118c
> ip6_output(fd80049a3a00,0,0,0,800024d03eb0,0) at
> ip6_output+0xd33
> nd6_ns_output(8158f000,0,800024d04228,fd8279b3b420,0) at
> nd6_ns
> _output+0x3e2
> nd6_resolve(8158f000,fd816c6b2ae0,fd80659d8300,800024d04220
> ,800024d040d8) at nd6_resolve+0x29d
> ether_resolve(8158f000,fd80659d8300,800024d04220,fd816c6b2a
> e0,800024d040d8) at ether_resolve+0x127
> ether_output(8158f000,fd80659d8300,800024d04220,fd816c6b2ae
> 0) at ether_output+0x2a
> ip6_output(fd80659d8300,0,0,0,0,0) at ip6_output+0x1180
> pfsync_undefer_notify(fd841dbec7b8) at pfsync_undefer_notify+0xac
> pfsync_undefer(fd841dbec7b8,0) at pfsync_undefer+0x8d
> pfsync_defer(fd82303cb310,fd8065a2) at pfsync_defer+0xfe
> pf_test_rule(800024d04600,800024d045e8,800024d045e0,800024d045f
> 0,800024d045d8,800024d045fe) at pf_test_rule+0x693
> pf_test(2,3,815a9000,800024d04798) at pf_test+0x10f1
> ip_output(fd8065a2,0,800024d04850,1,0,0) at ip_output+0x829
> ip_forward(fd8065a2,81541800,fd817a7722d0,0) at
> ip_forward+
> 0x27a  
> ip_input_if(800024d04a80,800024d04a7c,4,0,81541800) at
> ip_input
> _if+0x5fd
> ipv4_input(81541800,fd8065a2) at ipv4_input+0x37
> carp_input(8158f000,fd8065a2,5e000158) at
> carp_input+0x1ac
> ether_input(8158f000,fd8065a2) at ether_input+0x1c0
> vlan_input(8152f000,fd8065a2) at vlan_input+0x19a
> ether_input(8152f000,fd8065a2) at ether_input+0x76
> if_input_process(801df048,800024d04ca8) at
> if_input_process+0x5a
> ifiq_process(801dbe00) at ifiq_process+0x6f
> taskq_thread(8002b080) at taskq_thread+0x7b
> end trace frame: 0x0, count: -26
> 

diff below postpones call to pfsync_undefer() to moment, when PF_LOCK() is
released. I believe this should fix the panic you see. the change does not
look nice. and can be later reverted, when pf_lock scope will be reduced.

I took a closer look at pfsync_defer() here:

1941 
1942 if (sc->sc_deferred >= 128) {
1943 mtx_enter(&sc->sc_deferrals_mtx);
1944 pd = TAILQ_FIRST(&sc->sc_deferrals);
1945 TAILQ_REMOVE(&sc->sc_deferrals, pd, pd_entry);
1946 sc->sc_deferred--;
1947 mtx_leave(&sc->sc_deferrals_mtx);
1948 if (timeout_del(&pd->pd_tmo))
1949 pfsync_undefer(pd, 0);
1950 }
1951 
1952 pd = pool_get(&sc->sc_pool, M_NOWAIT);
1953 if (pd == NULL)
1954 return (0);

it seems to me we may leak `pd` we took at line 1944. The leak
happens in case timeout_del() return zero.

diff below ignores the return value from timeout_del() and makes
sure we always call pfsync_undefefer()

I have not seen such panic in my test environment. Would you be able
to give my diff a try?

thanks a lot
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index b90aa934de4..a5e5e1ae0f3 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -272,7 +272,6 @@ voidpfsync_syncdev_state(void *);
 void   pfsync_ifdetach(void *);
 
 void   pfsync_deferred(struct pf_state *, int);
-void   pfsync_undefer(struct pfsync_deferral *, int);
 void   pfsync_defer_tmo(void *);
 
 void   pfsync_cancel_full_update(struct pfsync_softc *);
@@ -1927,7 +1926,7 @@ pfsync_insert_state(struct pf_state *st)
 }
 
 int
-pfsync_defer(struct pf_state *st, struct mbuf *m)
+pfsync_defer(struct pf_state *st, struct mbuf *m, struct pfsync_deferral **ppd)
 {
struct pfsync_softc *sc = pfsyncif;
struct pfsync_deferral *pd;
@@ -1939,19 +1938,29 @@ pfsync_defer(struct pf_state *st, struct mbuf *m)
m->m_flags & (M_BCAST|M_MCAST))
return (0);
 
+   pd = pool_get(&sc->sc_pool, M_NOWAIT);
+   if (pd == NULL)
+   return (0);
+
+   /*
+* deferral queue grows faster, than timeout can consume,
+* we have to ask packet (caller) to help timer and dispatch
+* one deferral for us.
+*
+* We wish to call pfsync_undefer() here. Unfortunately we can't,
+* because pfsync_undefer() will be calling to ip_output(),
+* which in turn will call to pf_test(), which 

Re: macppc: add ld.script for kernel, ofwboot

2021-06-05 Thread Brad Smith
On Mon, May 10, 2021 at 09:49:24PM +0200, Mark Kettenis wrote:
> > Date: Mon, 10 May 2021 14:22:33 -0400
> > From: George Koehler 
> > 
> > On Fri, 7 May 2021 10:31:55 +0200 (CEST)
> > Mark Kettenis  wrote:
> > 
> > > Makes sense to me.  It seems ldd always seems to require a little bit
> > > more coercion to produce non-standard binaries.  We use linker scripts
> > > for the various EFI bootloaders as well.
> > > 
> > > ok kettenis@
> > 
> > My diff had an extra "pwd" in arch/macppc/stand/ofwboot/Makefile;
> > I deleted the "pwd" before committing it.
> > 
> > > > -${PROG}: ${OBJS} ${LIBSA} ${LIBZ}
> > > > -   ${LD} -nopie -znorelro -N -X -Ttext ${RELOC} -e ${ENTRY} -o 
> > > > ${PROG} \
> > > > +${PROG}: ${OBJS} ${LIBSA} ${LIBZ} ld.script
> > > > +   pwd
> > > > +   ${LD} -nopie -znorelro -N -X -T ${.CURDIR}/ld.script -o ${PROG} 
> > > > \
> > > > ${OBJS} ${LIBS}
> > 
> > >From my experiments with lld 10, I believe that macppc is almost ready
> > to switch from ld.bfd to ld.lld.  I know of 2 other problems:
> > 
> >   1.  ports/lang/gcc/8 needs USE_LLD = No, because lld 10 can't link
> >   C++ code from gcc.  (I have not yet checked lld 11.)  lld had no
> >   problem with Fortran ports built by gcc.
> > 
> >   2.  All instances of -Wl,-relax or -Wl,--relax in src or ports must
> >   be deleted, because it is an unknown option to lld, but lld can
> >   link large binaries without the option.
> 
> Maybe just coordinate with Theo and the ports folks and move ahead.


Index: gnu/usr.bin/clang/Makefile.inc
===
RCS file: /home/cvs/src/gnu/usr.bin/clang/Makefile.inc,v
retrieving revision 1.24
diff -u -p -u -p -r1.24 Makefile.inc
--- gnu/usr.bin/clang/Makefile.inc  19 May 2021 23:18:40 -  1.24
+++ gnu/usr.bin/clang/Makefile.inc  5 Jun 2021 03:40:20 -
@@ -77,7 +77,3 @@ DPADD+=   ${.OBJDIR}/../lib${lib}/lib${lib
 LDADD+=${.OBJDIR}/../lib${lib}/lib${lib}.a
 .endfor
 LDADD+=-Wl,--end-group
-
-.if ${MACHINE_ARCH} == "powerpc"
-LDADD+=-Wl,-relax
-.endif



Index: devel/clang-tools-extra/Makefile
===
RCS file: /home/cvs/ports/devel/clang-tools-extra/Makefile,v
retrieving revision 1.16
diff -u -p -u -p -r1.16 Makefile
--- devel/clang-tools-extra/Makefile18 May 2021 06:19:48 -  1.16
+++ devel/clang-tools-extra/Makefile5 Jun 2021 03:32:01 -
@@ -67,11 +67,6 @@ CONFIGURE_ARGS +=-DCLANG_ENABLE_STATIC_
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF
 
-.if ${MACHINE_ARCH} == "powerpc"
-CONFIGURE_ARGS +=  -DCMAKE_EXE_LINKER_FLAGS="-Wl,--relax"
-CONFIGURE_ARGS +=  -DCMAKE_SHARED_LINKER_FLAGS="-Wl,--relax"
-.endif
-
 GCC_VER =  8.4.0
 .if ${MACHINE_ARCH} == "amd64"
 GCC_CONFIG =   x86_64-unknown-openbsd${OSREV}
Index: devel/llvm/Makefile
===
RCS file: /home/cvs/ports/devel/llvm/Makefile,v
retrieving revision 1.277
diff -u -p -u -p -r1.277 Makefile
--- devel/llvm/Makefile 22 May 2021 20:27:35 -  1.277
+++ devel/llvm/Makefile 5 Jun 2021 03:32:12 -
@@ -112,11 +112,6 @@ CXXFLAGS+= -mno-retpoline
 CXXFLAGS+= -fomit-frame-pointer
 .endif
 
-.if ${MACHINE_ARCH} == "powerpc"
-CONFIGURE_ARGS +=  -DCMAKE_EXE_LINKER_FLAGS="-Wl,-relax"
-CONFIGURE_ARGS +=  -DCMAKE_SHARED_LINKER_FLAGS="-Wl,-relax"
-.endif
-
 TEST_TARGET =  check check-clang
 
 # XXX sync
Index: games/godot/Makefile
===
RCS file: /home/cvs/ports/games/godot/Makefile,v
retrieving revision 1.18
diff -u -p -u -p -r1.18 Makefile
--- games/godot/Makefile3 May 2021 19:10:24 -   1.18
+++ games/godot/Makefile5 Jun 2021 03:32:32 -
@@ -76,11 +76,6 @@ LDFLAGS += -latomic
 WANTLIB += atomic
 .endif
 
-# Fix relocation overflows
-.if ${MACHINE_ARCH:Mpowerpc}
-LDFLAGS += -Wl,--relax
-.endif
-
 post-extract:
cp -R ${FILESDIR}/sndio ${WRKDIST}/drivers
 
Index: games/scummvm/Makefile
===
RCS file: /home/cvs/ports/games/scummvm/Makefile,v
retrieving revision 1.88
diff -u -p -u -p -r1.88 Makefile
--- games/scummvm/Makefile  4 Jan 2021 17:43:02 -   1.88
+++ games/scummvm/Makefile  5 Jun 2021 03:32:48 -
@@ -40,11 +40,6 @@ LIB_DEPENDS= audio/fluidsynth \
 CXXFLAGS+= -mxgot
 .endif
 
-# Fix relocation overflows
-.if ${MACHINE_ARCH} == "powerpc"
-LDFLAGS+=  -Wl,--relax
-.endif
-
 CONFIGURE_STYLE=simple
 CONFIGURE_ARGS+=--disable-alsa \
--disable-cloud \
Index: lang/gcc/8/Makefile
===
RCS file: /home/cvs/ports/lang/gcc/8/Makefile,v
retrieving revision 1.41
diff -u -p -u -p -r1.41 Makefile
--- lang/gcc/8/Makefile 9 Feb 2021 02:1