ospfd seq out of order in ls_upd floods
Sometimes I see authentication errors from ospfd, mainly (though possibly not entirely always) on a 30 minute cycle, e.g. these log entries 2021-06-03T05:30:04.952Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T05:51:43.785Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T05:51:43.785Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T05:56:03.248Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T05:56:03.248Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T05:59:58.978Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T08:21:43.851Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T08:21:43.851Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T08:26:03.434Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T08:26:03.434Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T08:29:59.087Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T08:30:04.097Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T08:51:43.858Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T08:51:43.858Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T08:56:03.470Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T08:56:03.470Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T09:00:01.033Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T09:00:06.043Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T09:21:43.876Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T09:21:43.876Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T09:26:03.518Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T09:26:03.518Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T09:30:00.060Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T09:30:05.070Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T09:51:43.893Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T09:51:43.893Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T09:56:03.555Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T09:56:03.555Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T09:59:59.086Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T11:00:05.088Z ospfd[31748]: spf_calc: area 0.0.0.0 calculated 2021-06-03T11:21:43.932Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T11:21:43.932Z ospfd[76044]: recv_packet: authentication error, interface vlan760 2021-06-03T11:26:03.675Z ospfd[76044]: auth_validate: decreasing seq num, interface vlan760 2021-06-03T11:26:03.675Z ospfd[76044]: recv_packet: authentication error, interface vlan760 The cyclical nature suggests it's in the ls_upd floods and looking at packets confirms that, note the ls_upd where 8715337 and 8715336 are sent out of order: 11:26:02.246716 xxx.xx.xxx.28 > 224.0.0.5: OSPFv2-hello 52: rtrid yyy.yy.yyy.13 backbone auth MD5 key-id 1 seq 8715334 off 0 len 16 E mask 255.255.255.248 int 1 pri 1 dead 4 dr xxx.xx.xxx.28 bdr xxx.xx.xxx.25 nbrs yyy.yy.yyy.11 yyy.yy.yyy.8 [tos 0xc0] [ttl 1] (id 38681, len 88) 11:26:03.249345 xxx.xx.xxx.28 > 224.0.0.5: OSPFv2-hello 52: rtrid yyy.yy.yyy.13 backbone auth MD5 key-id 1 seq 8715335 off 0 len 16 E mask 255.255.255.248 int 1 pri 1 dead 4 dr xxx.xx.xxx.28 bdr xxx.xx.xxx.25 nbrs yyy.yy.yyy.11 yyy.yy.yyy.8 [tos 0xc0] [ttl 1] (id 38060, len 88) 11:26:03.584768 xxx.xx.xxx.28 > 224.0.0.5: OSPFv2-ls_upd 628: rtrid yyy.yy.yyy.13 backbone auth MD5 key-id 1 seq 8715337 off 0 len 16 { S 864D age 9 ase 10.2.62.0 asbr yyy.yy.yyy.9 mask 255.255.255.0 type 1 tos 0 metric 100 } { S 864D age 9 ase 10.2.66.0 asbr yyy.yy.yyy.9 mask 255.255.255.0 type 1 tos 0 metric 100 } { E S 80013EB3 age 9 rtr yyy.yy.yyy.9 E { net 185.91.102.64 mask 255.255.255.248 tos 0 metric 10 } { net 10.88.15.0 mask 255.255.255.224 tos 0 metric 10 } { net xxx.xx.xxx.64 mask 255.255.255.255 tos 0 metric 10 } { net yyy.yy.yyy.72 mask 255.255.255.248 tos 0 metric 10 } { net zzz.zzz.44.0 mask 255.255.255.192 tos 0 metric 10 } { net zzz.zzz.43.96 mask 255.255.255.240 tos 0 metric 10 } { net zzz.zzz.43.64 mask 255.255.255.224 tos 0 metric 10 } { net zzz.zzz.43.128 mask 255.255.255.240 tos 0 metric 10 } { net zzz.zzz.43.32 mask 255.255.255.224 tos 0 metric 10 } { net zzz.zzz.41.176 mask 255.255.255.248 tos 0 metric 10 } { net zzz.zzz.41.160 mask 255.255.255.240 tos 0 metric 10 } { net zzz.zzz.41.144 mask 255.255.255.240 tos 0 metric 10 } { net zzz.zzz.41.32 mask 255.255.255.224 tos 0
Re: [External] : Re: parallel forwarding vs. bridges
Hello David, > > my best argument for mutexes in pf is that currently we use smr critical > sections around things like the bridge and aggr port selection, which > are very obviously read-only workloads. pf may be a largely read-only > workload, but where it is at the moment means it's not going to get run > concurrently anyway. > I'm glad you mention bridges and smr. with bluhm's parallel forwarding diff [1] Hrvoje noticed panics with various bridges: db_enter() at db_enter+0x10 panic() at panic+0x12a __assert() at __assert+0x2b assertwaitok() at assertwaitok+0xdc mi_switch() at mi_switch+0x40 sleep_finish() at sleep_finish+0x11c rw_enter() at rw_enter+0x229 pf_test() at pf_test+0x1055 tpmr_pf() at tpmr_pf+0x7e tpmr_input() at tpmr_input+0x124 ether_input() at ether_input+0xf5 if_input_process() at if_input_process+0x6f ifiq_process() at ifiq_process+0x69 the things start to go downhill right here in ether_input(): 426 427 smr_read_enter(); 428 eb = SMR_PTR_GET(&ac->ac_brport); 429 if (eb != NULL) { 430 m = (*eb->eb_input)(ifp, m, dst, eb->eb_port); 431 if (m == NULL) { 432 smr_read_leave(); 433 return; 434 } 435 } 436 smr_read_leave(); 437 we panic, because PF is going to sleep on one of its locks. this is the case of 1 of 40 packets. The one, which needs to create state. ether_input() can not expect that ->eb_input(), won't block. May be someday one day in future, but we are not there yet. we don't see panic in current tree, because of exclusive net_lock. The ether_input() runs exclusively as a writer on netlock, thus all further locks down the stack are grabbed without blocking. My personal though on this: non-blocking ->eb_input() is pretty hard requirement. I don't mind if packet processing gives up a CPU to wait for another packet to get its job done. This gives other process chance to use cpu, be it another packet or any process in system. lack of sleeping points may lead to less responsive system, when network will get more busy. diff below combines smr_read... with reference counting using traditional atomic ops. The idea is to adjust code found in ether_input(): 426 427 smr_read_enter(); 428 eb = SMR_PTR_GET(&ac->ac_brport); 429 if (eb != NULL) 430 eb->eb_port_take(eb->eb_port); 431 smr_read_leave(); 432 if (eb != NULL) { 433 m = (*eb->eb_input)(ifp, m, dst, eb->eb_port); 434 eb->eb_port_rele(eb->eb_port); 435 if (m == NULL) { 436 return; 437 } 438 } 439 According to early tests it works well. Currently there is a one mysterious panic, which Hrvoje may be able to comment on how to trigger it. The stack looks as follows: kernel diagnostic assertion "smr->smr_func == NULL" failed: file "/sys/kern/kern_smr.c", line 247 panic() at panic+0x12a __assert() at __assert+0x2b smr_call_impl() at smr_call_impl+0xd4 veb_port_input() at veb_port_input+0x2fa ether_input() at ether_input+0x10b if_input_process() at if_input_process+0x6f ifiq_process() at ifiq_process+0x69 taskq_thread() at taskq_thread+0x81 It's not obvious from stack above, but we die in etherbridge_map(), which is being called from here in veb_port_input(): #if 0 && defined(IPSEC) if (ISSET(ifp->if_flags, IFF_LINK2) && (m = veb_ipsec_in(ifp0, m)) == NULL) return (NULL); #endif eh = mtod(m, struct ether_header *); if (ISSET(p->p_bif_flags, IFBIF_LEARNING)) etherbridge_map(&sc->sc_eb, p, src); CLR(m->m_flags, M_BCAST|M_MCAST); SET(m->m_flags, M_PROTO1); the panic itself must be triggered by 'ebe_rele(oebe);' found at the end of etherbridge_map(). I've spent few days with it, and have no idea how it could happen. looks like result of memory corruption/use-after-free bug. anyway I would like to know if it would make sense to go with diff below. This will allow us to keep hunting for other bugs sitting in parallel forwarding path. thanks and sashan [1] paralle://marc.info/?l=openbsd-tech&m=161903387904923&w=2 8<---8<---8<--8< diff --git a/sys/net/if_bridge.c b/sys/net/if_bridge.c index 703be01648f..c41f3c93668 100644 --- a/sys/net/if_bridge.c +++ b/sys/net/if_bridge.c
Re: [External] : Re: parallel forwarding vs. bridges
Hello David, > the scope of the pf locks likely needs reduction anyway. one of my I agree. pf_lock covers too much in PF currently. it protects, all rules, tables and fragment caches. > production firewalls panicked with the pf lock trying to lock against > itself a couple of nights ago: > > db_enter() at db_enter+0x5 > panic(81d47212) at panic+0x12a > rw_enter(82060e70,1) at rw_enter+0x261 > pf_test(18,2,8158f000,800024d03db8) at pf_test+0x118c > ip6_output(fd80049a3a00,0,0,0,800024d03eb0,0) at > ip6_output+0xd33 > nd6_ns_output(8158f000,0,800024d04228,fd8279b3b420,0) at > nd6_ns > _output+0x3e2 > nd6_resolve(8158f000,fd816c6b2ae0,fd80659d8300,800024d04220 > ,800024d040d8) at nd6_resolve+0x29d > ether_resolve(8158f000,fd80659d8300,800024d04220,fd816c6b2a > e0,800024d040d8) at ether_resolve+0x127 > ether_output(8158f000,fd80659d8300,800024d04220,fd816c6b2ae > 0) at ether_output+0x2a > ip6_output(fd80659d8300,0,0,0,0,0) at ip6_output+0x1180 > pfsync_undefer_notify(fd841dbec7b8) at pfsync_undefer_notify+0xac > pfsync_undefer(fd841dbec7b8,0) at pfsync_undefer+0x8d > pfsync_defer(fd82303cb310,fd8065a2) at pfsync_defer+0xfe > pf_test_rule(800024d04600,800024d045e8,800024d045e0,800024d045f > 0,800024d045d8,800024d045fe) at pf_test_rule+0x693 > pf_test(2,3,815a9000,800024d04798) at pf_test+0x10f1 > ip_output(fd8065a2,0,800024d04850,1,0,0) at ip_output+0x829 > ip_forward(fd8065a2,81541800,fd817a7722d0,0) at > ip_forward+ > 0x27a > ip_input_if(800024d04a80,800024d04a7c,4,0,81541800) at > ip_input > _if+0x5fd > ipv4_input(81541800,fd8065a2) at ipv4_input+0x37 > carp_input(8158f000,fd8065a2,5e000158) at > carp_input+0x1ac > ether_input(8158f000,fd8065a2) at ether_input+0x1c0 > vlan_input(8152f000,fd8065a2) at vlan_input+0x19a > ether_input(8152f000,fd8065a2) at ether_input+0x76 > if_input_process(801df048,800024d04ca8) at > if_input_process+0x5a > ifiq_process(801dbe00) at ifiq_process+0x6f > taskq_thread(8002b080) at taskq_thread+0x7b > end trace frame: 0x0, count: -26 > diff below postpones call to pfsync_undefer() to moment, when PF_LOCK() is released. I believe this should fix the panic you see. the change does not look nice. and can be later reverted, when pf_lock scope will be reduced. I took a closer look at pfsync_defer() here: 1941 1942 if (sc->sc_deferred >= 128) { 1943 mtx_enter(&sc->sc_deferrals_mtx); 1944 pd = TAILQ_FIRST(&sc->sc_deferrals); 1945 TAILQ_REMOVE(&sc->sc_deferrals, pd, pd_entry); 1946 sc->sc_deferred--; 1947 mtx_leave(&sc->sc_deferrals_mtx); 1948 if (timeout_del(&pd->pd_tmo)) 1949 pfsync_undefer(pd, 0); 1950 } 1951 1952 pd = pool_get(&sc->sc_pool, M_NOWAIT); 1953 if (pd == NULL) 1954 return (0); it seems to me we may leak `pd` we took at line 1944. The leak happens in case timeout_del() return zero. diff below ignores the return value from timeout_del() and makes sure we always call pfsync_undefefer() I have not seen such panic in my test environment. Would you be able to give my diff a try? thanks a lot regards sashan 8<---8<---8<--8< diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c index b90aa934de4..a5e5e1ae0f3 100644 --- a/sys/net/if_pfsync.c +++ b/sys/net/if_pfsync.c @@ -272,7 +272,6 @@ voidpfsync_syncdev_state(void *); void pfsync_ifdetach(void *); void pfsync_deferred(struct pf_state *, int); -void pfsync_undefer(struct pfsync_deferral *, int); void pfsync_defer_tmo(void *); void pfsync_cancel_full_update(struct pfsync_softc *); @@ -1927,7 +1926,7 @@ pfsync_insert_state(struct pf_state *st) } int -pfsync_defer(struct pf_state *st, struct mbuf *m) +pfsync_defer(struct pf_state *st, struct mbuf *m, struct pfsync_deferral **ppd) { struct pfsync_softc *sc = pfsyncif; struct pfsync_deferral *pd; @@ -1939,19 +1938,29 @@ pfsync_defer(struct pf_state *st, struct mbuf *m) m->m_flags & (M_BCAST|M_MCAST)) return (0); + pd = pool_get(&sc->sc_pool, M_NOWAIT); + if (pd == NULL) + return (0); + + /* +* deferral queue grows faster, than timeout can consume, +* we have to ask packet (caller) to help timer and dispatch +* one deferral for us. +* +* We wish to call pfsync_undefer() here. Unfortunately we can't, +* because pfsync_undefer() will be calling to ip_output(), +* which in turn will call to pf_test(), which
Re: macppc: add ld.script for kernel, ofwboot
On Mon, May 10, 2021 at 09:49:24PM +0200, Mark Kettenis wrote: > > Date: Mon, 10 May 2021 14:22:33 -0400 > > From: George Koehler > > > > On Fri, 7 May 2021 10:31:55 +0200 (CEST) > > Mark Kettenis wrote: > > > > > Makes sense to me. It seems ldd always seems to require a little bit > > > more coercion to produce non-standard binaries. We use linker scripts > > > for the various EFI bootloaders as well. > > > > > > ok kettenis@ > > > > My diff had an extra "pwd" in arch/macppc/stand/ofwboot/Makefile; > > I deleted the "pwd" before committing it. > > > > > > -${PROG}: ${OBJS} ${LIBSA} ${LIBZ} > > > > - ${LD} -nopie -znorelro -N -X -Ttext ${RELOC} -e ${ENTRY} -o > > > > ${PROG} \ > > > > +${PROG}: ${OBJS} ${LIBSA} ${LIBZ} ld.script > > > > + pwd > > > > + ${LD} -nopie -znorelro -N -X -T ${.CURDIR}/ld.script -o ${PROG} > > > > \ > > > > ${OBJS} ${LIBS} > > > > >From my experiments with lld 10, I believe that macppc is almost ready > > to switch from ld.bfd to ld.lld. I know of 2 other problems: > > > > 1. ports/lang/gcc/8 needs USE_LLD = No, because lld 10 can't link > > C++ code from gcc. (I have not yet checked lld 11.) lld had no > > problem with Fortran ports built by gcc. > > > > 2. All instances of -Wl,-relax or -Wl,--relax in src or ports must > > be deleted, because it is an unknown option to lld, but lld can > > link large binaries without the option. > > Maybe just coordinate with Theo and the ports folks and move ahead. Index: gnu/usr.bin/clang/Makefile.inc === RCS file: /home/cvs/src/gnu/usr.bin/clang/Makefile.inc,v retrieving revision 1.24 diff -u -p -u -p -r1.24 Makefile.inc --- gnu/usr.bin/clang/Makefile.inc 19 May 2021 23:18:40 - 1.24 +++ gnu/usr.bin/clang/Makefile.inc 5 Jun 2021 03:40:20 - @@ -77,7 +77,3 @@ DPADD+= ${.OBJDIR}/../lib${lib}/lib${lib LDADD+=${.OBJDIR}/../lib${lib}/lib${lib}.a .endfor LDADD+=-Wl,--end-group - -.if ${MACHINE_ARCH} == "powerpc" -LDADD+=-Wl,-relax -.endif Index: devel/clang-tools-extra/Makefile === RCS file: /home/cvs/ports/devel/clang-tools-extra/Makefile,v retrieving revision 1.16 diff -u -p -u -p -r1.16 Makefile --- devel/clang-tools-extra/Makefile18 May 2021 06:19:48 - 1.16 +++ devel/clang-tools-extra/Makefile5 Jun 2021 03:32:01 - @@ -67,11 +67,6 @@ CONFIGURE_ARGS +=-DCLANG_ENABLE_STATIC_ -DLLVM_INCLUDE_EXAMPLES=OFF \ -DLLVM_INCLUDE_TESTS=OFF -.if ${MACHINE_ARCH} == "powerpc" -CONFIGURE_ARGS += -DCMAKE_EXE_LINKER_FLAGS="-Wl,--relax" -CONFIGURE_ARGS += -DCMAKE_SHARED_LINKER_FLAGS="-Wl,--relax" -.endif - GCC_VER = 8.4.0 .if ${MACHINE_ARCH} == "amd64" GCC_CONFIG = x86_64-unknown-openbsd${OSREV} Index: devel/llvm/Makefile === RCS file: /home/cvs/ports/devel/llvm/Makefile,v retrieving revision 1.277 diff -u -p -u -p -r1.277 Makefile --- devel/llvm/Makefile 22 May 2021 20:27:35 - 1.277 +++ devel/llvm/Makefile 5 Jun 2021 03:32:12 - @@ -112,11 +112,6 @@ CXXFLAGS+= -mno-retpoline CXXFLAGS+= -fomit-frame-pointer .endif -.if ${MACHINE_ARCH} == "powerpc" -CONFIGURE_ARGS += -DCMAKE_EXE_LINKER_FLAGS="-Wl,-relax" -CONFIGURE_ARGS += -DCMAKE_SHARED_LINKER_FLAGS="-Wl,-relax" -.endif - TEST_TARGET = check check-clang # XXX sync Index: games/godot/Makefile === RCS file: /home/cvs/ports/games/godot/Makefile,v retrieving revision 1.18 diff -u -p -u -p -r1.18 Makefile --- games/godot/Makefile3 May 2021 19:10:24 - 1.18 +++ games/godot/Makefile5 Jun 2021 03:32:32 - @@ -76,11 +76,6 @@ LDFLAGS += -latomic WANTLIB += atomic .endif -# Fix relocation overflows -.if ${MACHINE_ARCH:Mpowerpc} -LDFLAGS += -Wl,--relax -.endif - post-extract: cp -R ${FILESDIR}/sndio ${WRKDIST}/drivers Index: games/scummvm/Makefile === RCS file: /home/cvs/ports/games/scummvm/Makefile,v retrieving revision 1.88 diff -u -p -u -p -r1.88 Makefile --- games/scummvm/Makefile 4 Jan 2021 17:43:02 - 1.88 +++ games/scummvm/Makefile 5 Jun 2021 03:32:48 - @@ -40,11 +40,6 @@ LIB_DEPENDS= audio/fluidsynth \ CXXFLAGS+= -mxgot .endif -# Fix relocation overflows -.if ${MACHINE_ARCH} == "powerpc" -LDFLAGS+= -Wl,--relax -.endif - CONFIGURE_STYLE=simple CONFIGURE_ARGS+=--disable-alsa \ --disable-cloud \ Index: lang/gcc/8/Makefile === RCS file: /home/cvs/ports/lang/gcc/8/Makefile,v retrieving revision 1.41 diff -u -p -u -p -r1.41 Makefile --- lang/gcc/8/Makefile 9 Feb 2021 02:1