Re: [External] : pfsync mutex mpfloor

2022-04-13 Thread Alexandr Nedvedicky
On Wed, Apr 13, 2022 at 09:33:52PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> Hrvoje has hit a witness issue in pfsync.
> 
> panic: acquiring blockable sleep lock with spinlock or critical
> section held (kernel_lock) &kernel_lock
> 
> panic(81f45bb7) at panic+0xbf
> witness_checkorder(8246e970,9,0) at witness_checkorder+0xb61
> __mp_lock(8246e768) at __mp_lock+0x5f
> kpageflttrap(800020b26dc0,17) at kpageflttrap+0x173
> kerntrap(800020b26dc0) at kerntrap+0x91
> alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> pfsync_q_del(fd875f6336c0) at pfsync_q_del+0x70
> pfsync_delete_state(fd875f6336c0) at pfsync_delete_state+0x118
> 
> pf and pfsync are running without kernel lock, so the mutexes
> must have at least mpfloor spl protection.
> 
> ok?
> 

looks correct to me.

OK sashan



Re: [External] : pfsync mutex

2022-02-24 Thread Alexandr Nedvedicky
Hello,

On Thu, Feb 24, 2022 at 10:42:58PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> Hrvoje reported some crashes with pfsync, IPsec and parallel
> forwarding.  Some locks were missing around the tdb flags in pfsync.
> 
> ok?
> 

change looks good to me. I just have a single concern
about pfsync_update_tdb() which I think deserves some
attention:

2474 void
2475 pfsync_update_tdb(struct tdb *t, int output)
2476 {
2477 struct pfsync_softc *sc = pfsyncif;
2478 size_t nlen, sc_len;
2479 
2480 if (sc == NULL)
2481 return;
2482 
2483 if (!ISSET(t->tdb_flags, TDBF_PFSYNC)) {
2484 do {
2485 mtx_enter(&sc->sc_tdb_mtx);
2486 nlen = sizeof(struct pfsync_tdb);
2487 
2488 if (TAILQ_EMPTY(&sc->sc_tdb_q))
2489 nlen += sizeof(struct pfsync_subheader);
2490 
2491 sc_len = atomic_add_long_nv(&sc->sc_len, nlen);
2492 if (sc_len > sc->sc_if.if_mtu) {
2493 atomic_sub_long(&sc->sc_len, nlen);
2494 mtx_leave(&sc->sc_tdb_mtx);
2495 pfsync_sendout();
2496 continue;
2497 }

my concern is we check ->tdb_flags at line 2483 _without_ holding
->tdb_mtx. I think we should re-check the flag as soon as we
grab ->tdb_mtx in the 'do {} while ()' loop. diff below does that.
it applies on top of your diff. with tweak suggested below the change
is OK sashan@

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index c934392af0f..620aa73060b 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -2485,19 +2485,27 @@ pfsync_update_tdb(struct tdb *t, int output)
mtx_enter(&sc->sc_tdb_mtx);
nlen = sizeof(struct pfsync_tdb);
 
+   mtx_enter(&t->tdb_mtx);
+   if (ISSET(t->tdb_flags, TDBF_PFSYNC)) {
+   /* we've lost race, no action for us then */
+   mtx_leave(&t->tdb_mtx);
+   mtx_leave(&sc->sc_tdb_mtx);
+   break;
+   }
+
if (TAILQ_EMPTY(&sc->sc_tdb_q))
nlen += sizeof(struct pfsync_subheader);
 
sc_len = atomic_add_long_nv(&sc->sc_len, nlen);
if (sc_len > sc->sc_if.if_mtu) {
atomic_sub_long(&sc->sc_len, nlen);
+   mtx_leave(&t->tdb_mtx);
mtx_leave(&sc->sc_tdb_mtx);
pfsync_sendout();
continue;
}
 
TAILQ_INSERT_TAIL(&sc->sc_tdb_q, t, tdb_sync_entry);
-   mtx_enter(&t->tdb_mtx);
SET(t->tdb_flags, TDBF_PFSYNC);
mtx_leave(&t->tdb_mtx);