On Fri, Dec 19, 2025 at 12:10:09PM +0100, Kristof Provost wrote:
K> I’m seeing panics on pfsync interface destruction now:
K> 
K>      panic: mld_change_state: bad ifp
K>      cpuid = 19
K>      time = 1766142554
K>      KDB: stack backtrace:
K>      db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
K> 0xfffffe01843fd990
K>      vpanic() at vpanic+0x136/frame 0xfffffe01843fdac0
K>      panic() at panic+0x43/frame 0xfffffe01843fdb20
K>      mld_change_state() at mld_change_state+0x6d0/frame 0xfffffe01843fdb90
K>      in6_leavegroup_locked() at in6_leavegroup_locked+0xa9/frame
K> 0xfffffe01843fdbf0
K>      in6_leavegroup() at in6_leavegroup+0x32/frame 0xfffffe01843fdc10
K>      pfsync_multicast_cleanup() at pfsync_multicast_cleanup+0x83/frame
K> 0xfffffe01843fdc40
K>      pfsync_clone_destroy() at pfsync_clone_destroy+0x260/frame
K> 0xfffffe01843fdc90
K>      ifc_simple_destroy_wrapper() at ifc_simple_destroy_wrapper+0x26/frame
K> 0xfffffe01843fdca0
K>      if_clone_destroyif_flags() at if_clone_destroyif_flags+0x69/frame
K> 0xfffffe01843fdce0
K>      if_clone_detach() at if_clone_detach+0xe6/frame 0xfffffe01843fdd10
K>      vnet_pfsync_uninit() at vnet_pfsync_uninit+0xf0/frame 0xfffffe01843fdd30
K>      vnet_destroy() at vnet_destroy+0x154/frame 0xfffffe01843fdd60
K>      prison_deref() at prison_deref+0xaf5/frame 0xfffffe01843fddd0
K>      sys_jail_remove() at sys_jail_remove+0x15c/frame 0xfffffe01843fde00
K>      amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe01843fdf30
K>      fast_syscall_common() at fast_syscall_common+0xf8/frame 
0xfffffe01843fdf30
K>      --- syscall (508, FreeBSD ELF64, jail_remove), rip = 0x2d8234c9e31a, 
rsp =
K> 0x2d823179b928, rbp = 0x2d823179b9b0 ---
K>      KDB: enter: panic
K> 
K> The pfsync:basic_ipv6 seems to trigger this reliably.

This actually surfaced an interesting problem, and pfsync being an interface
isn't a culprit here :)  Neither my changes are.

The problem is that IPv6 multicast layer in in6_getmulti() will call into
interface multicast layer with if_addmulti() to allocate struct ifmultiaddr.
This new born ifmultiaddr will have refcount of 1, but it will be referenced
both by the struct in6_multi and the interface linked list.  It should have
refcount of 2.  For all normal cases the in6_multi structs are also somehow
associated with the interface they were allocated for and at teardown sequence
they will go away all together, so this refcounting bug never triggers.

But with pfsync calling in6_joingroup() on some ifnet from its own pfsync's
context we come into a situation when the struct in6_multi is external to the
ifnet it is associated with.  If this ifnet is detached before pfsync context
is destroyed, then our in6_multi will point at a detached ifnet that is hanging
on the last reference (all methods point to if_dead) and this in6_multi will
also point at freed ifmultiaddr.

I'm looking at either a proper fix or at hiding it back under carper as it was
before.

-- 
Gleb Smirnoff

Reply via email to