On 7.6.2022. 2:16, Alexandr Nedvedicky wrote:
> Hello,
> 
> below is a diff which hopes to fix the issue. Although diff is fairly
> large the change itself is kind of straightforward. Let me briefly
> explain what's going on here. Diff introduces a mutex to pf_state,
> which protects array of keys (pf_state::key) bound to state.
> 
> The panic which diff below hopes to fix is caused by a race between timer
> thread, which expires state and pfsync dispatch task, which updates a peer.
> According to data provided by Hrvoje we panic due to NULL pointer dereference
> in pf_state_export(), which finds sk->key[] to be NULL. This may happen 
> because
> purge state mechanism detaches state key from state under protection
> of PF_STATE_LOCK, while pfsync dispatch task just keeps a reference to state
> without using a PF_STATE_LOCK to access a state instance.
> 
> In order to synchronize access to pf_statey::key between purge thread
> and pfsync dispatch task diff below introduces pf_state::mtx.
> pfsync uses pf_state::mtx to attempt to grab references to keys bound
> to state, while purge task uses mtx to safely invalidate state keys
> in pf_detach_state().
> 
> Such change requires pfsync(4) to deal with situation when state
> got detached while waiting in dispatch queue to update a peer.
> We have to a .write() operation on sync-queue to indicate a failure
> so pfsync_sendout() will just skip the state when processing dispatch
> queue.
> 
> Also diff changes pf_state_key_detach() such caller must pass pointer to state
> key instead of key index to be detached from state.  It also requires caller 
> to
> invalidate a state key entry in pf_state::key member.
> 
> I've just smoked tested the diff _without_ pfsync.

Hi,

while booting with this diff I've got this log:

starting early daemons: syslogd pflogd ntpdwitness: lock_object
uninitialized: 0xfffffd8785c81a
90
Starting stack trace...
witness_checkorder(fffffd8785c81a90,9,0) at witness_checkorder+0xad
mtx_enter(fffffd8785c81a80) at mtx_enter+0x34
pf_remove_state(fffffd8785c81988) at pf_remove_state+0x1da
pfsync_in_del_c(fffffd80028977b0,c,2,2) at pfsync_in_del_c+0x9f
pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c
ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103
ip_local(ffff800020b056e8,ffff800020b056f4,fe0000007fff0220,0) at
ip_local+0x1b7
ipintr() at ipintr+0x5f
if_netisr(0) at if_netisr+0xca
taskq_thread(ffff800000036000) at taskq_thread+0x11a
end trace frame: 0x0, count: 247
End of stack trace.
witness: lock_object uninitialized: 0xfffffd8786d61d50
Starting stack trace...
witness_checkorder(fffffd8786d61d50,9,0) at witness_checkorder+0xad
mtx_enter(fffffd8786d61d40) at mtx_enter+0x34
pf_remove_state(fffffd8786d61c48) at pf_remove_state+0x1da
pfsync_in_del_c(fffffd80028d04e0,c,2,2) at pfsync_in_del_c+0x9f
pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c
ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103
ip_local(ffff800020b056e8,ffff800020b056f4,fe00000000000003,0) at
ip_local+0x1b7
ipintr() at ipintr+0x5f
if_netisr(0) at if_netisr+0xca
taskq_thread(ffff800000036000) at taskq_thread+0x11a
end trace frame: 0x0, count: 247
End of stack trace.
witness: lock_object uninitialized: 0xfffffd8786d61bd0
Starting stack trace...
witness_checkorder(fffffd8786d61bd0,9,0) at witness_checkorder+0xad
mtx_enter(fffffd8786d61bc0) at mtx_enter+0x34
pf_remove_state(fffffd8786d61ac8) at pf_remove_state+0x1da
pfsync_in_del_c(fffffd80028d04e0,c,2,2) at pfsync_in_del_c+0x9f
pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c
ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103
ip_local(ffff800020b056e8,ffff800020b056f4,fe00000000000003,0) at
ip_local+0x1b7
ipintr() at ipintr+0x5f
if_netisr(0) at if_netisr+0xca
taskq_thread(ffff800000036000) at taskq_thread+0x11a
end trace frame: 0x0, count: 247
End of stack trace.
witness: lock_object uninitialized: 0xfffffd87846cebc8
Starting stack trace...
witness_checkorder(fffffd87846cebc8,9,0) at witness_checkorder+0xad
mtx_enter(fffffd87846cebb8) at mtx_enter+0x34
pf_remove_state(fffffd87846ceac0) at pf_remove_state+0x1da
pfsync_in_del_c(fffffd8070be0450,c,2,2) at pfsync_in_del_c+0x9f
pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c
ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103
ip_local(ffff800020b056e8,ffff800020b056f4,fe00000000000003,0) at
ip_local+0x1b7
ipintr() at ipintr+0x5f
if_netisr(0) at if_netisr+0xca
taskq_thread(ffff800000036000) at taskq_thread+0x11a
end trace frame: 0x0, count: 247
End of stack trace.
witness: lock_object uninitialized: 0xfffffd87846ce748
Starting stack trace...
witness_checkorder(fffffd87846ce748,9,0) at witness_checkorder+0xad
mtx_enter(fffffd87846ce738) at mtx_enter+0x34
pf_remove_state(fffffd87846ce640) at pf_remove_state+0x1da
pfsync_in_del_c(fffffd8070be0450,c,2,2) at pfsync_in_del_c+0x9f
pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c
ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103
ip_local(ffff800020b056e8,ffff800020b056f4,fe00000000000003,0) at
ip_local+0x1b7
ipintr() at ipintr+0x5f
if_netisr(0) at if_netisr+0xca
taskq_thread(ffff800000036000) at taskq_thread+0x11a
end trace frame: 0x0, count: 247
End of stack trace.
starting RPC daemons:.
savecore: no core dump
checking quotas: done.
clearing /tmp
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
starting network daemons: sshd snmpd dhcpd smtpd.
starting package daemons: lldpd.
starting local daemons: cron.
Tue Jun  7 08:19:28 CEST 2022

Reply via email to