On 7.6.2022. 2:16, Alexandr Nedvedicky wrote: > Hello, > > below is a diff which hopes to fix the issue. Although diff is fairly > large the change itself is kind of straightforward. Let me briefly > explain what's going on here. Diff introduces a mutex to pf_state, > which protects array of keys (pf_state::key) bound to state. > > The panic which diff below hopes to fix is caused by a race between timer > thread, which expires state and pfsync dispatch task, which updates a peer. > According to data provided by Hrvoje we panic due to NULL pointer dereference > in pf_state_export(), which finds sk->key[] to be NULL. This may happen > because > purge state mechanism detaches state key from state under protection > of PF_STATE_LOCK, while pfsync dispatch task just keeps a reference to state > without using a PF_STATE_LOCK to access a state instance. > > In order to synchronize access to pf_statey::key between purge thread > and pfsync dispatch task diff below introduces pf_state::mtx. > pfsync uses pf_state::mtx to attempt to grab references to keys bound > to state, while purge task uses mtx to safely invalidate state keys > in pf_detach_state(). > > Such change requires pfsync(4) to deal with situation when state > got detached while waiting in dispatch queue to update a peer. > We have to a .write() operation on sync-queue to indicate a failure > so pfsync_sendout() will just skip the state when processing dispatch > queue. > > Also diff changes pf_state_key_detach() such caller must pass pointer to state > key instead of key index to be detached from state. It also requires caller > to > invalidate a state key entry in pf_state::key member. > > I've just smoked tested the diff _without_ pfsync.
Hi, while booting with this diff I've got this log: starting early daemons: syslogd pflogd ntpdwitness: lock_object uninitialized: 0xfffffd8785c81a 90 Starting stack trace... witness_checkorder(fffffd8785c81a90,9,0) at witness_checkorder+0xad mtx_enter(fffffd8785c81a80) at mtx_enter+0x34 pf_remove_state(fffffd8785c81988) at pf_remove_state+0x1da pfsync_in_del_c(fffffd80028977b0,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103 ip_local(ffff800020b056e8,ffff800020b056f4,fe0000007fff0220,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(ffff800000036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. witness: lock_object uninitialized: 0xfffffd8786d61d50 Starting stack trace... witness_checkorder(fffffd8786d61d50,9,0) at witness_checkorder+0xad mtx_enter(fffffd8786d61d40) at mtx_enter+0x34 pf_remove_state(fffffd8786d61c48) at pf_remove_state+0x1da pfsync_in_del_c(fffffd80028d04e0,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103 ip_local(ffff800020b056e8,ffff800020b056f4,fe00000000000003,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(ffff800000036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. witness: lock_object uninitialized: 0xfffffd8786d61bd0 Starting stack trace... witness_checkorder(fffffd8786d61bd0,9,0) at witness_checkorder+0xad mtx_enter(fffffd8786d61bc0) at mtx_enter+0x34 pf_remove_state(fffffd8786d61ac8) at pf_remove_state+0x1da pfsync_in_del_c(fffffd80028d04e0,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103 ip_local(ffff800020b056e8,ffff800020b056f4,fe00000000000003,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(ffff800000036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. witness: lock_object uninitialized: 0xfffffd87846cebc8 Starting stack trace... witness_checkorder(fffffd87846cebc8,9,0) at witness_checkorder+0xad mtx_enter(fffffd87846cebb8) at mtx_enter+0x34 pf_remove_state(fffffd87846ceac0) at pf_remove_state+0x1da pfsync_in_del_c(fffffd8070be0450,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103 ip_local(ffff800020b056e8,ffff800020b056f4,fe00000000000003,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(ffff800000036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. witness: lock_object uninitialized: 0xfffffd87846ce748 Starting stack trace... witness_checkorder(fffffd87846ce748,9,0) at witness_checkorder+0xad mtx_enter(fffffd87846ce738) at mtx_enter+0x34 pf_remove_state(fffffd87846ce640) at pf_remove_state+0x1da pfsync_in_del_c(fffffd8070be0450,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(ffff800020b056e8,ffff800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(ffff800020b056e8,ffff800020b056f4,f0,2) at ip_deliver+0x103 ip_local(ffff800020b056e8,ffff800020b056f4,fe00000000000003,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(ffff800000036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. starting RPC daemons:. savecore: no core dump checking quotas: done. clearing /tmp kern.securelevel: 0 -> 1 creating runtime link editor directory cache. preserving editor files. starting network daemons: sshd snmpd dhcpd smtpd. starting package daemons: lldpd. starting local daemons: cron. Tue Jun 7 08:19:28 CEST 2022