Hello, On Fri, Jun 10, 2022 at 09:54:47AM +1000, David Gwynne wrote: > > i upgraded one of the work firewalls to -current and added the diff > below in, and got what looks like a different panic: > > ddb{6}> tr > db_enter() at db_enter+0x5 > panic(ffffffff81e2cc31) at panic+0xbf > __assert(ffffffff81eae23b,ffffffff81ee4549,797,ffffffff81e7fd91) at > __assert+0x25 > pfsync_insert_state(fffffd816375b720) at pfsync_insert_state+0xec > pf_state_insert(ffff8000001f0000,ffff800024cd9cc0,ffff800024cd9cb8,fffffd816375b720) > at pf_state_insert+0x2df > pf_test_rule(ffff800024cd9e50,ffff800024cd9e38,ffff800024cd9e30,ffff800024cd9e40,ffff800024cd9e28,ffff800024cd9e4e,1) > at pf_test_rule+0xec4 > pf_test(2,3,ffff8000016ab800,ffff800024cd9fe8) at pf_test+0x1126 > ip_output(fffffd8052dae400,0,ffff800024cda0b0,1,0,0,ffff800001640801) at > ip_out put+0x72a > ip_forward(fffffd8052dae400,ffff800001640800,fffffd81840e2ad0,0) at > ip_forward+0x286 > ip_input_if(ffff800024cda2e0,ffff800024cda2dc,4,0,ffff800001640800) at > ip_input_if+0x347 > ipv4_input(ffff800001640800,fffffd8052dae400) at ipv4_input+0x37 > ether_input(ffff800001640800,fffffd8052dae400) at ether_input+0x394 > carp_input(ffff8000016a2000,fffffd8052dae400,5e000156) at carp_input+0x186 > ether_input(ffff8000016a2000,fffffd8052dae400) at ether_input+0x1c5 > vlan_input(ffff80000161a000,fffffd8052dae400,ffff800024cda4fc) at > vlan_input+0x22d > ether_input(ffff80000161a000,fffffd8052dae400) at ether_input+0x83 > if_input_process(ffff80000019a048,ffff800024cda588) at if_input_process+0x4a > ifiq_process(ffff8000001f0800) at ifiq_process+0x8e > taskq_thread(ffff80000002c080) at taskq_thread+0xfa > end trace frame: 0x0, count: -19 > ddb{6}> sh panic > *cpu6: kernel diagnostic assertion "st->sync_state == PFSYNC_S_NONE" failed: > file "/usr/src/sys/net/if_pfsync.c", line 1943 > > i'll try and have a look at it. i am probably most responsible for the > code :( >
I have a theory. This is where things start to go downhill. snippet comes from pf_state_insert(): 1036 pf_state_list_insert(&pf_state_list, s); 1037 pf_status.fcounters[FCNT_STATE_INSERT]++; 1038 pf_status.states++; 1039 pfi_kif_ref(kif, PFI_KIF_REF_STATE); 1040 PF_STATE_EXIT_WRITE(); 1041 #if NPFSYNC > 0 1042 pfsync_insert_state(s); 1043 #endif /* NPFSYNC > 0 */ 1044 return (0); 1045 } is there any chance the state we've just inserted would be found by another packet, which might be running on other CPU? packets which do state look up run as readers on state lock. if this happens then the assertion might not be true any more, because the other packet/cpu could alter ::sync_state member from PFSYNC_S_NONE state. just a thought. regards sashan