Hello,

On Fri, Jun 10, 2022 at 09:54:47AM +1000, David Gwynne wrote:
> 
> i upgraded one of the work firewalls to -current and added the diff
> below in, and got what looks like a different panic:
> 
> ddb{6}> tr
> db_enter() at db_enter+0x5
> panic(ffffffff81e2cc31) at panic+0xbf
> __assert(ffffffff81eae23b,ffffffff81ee4549,797,ffffffff81e7fd91) at 
> __assert+0x25
> pfsync_insert_state(fffffd816375b720) at pfsync_insert_state+0xec
> pf_state_insert(ffff8000001f0000,ffff800024cd9cc0,ffff800024cd9cb8,fffffd816375b720)
>  at pf_state_insert+0x2df
> pf_test_rule(ffff800024cd9e50,ffff800024cd9e38,ffff800024cd9e30,ffff800024cd9e40,ffff800024cd9e28,ffff800024cd9e4e,1)
>  at pf_test_rule+0xec4
> pf_test(2,3,ffff8000016ab800,ffff800024cd9fe8) at pf_test+0x1126
> ip_output(fffffd8052dae400,0,ffff800024cda0b0,1,0,0,ffff800001640801) at 
> ip_out put+0x72a
> ip_forward(fffffd8052dae400,ffff800001640800,fffffd81840e2ad0,0) at 
> ip_forward+0x286
> ip_input_if(ffff800024cda2e0,ffff800024cda2dc,4,0,ffff800001640800) at 
> ip_input_if+0x347
> ipv4_input(ffff800001640800,fffffd8052dae400) at ipv4_input+0x37
> ether_input(ffff800001640800,fffffd8052dae400) at ether_input+0x394
> carp_input(ffff8000016a2000,fffffd8052dae400,5e000156) at carp_input+0x186
> ether_input(ffff8000016a2000,fffffd8052dae400) at ether_input+0x1c5
> vlan_input(ffff80000161a000,fffffd8052dae400,ffff800024cda4fc) at 
> vlan_input+0x22d
> ether_input(ffff80000161a000,fffffd8052dae400) at ether_input+0x83
> if_input_process(ffff80000019a048,ffff800024cda588) at if_input_process+0x4a
> ifiq_process(ffff8000001f0800) at ifiq_process+0x8e
> taskq_thread(ffff80000002c080) at taskq_thread+0xfa
> end trace frame: 0x0, count: -19
> ddb{6}> sh panic
> *cpu6: kernel diagnostic assertion "st->sync_state == PFSYNC_S_NONE" failed: 
> file "/usr/src/sys/net/if_pfsync.c", line 1943
> 
> i'll try and have a look at it. i am probably most responsible for the
> code :(
> 

    I have a theory. This is where things start to go downhill. snippet comes
    from pf_state_insert():

1036         pf_state_list_insert(&pf_state_list, s);
1037         pf_status.fcounters[FCNT_STATE_INSERT]++;
1038         pf_status.states++;
1039         pfi_kif_ref(kif, PFI_KIF_REF_STATE);
1040         PF_STATE_EXIT_WRITE();
1041 #if NPFSYNC > 0
1042         pfsync_insert_state(s);
1043 #endif  /* NPFSYNC > 0 */
1044         return (0);
1045 }


    is there any chance the state we've just inserted would be found by
    another packet, which might be running on other CPU? packets which
    do state look up run as readers on state lock.

    if this happens then the assertion might not be true any more,
    because the other packet/cpu could alter ::sync_state member
    from PFSYNC_S_NONE state.

just a thought.

regards
sashan

Reply via email to