Hi sashan, On Tue, Aug 20, 2019 at 9:42 PM Alexandr Nedvedicky < alexandr.nedvedi...@oracle.com> wrote:
> Hello, > > </snip> > > > >How-To-Repeat: > > # echo "set timeout interval 1" >> /etc/pf.conf > > # pfctl -f /etc/pf.conf > > > > <wait a few seconds> > > > > >Fix: > > Not known. > > I've used same steps to reproduce the panic. This is what I could > see in resulting crash: > > The system died at line 1441 in pf_free_state(): > > 1415 void > 1416 pf_free_state(struct pf_state *cur) > 1417 { > 1418 struct pf_rule_item *ri; > 1419 > 1420 PF_ASSERT_LOCKED(); > .... > 1440 pf_normalize_tcp_cleanup(cur); > 1441 pfi_kif_unref(cur->kif, PFI_KIF_REF_STATE); > 1442 TAILQ_REMOVE(&state_list, cur, entry_list); > 1443 if (cur->tag) > 1444 pf_tag_unref(cur->tag); > 1445 pf_state_unref(cur); > > the instruction, where we died was as follows: > > Stopped at pf_free_state+0xfe: movq %rcx,0x28(%rax) > > there was -1 in rax. the -1 at offset 0x28 indicates the `cur` has been > removed from state_list. To explain how is it possible, we need to step > back to pf_free_state() caller, which is pf_purge_expired_states(): > > 1453 static struct pf_state *cur = NULL; > 1454 struct pf_state *next; > 1455 SLIST_HEAD(pf_state_gcl, pf_state) gcl; > 1456 > 1457 PF_ASSERT_UNLOCKED(); > 1458 SLIST_INIT(&gcl); > 1459 > 1460 PF_STATE_ENTER_READ(); > 1461 while (maxcheck--) { > 1462 /* wrap to start of list when we hit the end */ > 1463 if (cur == NULL) { > 1464 cur = > pf_state_ref(TAILQ_FIRST(&state_list)); > 1465 if (cur == NULL) > 1466 break; /* list empty */ > 1467 } > 1468 > 1469 /* get next state, as cur may get deleted */ > 1470 next = TAILQ_NEXT(cur, entry_list); > 1471 > 1472 if ((cur->timeout == PFTM_UNLINKED) || > 1473 (pf_state_expires(cur) <= time_uptime)) > 1474 SLIST_INSERT_HEAD(&gcl, cur, gc_list); > 1475 else > 1476 pf_state_unref(cur); > 1477 > 1478 cur = pf_state_ref(next); > 1479 } > > I think the problem is the first 'while (maxcheck--)' loop may actually > wrap around the state_list and re-insert the `cur` to garbage collector > list > again. Such sequence of events would match the panic I could see. I think > the right fix is to break from the while loop as soon, as we reach > the end of the state_list. > > I've also noticed yet another minor problem there in loop, which > processes the garbage collector list: > > 1487 while ((next = SLIST_FIRST(&gcl)) != NULL) { > 1488 SLIST_REMOVE_HEAD(&gcl, gc_list); > 1489 if (next->timeout == PFTM_UNLINKED) > 1490 pf_free_state(next); > 1491 else if (pf_state_expires(next) <= time_uptime)) { > 1492 pf_remove_state(next); > 1493 pf_free_state(next); > 1494 } > 1495 > 1496 pf_state_unref(next); > 1497 } > > at line 1491, we don't need to recalculate expiration time. The `next` > item is > on the garbage collector list already, so it must be expired for sure. > > patch below fixes both glitches. I wonder if it will work for you as well. > Thank you for your quick response. I've applied the diff and it worked, no more crashes! Thanks again, --Kor > > thanks for testing > and sorry for inconveniences > > regards > sashan > > --------8<---------------8<---------------8<------------------8<-------- > diff --git a/sys/net/pf.c b/sys/net/pf.c > index bc33fa723ea..8cadd1f53c2 100644 > --- a/sys/net/pf.c > +++ b/sys/net/pf.c > @@ -1476,6 +1476,9 @@ pf_purge_expired_states(u_int32_t maxcheck) > pf_state_unref(cur); > > cur = pf_state_ref(next); > + > + if (cur == NULL) > + break; > } > PF_STATE_EXIT_READ(); > > @@ -1485,7 +1488,7 @@ pf_purge_expired_states(u_int32_t maxcheck) > SLIST_REMOVE_HEAD(&gcl, gc_list); > if (next->timeout == PFTM_UNLINKED) > pf_free_state(next); > - else if (pf_state_expires(next) <= time_uptime) { > + else { > pf_remove_state(next); > pf_free_state(next); > } > > >