Re: pf.conf "set timeout interval 1" causes kernel crash

Kor son of Rynar Tue, 20 Aug 2019 21:42:53 -0700

Hi sashan,

On Tue, Aug 20, 2019 at 9:42 PM Alexandr Nedvedicky <
alexandr.nedvedi...@oracle.com> wrote:


> Hello,
>
> </snip>
>
> > >How-To-Repeat:
> > # echo "set timeout interval 1" >> /etc/pf.conf
> > # pfctl -f /etc/pf.conf
> >
> >   <wait a few seconds>
> >
> > >Fix:
> >         Not known.
>
> I've used same steps to reproduce the panic. This is what I could
> see in resulting crash:
>
> The system died at line 1441 in pf_free_state():
>
>     1415 void
>     1416 pf_free_state(struct pf_state *cur)
>     1417 {
>     1418         struct pf_rule_item *ri;
>     1419
>     1420         PF_ASSERT_LOCKED();
>     ....
>     1440         pf_normalize_tcp_cleanup(cur);
>     1441         pfi_kif_unref(cur->kif, PFI_KIF_REF_STATE);
>     1442         TAILQ_REMOVE(&state_list, cur, entry_list);
>     1443         if (cur->tag)
>     1444                 pf_tag_unref(cur->tag);
>     1445         pf_state_unref(cur);
>
> the instruction, where we died was as follows:
>
>     Stopped at      pf_free_state+0xfe:     movq    %rcx,0x28(%rax)
>
> there was -1 in rax. the -1 at offset 0x28 indicates the `cur` has been
> removed from state_list. To explain how is it possible, we need to step
> back to pf_free_state() caller, which is pf_purge_expired_states():
>
>     1453         static struct pf_state  *cur = NULL;
>     1454         struct pf_state         *next;
>     1455         SLIST_HEAD(pf_state_gcl, pf_state) gcl;
>     1456
>     1457         PF_ASSERT_UNLOCKED();
>     1458         SLIST_INIT(&gcl);
>     1459
>     1460         PF_STATE_ENTER_READ();
>     1461         while (maxcheck--) {
>     1462                 /* wrap to start of list when we hit the end */
>     1463                 if (cur == NULL) {
>     1464                         cur =
> pf_state_ref(TAILQ_FIRST(&state_list));
>     1465                         if (cur == NULL)
>     1466                                 break;  /* list empty */
>     1467                 }
>     1468
>     1469                 /* get next state, as cur may get deleted */
>     1470                 next = TAILQ_NEXT(cur, entry_list);
>     1471
>     1472                 if ((cur->timeout == PFTM_UNLINKED) ||
>     1473                     (pf_state_expires(cur) <= time_uptime))
>     1474                         SLIST_INSERT_HEAD(&gcl, cur, gc_list);
>     1475                 else
>     1476                         pf_state_unref(cur);
>     1477
>     1478                 cur = pf_state_ref(next);
>     1479         }
>
> I think the problem is the first 'while (maxcheck--)' loop may actually
> wrap around the state_list and re-insert the `cur` to garbage collector
> list
> again. Such sequence of events would match the panic I could see. I think
> the right fix is to break from the while loop as soon, as we reach
> the end of the state_list.
>
> I've also noticed yet another minor problem there in loop, which
> processes the garbage collector list:
>
>     1487         while ((next = SLIST_FIRST(&gcl)) != NULL) {
>     1488                 SLIST_REMOVE_HEAD(&gcl, gc_list);
>     1489                 if (next->timeout == PFTM_UNLINKED)
>     1490                         pf_free_state(next);
>     1491                 else if (pf_state_expires(next) <= time_uptime)) {
>     1492                         pf_remove_state(next);
>     1493                         pf_free_state(next);
>     1494                 }
>     1495
>     1496                 pf_state_unref(next);
>     1497         }
>
> at line 1491, we don't need to recalculate expiration time. The `next`
> item is
> on the garbage collector list already, so it must be expired for sure.
>
> patch below fixes both glitches. I wonder if it will work for you as well.
>

Thank you for your quick response.  I've applied the diff and it worked, no
more crashes!

Thanks again,
--Kor


>
> thanks for testing
> and sorry for inconveniences
>
> regards
> sashan
>
> --------8<---------------8<---------------8<------------------8<--------
> diff --git a/sys/net/pf.c b/sys/net/pf.c
> index bc33fa723ea..8cadd1f53c2 100644
> --- a/sys/net/pf.c
> +++ b/sys/net/pf.c
> @@ -1476,6 +1476,9 @@ pf_purge_expired_states(u_int32_t maxcheck)
>                         pf_state_unref(cur);
>
>                 cur = pf_state_ref(next);
> +
> +               if (cur == NULL)
> +                       break;
>         }
>         PF_STATE_EXIT_READ();
>
> @@ -1485,7 +1488,7 @@ pf_purge_expired_states(u_int32_t maxcheck)
>                 SLIST_REMOVE_HEAD(&gcl, gc_list);
>                 if (next->timeout == PFTM_UNLINKED)
>                         pf_free_state(next);
> -               else if (pf_state_expires(next) <= time_uptime) {
> +               else {
>                         pf_remove_state(next);
>                         pf_free_state(next);
>                 }
>
>
>

Re: pf.conf "set timeout interval 1" causes kernel crash

Reply via email to