Hey,

I'm seeing a semi-reproducable panic with pfctl in 5.3-RELEASE.  The
machine is an IPv6 router/firewall, the pf firewall includes rules for
both ipv4 and ipv6.  Sadly, the machine is not running with a debug
kernel, but I do have dumps of the panics.  It's a single processor
machine with one fxp card and one xl card.  IPv4 is only on the xl card,
ipv6 is also native on the xl card and is on a vlan(4) on the fxp card.

I've actually had three panics with this machine today, the latter two of
which were while running pfctl, and the first was when running sysctl -a
(which I guess also looks at pf-related values).  I only have dumps of the
two pfctl panics, but I'm not sure the dumps can yield any secrets as in
both cases the stacks seem to have been smashed.  However, it does seem
that both panics occured in rn_walktree.

First panic:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x8
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc067ec6a
stack pointer           = 0x10:0xc8856820
frame pointer           = 0x10:0xc885682c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1060 (pfctl)
trap number             = 12
panic: page fault
Uptime: 2h48m2s
Dumping 128 MB
 16 32 48 64 80 96 112

(kgdb) bt
#0  0xc060bc12 in doadump ()
#1  0xc060c1e5 in boot ()
#2  0xc060c4a1 in panic ()
#3  0xc07b6060 in trap_fatal ()
#4  0xc07b5dcb in trap_pfault ()
#5  0xc07b5a0d in trap ()
#6  0xc07a5a9a in calltrap ()
#7  0xc0670018 in if_attachdomain1 ()
#8  0xc15bc90f in ?? ()
#9  0xc159a300 in ?? ()
#10 0xc15bcfe0 in ?? ()
#11 0xc8856840 in ?? ()
#12 0x00000000 in ?? ()
#13 0x00000000 in ?? ()
#14 0x00000000 in ?? ()
#15 0x00000000 in ?? ()
#16 0xc15e44d8 in ?? ()
#17 0xc15e44d8 in ?? ()
(etc etc - another 40 "??" lines not inside fucntions)

see below 2nd packet for more info on this panic.


Second panic:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc13f53
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc067ec6a
stack pointer           = 0x10:0xc8842818
frame pointer           = 0x10:0xc8842824
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 564 (pfctl)
trap number             = 12
panic: page fault
Uptime: 9m54s
Dumping 128 MB
 16 32 48 64 80 96 112

(kgdb) bt
#0  0xc060bc12 in doadump ()
#1  0xc060c1e5 in boot ()
#2  0xc060c4a1 in panic ()
#3  0xc07b6060 in trap_fatal ()
#4  0xc07b5dcb in trap_pfault ()
#5  0xc07b5a0d in trap ()
#6  0xc07a5a9a in calltrap ()
#7  0x00000018 in ?? ()
#8  0x00000010 in ?? ()
#9  0xc1750010 in ?? ()
#10 0xc8842838 in ?? ()
#11 0xc16091f0 in ?? ()
#12 0xc8842824 in ?? ()
#13 0xc8842804 in ?? ()
#14 0xc1594734 in ?? ()
#15 0x00c13f4b in ?? ()
#16 0xc159475d in ?? ()
#17 0xc159475d in ?? ()
#18 0x0000000c in ?? ()
#19 0x00000000 in ?? ()
#20 0xc067ec6a in rn_walktree ()
#21 0xc15cd890 in ?? ()
#22 0xc1594700 in ?? ()
#23 0xc15cdfe0 in ?? ()
#24 0xc8842838 in ?? ()
#25 0x00000002 in ?? ()
#26 0xc884286c in ?? ()
#27 0x00000006 in ?? ()
#28 0x00000000 in ?? ()
(etc etc - another 60 lines not inside functions)

In both cases the panic was caused at instruction pointer 0xc067ec6a,
which seems to be line 1069 in rn_walktree (i'll spare you the
disassembly, just trust me :)

  1062          for (;;) {
  1063                  base = rn;
  1064                  /* If at right child go back up, otherwise, go right */
  1065                  while (rn->rn_parent->rn_right == rn
  1066                         && (rn->rn_flags & RNF_ROOT) == 0)
  1067                          rn = rn->rn_parent;
  1068                  /* Find the next *leaf* since next node might vanish, 
too */
  1069  --->            for (rn = rn->rn_parent->rn_right; rn->rn_bit >= 0;)
  1070                          rn = rn->rn_left;
  1071                  next = rn;
  1072                  /* Process leaves */
  1073                  while ((rn = base)) {
  1074                          base = rn->rn_dupedkey;
  1075                          if (!(rn->rn_flags & RNF_ROOT)
  1076                              && (error = (*f)(rn, w)))
  1077                                  return (error);
  1078                  }
  1079                  rn = next;
  1080                  if (rn->rn_flags & RNF_ROOT)
  1081                          return (0);
  1082          }

(specifically, it's the read of rn->rn_bit that triggers the panic)

And at this point I can't really investigate this further.  I guess the
radix tree is corrupt, but I suspect it's going to be hard to establish
what events actually happened leading up to the panic.

I'm relatively happy that the hardware is OK, it's been running linux for
over a year without problem.  I'm happy to do any more debugging or
investigation with the core files, or to supply the machine's config (eg
firewall ruleset) and core files to "trusted" people.  I will compile up a
debug kernel, but would appreciate any suggestions if anyone has any.

Gavin

_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to