Re: testing request: pf internals rearrangement
I have had both firewalls in a carp/pfsync pair which are running the same snapshot crash; uvm_fault(0xd07f81e0, 0x0, 0, 1) - e kernel: page fault trap, code=0 Stopped at pf_test_rule+0x8a0: movl0x58(%eax),%ecx ddb pf_test_rule(d51efd64,d51efd5c,1,d0ba8500,db53ad00) at pf_test_rule+0x8a0 pf_test(1,d0c54800,d51efe64,0) at pf_test+0x8c1 ipv4_input(db53ad00,d0b9a180,6840,7c10c) at ipv4_input+0x124 ipintr(58,10,d51e0010,d0480010,7c10c) at ipintr+0x64 Bad frame pointer: 0xd51efe7c ddbPID PPID PGRPUID S FLAGS WAIT COMMAND 29392 24940 24940 0 2 0x40100sendmail 897 22729897 0 2 0x4002top 22729 16691 22729 0 3 0x4082 pause ksh 16691 9570 16691 0 2 0x4180sshd 17835 3928 17835 0 3 0x4082 ttyin ksh 3928 9570 3928 0 2 0x4180sshd 6950 1 6950 0 3 0x4082 ttyin getty 27487 1 27487 0 3 0x4082 ttyin getty 1375 1 1375 0 3 0x4082 ttyin getty 11547 1 11547 0 3 0x4082 ttyin getty 3455 1 3455 0 3 0x4082 ttyin getty 4587 1 4587 0 2 0cron 9570 1 9570 0 30x80 selectsshd 24940 1 24940 0 2 0x40100sendmail 138 1138 0 3 0x180 selectinetd 30566 14663 14663 83 2 0x100ntpd 14663 1 14663 0 2 0ntpd 10627 31688 31688 70 3 0x100 uvn_getpage named 31688 1 31688 0 3 0x180 netio named 5097 8724 8724 74 2 0x100pflogd 8724 1 8724 0 30x80 netio pflogd 16751 7245 7245 73 2 0x100syslogd 7245 1 7245 0 30x88 netio syslogd 12 0 0 0 30x100200 bored crypto 11 0 0 0 30x100200 aiodoned aiodoned 10 0 0 0 20x100200update 9 0 0 0 30x100200 cleaner cleaner 8 0 0 0 30x100200 reaperreaper *7 0 0 0 70x100200pagedaemon 6 0 0 0 20x100600pfpurge 5 0 0 0 30x100200 acpi_idle acpi0 4 0 0 0 30x100200 bored syswq 3 0 0 0 30x100200idle0 2 0 0 0 30x100200 km_alloc1wkmthread 1 0 1 0 3 0x4080 wait init 0 -1 0 0 3 0x80200 scheduler swapper Here is the last pf related info I have off of one of the machines before it crashed (less than 30 minutes later): Sun Jun 8 01:30:03 NZST 2008 Status: Enabled for 5 days 05:53:11 Debug: Misc State Table Total Rate current entries 513 searches 112428343 248.1/s inserts 9196082.0/s removals 9190952.0/s Counters match 9735382.1/s bad-offset 00.0/s fragment 00.0/s short 00.0/s normalize 00.0/s memory 00.0/s bad-timestamp 00.0/s congestion 00.0/s ip-option 00.0/s proto-cksum00.0/s state-mismatch 1080.0/s state-insert 00.0/s state-limit00.0/s src-limit 00.0/s synproxy 00.0/s NameSize Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle pfiaddrpl100400 1 0 1 1 0 80 pfrulepl 848 300 10 8 0 8 8 0 82 pfstatepl192 51225660 512206849 04949 0
Re: testing request: pf internals rearrangement
ok it seems to be pf.c line 3397 (mybe littleoffset for you,i have chnges in my tree),which is pool_put(pf_src_tree_pl, nsn); in the second if block after the cleanup label. this is in source node tracking, as in, not new or changed code. crash there means we either tried to pool_put something invalid (double free style?) or we have pool corrpution. * Josh [EMAIL PROTECTED] [2008-06-08 17:11]: I have had both firewalls in a carp/pfsync pair which are running the same snapshot crash; uvm_fault(0xd07f81e0, 0x0, 0, 1) - e kernel: page fault trap, code=0 Stopped at pf_test_rule+0x8a0: movl0x58(%eax),%ecx ddb pf_test_rule(d51efd64,d51efd5c,1,d0ba8500,db53ad00) at pf_test_rule+0x8a0 pf_test(1,d0c54800,d51efe64,0) at pf_test+0x8c1 ipv4_input(db53ad00,d0b9a180,6840,7c10c) at ipv4_input+0x124 ipintr(58,10,d51e0010,d0480010,7c10c) at ipintr+0x64 Bad frame pointer: 0xd51efe7c ddbPID PPID PGRPUID S FLAGS WAIT COMMAND 29392 24940 24940 0 2 0x40100sendmail 897 22729897 0 2 0x4002top 22729 16691 22729 0 3 0x4082 pause ksh 16691 9570 16691 0 2 0x4180sshd 17835 3928 17835 0 3 0x4082 ttyin ksh 3928 9570 3928 0 2 0x4180sshd 6950 1 6950 0 3 0x4082 ttyin getty 27487 1 27487 0 3 0x4082 ttyin getty 1375 1 1375 0 3 0x4082 ttyin getty 11547 1 11547 0 3 0x4082 ttyin getty 3455 1 3455 0 3 0x4082 ttyin getty 4587 1 4587 0 2 0cron 9570 1 9570 0 30x80 selectsshd 24940 1 24940 0 2 0x40100sendmail 138 1138 0 3 0x180 selectinetd 30566 14663 14663 83 2 0x100ntpd 14663 1 14663 0 2 0ntpd 10627 31688 31688 70 3 0x100 uvn_getpage named 31688 1 31688 0 3 0x180 netio named 5097 8724 8724 74 2 0x100pflogd 8724 1 8724 0 30x80 netio pflogd 16751 7245 7245 73 2 0x100syslogd 7245 1 7245 0 30x88 netio syslogd 12 0 0 0 30x100200 bored crypto 11 0 0 0 30x100200 aiodoned aiodoned 10 0 0 0 20x100200update 9 0 0 0 30x100200 cleaner cleaner 8 0 0 0 30x100200 reaperreaper *7 0 0 0 70x100200pagedaemon 6 0 0 0 20x100600pfpurge 5 0 0 0 30x100200 acpi_idle acpi0 4 0 0 0 30x100200 bored syswq 3 0 0 0 30x100200idle0 2 0 0 0 30x100200 km_alloc1wkmthread 1 0 1 0 3 0x4080 wait init 0 -1 0 0 3 0x80200 scheduler swapper Here is the last pf related info I have off of one of the machines before it crashed (less than 30 minutes later): Sun Jun 8 01:30:03 NZST 2008 Status: Enabled for 5 days 05:53:11 Debug: Misc State Table Total Rate current entries 513 searches 112428343 248.1/s inserts 9196082.0/s removals 9190952.0/s Counters match 9735382.1/s bad-offset 00.0/s fragment 00.0/s short 00.0/s normalize 00.0/s memory 00.0/s bad-timestamp 00.0/s congestion 00.0/s ip-option 00.0/s proto-cksum00.0/s state-mismatch 1080.0/s state-insert
Re: testing request: pf internals rearrangement
this was not meant to go to the list, and the analysis was off due to a difference in kernel sources. meanwhile mbalmer went into that bug to and I found it. It is obvious why nobody ran in to it yet; it is ipv6 only. Index: pf.c === RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.579 diff -u -p -r1.579 pf.c --- pf.c2 Jun 2008 11:38:22 - 1.579 +++ pf.c8 Jun 2008 17:13:11 - @@ -3058,7 +3058,8 @@ pf_test_rule(struct pf_rule **rm, struct goto cleanup; } - bip_sum = *pd-ip_sum; + if (pd-ip_sum) + bip_sum = *pd-ip_sum; switch (pd-proto) { case IPPROTO_TCP: -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg Amsterdam