Re: testing request: pf internals rearrangement

2008-06-08 Thread Josh
I have had both firewalls in a carp/pfsync pair which are running the same 
snapshot crash;


uvm_fault(0xd07f81e0, 0x0, 0, 1) - e
kernel: page fault trap, code=0
Stopped at  pf_test_rule+0x8a0: movl0x58(%eax),%ecx
ddb pf_test_rule(d51efd64,d51efd5c,1,d0ba8500,db53ad00) at pf_test_rule+0x8a0
pf_test(1,d0c54800,d51efe64,0) at pf_test+0x8c1
ipv4_input(db53ad00,d0b9a180,6840,7c10c) at ipv4_input+0x124
ipintr(58,10,d51e0010,d0480010,7c10c) at ipintr+0x64
Bad frame pointer: 0xd51efe7c
ddbPID   PPID   PGRPUID  S   FLAGS  WAIT  COMMAND 
 29392  24940  24940  0  2 0x40100sendmail
   897  22729897  0  2  0x4002top 
 22729  16691  22729  0  3  0x4082  pause ksh 
 16691   9570  16691  0  2  0x4180sshd
 17835   3928  17835  0  3  0x4082  ttyin ksh 
  3928   9570   3928  0  2  0x4180sshd
  6950  1   6950  0  3  0x4082  ttyin getty   
 27487  1  27487  0  3  0x4082  ttyin getty   
  1375  1   1375  0  3  0x4082  ttyin getty   
 11547  1  11547  0  3  0x4082  ttyin getty   
  3455  1   3455  0  3  0x4082  ttyin getty   
  4587  1   4587  0  2   0cron
  9570  1   9570  0  30x80  selectsshd
 24940  1  24940  0  2 0x40100sendmail
   138  1138  0  3   0x180  selectinetd   
 30566  14663  14663 83  2   0x100ntpd
 14663  1  14663  0  2   0ntpd
 10627  31688  31688 70  3   0x100  uvn_getpage   named   
 31688  1  31688  0  3   0x180  netio named   
  5097   8724   8724 74  2   0x100pflogd  
  8724  1   8724  0  30x80  netio pflogd  
 16751   7245   7245 73  2   0x100syslogd 
  7245  1   7245  0  30x88  netio syslogd 
12  0  0  0  30x100200  bored crypto  
11  0  0  0  30x100200  aiodoned  aiodoned
10  0  0  0  20x100200update  
 9  0  0  0  30x100200  cleaner   cleaner 
 8  0  0  0  30x100200  reaperreaper  
*7  0  0  0  70x100200pagedaemon  
 6  0  0  0  20x100600pfpurge 
 5  0  0  0  30x100200  acpi_idle acpi0   
 4  0  0  0  30x100200  bored syswq   
 3  0  0  0  30x100200idle0   
 2  0  0  0  30x100200  km_alloc1wkmthread
 1  0  1  0  3  0x4080  wait  init
 0 -1  0  0  3 0x80200  scheduler swapper  



Here is the last pf related info I have off of one of the machines
before it crashed (less than 30 minutes later):


Sun Jun  8 01:30:03 NZST 2008

Status: Enabled for 5 days 05:53:11 Debug: Misc

State Table  Total Rate
  current entries  513   
  searches   112428343  248.1/s
  inserts   9196082.0/s
  removals  9190952.0/s
Counters
  match 9735382.1/s
  bad-offset 00.0/s
  fragment   00.0/s
  short  00.0/s
  normalize  00.0/s
  memory 00.0/s
  bad-timestamp  00.0/s
  congestion 00.0/s
  ip-option  00.0/s
  proto-cksum00.0/s
  state-mismatch   1080.0/s
  state-insert   00.0/s
  state-limit00.0/s
  src-limit  00.0/s
  synproxy   00.0/s

NameSize Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
Maxpg Idle
pfiaddrpl100400 1 0 1 1 0
80
pfrulepl 848   300   10 8 0 8 8 0
82
pfstatepl192  51225660  512206849 04949 0

Re: testing request: pf internals rearrangement

2008-06-08 Thread Henning Brauer
ok it seems to be pf.c line 3397 (mybe littleoffset for you,i have 
chnges in my tree),which is
pool_put(pf_src_tree_pl, nsn);
in the second if block after the cleanup label. this is in source node 
tracking, as in, not new or changed code. crash there means we either 
tried to pool_put something invalid (double free style?) or we have 
pool corrpution.


* Josh [EMAIL PROTECTED] [2008-06-08 17:11]:
 I have had both firewalls in a carp/pfsync pair which are running the same 
 snapshot crash;
 
 
 uvm_fault(0xd07f81e0, 0x0, 0, 1) - e
 kernel: page fault trap, code=0
 Stopped at  pf_test_rule+0x8a0: movl0x58(%eax),%ecx
 ddb pf_test_rule(d51efd64,d51efd5c,1,d0ba8500,db53ad00) at pf_test_rule+0x8a0
 pf_test(1,d0c54800,d51efe64,0) at pf_test+0x8c1
 ipv4_input(db53ad00,d0b9a180,6840,7c10c) at ipv4_input+0x124
 ipintr(58,10,d51e0010,d0480010,7c10c) at ipintr+0x64
 Bad frame pointer: 0xd51efe7c
 ddbPID   PPID   PGRPUID  S   FLAGS  WAIT  COMMAND
  
  29392  24940  24940  0  2 0x40100sendmail
897  22729897  0  2  0x4002top 
  22729  16691  22729  0  3  0x4082  pause ksh 
  16691   9570  16691  0  2  0x4180sshd
  17835   3928  17835  0  3  0x4082  ttyin ksh 
   3928   9570   3928  0  2  0x4180sshd
   6950  1   6950  0  3  0x4082  ttyin getty   
  27487  1  27487  0  3  0x4082  ttyin getty   
   1375  1   1375  0  3  0x4082  ttyin getty   
  11547  1  11547  0  3  0x4082  ttyin getty   
   3455  1   3455  0  3  0x4082  ttyin getty   
   4587  1   4587  0  2   0cron
   9570  1   9570  0  30x80  selectsshd
  24940  1  24940  0  2 0x40100sendmail
138  1138  0  3   0x180  selectinetd   
  30566  14663  14663 83  2   0x100ntpd
  14663  1  14663  0  2   0ntpd
  10627  31688  31688 70  3   0x100  uvn_getpage   named   
  31688  1  31688  0  3   0x180  netio named   
   5097   8724   8724 74  2   0x100pflogd  
   8724  1   8724  0  30x80  netio pflogd  
  16751   7245   7245 73  2   0x100syslogd 
   7245  1   7245  0  30x88  netio syslogd 
 12  0  0  0  30x100200  bored crypto  
 11  0  0  0  30x100200  aiodoned  aiodoned
 10  0  0  0  20x100200update  
  9  0  0  0  30x100200  cleaner   cleaner 
  8  0  0  0  30x100200  reaperreaper  
 *7  0  0  0  70x100200pagedaemon  
  6  0  0  0  20x100600pfpurge 
  5  0  0  0  30x100200  acpi_idle acpi0   
  4  0  0  0  30x100200  bored syswq   
  3  0  0  0  30x100200idle0   
  2  0  0  0  30x100200  km_alloc1wkmthread
  1  0  1  0  3  0x4080  wait  init
  0 -1  0  0  3 0x80200  scheduler swapper  
 
 
 
 Here is the last pf related info I have off of one of the machines
 before it crashed (less than 30 minutes later):
 
 
 Sun Jun  8 01:30:03 NZST 2008
 
 Status: Enabled for 5 days 05:53:11 Debug: Misc
 
 State Table  Total Rate
   current entries  513   
   searches   112428343  248.1/s
   inserts   9196082.0/s
   removals  9190952.0/s
 Counters
   match 9735382.1/s
   bad-offset 00.0/s
   fragment   00.0/s
   short  00.0/s
   normalize  00.0/s
   memory 00.0/s
   bad-timestamp  00.0/s
   congestion 00.0/s
   ip-option  00.0/s
   proto-cksum00.0/s
   state-mismatch   1080.0/s
   state-insert  

Re: testing request: pf internals rearrangement

2008-06-08 Thread Henning Brauer
this was not meant to go to the list, and the analysis was off due to a 
difference in kernel sources. meanwhile mbalmer went into that bug to 
and I found it. It is obvious why nobody ran in to it yet; it is ipv6 
only.

Index: pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.579
diff -u -p -r1.579 pf.c
--- pf.c2 Jun 2008 11:38:22 -   1.579
+++ pf.c8 Jun 2008 17:13:11 -
@@ -3058,7 +3058,8 @@ pf_test_rule(struct pf_rule **rm, struct
goto cleanup;
}
 
-   bip_sum = *pd-ip_sum;
+   if (pd-ip_sum)
+   bip_sum = *pd-ip_sum;
 
switch (pd-proto) {
case IPPROTO_TCP:


-- 
Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg  Amsterdam