Re: panic: no appropriate pool

2015-08-03 Thread Mike Belopuhov
On Mon, Aug 03, 2015 at 11:14 +1000, Jonathan Gray wrote:
> On Mon, Aug 03, 2015 at 12:55:46AM +0200, Mike Belopuhov wrote:
> > On 2 August 2015 at 22:00, RD Thrush  wrote:
> > > On 08/02/15 13:37, Mike Belopuhov wrote:
> > >> most likely it's triggered by the reply-to statement.  you may try the 
> > >> attached
> > >> diff to see which rule the state belongs to. since you're using
> > >> anchors, figuring
> > >> out rule numbers will not be easy but you may try to see if one of those 
> > >> give
> > >> you something reasonable:
> > >>
> > >>  pfctl -a '*' -vvsr
> > >>  pfctl -a 'ext1' -vvsr
> > >>  pfctl -a 'ext2' -vvsr
> > >
> > > Thanks, "panic: no appropriate pool for 23/23" is the new result.  Since 
> > > the main pf has less than 23 rules and only one of the anchors has an 
> > > active interface, I assume it's rule 23 from the ext1 anchor.  I've 
> > > attached the pfctl results from above as well as a short gdb session w/ 
> > > the crash dump.
> > >
> > > panic: no appropriate pool for 23/23
> > 
> > thanks for testing.  rule 23 is a reply-to rule.  jonathan, if
> > you don't object, i think we should commit the dif as is at least
> > for the release.
> > 
> 
> Well if we want to do that the diff should really be a return where
> the panic is.
>

Ultimately it's the same fix, but sure, go ahead with this version.
OK mikeb

> Index: pf_lb.c
> ===
> RCS file: /cvs/src/sys/net/pf_lb.c,v
> retrieving revision 1.48
> diff -u -p -r1.48 pf_lb.c
> --- pf_lb.c   20 Jul 2015 18:42:08 -  1.48
> +++ pf_lb.c   3 Aug 2015 01:13:02 -
> @@ -873,7 +873,7 @@ pf_postprocess_addr(struct pf_state *cur
>   else if (nr->route.addr.type != PF_ADDR_NONE)
>   rpool = nr->route;
>   else
> - panic("no appropriate pool");
> + return (0);
>  
>   if (((rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_LEASTSTATES))
>   return (0);



Re: panic: no appropriate pool

2015-08-02 Thread Joerg Jung


> Am 03.08.2015 um 03:14 schrieb Jonathan Gray :
> 
>> On Mon, Aug 03, 2015 at 12:55:46AM +0200, Mike Belopuhov wrote:
>>> On 2 August 2015 at 22:00, RD Thrush  wrote:
 On 08/02/15 13:37, Mike Belopuhov wrote:
 most likely it's triggered by the reply-to statement.  you may try the 
 attached
 diff to see which rule the state belongs to. since you're using
 anchors, figuring
 out rule numbers will not be easy but you may try to see if one of those 
 give
 you something reasonable:
 
 pfctl -a '*' -vvsr
 pfctl -a 'ext1' -vvsr
 pfctl -a 'ext2' -vvsr
>>> 
>>> Thanks, "panic: no appropriate pool for 23/23" is the new result.  Since 
>>> the main pf has less than 23 rules and only one of the anchors has an 
>>> active interface, I assume it's rule 23 from the ext1 anchor.  I've 
>>> attached the pfctl results from above as well as a short gdb session w/ the 
>>> crash dump.
>>> 
>>> panic: no appropriate pool for 23/23
>> 
>> thanks for testing.  rule 23 is a reply-to rule.  jonathan, if
>> you don't object, i think we should commit the dif as is at least
>> for the release.
> 
> Well if we want to do that the diff should really be a return where
> the panic is.

So the pf_postprocess_addr() only reduces state counters in the least-states 
case.

As far as I understand, in the given triggering ruleset there is no 
least-states involved 
with mentioned rule 23 thus the function returns two lines later anyway, right?

However, panic() seems too much here, as in worst case decrease of least-states 
counters
just may not work (and thus somewhat breaks least-states).

So, ok jung@ for the diff below.

> Index: pf_lb.c
> ===
> RCS file: /cvs/src/sys/net/pf_lb.c,v
> retrieving revision 1.48
> diff -u -p -r1.48 pf_lb.c
> --- pf_lb.c20 Jul 2015 18:42:08 -1.48
> +++ pf_lb.c3 Aug 2015 01:13:02 -
> @@ -873,7 +873,7 @@ pf_postprocess_addr(struct pf_state *cur
>else if (nr->route.addr.type != PF_ADDR_NONE)
>rpool = nr->route;
>else
> -panic("no appropriate pool");
> +return (0);
> 
>if (((rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_LEASTSTATES))
>return (0);
> 



Re: panic: no appropriate pool

2015-08-02 Thread Jonathan Gray
On Mon, Aug 03, 2015 at 12:55:46AM +0200, Mike Belopuhov wrote:
> On 2 August 2015 at 22:00, RD Thrush  wrote:
> > On 08/02/15 13:37, Mike Belopuhov wrote:
> >> most likely it's triggered by the reply-to statement.  you may try the 
> >> attached
> >> diff to see which rule the state belongs to. since you're using
> >> anchors, figuring
> >> out rule numbers will not be easy but you may try to see if one of those 
> >> give
> >> you something reasonable:
> >>
> >>  pfctl -a '*' -vvsr
> >>  pfctl -a 'ext1' -vvsr
> >>  pfctl -a 'ext2' -vvsr
> >
> > Thanks, "panic: no appropriate pool for 23/23" is the new result.  Since 
> > the main pf has less than 23 rules and only one of the anchors has an 
> > active interface, I assume it's rule 23 from the ext1 anchor.  I've 
> > attached the pfctl results from above as well as a short gdb session w/ the 
> > crash dump.
> >
> > panic: no appropriate pool for 23/23
> 
> thanks for testing.  rule 23 is a reply-to rule.  jonathan, if
> you don't object, i think we should commit the dif as is at least
> for the release.
> 

Well if we want to do that the diff should really be a return where
the panic is.

Index: pf_lb.c
===
RCS file: /cvs/src/sys/net/pf_lb.c,v
retrieving revision 1.48
diff -u -p -r1.48 pf_lb.c
--- pf_lb.c 20 Jul 2015 18:42:08 -  1.48
+++ pf_lb.c 3 Aug 2015 01:13:02 -
@@ -873,7 +873,7 @@ pf_postprocess_addr(struct pf_state *cur
else if (nr->route.addr.type != PF_ADDR_NONE)
rpool = nr->route;
else
-   panic("no appropriate pool");
+   return (0);
 
if (((rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_LEASTSTATES))
return (0);



Re: panic: no appropriate pool

2015-08-02 Thread Mike Belopuhov
On 2 August 2015 at 22:00, RD Thrush  wrote:
> On 08/02/15 13:37, Mike Belopuhov wrote:
>> most likely it's triggered by the reply-to statement.  you may try the 
>> attached
>> diff to see which rule the state belongs to. since you're using
>> anchors, figuring
>> out rule numbers will not be easy but you may try to see if one of those give
>> you something reasonable:
>>
>>  pfctl -a '*' -vvsr
>>  pfctl -a 'ext1' -vvsr
>>  pfctl -a 'ext2' -vvsr
>
> Thanks, "panic: no appropriate pool for 23/23" is the new result.  Since the 
> main pf has less than 23 rules and only one of the anchors has an active 
> interface, I assume it's rule 23 from the ext1 anchor.  I've attached the 
> pfctl results from above as well as a short gdb session w/ the crash dump.
>
> panic: no appropriate pool for 23/23

thanks for testing.  rule 23 is a reply-to rule.  jonathan, if
you don't object, i think we should commit the dif as is at least
for the release.



Re: panic: no appropriate pool

2015-08-02 Thread RD Thrush
On 08/02/15 13:37, Mike Belopuhov wrote:
> On 2 August 2015 at 15:28, RD Thrush  wrote:
>> On 08/01/15 19:31, Jonathan Gray wrote:
>>> On Sat, Aug 01, 2015 at 08:46:00PM +0200, Mike Belopuhov wrote:
   [... snip ...]
 You're slightly overanalyzing here: panic has caught the unhandled
 case, but it's not needed per se.

>>>
>>> The code directly after the panic assumes rpool is set.
>>> Something is clearly wrong in the pf code if this triggers.
>>>
>>> Without a pf.conf it is hard to guess as to why this triggers...
>>
>> I've attached a partially sanitized concatenation of pf rules, ifconfig, 
>> netstat -nr, cat /etc/hostname.$if.  Please let me know what more info would 
>> be helpful.
>>
>> FWIW, this firewall has been operating successfully with snaps for many 
>> years.  The pf configuration is not tuned as it is somewhat a testbed with 
>> an accumulation of various failed/successful experiments.  Also, the urtwn 
>> interface has been removed for at least the past month so treat the 
>> associated rules accordingly.
>>
> 
> most likely it's triggered by the reply-to statement.  you may try the 
> attached
> diff to see which rule the state belongs to. since you're using
> anchors, figuring
> out rule numbers will not be easy but you may try to see if one of those give
> you something reasonable:
> 
>  pfctl -a '*' -vvsr
>  pfctl -a 'ext1' -vvsr
>  pfctl -a 'ext2' -vvsr

Thanks, "panic: no appropriate pool for 23/23" is the new result.  Since the 
main pf has less than 23 rules and only one of the anchors has an active 
interface, I assume it's rule 23 from the ext1 anchor.  I've attached the pfctl 
results from above as well as a short gdb session w/ the crash dump.

panic: no appropriate pool for 23/23
Stopped at  Debugger+0x7:   leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb> trace
Debugger(d09d272c,f5233e18,d09a5231,f5233e18,0) at Debugger+0x7
panic(d09a5231,17,17,d03b7608,d179ec30) at panic+0x71
pf_postprocess_addr(d5daf1c4,d09a3978,751611d9,d608fba0,f5233f40) at 
pf_postprocess_addr+0x2db
pf_unlink_state(d5daf1c4,8,0,0,0) at pf_unlink_state+0x2f
pf_purge_expired_states(8,20,d09a3b9f,64,d608fba0) at 
pf_purge_expired_states+0x8e
pf_purge_thread(d608fba0) at pf_purge_thread+0x66
ddb> ps
   PID   PPID   PGRPUID  S   FLAGS  WAIT  COMMAND
 28959  29744  29744  0  30x82  nanosleep newsyslog
 29744  15742  29744  0  30x8a  pause sh
 15742  29063  29063  0  30x80  piperdcron
 30635   3057  18902   1000  30x82  nanosleep sleep
 10521   3942  10521   1000  30x83  ttyin bash
  3942  13206  13206   1000  30x90  selectsshd
 13206  10345  13206  0  30x92  poll  sshd
 24303  1  24303  0  30x83  ttyin getty
  3057  18902  18902   1000  30x8a  pause sh
 18902  17843  18902   1000  30x8a  pause sh
 17843  29063  29063  0  30x80  piperdcron
 29063  1  29063  0  30x80  poll  cron
 14529  1  18325750  30x81  nanosleep perl
 31306  19765  19765606  30x90  kqreadladvd
 19765  1  19765  0  30x80  kqreadladvd
   628  1628 99  30x90  poll  sndiod
  5370  1   5370 79  30x90  kqreadtftpd
 26243  17852  17852 67  30x90  kqreadhttpd
  5443  17852  17852 67  30x90  kqreadhttpd
 23826  28804  23826 67  30x90  kqreadhttpd
 17852  28804  17852 67  30x90  kqreadhttpd
 28804  1  28804  0  30x80  kqreadhttpd
 28240  26040  26040 95  30x90  kqreadsmtpd
 28040  26040  26040 95  30x90  kqreadsmtpd
 17835  26040  26040 95  30x90  kqreadsmtpd
 13222  26040  26040 95  30x90  kqreadsmtpd
 21891  26040  26040 95  30x90  kqreadsmtpd
 31023  26040  26040103  30x90  kqreadsmtpd
 26040  1  26040  0  30x80  kqreadsmtpd
  7321  1   7321 77  30x90  poll  dhcpd
 10345  1  10345  0  30x80  selectsshd
  1103  0  0  0  3 0x14280  nfsidlnfsio
 11485  0  0  0  3 0x14280  nfsidlnfsio
 32022  0  0  0  3 0x14280  nfsidlnfsio
   609  0  0  0  3 0x14280  nfsidlnfsio
 20992  1  20992  0  30x80  poll  ntpd
 16027  17436  16027 83  30x90  poll  ntpd
 17436  1  17436 83  30x90  poll  ntpd
 15716  1  15716 53  30x90  kqreadunbound
 26708  10651   9354 97  30x90  kqreadnsd
 10651   9354   9354  

Re: panic: no appropriate pool

2015-08-02 Thread Mike Belopuhov
On 2 August 2015 at 15:28, RD Thrush  wrote:
> On 08/01/15 19:31, Jonathan Gray wrote:
>> On Sat, Aug 01, 2015 at 08:46:00PM +0200, Mike Belopuhov wrote:
>>>   [... snip ...]
>>> You're slightly overanalyzing here: panic has caught the unhandled
>>> case, but it's not needed per se.
>>>
>>
>> The code directly after the panic assumes rpool is set.
>> Something is clearly wrong in the pf code if this triggers.
>>
>> Without a pf.conf it is hard to guess as to why this triggers...
>
> I've attached a partially sanitized concatenation of pf rules, ifconfig, 
> netstat -nr, cat /etc/hostname.$if.  Please let me know what more info would 
> be helpful.
>
> FWIW, this firewall has been operating successfully with snaps for many 
> years.  The pf configuration is not tuned as it is somewhat a testbed with an 
> accumulation of various failed/successful experiments.  Also, the urtwn 
> interface has been removed for at least the past month so treat the 
> associated rules accordingly.
>

most likely it's triggered by the reply-to statement.  you may try the attached
diff to see which rule the state belongs to. since you're using
anchors, figuring
out rule numbers will not be easy but you may try to see if one of those give
you something reasonable:

 pfctl -a '*' -vvsr
 pfctl -a 'ext1' -vvsr
 pfctl -a 'ext2' -vvsr

cheers,
mike
diff --git sys/net/pf_lb.c sys/net/pf_lb.c
index 4e8d0cd..4cd9b04 100644
--- sys/net/pf_lb.c
+++ sys/net/pf_lb.c
@@ -866,6 +866,7 @@ pf_postprocess_addr(struct pf_state *cur)
}
 
/* check for appropriate pool */
+   memset(&rpool, 0, sizeof(rpool));
if (nr->rdr.addr.type != PF_ADDR_NONE)
rpool = nr->rdr;
else if (nr->nat.addr.type != PF_ADDR_NONE)
@@ -873,7 +874,8 @@ pf_postprocess_addr(struct pf_state *cur)
else if (nr->route.addr.type != PF_ADDR_NONE)
rpool = nr->route;
else
-   panic("no appropriate pool");
+   panic("no appropriate pool for %d/%d", cur->rule.ptr ?
+   cur->rule.ptr->nr : -1, nr->nr);
 
if (((rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_LEASTSTATES))
return (0);


Re: panic: no appropriate pool

2015-08-02 Thread RD Thrush
On 08/01/15 19:31, Jonathan Gray wrote:
> On Sat, Aug 01, 2015 at 08:46:00PM +0200, Mike Belopuhov wrote:
>>   [... snip ...]
>> You're slightly overanalyzing here: panic has caught the unhandled
>> case, but it's not needed per se.
>>
> 
> The code directly after the panic assumes rpool is set.
> Something is clearly wrong in the pf code if this triggers.
> 
> Without a pf.conf it is hard to guess as to why this triggers...

I've attached a partially sanitized concatenation of pf rules, ifconfig, 
netstat -nr, cat /etc/hostname.$if.  Please let me know what more info would be 
helpful.

FWIW, this firewall has been operating successfully with snaps for many years.  
The pf configuration is not tuned as it is somewhat a testbed with an 
accumulation of various failed/successful experiments.  Also, the urtwn 
interface has been removed for at least the past month so treat the associated 
rules accordingly.

### pfctl-sr:pfctl -a "*" -sr ###
pass all flags S/SA
match out on egress all set ( prio(5, 6) )
match all scrub (no-df)
match out on pppoe all scrub (max-mss 1440)
block drop all label "block_all"
block drop in on ! int inet from 10.1.2.0/24 to any
block drop in inet from 10.1.2.1 to any
block drop in on ! dsl inet from 192.168.7.0/24 to any
block drop in inet from 192.168.7.2 to any
block drop in quick on int from any to  label "bogus_in"
anchor "ext1" on pppoe0 all {
  block drop in on ! pppoe0 from (pppoe0:network) to any
  block drop in from (pppoe0) to any
  block drop in log quick on pppoe0 proto tcp from  to any port = 22 
label "ssh bruteforce_pppoe0"
  block drop in log quick on pppoe0 from  to any label "bogon_in_pppoe0"
  block drop out log on pppoe0 from any to  label "bogon_out_pppoe0"
  pass out log on pppoe0 all flags S/SA label "out_pppoe0"
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 80 rdr-to 
10.1.2.30
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 6081 rdr-to 
10.1.2.30
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 8080 rdr-to 
10.1.2.30
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 9418 rdr-to 
10.1.2.18 port 9418
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 119 rdr-to 
10.1.2.10 port 119
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65429 rdr-to 
10.1.2.18 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65428 rdr-to 
10.1.2.30 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65427 rdr-to 
10.1.2.31 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65426 rdr-to 
10.1.2.33 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65425 rdr-to 
10.1.2.11 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65424 rdr-to 
10.1.2.12 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65423 rdr-to 
10.1.2.15 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65420 rdr-to 
10.1.2.143 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 65419 rdr-to 
10.1.2.144 port 22
  match in on pppoe0 inet proto tcp from any to (pppoe0) port = 46889 set ( 
prio 2 ) rdr-to 10.1.2.17 port 46889
  match in on pppoe0 inet proto udp from any to (pppoe0) port = 46889 set ( 
prio 2 ) rdr-to 10.1.2.17 port 46889
  match out on pppoe0 inet from  to any nat-to (pppoe0:0)
  pass in quick on pppoe0 inet proto tcp from any to any port = 22 flags S/SA 
synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 52122 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65432 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65431 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65430 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65429 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65428 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65427 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65426 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65425 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65424 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65423 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65420 flags 
S/SA synproxy state reply-to pppoe0
  pass in quick on pppoe0 inet proto tcp from any to any port = 65419 flags 
S/SA synproxy state reply-to pppoe0
  pass in on pppoe0 inet pr

Re: panic: no appropriate pool

2015-08-01 Thread Jonathan Gray
On Sat, Aug 01, 2015 at 08:46:00PM +0200, Mike Belopuhov wrote:
> On 1 August 2015 at 19:20, RD Thrush  wrote:
> >
> > The patch ran without panic for 20+ hours.
> >
> 
> Thanks for testing!
> 
> > I wondered about the removal of the panic() statement so I tried
> > another kernel that added the memset() but kept the panic() statement, as 
> > follows:
> >
> [snip]
> >
> > That kernel panic'd as before with "no appropriate pool".
> 
> Well of course.  Not all rules are rdr/nat/route-to.
> 
> > Was the Jul 20 cvs commit (panic addition) incorrect?
> 
> It has served it's purpose well: it has found this bug.
> But panic'ing here in general is of course incorrect.
> 
> > If not, it appears the memset() addition didn't fix the panic.
> >
> 
> It did, clearly.  You can run your setup again (-:
> 
> > I was able to take a crash dump with the above change and have
> > attached a gdb transcript.  The stack is apparently damaged in the
> > pf_postprocess_addr() function; however, I'm over my head at this
> > point.  How may I help further troubleshoot?
> 
> You're slightly overanalyzing here: panic has caught the unhandled
> case, but it's not needed per se.
> 

The code directly after the panic assumes rpool is set.
Something is clearly wrong in the pf code if this triggers.

Without a pf.conf it is hard to guess as to why this triggers...



Re: panic: no appropriate pool

2015-08-01 Thread RD Thrush
On 08/01/15 14:46, Mike Belopuhov wrote:
> On 1 August 2015 at 19:20, RD Thrush  wrote:
>>
>> The patch ran without panic for 20+ hours.
>>
> 
> Thanks for testing!
> 
>> I wondered about the removal of the panic() statement so I tried
>> another kernel that added the memset() but kept the panic() statement, as 
>> follows:
>>
> [snip]
>>
>> That kernel panic'd as before with "no appropriate pool".
> 
> Well of course.  Not all rules are rdr/nat/route-to.
> 
>> Was the Jul 20 cvs commit (panic addition) incorrect?
> 
> It has served it's purpose well: it has found this bug.
> But panic'ing here in general is of course incorrect.

Fair enough.  I'll run with your patch until a snapshot includes it.


>> If not, it appears the memset() addition didn't fix the panic.
>>
> 
> It did, clearly.  You can run your setup again (-:
> 
>> I was able to take a crash dump with the above change and have
>> attached a gdb transcript.  The stack is apparently damaged in the
>> pf_postprocess_addr() function; however, I'm over my head at this
>> point.  How may I help further troubleshoot?
> 
> You're slightly overanalyzing here: panic has caught the unhandled
> case, but it's not needed per se.

Thanks for the explanation.



Re: panic: no appropriate pool

2015-08-01 Thread Mike Belopuhov
On 1 August 2015 at 19:20, RD Thrush  wrote:
>
> The patch ran without panic for 20+ hours.
>

Thanks for testing!

> I wondered about the removal of the panic() statement so I tried
> another kernel that added the memset() but kept the panic() statement, as 
> follows:
>
[snip]
>
> That kernel panic'd as before with "no appropriate pool".

Well of course.  Not all rules are rdr/nat/route-to.

> Was the Jul 20 cvs commit (panic addition) incorrect?

It has served it's purpose well: it has found this bug.
But panic'ing here in general is of course incorrect.

> If not, it appears the memset() addition didn't fix the panic.
>

It did, clearly.  You can run your setup again (-:

> I was able to take a crash dump with the above change and have
> attached a gdb transcript.  The stack is apparently damaged in the
> pf_postprocess_addr() function; however, I'm over my head at this
> point.  How may I help further troubleshoot?

You're slightly overanalyzing here: panic has caught the unhandled
case, but it's not needed per se.



Re: panic: no appropriate pool

2015-08-01 Thread RD Thrush
On 07/31/15 11:22, Mike Belopuhov wrote:
> On Fri, Jul 31, 2015 at 10:57 -0400, RD Thrush wrote:
>>> Synopsis:   panic in sys/net/pf_lb.c
>>> Category:   kernel
>>> Environment:
>>  System  : OpenBSD 5.8
>>  Details : OpenBSD 5.8 (GENERIC) #1047: Thu Jul 30 23:24:48 MDT 2015
>>   
>> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
>>
>>  Architecture: OpenBSD.i386
>>  Machine : i386
>>> Description:
>>  Repeatable crash after a few minutes with Jul 28,29 and 30 snapshots.
>>  Crashes w/ Jul 30 sp and mp snapshots w/ kern.pool_debug set to 0/1.
>>  The ddb transcript containing the following commands is appended:
>>  trace
>>  ps
>>  show registers
>>  show malloc
>>  show proc
>>  show uvmexp
>>  callout
>>  ps /w
>>  ps /a
>>  show bcstats
>>  show all pools
>>  show all pools /a
>>  show extents
>>  boot sync
>>  Please note that usbdevs and pcidump were done w/ the Jun 22 snapshot.
>>  acpi doesn't exist on this soekris 5501.
>>> How-To-Repeat:
>>  Install recent snapshot, sysmerge, reboot and wait a few minutes.
>>> Fix:
>>  Reboot with Jun 22 sp snapshot.  According to cvs, the panic diagnostic
>>  was added Jul 20 to src/sys/net/pf_lb.c.
>>
> 
> Can you please try this diff.
> 
> diff --git sys/net/pf_lb.c sys/net/pf_lb.c
> index 4e8d0cd..2c36b45 100644
> --- sys/net/pf_lb.c
> +++ sys/net/pf_lb.c
> @@ -866,14 +866,13 @@ pf_postprocess_addr(struct pf_state *cur)
>   }
>  
>   /* check for appropriate pool */
> + memset(&rpool, 0, sizeof(rpool));
>   if (nr->rdr.addr.type != PF_ADDR_NONE)
>   rpool = nr->rdr;
>   else if (nr->nat.addr.type != PF_ADDR_NONE)
>   rpool = nr->nat;
>   else if (nr->route.addr.type != PF_ADDR_NONE)
>   rpool = nr->route;
> - else
> - panic("no appropriate pool");
>  
>   if (((rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_LEASTSTATES))
>   return (0);

The patch ran without panic for 20+ hours.

I wondered about the removal of the panic() statement so I tried another kernel 
that added the memset() but kept the panic() statement, as follows:

cvs diff -u /usr/src/sys/net/pf_lb.c
Index: /usr/src/sys/net/pf_lb.c
===
RCS file: /cvs/OpenBSD/src/sys/net/pf_lb.c,v
retrieving revision 1.48
diff -u -p -u -r1.48 pf_lb.c
--- /usr/src/sys/net/pf_lb.c20 Jul 2015 18:42:08 -  1.48
+++ /usr/src/sys/net/pf_lb.c1 Aug 2015 14:31:07 -
@@ -866,6 +866,7 @@ pf_postprocess_addr(struct pf_state *cur
}

/* check for appropriate pool */
+   memset(&rpool, 0, sizeof(rpool));
if (nr->rdr.addr.type != PF_ADDR_NONE)
rpool = nr->rdr;
else if (nr->nat.addr.type != PF_ADDR_NONE)

That kernel panic'd as before with "no appropriate pool".  Was the Jul 20 cvs 
commit (panic addition) incorrect?  If not, it appears the memset() addition 
didn't fix the panic.

I was able to take a crash dump with the above change and have attached a gdb 
transcript.  The stack is apparently damaged in the pf_postprocess_addr() 
function; however, I'm over my head at this point.  How may I help further 
troubleshoot?

obsd32:i386/tmp 3>sudo gdb
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-openbsd5.8".
(gdb) file bsd.gdb
Reading symbols from /usr/obj/i386/tmp/bsd.gdb...done.
(gdb) target kvm bsd.1.core
#0  0xd0557a78 in boot (howto=0) at ../../../../arch/i386/i386/machdep.c:2637
2637dumpsys();
(gdb) where
#0  0xd0557a78 in boot (howto=0) at ../../../../arch/i386/i386/machdep.c:2637
#1  0xd03bafff in reboot (howto=0) at ../../../../kern/kern_xxx.c:69
#2  0xd037f462 in db_boot_crash_cmd (addr=Could not find the frame base for 
"db_boot_crash_cmd".
) at ../../../../ddb/db_command.c:730
#3  0xd037fb44 in db_command (last_cmdp=0x0, cmd_table=0xd0b22dc0) at 
../../../../ddb/db_command.c:260
#4  0xd037fd8f in db_command_loop () at ../../../../ddb/db_command.c:643
#5  0xd0383f5a in db_trap (type=1, code=0) at ../../../../ddb/db_trap.c:94
#6  0xd0553e8c in kdb_trap (type=1, code=0, regs=0xf5233d8c) at 
../../../../arch/i386/i386/db_interface.c:157
#7  0xd0565aa5 in trap (frame=0xf5233d8c) at 
../../../../arch/i386/i386/trap.c:189
#8  0xd0200b12 in calltrap ()
#9  0xd0553c07 in Debugger () at ../../../../arch/i386/i386/db_interface.c:359
#10 0xd03c9741 in panic (fmt=0xd09a51d1 "no appropriate pool") at 
../../../../kern

Re: panic: no appropriate pool

2015-07-31 Thread RD Thrush
Thanks, you caught me in transition.  I'll be able to report a bit later.

FWIW, I've been running this firewall for years w/ -current.  This is the first
time I can remember this type of show stopper.  The pf rules haven't been
changed much in the past year and not at all between the Jun 22 and Jul 28
snapshots.  If necessary, I can provide them privately.


On 07/31/15 11:22, Mike Belopuhov wrote:
> On Fri, Jul 31, 2015 at 10:57 -0400, RD Thrush wrote:
>>> Synopsis:   panic in sys/net/pf_lb.c
>>> Category:   kernel
>>> Environment:
>>  System  : OpenBSD 5.8
>>  Details : OpenBSD 5.8 (GENERIC) #1047: Thu Jul 30 23:24:48 MDT 2015
>>   
>> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
>>
>>  Architecture: OpenBSD.i386
>>  Machine : i386
>>> Description:
>>  Repeatable crash after a few minutes with Jul 28,29 and 30 snapshots.
>>  Crashes w/ Jul 30 sp and mp snapshots w/ kern.pool_debug set to 0/1.
>>  The ddb transcript containing the following commands is appended:
>>  trace
>>  ps
>>  show registers
>>  show malloc
>>  show proc
>>  show uvmexp
>>  callout
>>  ps /w
>>  ps /a
>>  show bcstats
>>  show all pools
>>  show all pools /a
>>  show extents
>>  boot sync
>>  Please note that usbdevs and pcidump were done w/ the Jun 22 snapshot.
>>  acpi doesn't exist on this soekris 5501.
>>> How-To-Repeat:
>>  Install recent snapshot, sysmerge, reboot and wait a few minutes.
>>> Fix:
>>  Reboot with Jun 22 sp snapshot.  According to cvs, the panic diagnostic
>>  was added Jul 20 to src/sys/net/pf_lb.c.
>>
> 
> Can you please try this diff.
> 
> diff --git sys/net/pf_lb.c sys/net/pf_lb.c
> index 4e8d0cd..2c36b45 100644
> --- sys/net/pf_lb.c
> +++ sys/net/pf_lb.c
> @@ -866,14 +866,13 @@ pf_postprocess_addr(struct pf_state *cur)
>   }
>  
>   /* check for appropriate pool */
> + memset(&rpool, 0, sizeof(rpool));
>   if (nr->rdr.addr.type != PF_ADDR_NONE)
>   rpool = nr->rdr;
>   else if (nr->nat.addr.type != PF_ADDR_NONE)
>   rpool = nr->nat;
>   else if (nr->route.addr.type != PF_ADDR_NONE)
>   rpool = nr->route;
> - else
> - panic("no appropriate pool");
>  
>   if (((rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_LEASTSTATES))
>   return (0);
> 



Re: panic: no appropriate pool

2015-07-31 Thread Mike Belopuhov
On Fri, Jul 31, 2015 at 10:57 -0400, RD Thrush wrote:
> >Synopsis:panic in sys/net/pf_lb.c
> >Category:kernel
> >Environment:
>   System  : OpenBSD 5.8
>   Details : OpenBSD 5.8 (GENERIC) #1047: Thu Jul 30 23:24:48 MDT 2015
>
> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> 
>   Architecture: OpenBSD.i386
>   Machine : i386
> >Description:
>   Repeatable crash after a few minutes with Jul 28,29 and 30 snapshots.
>   Crashes w/ Jul 30 sp and mp snapshots w/ kern.pool_debug set to 0/1.
>   The ddb transcript containing the following commands is appended:
>   trace
>   ps
>   show registers
>   show malloc
>   show proc
>   show uvmexp
>   callout
>   ps /w
>   ps /a
>   show bcstats
>   show all pools
>   show all pools /a
>   show extents
>   boot sync
>   Please note that usbdevs and pcidump were done w/ the Jun 22 snapshot.
>   acpi doesn't exist on this soekris 5501.
> >How-To-Repeat:
>   Install recent snapshot, sysmerge, reboot and wait a few minutes.
> >Fix:
>   Reboot with Jun 22 sp snapshot.  According to cvs, the panic diagnostic
>   was added Jul 20 to src/sys/net/pf_lb.c.
> 

Can you please try this diff.

diff --git sys/net/pf_lb.c sys/net/pf_lb.c
index 4e8d0cd..2c36b45 100644
--- sys/net/pf_lb.c
+++ sys/net/pf_lb.c
@@ -866,14 +866,13 @@ pf_postprocess_addr(struct pf_state *cur)
}
 
/* check for appropriate pool */
+   memset(&rpool, 0, sizeof(rpool));
if (nr->rdr.addr.type != PF_ADDR_NONE)
rpool = nr->rdr;
else if (nr->nat.addr.type != PF_ADDR_NONE)
rpool = nr->nat;
else if (nr->route.addr.type != PF_ADDR_NONE)
rpool = nr->route;
-   else
-   panic("no appropriate pool");
 
if (((rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_LEASTSTATES))
return (0);



Re: panic: no appropriate pool

2015-07-31 Thread Mike Belopuhov
On Fri, Jul 31, 2015 at 10:57 -0400, RD Thrush wrote:
> >Synopsis:panic in sys/net/pf_lb.c
> >Category:kernel
> >Environment:
>   System  : OpenBSD 5.8
>   Details : OpenBSD 5.8 (GENERIC) #1047: Thu Jul 30 23:24:48 MDT 2015
>
> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> 
>   Architecture: OpenBSD.i386
>   Machine : i386
> >Description:
>   Repeatable crash after a few minutes with Jul 28,29 and 30 snapshots.
>   Crashes w/ Jul 30 sp and mp snapshots w/ kern.pool_debug set to 0/1.
>   The ddb transcript containing the following commands is appended:
>   trace
>   ps
>   show registers
>   show malloc
>   show proc
>   show uvmexp
>   callout
>   ps /w
>   ps /a
>   show bcstats
>   show all pools
>   show all pools /a
>   show extents
>   boot sync
>   Please note that usbdevs and pcidump were done w/ the Jun 22 snapshot.
>   acpi doesn't exist on this soekris 5501.
> >How-To-Repeat:
>   Install recent snapshot, sysmerge, reboot and wait a few minutes.
> >Fix:
>   Reboot with Jun 22 sp snapshot.  According to cvs, the panic diagnostic
>   was added Jul 20 to src/sys/net/pf_lb.c.
> 

Please post your ruleset ASAP.

Thanks.