Re: /etc/rc.d/ipfw can't deal with firewall_type?

2011-05-03 Thread Ian Smith
On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote:
 > At Wed, 4 May 2011 03:47:02 +1000 (EST),
 > Ian Smith wrote:
 > > 
 > > On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote:
 > >  > Hi all,
 > >  > Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, 
 > > but
 > >  > all packets could not over nat box. I've researched and found
 > >  > /etc/rc.firewall does not recieve argument of firewall_type. So ipfw 
 > > does
 > >  > not divert and natd could not be performed. The reason is /etc/rc.d/ipfw
 > >  > incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. 
 > > Is
 > >  > there any problem to do this?
 > > 
 > > Yes.  Assuming using the default firewall_script="/etc/rc.firewall", 
 > > then as it says early in /etc/rc.firewall, you just needed to:
 > > 
 > ># Define the firewall type in /etc/rc.conf.  Valid values are:
 > >[..]

It's just occured to me that - assuming you are NOT trying to start ipfw 
or natd inside a jail, which won't work - you may well be running into 
another problem related to some PRs/patches hrs@ (cc'd) is reviewing re 
startup order and loading of modules for ipfw and natd.  You mentioned 
running an 'OPEN' firewall which (like any other type) will fail to load 
divert rule/s unless ipdivert.ko is already loaded or built into kernel.

This can be solved meanwhile by either a) adding to /boot/loader.conf:

ipdivert_load="YES"

or b) by applying the following patch to /etc/rc.d/ipfw (on 7.x or 8.x)

cheers, Ian

--- rc.d_ipfw.1.24  Sat Jan  8 18:13:46 2011
+++ ipfwSat Jan  8 21:00:18 2011
@@ -27,9 +27,9 @@
fi

if checkyesno firewall_nat_enable; then
-   if ! checkyesno natd_enable; then
-   required_modules="$required_modules ipfw_nat"
-   fi
+   required_modules="$required_modules ipfw_nat"
+   elif checkyesno natd_enable; then
+   required_modules="$required_modules ipdivert"
fi
 }

@@ -105,6 +105,7 @@
 }

 load_rc_config $name
-firewall_coscripts="/etc/rc.d/natd ${firewall_coscripts}"
+checkyesno natd_enable && ! checkyesno firewall_nat_enable && \
+   firewall_coscripts="/etc/rc.d/natd ${firewall_coscripts}"

 run_rc_command $*
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: /etc/rc.d/ipfw can't deal with firewall_type?

2011-05-03 Thread Ian Smith
On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote:
 > At Wed, 4 May 2011 03:47:02 +1000 (EST),
 > Ian Smith wrote:
 > > 
 > > On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote:
 > >  > Hi all,
 > >  > Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, 
 > > but
 > >  > all packets could not over nat box. I've researched and found
 > >  > /etc/rc.firewall does not recieve argument of firewall_type. So ipfw 
 > > does
 > >  > not divert and natd could not be performed. The reason is /etc/rc.d/ipfw
 > >  > incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. 
 > > Is
 > >  > there any problem to do this?
 > > 
 > > Yes.  Assuming using the default firewall_script="/etc/rc.firewall", 
 > > then as it says early in /etc/rc.firewall, you just needed to:
 > > 
 > ># Define the firewall type in /etc/rc.conf.  Valid values are:
 > >[..]
 > > 
 > > Sure, /etc/rc.firewall can set firewall_type to a parameter if you pass 
 > > it one, but otherwise uses whatever $firewall_type is set to when you 
 > > start ipfw.  I guess the code below allows you to use syntax like:
 > > 
 > >  # /etc/rc.d/ipfw start client

 > I missed it intended to use in commandline but usually /etc/rc.d/* script
 > uses at startup rc. If /etc/rc.d/ipfw must be 2 arguments,firewall_type
 > always undefined at startup nevertheless it specified in /etc/rc.conf. It
 > is the very serious problem isn't it?

/etc/rc.d/ipfw normally only takes one argument, {,quiet}start|stop|etc.  
The use of $1 in ipfw_start() surprised me actually, I'm only assuming 
its above intended use, but it's clearly an extra argument passed by rc, 
not the first argument to /etc/rc.d/ipfw itself (ie start|stop etc).  

Sorry to repeat, but normally firewall_type should be set in rc.conf - 
which works properly; no patching of /etc/rc.d/ipfw is needed.

 > > to override the $firewall_type set in /etc/rc.conf, but it's not the 
 > > common usage, nor is it how ipfw is started normally by rc.
 > > 
 > > So just set firewall_type in rc.conf and you should be fine .. unless 
 > > you meant that you're trying to run ipfw & natd INSIDE a jail?
 > 
 > The network being configure is as follows:
 >.../27
 > -+
 >  |53
 >   +--+---+
 >   |bge0 jailed natd box  |
 >   |t2.st.foo (ipfw `OPEN')   |
 >   |+++++++
 >   |firewall|   ns   |  ldap  |diskless|  mail  |  web   |  ftp   |
 >   |  bge1  |  bge1  |  bge1  |  bge1  |  bge1  |  bge1  |  bge1  |
 >   ++---++---++---++---++---++---++---+
 > 254|   1|   2|   3|   4|   5|   6|
 > ---+++++++
 >192.168.2.0/24

I'm not entirely sure how to interpret your diagram, but as far as I am 
aware you can run neither ipfw nor natd within a jail; both scripts have 
'KEYWORD: nojail' so they won't be run on jail startup.  There's been 
mention of work underway with VIMAGE toward a full stack inside jail(s), 
but for now you can run ipfw (and natd) only on the host system.

 > >  > --- /etc/rc.d/ipfw.org  2011-05-03 18:19:28.0 +0900
 > >  > +++ /etc/rc.d/ipfw  2011-05-03 22:08:14.0 +0900
 > >  > @@ -35,15 +35,11 @@
 > >  >  
 > >  >  ipfw_start()
 > >  >  {
 > >  > -   local   _firewall_type
 > >  > -
 > >  > -   _firewall_type=$1
 > >  > -
 > >  > # set the firewall rules script if none was specified
 > >  > [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall
 > >  >  
 > >  > if [ -r "${firewall_script}" ]; then
 > >  > -   /bin/sh "${firewall_script}" "${_firewall_type}"
 > >  > +   /bin/sh "${firewall_script}" "${firewall_type}"
 > >  > echo 'Firewall rules loaded.'
 > >  > elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; 
 > > then
 > >  > echo 'Warning: kernel has firewall functionality, but' \
 > 
 > For the case of commandline usage, above patch should be modified as
 > follows:
 > 
 > --- /etc/rc.d/ipfw.org   2011-05-03 18:19:28.0 +0900
 > +++ /etc/rc.d/ipfw   2011-05-04 09:31:09.0 +0900
 > @@ -37,7 +37,11 @@
 >  {
 >  local   _firewall_type
 >  
 > -_firewall_type=$1
 > +if [ -n "${1}" ]; then
 > +_firewall_type=$1
 > +elif [ -n "${firewall_type}" ]
 > +_firewall_type=${firewall_type}
 > +fi  
 >  
 >  # set the firewall rules script if none was specified
 >  [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall

It's still unnecessary to mess with this.  See /etc/rc.firewall for its 
use of $fi

Re: /etc/rc.d/ipfw can't deal with firewall_type?

2011-05-03 Thread KIRIYAMA Kazuhiko
At Wed, 04 May 2011 10:40:12 +0900,
My wrote:
> 
> At Wed, 4 May 2011 03:47:02 +1000 (EST),
> Ian Smith wrote:
> > 
> >  > +++ /etc/rc.d/ipfw   2011-05-03 22:08:14.0 +0900
> >  > @@ -35,15 +35,11 @@
> >  >  
> >  >  ipfw_start()
> >  >  {
> >  > -local   _firewall_type
> >  > -
> >  > -_firewall_type=$1
> >  > -
> >  >  # set the firewall rules script if none was specified
> >  >  [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall
> >  >  
> >  >  if [ -r "${firewall_script}" ]; then
> >  > -/bin/sh "${firewall_script}" "${_firewall_type}"
> >  > +/bin/sh "${firewall_script}" "${firewall_type}"
> >  >  echo 'Firewall rules loaded.'
> >  >  elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; 
> > then
> >  >  echo 'Warning: kernel has firewall functionality, but' \
> 
> For the case of commandline usage, above patch should be modified as
> follows:
> 
> --- /etc/rc.d/ipfw.org2011-05-03 18:19:28.0 +0900
> +++ /etc/rc.d/ipfw2011-05-04 09:31:09.0 +0900
> @@ -37,7 +37,11 @@
>  {
>   local   _firewall_type
>  
> - _firewall_type=$1
> + if [ -n "${1}" ]; then
> + _firewall_type=$1
> + elif [ -n "${firewall_type}" ]
> + _firewall_type=${firewall_type}
> + fi  
>  
>   # set the firewall rules script if none was specified
>   [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall

Above patch has typo. Collect one is as follows:

--- /etc/rc.d/ipfw.org  2011-05-03 18:19:28.0 +0900
+++ /etc/rc.d/ipfw  2011-05-04 09:53:40.0 +0900
@@ -37,7 +37,11 @@
 {
local   _firewall_type
 
-   _firewall_type=$1
+   if [ -n "${1}" ]; then
+   _firewall_type=$1
+   elif [ -n "${firewall_type}" ]; then
+   _firewall_type=${firewall_type}
+   fi  
 
# set the firewall rules script if none was specified
[ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: /etc/rc.d/ipfw can't deal with firewall_type?

2011-05-03 Thread KIRIYAMA Kazuhiko
At Wed, 4 May 2011 03:47:02 +1000 (EST),
Ian Smith wrote:
> 
> On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote:
>  > Hi all,
>  > Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, but
>  > all packets could not over nat box. I've researched and found
>  > /etc/rc.firewall does not recieve argument of firewall_type. So ipfw does
>  > not divert and natd could not be performed. The reason is /etc/rc.d/ipfw
>  > incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. Is
>  > there any problem to do this?
> 
> Yes.  Assuming using the default firewall_script="/etc/rc.firewall", 
> then as it says early in /etc/rc.firewall, you just needed to:
> 
>   # Define the firewall type in /etc/rc.conf.  Valid values are:
>   [..]
> 
> Sure, /etc/rc.firewall can set firewall_type to a parameter if you pass 
> it one, but otherwise uses whatever $firewall_type is set to when you 
> start ipfw.  I guess the code below allows you to use syntax like:
> 
>  # /etc/rc.d/ipfw start client

I missed it intended to use in commandline but usually /etc/rc.d/* script
uses at startup rc. If /etc/rc.d/ipfw must be 2 arguments,firewall_type
always undefined at startup nevertheless it specified in /etc/rc.conf. It
is the very serious problem isn't it?

> to override the $firewall_type set in /etc/rc.conf, but it's not the 
> common usage, nor is it how ipfw is started normally by rc.
> 
> So just set firewall_type in rc.conf and you should be fine .. unless 
> you meant that you're trying to run ipfw & natd INSIDE a jail?

The network being configure is as follows:
   .../27
-+
 |53
  +--+---+
  |bge0 jailed natd box  |
  |t2.st.foo (ipfw `OPEN')   |
  |+++++++
  |firewall|   ns   |  ldap  |diskless|  mail  |  web   |  ftp   |
  |  bge1  |  bge1  |  bge1  |  bge1  |  bge1  |  bge1  |  bge1  |
  ++---++---++---++---++---++---++---+
254|   1|   2|   3|   4|   5|   6|
---+++++++
   192.168.2.0/24
> cheers, Ian
> 
>  > --- /etc/rc.d/ipfw.org 2011-05-03 18:19:28.0 +0900
>  > +++ /etc/rc.d/ipfw 2011-05-03 22:08:14.0 +0900
>  > @@ -35,15 +35,11 @@
>  >  
>  >  ipfw_start()
>  >  {
>  > -  local   _firewall_type
>  > -
>  > -  _firewall_type=$1
>  > -
>  ># set the firewall rules script if none was specified
>  >[ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall
>  >  
>  >if [ -r "${firewall_script}" ]; then
>  > -  /bin/sh "${firewall_script}" "${_firewall_type}"
>  > +  /bin/sh "${firewall_script}" "${firewall_type}"
>  >echo 'Firewall rules loaded.'
>  >elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; then
>  >echo 'Warning: kernel has firewall functionality, but' \

For the case of commandline usage, above patch should be modified as
follows:

--- /etc/rc.d/ipfw.org  2011-05-03 18:19:28.0 +0900
+++ /etc/rc.d/ipfw  2011-05-04 09:31:09.0 +0900
@@ -37,7 +37,11 @@
 {
local   _firewall_type
 
-   _firewall_type=$1
+   if [ -n "${1}" ]; then
+   _firewall_type=$1
+   elif [ -n "${firewall_type}" ]
+   _firewall_type=${firewall_type}
+   fi  
 
# set the firewall rules script if none was specified
[ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zpool upgrade, can't boot

2011-05-03 Thread Scot Hetzel
On Tue, May 3, 2011 at 12:34 PM, Eric Damien  wrote:
> Hi Scot,
>
> the link you provided is for a FreeBSD MBR Slice.
> How about the GPT? Because I have the exact same problem,
> and after following  2.7 (modified for no mirror) on
>        http://wiki.freebsd.org/RootOnZFS/InstallingFreeBSD
>
> I did
>  Fixit# sysctl kern.geom.debugflags=0x10
>  Fixit# gpart bootcode -b /zroot/boot/pmbr -p /zroot/boot/gptzfsboot -i 1 ad0
>
> but got the following error:
>        gpart: /dev/ad0p1: operation not permitted
>
>

That should have worked.

Is partition 1 (ad0p1) your freebsd-boot partition?

Scot
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zpool upgrade, can't boot

2011-05-03 Thread Eric Damien
Hi Scot,

the link you provided is for a FreeBSD MBR Slice. 
How about the GPT? Because I have the exact same problem,
and after following  2.7 (modified for no mirror) on 
http://wiki.freebsd.org/RootOnZFS/InstallingFreeBSD

I did
 Fixit# sysctl kern.geom.debugflags=0x10
 Fixit# gpart bootcode -b /zroot/boot/pmbr -p /zroot/boot/gptzfsboot -i 1 ad0

but got the following error:
gpart: /dev/ad0p1: operation not permitted




On Tue, 3 May 2011 01:26:21 -0500
Scot Hetzel  wrote:

> On Mon, May 2, 2011 at 11:42 AM, Jeff Blank 
> wrote:
> > Hi,
> >
> > I recently upgraded from 8.0-STABLE to 8.2-STABLE (Apr. 29 checkout)
> > and upgraded my zpool (includes root FS) from v13 to v15.  This is a
> > dual-boot laptop, so I'm using MBR/boot0 and not GPT.  Here's what
> > happens when I boot:
> >
> > F1  Win
> > F2  ?
> > F3  FreeBSD
> >
> > F6 PXE
> > Boot:  F3
> > ZFS: unsupported ZFS version 15 (should be 13)
> > No ZFS pools located, can't boot
> >
> > I've googled around, but I can't find anything relevant for
> > MBR/boot0 configurations, just GPT.  I've ensured that the loaders
> > and boot0/boot1/boot2 are all new, and I rebuilt/reinstalled them
> > in a fixit environment just to be sure.  I also ran 'boot0cfg
> > -B' (with an appropriate -b), but nothing has changed.  How can I
> > get my pool booting again?
> >
> 
> You need to re-install the zfsboot code similar to step 10 (Install
> ZFS boot) in
> 
> http://wiki.freebsd.org/RootOnZFS/ZFSBootPartition
> 
> Scot
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: /etc/rc.d/ipfw can't deal with firewall_type?

2011-05-03 Thread Ian Smith
On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote:
 > Hi all,
 > Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, but
 > all packets could not over nat box. I've researched and found
 > /etc/rc.firewall does not recieve argument of firewall_type. So ipfw does
 > not divert and natd could not be performed. The reason is /etc/rc.d/ipfw
 > incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. Is
 > there any problem to do this?

Yes.  Assuming using the default firewall_script="/etc/rc.firewall", 
then as it says early in /etc/rc.firewall, you just needed to:

# Define the firewall type in /etc/rc.conf.  Valid values are:
[..]

Sure, /etc/rc.firewall can set firewall_type to a parameter if you pass 
it one, but otherwise uses whatever $firewall_type is set to when you 
start ipfw.  I guess the code below allows you to use syntax like:

 # /etc/rc.d/ipfw start client

to override the $firewall_type set in /etc/rc.conf, but it's not the 
common usage, nor is it how ipfw is started normally by rc.

So just set firewall_type in rc.conf and you should be fine .. unless 
you meant that you're trying to run ipfw & natd INSIDE a jail?

cheers, Ian

 > --- /etc/rc.d/ipfw.org   2011-05-03 18:19:28.0 +0900
 > +++ /etc/rc.d/ipfw   2011-05-03 22:08:14.0 +0900
 > @@ -35,15 +35,11 @@
 >  
 >  ipfw_start()
 >  {
 > -local   _firewall_type
 > -
 > -_firewall_type=$1
 > -
 >  # set the firewall rules script if none was specified
 >  [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall
 >  
 >  if [ -r "${firewall_script}" ]; then
 > -/bin/sh "${firewall_script}" "${_firewall_type}"
 > +/bin/sh "${firewall_script}" "${firewall_type}"
 >  echo 'Firewall rules loaded.'
 >  elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; then
 >  echo 'Warning: kernel has firewall functionality, but' \
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mps driver instability under stable/8

2011-05-03 Thread Dmitry Morozovsky
On Tue, 3 May 2011, Kenneth D. Merry wrote:

KDM> Sorry you ran into all of those problems!  Needless to say I haven't seen
KDM> that with the 9.0 firmware in my environment, but then again I've got a
KDM> different setup.

I just postd comment on LSI kb forum, will see how they'd comment it.

KDM> If the firmware doesn't fix it, we'll go down the path of trying to see why
KDM> the IOC fault is happening.

I'm staying tuned, while conserver is writing logs ;-)

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mps driver instability under stable/8

2011-05-03 Thread Kenneth D. Merry
On Tue, May 03, 2011 at 21:28:27 +0400, Dmitry Morozovsky wrote:
> 
> On Tue, 3 May 2011, Dmitry Morozovsky wrote:
> 
> DM> DM> Well, I tried, and unfortunately I can not say that I'm happy after 
> the 
> DM> DM> upgrade. :(
> DM> DM> 
> DM> DM> Particularly, adapter now takes *VERY* long time (>10 minutes) to 
> initialize, 
> DM> DM> and report as "ERROR" in BIOS utility (while seeing all 24 disks; 
> however, it 
> DM> DM> reports 8 x36 expanders instead of one).
> DM> DM> 
> DM> DM> I can't boot the system off this array yet; will experiment further :(
> DM> 
> DM> booted from USB stick, I have constantly repeating
> DM> 
> DM> (ses3:mps0:0:25:0): lost device
> DM> (ses3:mps0:0:25:0): removing device entry
> DM> ses3 at mps0 bus 0 scbus0 target 25 lun 0
> DM> ses3:  Fixed Enclosure Services SCSI-5 device 
> DM> ses3: 600.000MB/s transfers
> DM> ses3: Command Queueing enabled
> DM> ses3: SCSI-3 SES Device
> DM> 
> DM> for different sesN, which are detected many times:
> DM> 
> DM> at scbus0 target 0 lun 0 (da0,pass0)
> DM> at scbus0 target 1 lun 0 (da1,pass1)
> DM> at scbus0 target 2 lun 0 (da2,pass2)
> DM> at scbus0 target 3 lun 0 (pass11,da4)
> DM> at scbus0 target 4 lun 0 (pass12,da5)
> DM> at scbus0 target 5 lun 0 (pass9,da3)
> DM> at scbus0 target 24 lun 0 (pass5,ses3)
> DM> at scbus0 target 25 lun 0 (pass19,ses5)
> DM> at scbus0 target 26 lun 0 (pass10,ses4)
> DM> at scbus0 target 27 lun 0 (pass14,ses7)
> DM> at scbus0 target 33 lun 0 (pass13,ses6)
> DM> at scbus0 target 39 lun 0 (pass3,ses0)
> DM> at scbus0 target 45 lun 0 (pass4,ses1)
> DM> at scbus0 target 51 lun 0 (pass8,ses2)
> DM> at scbus0 target 55 lun 0 (pass15,da7)
> DM> at scbus0 target 63 lun 0 (pass16,da8)
> DM> at scbus0 target 71 lun 0 (pass17,da9)
> DM> at scbus0 target 79 lun 0 (pass18,da10)
> DM> at scbus0 target 87 lun 0 (pass6,da11)
> DM> at scbus0 target 95 lun 0 (pass20,da12)
> DM> at scbus0 target 103 lun 0 
> (pass21,da13)
> 
> Well, using 
> http://kb.lsi.com/KnowledgebaseArticle16414.aspx
> I downgraded to version 8-fixed, and at least topology errors disappear.
> 
> Just booted successfully (errm, it was a few nervous hours, to be honest :)
> 
> Now I have in verbose kernel messages
> 
> mps0:  port 0xc000-0xc0ff mem 
> 0xfb43c000-0xfb43,0xfb44-0xfb47 irq 16 at device 0.0 on pci2
> mps0: Reserved 0x4000 bytes for rid 0x14 type 3 at 0xfb43c000
> mps0: Firmware: 08.00.00.00
> mps0: IOCCapabilities: 185c
> mps0: attempting to allocate 1 MSI-X vectors (15 supported)
> msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
> mps0: using IRQ 256 for MSI-X
> mps0: [MPSAFE]
> mps0: [ITHREAD]

Sorry you ran into all of those problems!  Needless to say I haven't seen
that with the 9.0 firmware in my environment, but then again I've got a
different setup.

> Will see whether it helps.

Yes.  I know the 8.0 firmware also works well.  The only issue I ran into
there was the topology issues that I'm guessing they fixed in that build.

If the firmware doesn't fix it, we'll go down the path of trying to see why
the IOC fault is happening.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mps driver instability under stable/8

2011-05-03 Thread Dmitry Morozovsky

On Tue, 3 May 2011, Dmitry Morozovsky wrote:

DM> DM> Well, I tried, and unfortunately I can not say that I'm happy after the 
DM> DM> upgrade. :(
DM> DM> 
DM> DM> Particularly, adapter now takes *VERY* long time (>10 minutes) to 
initialize, 
DM> DM> and report as "ERROR" in BIOS utility (while seeing all 24 disks; 
however, it 
DM> DM> reports 8 x36 expanders instead of one).
DM> DM> 
DM> DM> I can't boot the system off this array yet; will experiment further :(
DM> 
DM> booted from USB stick, I have constantly repeating
DM> 
DM> (ses3:mps0:0:25:0): lost device
DM> (ses3:mps0:0:25:0): removing device entry
DM> ses3 at mps0 bus 0 scbus0 target 25 lun 0
DM> ses3:  Fixed Enclosure Services SCSI-5 device 
DM> ses3: 600.000MB/s transfers
DM> ses3: Command Queueing enabled
DM> ses3: SCSI-3 SES Device
DM> 
DM> for different sesN, which are detected many times:
DM> 
DM> at scbus0 target 0 lun 0 (da0,pass0)
DM> at scbus0 target 1 lun 0 (da1,pass1)
DM> at scbus0 target 2 lun 0 (da2,pass2)
DM> at scbus0 target 3 lun 0 (pass11,da4)
DM> at scbus0 target 4 lun 0 (pass12,da5)
DM> at scbus0 target 5 lun 0 (pass9,da3)
DM> at scbus0 target 24 lun 0 (pass5,ses3)
DM> at scbus0 target 25 lun 0 (pass19,ses5)
DM> at scbus0 target 26 lun 0 (pass10,ses4)
DM> at scbus0 target 27 lun 0 (pass14,ses7)
DM> at scbus0 target 33 lun 0 (pass13,ses6)
DM> at scbus0 target 39 lun 0 (pass3,ses0)
DM> at scbus0 target 45 lun 0 (pass4,ses1)
DM> at scbus0 target 51 lun 0 (pass8,ses2)
DM> at scbus0 target 55 lun 0 (pass15,da7)
DM> at scbus0 target 63 lun 0 (pass16,da8)
DM> at scbus0 target 71 lun 0 (pass17,da9)
DM> at scbus0 target 79 lun 0 (pass18,da10)
DM> at scbus0 target 87 lun 0 (pass6,da11)
DM> at scbus0 target 95 lun 0 (pass20,da12)
DM> at scbus0 target 103 lun 0 (pass21,da13)

Well, using 
http://kb.lsi.com/KnowledgebaseArticle16414.aspx
I downgraded to version 8-fixed, and at least topology errors disappear.

Just booted successfully (errm, it was a few nervous hours, to be honest :)

Now I have in verbose kernel messages

mps0:  port 0xc000-0xc0ff mem 
0xfb43c000-0xfb43,0xfb44-0xfb47 irq 16 at device 0.0 on pci2
mps0: Reserved 0x4000 bytes for rid 0x14 type 3 at 0xfb43c000
mps0: Firmware: 08.00.00.00
mps0: IOCCapabilities: 185c
mps0: attempting to allocate 1 MSI-X vectors (15 supported)
msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
mps0: using IRQ 256 for MSI-X
mps0: [MPSAFE]
mps0: [ITHREAD]


Will see whether it helps.

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


/etc/rc.d/ipfw can't deal with firewall_type?

2011-05-03 Thread KIRIYAMA Kazuhiko
Hi all,
Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, but
all packets could not over nat box. I've researched and found
/etc/rc.firewall does not recieve argument of firewall_type. So ipfw does
not divert and natd could not be performed. The reason is /etc/rc.d/ipfw
incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. Is
there any problem to do this?

--- /etc/rc.d/ipfw.org  2011-05-03 18:19:28.0 +0900
+++ /etc/rc.d/ipfw  2011-05-03 22:08:14.0 +0900
@@ -35,15 +35,11 @@
 
 ipfw_start()
 {
-   local   _firewall_type
-
-   _firewall_type=$1
-
# set the firewall rules script if none was specified
[ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall
 
if [ -r "${firewall_script}" ]; then
-   /bin/sh "${firewall_script}" "${_firewall_type}"
+   /bin/sh "${firewall_script}" "${firewall_type}"
echo 'Firewall rules loaded.'
elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; then
echo 'Warning: kernel has firewall functionality, but' \
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mps driver instability under stable/8

2011-05-03 Thread Dmitry Morozovsky
On Tue, 3 May 2011, Dmitry Morozovsky wrote:

DM> KDM> It looks like you have a SAS2008, with the 4.0 firmware.  I think it 
would
DM> KDM> be worthwhile to upgrade to the 9.0 firmware.  I know for sure there 
are
DM> KDM> issues with the 2.0 firmware, and I know the 9.0 firmware works fairly
DM> KDM> well.  I don't know whether the 4.0 firmware has any severe issues, 
but it
DM> KDM> would be good to eliminate firmware bugs before we chase driver issues.
DM> 
DM> [snip]
DM> 
DM> KDM> Well, I think the first thing to do is upgrade the firmware and see if 
that
DM> KDM> fixes it.
DM> 
DM> Well, I tried, and unfortunately I can not say that I'm happy after the 
DM> upgrade. :(
DM> 
DM> Particularly, adapter now takes *VERY* long time (>10 minutes) to 
initialize, 
DM> and report as "ERROR" in BIOS utility (while seeing all 24 disks; however, 
it 
DM> reports 8 x36 expanders instead of one).
DM> 
DM> I can't boot the system off this array yet; will experiment further :(

booted from USB stick, I have constantly repeating

(ses3:mps0:0:25:0): lost device
(ses3:mps0:0:25:0): removing device entry
ses3 at mps0 bus 0 scbus0 target 25 lun 0
ses3:  Fixed Enclosure Services SCSI-5 device 
ses3: 600.000MB/s transfers
ses3: Command Queueing enabled
ses3: SCSI-3 SES Device

for different sesN, which are detected many times:

at scbus0 target 0 lun 0 (da0,pass0)
at scbus0 target 1 lun 0 (da1,pass1)
at scbus0 target 2 lun 0 (da2,pass2)
at scbus0 target 3 lun 0 (pass11,da4)
at scbus0 target 4 lun 0 (pass12,da5)
at scbus0 target 5 lun 0 (pass9,da3)
at scbus0 target 24 lun 0 (pass5,ses3)
at scbus0 target 25 lun 0 (pass19,ses5)
at scbus0 target 26 lun 0 (pass10,ses4)
at scbus0 target 27 lun 0 (pass14,ses7)
at scbus0 target 33 lun 0 (pass13,ses6)
at scbus0 target 39 lun 0 (pass3,ses0)
at scbus0 target 45 lun 0 (pass4,ses1)
at scbus0 target 51 lun 0 (pass8,ses2)
at scbus0 target 55 lun 0 (pass15,da7)
at scbus0 target 63 lun 0 (pass16,da8)
at scbus0 target 71 lun 0 (pass17,da9)
at scbus0 target 79 lun 0 (pass18,da10)
at scbus0 target 87 lun 0 (pass6,da11)
at scbus0 target 95 lun 0 (pass20,da12)
at scbus0 target 103 lun 0 (pass21,da13)


-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS vs OSX Time Machine

2011-05-03 Thread Daniel O'Connor

On 29/04/2011, at 10:38, Jeremy Chadwick wrote:
>> The OSX box is connected via an Airport Express (11n).
> 
> Can you connect something to it via Ethernet and attempt an FTP transfer
> (both PUT (store on server) and GET (retrieve from server)) from a
> client on the wired network?  Make sure whatever you're PUT'ing and
> GET'ing are using the ZFS filesystem.  Don't forget "binary" mode too.

I tried dd'ing /dev/zero over SMB and got 40MB/sec (although I'm not using AIO 
yet..)

FTP'ing a 300 MB file averages 60-70MB/sec (the speed of my laptop HD)

ttcp between the hosts hits wire speed (100MB/sec)

>> OK. I don't think TM can use CIFS, I will try ISCSI as someone else 
>> suggested, perhaps it will help.
> 
> Be aware there are all sorts of caveats/complexities with iSCSI on
> FreeBSD.  There are past threads on -stable and -fs talking about them
> in great detail.  I personally wouldn't go this route.
> 
> Why can't OS X use CIFS?  It has the ability to mount a SMB filesystem,
> right?  Is there some reason you can't mount that, then tell TM to write
> its backups to /mountedcifs?

It looks like I had a dodgy disk which was being tickled by the time machine 
backup (eg dodgy sector where the backup was located) so  I have been chasing a 
ghost :)

However, thanks to everyone for your helpful suggestions!

I still haven't tried iSCSI, given I can't do a bare metal restore from it it 
doesn't seem worth it (also I don't have the time..)

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mps driver instability under stable/8

2011-05-03 Thread Dmitry Morozovsky
On Mon, 2 May 2011, Kenneth D. Merry wrote:

KDM> It looks like you have a SAS2008, with the 4.0 firmware.  I think it would
KDM> be worthwhile to upgrade to the 9.0 firmware.  I know for sure there are
KDM> issues with the 2.0 firmware, and I know the 9.0 firmware works fairly
KDM> well.  I don't know whether the 4.0 firmware has any severe issues, but it
KDM> would be good to eliminate firmware bugs before we chase driver issues.

[snip]

KDM> Well, I think the first thing to do is upgrade the firmware and see if that
KDM> fixes it.

Well, I tried, and unfortunately I can not say that I'm happy after the 
upgrade. :(

Particularly, adapter now takes *VERY* long time (>10 minutes) to initialize, 
and report as "ERROR" in BIOS utility (while seeing all 24 disks; however, it 
reports 8 x36 expanders instead of one).

I can't boot the system off this array yet; will experiment further :(

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Automatic reboot doesn't reboot

2011-05-03 Thread Jeremy Chadwick
On Tue, May 03, 2011 at 02:30:15PM +0200, Olaf Seibert wrote:
> On Tue 03 May 2011 at 05:20:52 -0700, Jeremy Chadwick wrote:
> > To be on the safe side, pick something that's small at first, then work
> > your way up.  You'll need probably 1+ weeks of heavy ZFS I/O between
> > tests (e.g. don't change the tunable, reboot, then 4 hours later declare
> > the new (larger) value as stable).
> 
> Ah, that's important: so far it seemed to me that a *too small* value
> (for all various tunables) would cause problems, but now you're saying
> that *too large* is the problem (at least for vfs.zfs.arc_max)! 

Too small = not-so-great performance (less data in the ARC means more
reads from the disks.  Disks are slower than RAM :-) ).

Too large = increased risk of kmem exhaustion panic.

> This machine has mixed loads; from time to time somebody starts a big
> job with lots of I/O, and in between it is much more modestly loaded.

I would recommend starting small (maybe 1/3rd of your physical RAM?) and
increase from there.  You can try the opposite technique too -- start
large (e.g. 3/4ths of RAM) and wait for a panic.  I'm of the opinion
that I'd rather have a stable system with less memory used for ARC than
a system which could panic and have more memory for ARC.

Sadly there's no 100% reliable way to calculate what's "ideal".  For
example I might use a smaller value than 6144M on a machine where mysqld
is tuned to utilise lots of RAM.  There's a balancing act that goes on
that takes some time to figure out.

For example, on our FreeBSD ZFS-backed NFS filer on our network, I ran
with a 3/4th amount for quite some time (we're talking 4-5 months).
Then suddenly one day I noticed the client machines were complaining
about NFS timeouts, etc...  Got on the filer, lo and behold kmem
exhaustion.  I decreased arc_max by about 1024M and it's been fine
since.

There's a lot of evolution that's occurred in the FreeBSD ZFS kernel
code over the years too.  Originally arc_max was a "high-water mark" of
some sort, but code was changed to make it a hard limit as much as it
could be.  Then some edge cases were found where it could still exceed
the maximum, so those were fixed, etc...  Tracking all the changes is
very difficult (I became very frustrated/irate at having to do so,
wishing that there was more of a "state of ZFS" announcement sent out
every so often so users/admins would know what's changed and adjust
things appropriately), requiring an admin to follow commits.  That's
just the nature of the beast.

> > So for example on an 8GB RAM machine, I might recommend starting with
> > vfs.zfs.arc_max="4096M" and let that run for a while.  If you find your
> > "Wired" value in top(1) remains fairly constant after a week or so of
> > heavy I/O, consider bumping up the value a bit more (say 4608M).
> 
> I'll do just that.

Let us know how things turn out.  Follow-ups that indicate things are
working are just as important as initial mails stating things aren't,
especially if you're someone searching the Web to try and find an answer
to what this kmem thing is all about.  :-)

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Jeremy Chadwick
On Tue, May 03, 2011 at 10:31:57AM +0100, Vincent Hoffman wrote:
> On 03/05/2011 10:16, Jeremy Chadwick wrote:
> 
> 
> > Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt
> > usage, etc. otherwise I'd be graphing that.  The more monitoring the
> > better; at least then I could say "wow, interrupts really did shoot
> > through the roof -- the box went crazy!" and RMA the thing.  :-)
> >
> you could use net-mgmt/bsnmp-regex although I dont know what the
> overhead for that is like.

Thanks for the tip.  I've investigated that plugin before, and its
implementation model seems like a very hackish way to accomplish
something that should ultimately be done inside of bsnmpd(8) itself via
native C.  It's good for parsing a single log file via tail -F (not
"tail -f" like the man page indicates), but it doesn't scale well.

bsnmpd(8) just needs to be enhanced and fixed, and I know there's
efforts underway by syrinx@ to do exactly that.  I have chatted with her
about some existing problems with bsnmpd(8) and its SNMP parser, and have
chatted with philip@ about a pf-related bug with bsnmp(8) (but I can't
remember what the details of that one is; I have a file with the info
around here somewhere...)

There was also a recent commit to net-mgmt/net-snmp that pertains to
*properly* monitoring swap, which makes me wonder if net-mgmt/bsnmp-ucd
(which a lot of people, myself included, rely on) also does the wrong
thing.

http://www.freebsd.org/cgi/query-pr.cgi?pr=153179
http://www.freebsd.org/cgi/cvsweb.cgi/ports/net-mgmt/net-snmp/files/patch-memory_freebsd.c

Things like this make me question my graphs and my monitoring data
pretty much every time I look at them.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Automatic reboot doesn't reboot

2011-05-03 Thread Olaf Seibert
On Tue 03 May 2011 at 05:20:52 -0700, Jeremy Chadwick wrote:
> To be on the safe side, pick something that's small at first, then work
> your way up.  You'll need probably 1+ weeks of heavy ZFS I/O between
> tests (e.g. don't change the tunable, reboot, then 4 hours later declare
> the new (larger) value as stable).

Ah, that's important: so far it seemed to me that a *too small* value
(for all various tunables) would cause problems, but now you're saying
that *too large* is the problem (at least for vfs.zfs.arc_max)! 

This machine has mixed loads; from time to time somebody starts a big
job with lots of I/O, and in between it is much more modestly loaded.

> So for example on an 8GB RAM machine, I might recommend starting with
> vfs.zfs.arc_max="4096M" and let that run for a while.  If you find your
> "Wired" value in top(1) remains fairly constant after a week or so of
> heavy I/O, consider bumping up the value a bit more (say 4608M).

I'll do just that.

> Sorry to make this long-winded; bad habit of mine that I've never
> managed to break.

Oh no problem, it turns out to be eye-opening!

> | Jeremy Chadwick   j...@parodius.com |
-Olaf.
-- 
Pipe rene = new PipePicture(); assert(Not rene.GetType().Equals(Pipe));
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Automatic reboot doesn't reboot

2011-05-03 Thread Jeremy Chadwick
On Tue, May 03, 2011 at 12:08:54PM +0200, Olaf Seibert wrote:
> On Tue 03 May 2011 at 02:21:13 -0700, Jeremy Chadwick wrote:
> > There are two things you might try fiddling with.  These are sysctls so
> > you can try them on the fly:
> > 
> > hw.acpi.disable_on_reboot
> > hw.acpi.handle_reboot
> 
> Thanks. For now I've set the second to 1 and we'll see if that affects
> matters.
> 
> > Check out the thread Peter Jeremy provided.  This is a near-sure
> > indicator of ZFS ARC exhaustion, and you seem to know of that.  What's
> > very interesting to me is this part of your mail:
> ...
> > 
> > Is this box running i386 or amd64?  If amd64, I can't explain why your
> 
> It's amd64. I double-checked just one, you never know what stupid
> mistakes one might make :-)
> 
> > /boot/loader.conf settings aren't taking -- they should be for sure.
> > Maybe provide us a full dmesg and XXX out things you consider
> > sensitive.  If i386, I'm not too surprised that some automatic defaults
> > get chosen instead of what you ask.
> 
> Based on one of your mails where setting vm.kmem_size to twice the real
> RAM size had adverse effects, I've taken the setting out to see if that
> improves matters. I'll have to wait until the next crash (or opportunity
> to reboot without too much disturbance) to see the effect.

The ill-effects are a result of an underlying change that I had
forgotten about but others remembered -- vm.kmem_size_scale used to be
set to something like "2" by default, but it was changed to "1" prior to
8.2-RELEASE.

So basically here's the current situation and how all of our 8.2-STABLE
machines are tuned for ARC: we only set one single tunable for ARC
"management": vfs.zfs.arc_max.  We don't touch vm.kmem_size.

Here's what we have literally in our /boot/loader.conf:

# Limit ZFS ARC maximum.
# NOTE #1: In 8.2-RELEASE and onward, vm.kmem_size_scale defaults to 1,
# which means vm.kmem_size should match the amount of RAM installed
# in the system.  If using an earlier FreeBSD release, be sure to set
# vm.kmem_size manually to the amount of RAM you have.
# NOTE #2: Do not set vm.kmem_size to 2x that of physical RAM, otherwise
# vfs.zfs.arc_max effectively becomes halved.
# http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010875.html
vfs.zfs.arc_max="6144M"

The value specified here (6144MBytes) is for a machine with 8GB of RAM.

Keep in mind that there is evidence that kmap/kmem exhaustion can still
happen even if you tune the ARC like this.  Apparently memory
fragmentation plays a role, and there's some overhead as well, so
calculating a 100% stable value is a little difficult.  I can point you
to that (very recent, as in last month) thread if you'd like.

To be on the safe side, pick something that's small at first, then work
your way up.  You'll need probably 1+ weeks of heavy ZFS I/O between
tests (e.g. don't change the tunable, reboot, then 4 hours later declare
the new (larger) value as stable).

So for example on an 8GB RAM machine, I might recommend starting with
vfs.zfs.arc_max="4096M" and let that run for a while.  If you find your
"Wired" value in top(1) remains fairly constant after a week or so of
heavy I/O, consider bumping up the value a bit more (say 4608M).

Sorry to make this long-winded; bad habit of mine that I've never
managed to break.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Daniel Hartmeier
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:

> The next thing I tried was "/etc/rc.d/pf stop", which worked.  Then I
> did "/etc/rc.d/pf start", which also worked.  However, what I saw next
> surely indicated a bug in the pf layer somewhere -- "pfctl -s states"
> and "pfctl -s info" disagreed on the state count:

This can be explained. Note that "/etc/rc.d/pf start" does first flush
all states by calling pfctl -F all.

This calls pf_unlink_state() for every state in the kernel, which
marks each state with PFTM_UNLINKED, but doesn't free it yet.

Such states do not show up in pfctl -s state output, but are still
counted in pfctl -s info output. Normally, they are freed the next
time the pfpurge thread runs (once per second).

It looks like the pfpurge thread was either

  a) sleeping indefinitely, not returning once a second from

tsleep(pf_purge_thread, PWAIT, "pftm", 1 * hz);

 or

  b) constantly failing to acquire a lock with

if (!sx_try_upgrade(&pf_consistency_lock))
return (0);

Maybe a) is possible when CLOCK_MONOTONIC is decreasing? And the
"POKED TIMER" messages you get from BIND, too?

Kind regards,
Daniel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Vlad Galu
On Tue, May 3, 2011 at 12:12 PM, Vlad Galu  wrote:

>
>
> On Tue, May 3, 2011 at 11:31 AM, Vincent Hoffman wrote:
>
>> On 03/05/2011 10:16, Jeremy Chadwick wrote:
>>
>> 
>> > Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt
>> > usage, etc. otherwise I'd be graphing that.  The more monitoring the
>> > better; at least then I could say "wow, interrupts really did shoot
>> > through the roof -- the box went crazy!" and RMA the thing.  :-)
>> >
>> you could use net-mgmt/bsnmp-regex although I dont know what the
>> overhead for that is like.
>>
>
> I use munin for graphing, as it allows easy scripting without using SNMP.
>
> My case is a bit different from Jeremy's. Every once in a while there is a
> sudden traffic spike which impacts pf performance as well. However, the
> graphed figures are nowhere near what I'd consider alarming levels (this box
> has withstood more in the past). I was able to coincidentally log in after
> such a spike and noticed the pfpurge thread eating up about 30% of the CPU
> while using the normal optimization policy. In my case, it could be related
> to another issue I'm seeing on this box - mbuma allocation failures. Here
> are my graphs:
>
> http://dl.dropbox.com/u/14650083/PF/bge_bits_1-week.png
> http://dl.dropbox.com/u/14650083/PF/bge_packets_1-week.png
> http://dl.dropbox.com/u/14650083/PF/bge_stats_1-week.png
> http://dl.dropbox.com/u/14650083/PF/load-week.png
> http://dl.dropbox.com/u/14650083/PF/mbuf_errors-week.png
> http://dl.dropbox.com/u/14650083/PF/mbuf_usage-week.png
> http://dl.dropbox.com/u/14650083/PF/pf_inserts-week.png
> http://dl.dropbox.com/u/14650083/PF/pf_matches-week.png
> http://dl.dropbox.com/u/14650083/PF/pf_removals-week.png
> http://dl.dropbox.com/u/14650083/PF/pf_searches-week.png
> http://dl.dropbox.com/u/14650083/PF/pf_src_limit-week.png
> http://dl.dropbox.com/u/14650083/PF/pf_states-week.png
> http://dl.dropbox.com/u/14650083/PF/pf_synproxy-week.png
>
> I'll wait for the next time the symptom occurs to switch to a stateless
> configuration.
>
>
I forgot to mention this is a UP box using TSC for timekeeping and running
ntpd.

-- /boot/loader.conf --
hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"
debug.acpi.disabled="timer"
-- /boot/loader.conf --

-- sysctl output --
kern.timecounter.choice: TSC(800) i8254(0) dummy(-100)
kern.timecounter.hardware: TSC
-- sysctl output --


-- 
Good, fast & cheap. Pick any two.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Vlad Galu
On Tue, May 3, 2011 at 11:31 AM, Vincent Hoffman  wrote:

> On 03/05/2011 10:16, Jeremy Chadwick wrote:
>
> 
> > Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt
> > usage, etc. otherwise I'd be graphing that.  The more monitoring the
> > better; at least then I could say "wow, interrupts really did shoot
> > through the roof -- the box went crazy!" and RMA the thing.  :-)
> >
> you could use net-mgmt/bsnmp-regex although I dont know what the
> overhead for that is like.
>

I use munin for graphing, as it allows easy scripting without using SNMP.

My case is a bit different from Jeremy's. Every once in a while there is a
sudden traffic spike which impacts pf performance as well. However, the
graphed figures are nowhere near what I'd consider alarming levels (this box
has withstood more in the past). I was able to coincidentally log in after
such a spike and noticed the pfpurge thread eating up about 30% of the CPU
while using the normal optimization policy. In my case, it could be related
to another issue I'm seeing on this box - mbuma allocation failures. Here
are my graphs:

http://dl.dropbox.com/u/14650083/PF/bge_bits_1-week.png
http://dl.dropbox.com/u/14650083/PF/bge_packets_1-week.png
http://dl.dropbox.com/u/14650083/PF/bge_stats_1-week.png
http://dl.dropbox.com/u/14650083/PF/load-week.png
http://dl.dropbox.com/u/14650083/PF/mbuf_errors-week.png
http://dl.dropbox.com/u/14650083/PF/mbuf_usage-week.png
http://dl.dropbox.com/u/14650083/PF/pf_inserts-week.png
http://dl.dropbox.com/u/14650083/PF/pf_matches-week.png
http://dl.dropbox.com/u/14650083/PF/pf_removals-week.png
http://dl.dropbox.com/u/14650083/PF/pf_searches-week.png
http://dl.dropbox.com/u/14650083/PF/pf_src_limit-week.png
http://dl.dropbox.com/u/14650083/PF/pf_states-week.png
http://dl.dropbox.com/u/14650083/PF/pf_synproxy-week.png

I'll wait for the next time the symptom occurs to switch to a stateless
configuration.



-- 
Good, fast & cheap. Pick any two.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Automatic reboot doesn't reboot

2011-05-03 Thread Olaf Seibert
On Tue 03 May 2011 at 02:21:13 -0700, Jeremy Chadwick wrote:
> There are two things you might try fiddling with.  These are sysctls so
> you can try them on the fly:
> 
> hw.acpi.disable_on_reboot
> hw.acpi.handle_reboot

Thanks. For now I've set the second to 1 and we'll see if that affects
matters.

> Check out the thread Peter Jeremy provided.  This is a near-sure
> indicator of ZFS ARC exhaustion, and you seem to know of that.  What's
> very interesting to me is this part of your mail:
...
> 
> Is this box running i386 or amd64?  If amd64, I can't explain why your

It's amd64. I double-checked just one, you never know what stupid
mistakes one might make :-)

> /boot/loader.conf settings aren't taking -- they should be for sure.
> Maybe provide us a full dmesg and XXX out things you consider
> sensitive.  If i386, I'm not too surprised that some automatic defaults
> get chosen instead of what you ask.

Based on one of your mails where setting vm.kmem_size to twice the real
RAM size had adverse effects, I've taken the setting out to see if that
improves matters. I'll have to wait until the next crash (or opportunity
to reboot without too much disturbance) to see the effect.

I put dmesg.boot in my other reply.

Thanks,
> | Jeremy Chadwick   j...@parodius.com |
-Olaf.
-- 
Pipe rene = new PipePicture(); assert(Not rene.GetType().Equals(Pipe));
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Automatic reboot doesn't reboot

2011-05-03 Thread Olaf Seibert
On Tue 03 May 2011 at 17:21:52 +1000, Peter Jeremy wrote:
> On 2011-May-02 16:32:30 +0200, Olaf Seibert  wrote:
> >However, it doesn't automatically reboot in 15 seconds, as promised.
> >It just sits there the whole weekend, until I log onto the IPMI console
> >and press the virtual reset button.
> 
> Your reference to IMPI indicates this is not a consumer PC.  Can you
> please provide some details of the hardware.

It is a Supermicro H8DME-2 motherboard with 2 dual Opteron S-F 2000
series CPUs (according to the spec sheet I have here). The IPMI console
(front-end processor as one would call it in the mainframe years ;-) is
an AOC-SiM1U+ with KVM over a dedicated LAN port. I usually access it
via its built-in webserver. I have appendend dmesg.boot at the end.

> Are you running ipmitools or similar? 

Not so far.

> Does "shutdown -r" or "reboot" work normally?

Yes, when I last used it while upgrading from 8.1 to 8.2 "shutdown -r"
worked fine, and on previous upgrades it worked too.

I can possibly imagine that the IPMI console would press a key just at
this inconvenient moment (so that the fault is entirely outside FreeBSD's
domain), but since it doesn't seem to do this at other moments, it seems
unlikely. Would pressing a key like "shift" stop a reboot?

> >panic: kmem_alloc(131072): kmem_map too small: 3428782080 total allocated
> 
> I suggest you have a read of the thread beginning
> http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010862.html
> (note that mailman has split it into at least 3 threads).

Thanks for the link. There seem to be some contradictory advices there
though: tune vm.kmem_size to twice the physical RAM size, or the same
size, or even 1,5 times. Apparently it is supposed to default to 1 x RAM
size, but for some reason on this machine it doesn't:

$ sysctl hw.realmem hw.physmem hw.usermem vm.kmem_size
hw.realmem: 9126805504
hw.physmem: 8580272128
hw.usermem: 3317899264
vm.kmem_size: 3739230208

$ sysctl vm.kmem_size_scale
vm.kmem_size_scale: 1

despite even the tune to 2 x RAM size in /boot/loader.conf.

I can imagine that since vfs.zfs.arc_max="4G" is larger than
vm.kmem_size, this might present a problem. On the other hand the
currently set value has apparently also been adjusted down:

$ sysctl vfs.zfs.arc_max
vfs.zfs.arc_max: 2665488384

This resembles the findings of Jeremy Chadwick  in
http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010880.html .

I think, based on that, that I will simply take out these setting
altogether, and after the next reboot we'll see how that affects
matters.

> -- 
> Peter Jeremy


Copyright (c) 1992-2011 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.2-RELEASE #3: Tue Apr 19 13:02:11 CEST 2011
r...@fourquid.cs.ru.nl:/usr/obj/usr/src/sys/FOURQUID amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Dual-Core AMD Opteron(tm) Processor 2212 (2010.32-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x40f13  Family = f  Model = 41  Stepping = 3
  
Features=0x178bfbff
  Features2=0x2001
  AMD Features=0xea500800
  AMD Features2=0x1f
real memory  = 8589934592 (8192 MB)
avail memory = 8267616256 (7884 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ioapic0  irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0:  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of fec0, 1000 (3) failed
acpi0: reservation of fee0, 1000 (3) failed
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, dff0 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x2008-0x200b on acpi0
cpu0:  on acpi0
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pci0:  at device 0.0 (no driver attached)
isab0:  at device 1.0 on pci0
isa0:  on isab0
pci0:  at device 1.1 (no driver attached)
ohci0:  mem 0xfe9bf000-0xfe9b irq 22 at 
device 2.0 on pci0
ohci0: [ITHREAD]
usbus0:  on ohci0
ehci0:  mem 0xfe9bec00-0xfe9becff irq 
23 at device 2.1 on pci0
ehci0: [ITHREAD]
usbus1: EHCI version 1.0
usbus1:  on ehci0
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 4.0 on pci0
ata0:  on atapci0
ata0: [ITHREAD]
ata1:  on atapci0
ata1: [ITHREAD]
atapci1:  port 
0xd480-0xd487,0xd400-0xd403,0xd080-0xd087,0xd000-0xd003,0xcc00-0xcc0f mem 
0xfe9bd000-0xfe9bdfff irq 21 at device 5.0 on pci0
atapci1: [ITHREAD]
ata2:  on atapci1
ata2: [ITHREAD]
ata3:  on atapci1
ata3: [ITHREAD]
atapci2:  port 
0xc880-0xc887,0xc800-0xc803,0xc480-0xc487,0xc400-0xc403,0xc080-0xc08f mem 
0xfe9bc000-0xfe9bcfff irq 22 at device 5.1 on pci0
atapci2: [ITHREAD]
ata4:  on a

Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Vincent Hoffman
On 03/05/2011 10:16, Jeremy Chadwick wrote:


> Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt
> usage, etc. otherwise I'd be graphing that.  The more monitoring the
> better; at least then I could say "wow, interrupts really did shoot
> through the roof -- the box went crazy!" and RMA the thing.  :-)
>
you could use net-mgmt/bsnmp-regex although I dont know what the
overhead for that is like.

Vince
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Daniel Hartmeier
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:

> Here's one piece of core.0.txt which makes no sense to me -- the "rate"
> column.  I have a very hard time believing that was the interrupt rate
> of all the relevant devices at the time (way too high).  Maybe this data
> becomes wrong only during a coredump?  The total column I could believe.
> 
> 
> vmstat -i
> 
> interrupt  total   rate
> irq4: uart054768912
> irq6: fdc0 1  0
> irq17: uhci1+172  2
> irq23: uhci3 ehci1+ 2367 39
> cpu0: timer  13183882632  219731377
> irq256: em02604910554341517
> irq257: em11275550362125917
> irq258: ahci0  2259231643765386
> cpu2: timer  13183881837  219731363
> cpu1: timer  13002196469  216703274
> cpu3: timer  13183881783  219731363
> Total53167869284  886131154
> 
> 
> Here's what a normal "vmstat -i" shows from the command-line:
> 
> # vmstat -i
> interrupt  total   rate
> irq4: uart0  518  0
> irq6: fdc0 1  0
> irq23: uhci3 ehci1+  145  0
> cpu0: timer 19041199   1999
> irq256: em0   614280 64
> irq257: em1   168529 17
> irq258: ahci0 355536 37
> cpu2: timer 19040462   1999
> cpu1: timer 19040458   1999
> cpu3: timer 19040454   1999
> Total   77301582   8119

The cpu0-3 timer totals seem consistent in the first output: 
13183881783/1999/60/60/24 matches 76 days of uptime.

The high rate in the first output comes from vmstat.c dointr()'s
division of the total by the uptime:

struct timespec sp;
clock_gettime(CLOCK_MONOTONIC, &sp);
uptime = sp.tv_sec;
for (i = 0; i < nintr; i++) {
printf("%-*s %20lu %10lu\n", istrnamlen, intrname,
*intrcnt, *intrcnt / uptime);
}

>From this we can deduce that the value of uptime must have been
13183881783/219731363 = 60 (seconds).

Since the uptime was 76 days (and not just 60 seconds), the
CLOCK_MONOTONIC clock must have reset, wrapped, or been overwritten.

I don't know how that's possible, but if this means that the kernel
variable time_second was possibly going back, that could very well
have messed up pf's state purging.

Daniel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Automatic reboot doesn't reboot

2011-05-03 Thread Jeremy Chadwick
On Mon, May 02, 2011 at 04:32:30PM +0200, Olaf Seibert wrote:
> I have a FreeBSD/amd64 8.2 server that has a few ZFS file systems served
> over NFS.  It has 8 GB of memory. There are 6 disks of 1,5 TB each
> forming a pool with raidz2.
> 
> >From time to time it crashes with some stack backtrace (included below).
> This already happened before the upgrade to 8.2.
> 
> Now a crash of a file server is annoying, but if it reboots
> automatically, there is just a few minutes of downtime (most of it is
> even spent by the BIOS before it gets to boot the OS).
> 
> However, it doesn't automatically reboot in 15 seconds, as promised.
> It just sits there the whole weekend, until I log onto the IPMI console
> and press the virtual reset button.

There are two things you might try fiddling with.  These are sysctls so
you can try them on the fly:

hw.acpi.disable_on_reboot
hw.acpi.handle_reboot

On our systems we set hw.acpi.handle_reboot=1 to speed up the reboot
process.  I remember hearing long ago how some people had issues getting
their machines to reboot (sometimes 100% of the time, other times
occasionally); using ACPI to reboot the machine fixed their issues.

> This was visible before I did that (4-finger copy):
> 
> panic: kmem_alloc(131072): kmem_map too small: 3428782080 total allocated
> cpuid = 0

Check out the thread Peter Jeremy provided.  This is a near-sure
indicator of ZFS ARC exhaustion, and you seem to know of that.  What's
very interesting to me is this part of your mail:

> There is some tuning in /boot/loader.conf from previous attempts tune to
> avoid crashes.
> 
> vm.kmem_size="16G"
> vfs.zfs.arc_max="4G"
> 
> Is that still useful, or does it harm by now? Real memory is 8 GB.
> I note that if I look with sysctl, I see
> 
> vm.kmem_size: 3739230208
> vfs.zfs.arc_max: 2665488384
> 
> which doesn't seem to match these attempted settings.

Is this box running i386 or amd64?  If amd64, I can't explain why your
/boot/loader.conf settings aren't taking -- they should be for sure.
Maybe provide us a full dmesg and XXX out things you consider
sensitive.  If i386, I'm not too surprised that some automatic defaults
get chosen instead of what you ask.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Jeremy Chadwick
On Tue, May 03, 2011 at 10:48:00AM +0200, Daniel Hartmeier wrote:
> On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:
> 
> > Status: Enabled for 76 days 06:49:10  Debug: Urgent
> 
> > The "pf uptime" shown above, by the way, matches system uptime.
> 
> > ps -axl
> > 
> >   UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT   TIME COMMAND
> > 0   422 0   0 -16  0 0 0 pftm   DL??  1362773081:04.00 
> > [pfpurge]
> 
> This looks weird, too. 1362773081 minutes would be >2500 years.
> 
> Usually, you should see [idle] with almost uptime in minutes, and
> [pfpurge] with much less, like in
> 
>   # uptime
>   10:22AM  up 87 days, 19:36, 1 user, load averages: 0.00, 0.03, 0.05
>   # echo "((87*24)+19)*60+36" | bc
>   126456
> 
>   # ps -axl
>   UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT   TIME COMMAND
> 0 7 0   0  44  0 0 8 pftm   DL??0:13.16 [pfpurge]
> 011 0   0 171  0 0 8 -  RL??  124311:23.04 [idle]

Agreed -- and that's exactly how things look on the same box right now:

$ ps -axl | egrep 'UID|pfpurge|idle'
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT   TIME COMMAND
011 0   0 171  0 064 -  RL??  2375:15.91 [idle]
0   422 0   0 -16  0 016 pftm   DL??0:00.28 [pfpurge]

The ps -axl output I provided earlier came from /var/crash/core.0.txt.
So it's interesting that ps -axl as well as vmstat -i both showed
something off-the-wall.  I wonder if this can happen when within ddb?
Unsure.  I do have the core from "call doadump", so I should be able to
go back and re-examine it with kgdb.  I just wish I knew what to poke
around looking for in there.

Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt
usage, etc. otherwise I'd be graphing that.  The more monitoring the
better; at least then I could say "wow, interrupts really did shoot
through the roof -- the box went crazy!" and RMA the thing.  :-)

> How is time handled on your machine? ntpdate on boot and then ntpd?

Yep, you got it:

ntpdate_enable="yes"
ntpdate_config="/conf/ME/ntp.conf"
ntpd_enable="yes"
ntpd_config="/conf/ME/ntp.conf"

I don't use ntpd_sync_on_start because I've never had reason to.  I
always set the system/BIOS clock to UTC time when building a system.  I
use ntpd's complaint about excessive offset as an indicator that
something bad happened.  /conf/ME/ntp.conf on this machine syncs from
another on the private network (em1) only, and that machine syncs from
a series of geographically-diverse stratum 2 servers and one stratum 1
server.  I've never seen high delays, offsets, or jitter using "ntpq -c
peers" on any box we have.

Actual timecounters (not time itself) are handled by ACPI-safe or
ACPI-fast (varies per boot; I've talked to jhb@ about this before and
it's normal).

powerd is in use on all our systems, and on this box use of processor
sleep states (lowest state = C2; physical CPU only supports C0-C2 and I
wouldn't go any lower than that anyway :-) ).  Appropriate
/boot/loader.conf entries that pertain to it:

# Enable use of P-state CPU frequency throttling.
# http://wiki.freebsd.org/TuningPowerConsumption
hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"

There are numerous other systems exactly like this one (literally same
model of hardware, RAM amount, CPU model, BIOS version and settings, and
system configuration, including pf) that have much higher load and fire
many more interrupts (particularly the NFS server!) that haven't
exhibited any problems.  This box had an uptime of 72 days, and prior to
that around 100 (before being taken down for world/kernel upgrades).
All machines have ECC RAM too, and MCA/MCE is in use.

You don't know how bad I'd love to blame this on a hardware issue (it's
always possible in some way or another), but the way this manifest
itself was extremely specific.  The problem could be super rare and
something triggered it that hasn't been seen before by developers.  So
far there's only 1 other user who has seen this behaviour but his was
attributed to use of "reassemble tcp" which I wasn't using; so the true
problem could still be out there.  I feel better knowing I'm not the
only one who's seen this oddity.

Since his post, I've removed all scrub rules from all of our machines as
a precaution.  If it ever happens again we'll have one more thing to
safely rule out.

We have other machines (different hardware, running RELENG_7 i386) which
have had 1+ year uptimes also using pf, so the possibility of just some
"crazy fluke" is plausible to me.

> Any manual time changes since the last boot?

None unless adjkerntz did something during the PST->PDT switchover, but
that would manifest itself as a +1 hour offset difference.

Since the machine rebooted the system synced its time without issue and
well within acceptable delta (1.075993 sec).  I did not power-cycle the
box during any of this; pure soft 

Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Daniel Hartmeier
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:

> Status: Enabled for 76 days 06:49:10  Debug: Urgent

> The "pf uptime" shown above, by the way, matches system uptime.

> ps -axl
> 
>   UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT   TIME COMMAND
> 0   422 0   0 -16  0 0 0 pftm   DL??  1362773081:04.00 
> [pfpurge]

This looks weird, too. 1362773081 minutes would be >2500 years.

Usually, you should see [idle] with almost uptime in minutes, and
[pfpurge] with much less, like in

  # uptime
  10:22AM  up 87 days, 19:36, 1 user, load averages: 0.00, 0.03, 0.05
  # echo "((87*24)+19)*60+36" | bc
  126456

  # ps -axl
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT   TIME COMMAND
0 7 0   0  44  0 0 8 pftm   DL??0:13.16 [pfpurge]
011 0   0 171  0 0 8 -  RL??  124311:23.04 [idle]

How is time handled on your machine? ntpdate on boot and then ntpd?
Any manual time changes since the last boot?

Daniel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Automatic reboot doesn't reboot

2011-05-03 Thread Peter Jeremy
On 2011-May-02 16:32:30 +0200, Olaf Seibert  wrote:
>However, it doesn't automatically reboot in 15 seconds, as promised.
>It just sits there the whole weekend, until I log onto the IPMI console
>and press the virtual reset button.

Your reference to IMPI indicates this is not a consumer PC.  Can you
please provide some details of the hardware.  Are you running ipmitools
or similar?  Does "shutdown -r" or "reboot" work normally?

>panic: kmem_alloc(131072): kmem_map too small: 3428782080 total allocated

I suggest you have a read of the thread beginning
http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010862.html
(note that mailman has split it into at least 3 threads).

-- 
Peter Jeremy


pgpQGveibDZlq.pgp
Description: PGP signature


Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Jeremy Chadwick
On Tue, May 03, 2011 at 09:00:42AM +0200, Daniel Hartmeier wrote:
> I read those graphs differently: the problem doesn't arise slowly,
> but rather seems to start suddenly at 13:00.
> 
> Right after 13:00, traffic on em0 drops, i.e. the firewall seems
> to stop forwarding packets completely.
> 
> Yet, at the same time, the states start to increase, almost linearly
> at about one state every two seconds, until the limit of 10,000 is
> reached. Reaching the limit seems to be only a side-effect of a
> problem that started at 13:00.
> 
> > Here's one piece of core.0.txt which makes no sense to me -- the "rate"
> > column.  I have a very hard time believing that was the interrupt rate
> > of all the relevant devices at the time (way too high).  Maybe this data
> > becomes wrong only during a coredump?  The total column I could believe.
> > 
> > 
> > vmstat -i
> > 
> > interrupt  total   rate
> > irq4: uart054768912
> > irq6: fdc0 1  0
> > irq17: uhci1+172  2
> > irq23: uhci3 ehci1+ 2367 39
> > cpu0: timer  13183882632  219731377
> > irq256: em02604910554341517
> > irq257: em11275550362125917
> > irq258: ahci0  2259231643765386
> > cpu2: timer  13183881837  219731363
> > cpu1: timer  13002196469  216703274
> > cpu3: timer  13183881783  219731363
> > Total53167869284  886131154
> > 
> 
> I find this suspect as well, but I don't have an explanation yet.
> 
> Are you using anything non-GENERIC related to timers, like change
> HZ or enable polling?

HZ is standard (1000 is the default I believe), and I do not use
polling.

> Are you sure the problem didn't start right at 13:00, and cause complete
> packet loss for the entire period, and that it grew gradually worse
> instead?

It's hard to discern from the graphs, but I can tell you exactly what I
saw TCP-wise since I did have some already-existing/established TCP
connections to the box (e.g. connections which already had ESTABLISHED
states according to pfctl -s state) when it began exhibiting issues.

Any packets which already had existing state entries in pf's state table
continued to work, and bidirectionally.  New inbound connections to the
box via em0 would result in no response/timeout (and as indicated per
pfctl, such packets were being dropped due to the state limit being
reached).  Outbound connections from the box via em0 to the outside
world also resulted in no response/timeout.  I will show you evidence of
the latter.

The first indication of a problem in syslog is the following message
from named -- this is the first in my entire life I've ever seen this
message, but seems to indicate some kind of internal watchdog was fired
within named itself.  The log I'm looking at, by the way, is
/var/log/all.log -- yes, I do turn that on (for reasons exactly like
this).  This box is a secondary nameserver (public), so keep that in
mind too.  Anyway:

May  1 12:50:14 isis named[728]: *** POKED TIMER ***

Seconds later, I see unexpected RCODE messages, lame server messages,
etc.. -- all which indicate packets to some degree are working ("the
usual" badly-configured nameservers on the Internet).

A few minutes later:

May  1 12:53:15 isis named[728]: *** POKED TIMER ***
May  1 12:53:54 isis named[728]: *** POKED TIMER ***

With more of the usual unexpected RCODE/SERVFAIL messages after that.
The next message:

May  1 13:28:55 isis named[728]: *** POKED TIMER ***
May  1 13:29:13 isis named[728]: *** POKED TIMER ***
May  1 13:30:11 isis last message repeated 3 times

Then more RCODE/SERVFAIL and something called "FORMERR" but that could
be normal as well.  Remember, all from named.

This "cycle" of behaviour continued, with the number of POKED TIMER
messages gradually increasing more and more as time went on.  By 16:07
on May 1st, these messages were arriving usually in "bursts" of 5 or 6.

Things finally "exploded", from named's perspective, here (with slaved
zones X'd out):

May  1 19:23:21 isis named[728]: *** POKED TIMER ***
May  1 19:28:59 isis named[728]: zone /IN: refresh: failure trying 
master x.x.x.x#53 (source x.x.x.x#0): operation canceled
May  1 19:35:32 isis named[728]: host unreachable resolving 
'dns2.djaweb.dz/A/IN': 213.179.160.66#53
May  1 19:35:32 isis named[728]: host unreachable resolving 
'dns2.djaweb.dz/A/IN': 193.0.12.4#53
May  1 19:35:32 isis named[728]: host unreachable resolving 
'dns2.djaweb.dz/A/IN': 193.194.64.242#53
May  1 19:35:32 isis named[728]: host unreachable resolving 
'dns2.djaweb.dz/A/IN': 192.134.0.49#53

And many other slaved zones reporting the exact same error.  The
hostnam

Re: RELENG_8 pf stack issue (state count spiraling out of control)

2011-05-03 Thread Daniel Hartmeier
I read those graphs differently: the problem doesn't arise slowly,
but rather seems to start suddenly at 13:00.

Right after 13:00, traffic on em0 drops, i.e. the firewall seems
to stop forwarding packets completely.

Yet, at the same time, the states start to increase, almost linearly
at about one state every two seconds, until the limit of 10,000 is
reached. Reaching the limit seems to be only a side-effect of a
problem that started at 13:00.

> Here's one piece of core.0.txt which makes no sense to me -- the "rate"
> column.  I have a very hard time believing that was the interrupt rate
> of all the relevant devices at the time (way too high).  Maybe this data
> becomes wrong only during a coredump?  The total column I could believe.
> 
> 
> vmstat -i
> 
> interrupt  total   rate
> irq4: uart054768912
> irq6: fdc0 1  0
> irq17: uhci1+172  2
> irq23: uhci3 ehci1+ 2367 39
> cpu0: timer  13183882632  219731377
> irq256: em02604910554341517
> irq257: em11275550362125917
> irq258: ahci0  2259231643765386
> cpu2: timer  13183881837  219731363
> cpu1: timer  13002196469  216703274
> cpu3: timer  13183881783  219731363
> Total53167869284  886131154
> 

I find this suspect as well, but I don't have an explanation yet.

Are you using anything non-GENERIC related to timers, like change
HZ or enable polling?

Are you sure the problem didn't start right at 13:00, and cause complete
packet loss for the entire period, and that it grew gradually worse
instead?

Daniel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"