Re: /etc/rc.d/ipfw can't deal with firewall_type?
On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote: > At Wed, 4 May 2011 03:47:02 +1000 (EST), > Ian Smith wrote: > > > > On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote: > > > Hi all, > > > Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, > > but > > > all packets could not over nat box. I've researched and found > > > /etc/rc.firewall does not recieve argument of firewall_type. So ipfw > > does > > > not divert and natd could not be performed. The reason is /etc/rc.d/ipfw > > > incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. > > Is > > > there any problem to do this? > > > > Yes. Assuming using the default firewall_script="/etc/rc.firewall", > > then as it says early in /etc/rc.firewall, you just needed to: > > > ># Define the firewall type in /etc/rc.conf. Valid values are: > >[..] It's just occured to me that - assuming you are NOT trying to start ipfw or natd inside a jail, which won't work - you may well be running into another problem related to some PRs/patches hrs@ (cc'd) is reviewing re startup order and loading of modules for ipfw and natd. You mentioned running an 'OPEN' firewall which (like any other type) will fail to load divert rule/s unless ipdivert.ko is already loaded or built into kernel. This can be solved meanwhile by either a) adding to /boot/loader.conf: ipdivert_load="YES" or b) by applying the following patch to /etc/rc.d/ipfw (on 7.x or 8.x) cheers, Ian --- rc.d_ipfw.1.24 Sat Jan 8 18:13:46 2011 +++ ipfwSat Jan 8 21:00:18 2011 @@ -27,9 +27,9 @@ fi if checkyesno firewall_nat_enable; then - if ! checkyesno natd_enable; then - required_modules="$required_modules ipfw_nat" - fi + required_modules="$required_modules ipfw_nat" + elif checkyesno natd_enable; then + required_modules="$required_modules ipdivert" fi } @@ -105,6 +105,7 @@ } load_rc_config $name -firewall_coscripts="/etc/rc.d/natd ${firewall_coscripts}" +checkyesno natd_enable && ! checkyesno firewall_nat_enable && \ + firewall_coscripts="/etc/rc.d/natd ${firewall_coscripts}" run_rc_command $* ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: /etc/rc.d/ipfw can't deal with firewall_type?
On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote: > At Wed, 4 May 2011 03:47:02 +1000 (EST), > Ian Smith wrote: > > > > On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote: > > > Hi all, > > > Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, > > but > > > all packets could not over nat box. I've researched and found > > > /etc/rc.firewall does not recieve argument of firewall_type. So ipfw > > does > > > not divert and natd could not be performed. The reason is /etc/rc.d/ipfw > > > incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. > > Is > > > there any problem to do this? > > > > Yes. Assuming using the default firewall_script="/etc/rc.firewall", > > then as it says early in /etc/rc.firewall, you just needed to: > > > ># Define the firewall type in /etc/rc.conf. Valid values are: > >[..] > > > > Sure, /etc/rc.firewall can set firewall_type to a parameter if you pass > > it one, but otherwise uses whatever $firewall_type is set to when you > > start ipfw. I guess the code below allows you to use syntax like: > > > > # /etc/rc.d/ipfw start client > I missed it intended to use in commandline but usually /etc/rc.d/* script > uses at startup rc. If /etc/rc.d/ipfw must be 2 arguments,firewall_type > always undefined at startup nevertheless it specified in /etc/rc.conf. It > is the very serious problem isn't it? /etc/rc.d/ipfw normally only takes one argument, {,quiet}start|stop|etc. The use of $1 in ipfw_start() surprised me actually, I'm only assuming its above intended use, but it's clearly an extra argument passed by rc, not the first argument to /etc/rc.d/ipfw itself (ie start|stop etc). Sorry to repeat, but normally firewall_type should be set in rc.conf - which works properly; no patching of /etc/rc.d/ipfw is needed. > > to override the $firewall_type set in /etc/rc.conf, but it's not the > > common usage, nor is it how ipfw is started normally by rc. > > > > So just set firewall_type in rc.conf and you should be fine .. unless > > you meant that you're trying to run ipfw & natd INSIDE a jail? > > The network being configure is as follows: >.../27 > -+ > |53 > +--+---+ > |bge0 jailed natd box | > |t2.st.foo (ipfw `OPEN') | > |+++++++ > |firewall| ns | ldap |diskless| mail | web | ftp | > | bge1 | bge1 | bge1 | bge1 | bge1 | bge1 | bge1 | > ++---++---++---++---++---++---++---+ > 254| 1| 2| 3| 4| 5| 6| > ---+++++++ >192.168.2.0/24 I'm not entirely sure how to interpret your diagram, but as far as I am aware you can run neither ipfw nor natd within a jail; both scripts have 'KEYWORD: nojail' so they won't be run on jail startup. There's been mention of work underway with VIMAGE toward a full stack inside jail(s), but for now you can run ipfw (and natd) only on the host system. > > > --- /etc/rc.d/ipfw.org 2011-05-03 18:19:28.0 +0900 > > > +++ /etc/rc.d/ipfw 2011-05-03 22:08:14.0 +0900 > > > @@ -35,15 +35,11 @@ > > > > > > ipfw_start() > > > { > > > - local _firewall_type > > > - > > > - _firewall_type=$1 > > > - > > > # set the firewall rules script if none was specified > > > [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall > > > > > > if [ -r "${firewall_script}" ]; then > > > - /bin/sh "${firewall_script}" "${_firewall_type}" > > > + /bin/sh "${firewall_script}" "${firewall_type}" > > > echo 'Firewall rules loaded.' > > > elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; > > then > > > echo 'Warning: kernel has firewall functionality, but' \ > > For the case of commandline usage, above patch should be modified as > follows: > > --- /etc/rc.d/ipfw.org 2011-05-03 18:19:28.0 +0900 > +++ /etc/rc.d/ipfw 2011-05-04 09:31:09.0 +0900 > @@ -37,7 +37,11 @@ > { > local _firewall_type > > -_firewall_type=$1 > +if [ -n "${1}" ]; then > +_firewall_type=$1 > +elif [ -n "${firewall_type}" ] > +_firewall_type=${firewall_type} > +fi > > # set the firewall rules script if none was specified > [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall It's still unnecessary to mess with this. See /etc/rc.firewall for its use of $fi
Re: /etc/rc.d/ipfw can't deal with firewall_type?
At Wed, 04 May 2011 10:40:12 +0900, My wrote: > > At Wed, 4 May 2011 03:47:02 +1000 (EST), > Ian Smith wrote: > > > > > +++ /etc/rc.d/ipfw 2011-05-03 22:08:14.0 +0900 > > > @@ -35,15 +35,11 @@ > > > > > > ipfw_start() > > > { > > > -local _firewall_type > > > - > > > -_firewall_type=$1 > > > - > > > # set the firewall rules script if none was specified > > > [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall > > > > > > if [ -r "${firewall_script}" ]; then > > > -/bin/sh "${firewall_script}" "${_firewall_type}" > > > +/bin/sh "${firewall_script}" "${firewall_type}" > > > echo 'Firewall rules loaded.' > > > elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; > > then > > > echo 'Warning: kernel has firewall functionality, but' \ > > For the case of commandline usage, above patch should be modified as > follows: > > --- /etc/rc.d/ipfw.org2011-05-03 18:19:28.0 +0900 > +++ /etc/rc.d/ipfw2011-05-04 09:31:09.0 +0900 > @@ -37,7 +37,11 @@ > { > local _firewall_type > > - _firewall_type=$1 > + if [ -n "${1}" ]; then > + _firewall_type=$1 > + elif [ -n "${firewall_type}" ] > + _firewall_type=${firewall_type} > + fi > > # set the firewall rules script if none was specified > [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall Above patch has typo. Collect one is as follows: --- /etc/rc.d/ipfw.org 2011-05-03 18:19:28.0 +0900 +++ /etc/rc.d/ipfw 2011-05-04 09:53:40.0 +0900 @@ -37,7 +37,11 @@ { local _firewall_type - _firewall_type=$1 + if [ -n "${1}" ]; then + _firewall_type=$1 + elif [ -n "${firewall_type}" ]; then + _firewall_type=${firewall_type} + fi # set the firewall rules script if none was specified [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: /etc/rc.d/ipfw can't deal with firewall_type?
At Wed, 4 May 2011 03:47:02 +1000 (EST), Ian Smith wrote: > > On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote: > > Hi all, > > Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, but > > all packets could not over nat box. I've researched and found > > /etc/rc.firewall does not recieve argument of firewall_type. So ipfw does > > not divert and natd could not be performed. The reason is /etc/rc.d/ipfw > > incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. Is > > there any problem to do this? > > Yes. Assuming using the default firewall_script="/etc/rc.firewall", > then as it says early in /etc/rc.firewall, you just needed to: > > # Define the firewall type in /etc/rc.conf. Valid values are: > [..] > > Sure, /etc/rc.firewall can set firewall_type to a parameter if you pass > it one, but otherwise uses whatever $firewall_type is set to when you > start ipfw. I guess the code below allows you to use syntax like: > > # /etc/rc.d/ipfw start client I missed it intended to use in commandline but usually /etc/rc.d/* script uses at startup rc. If /etc/rc.d/ipfw must be 2 arguments,firewall_type always undefined at startup nevertheless it specified in /etc/rc.conf. It is the very serious problem isn't it? > to override the $firewall_type set in /etc/rc.conf, but it's not the > common usage, nor is it how ipfw is started normally by rc. > > So just set firewall_type in rc.conf and you should be fine .. unless > you meant that you're trying to run ipfw & natd INSIDE a jail? The network being configure is as follows: .../27 -+ |53 +--+---+ |bge0 jailed natd box | |t2.st.foo (ipfw `OPEN') | |+++++++ |firewall| ns | ldap |diskless| mail | web | ftp | | bge1 | bge1 | bge1 | bge1 | bge1 | bge1 | bge1 | ++---++---++---++---++---++---++---+ 254| 1| 2| 3| 4| 5| 6| ---+++++++ 192.168.2.0/24 > cheers, Ian > > > --- /etc/rc.d/ipfw.org 2011-05-03 18:19:28.0 +0900 > > +++ /etc/rc.d/ipfw 2011-05-03 22:08:14.0 +0900 > > @@ -35,15 +35,11 @@ > > > > ipfw_start() > > { > > - local _firewall_type > > - > > - _firewall_type=$1 > > - > ># set the firewall rules script if none was specified > >[ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall > > > >if [ -r "${firewall_script}" ]; then > > - /bin/sh "${firewall_script}" "${_firewall_type}" > > + /bin/sh "${firewall_script}" "${firewall_type}" > >echo 'Firewall rules loaded.' > >elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; then > >echo 'Warning: kernel has firewall functionality, but' \ For the case of commandline usage, above patch should be modified as follows: --- /etc/rc.d/ipfw.org 2011-05-03 18:19:28.0 +0900 +++ /etc/rc.d/ipfw 2011-05-04 09:31:09.0 +0900 @@ -37,7 +37,11 @@ { local _firewall_type - _firewall_type=$1 + if [ -n "${1}" ]; then + _firewall_type=$1 + elif [ -n "${firewall_type}" ] + _firewall_type=${firewall_type} + fi # set the firewall rules script if none was specified [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zpool upgrade, can't boot
On Tue, May 3, 2011 at 12:34 PM, Eric Damien wrote: > Hi Scot, > > the link you provided is for a FreeBSD MBR Slice. > How about the GPT? Because I have the exact same problem, > and after following 2.7 (modified for no mirror) on > http://wiki.freebsd.org/RootOnZFS/InstallingFreeBSD > > I did > Fixit# sysctl kern.geom.debugflags=0x10 > Fixit# gpart bootcode -b /zroot/boot/pmbr -p /zroot/boot/gptzfsboot -i 1 ad0 > > but got the following error: > gpart: /dev/ad0p1: operation not permitted > > That should have worked. Is partition 1 (ad0p1) your freebsd-boot partition? Scot ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zpool upgrade, can't boot
Hi Scot, the link you provided is for a FreeBSD MBR Slice. How about the GPT? Because I have the exact same problem, and after following 2.7 (modified for no mirror) on http://wiki.freebsd.org/RootOnZFS/InstallingFreeBSD I did Fixit# sysctl kern.geom.debugflags=0x10 Fixit# gpart bootcode -b /zroot/boot/pmbr -p /zroot/boot/gptzfsboot -i 1 ad0 but got the following error: gpart: /dev/ad0p1: operation not permitted On Tue, 3 May 2011 01:26:21 -0500 Scot Hetzel wrote: > On Mon, May 2, 2011 at 11:42 AM, Jeff Blank > wrote: > > Hi, > > > > I recently upgraded from 8.0-STABLE to 8.2-STABLE (Apr. 29 checkout) > > and upgraded my zpool (includes root FS) from v13 to v15. This is a > > dual-boot laptop, so I'm using MBR/boot0 and not GPT. Here's what > > happens when I boot: > > > > F1 Win > > F2 ? > > F3 FreeBSD > > > > F6 PXE > > Boot: F3 > > ZFS: unsupported ZFS version 15 (should be 13) > > No ZFS pools located, can't boot > > > > I've googled around, but I can't find anything relevant for > > MBR/boot0 configurations, just GPT. I've ensured that the loaders > > and boot0/boot1/boot2 are all new, and I rebuilt/reinstalled them > > in a fixit environment just to be sure. I also ran 'boot0cfg > > -B' (with an appropriate -b), but nothing has changed. How can I > > get my pool booting again? > > > > You need to re-install the zfsboot code similar to step 10 (Install > ZFS boot) in > > http://wiki.freebsd.org/RootOnZFS/ZFSBootPartition > > Scot > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: /etc/rc.d/ipfw can't deal with firewall_type?
On Wed, 4 May 2011, KIRIYAMA Kazuhiko wrote: > Hi all, > Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, but > all packets could not over nat box. I've researched and found > /etc/rc.firewall does not recieve argument of firewall_type. So ipfw does > not divert and natd could not be performed. The reason is /etc/rc.d/ipfw > incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. Is > there any problem to do this? Yes. Assuming using the default firewall_script="/etc/rc.firewall", then as it says early in /etc/rc.firewall, you just needed to: # Define the firewall type in /etc/rc.conf. Valid values are: [..] Sure, /etc/rc.firewall can set firewall_type to a parameter if you pass it one, but otherwise uses whatever $firewall_type is set to when you start ipfw. I guess the code below allows you to use syntax like: # /etc/rc.d/ipfw start client to override the $firewall_type set in /etc/rc.conf, but it's not the common usage, nor is it how ipfw is started normally by rc. So just set firewall_type in rc.conf and you should be fine .. unless you meant that you're trying to run ipfw & natd INSIDE a jail? cheers, Ian > --- /etc/rc.d/ipfw.org 2011-05-03 18:19:28.0 +0900 > +++ /etc/rc.d/ipfw 2011-05-03 22:08:14.0 +0900 > @@ -35,15 +35,11 @@ > > ipfw_start() > { > -local _firewall_type > - > -_firewall_type=$1 > - > # set the firewall rules script if none was specified > [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall > > if [ -r "${firewall_script}" ]; then > -/bin/sh "${firewall_script}" "${_firewall_type}" > +/bin/sh "${firewall_script}" "${firewall_type}" > echo 'Firewall rules loaded.' > elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; then > echo 'Warning: kernel has firewall functionality, but' \ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mps driver instability under stable/8
On Tue, 3 May 2011, Kenneth D. Merry wrote: KDM> Sorry you ran into all of those problems! Needless to say I haven't seen KDM> that with the 9.0 firmware in my environment, but then again I've got a KDM> different setup. I just postd comment on LSI kb forum, will see how they'd comment it. KDM> If the firmware doesn't fix it, we'll go down the path of trying to see why KDM> the IOC fault is happening. I'm staying tuned, while conserver is writing logs ;-) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mps driver instability under stable/8
On Tue, May 03, 2011 at 21:28:27 +0400, Dmitry Morozovsky wrote: > > On Tue, 3 May 2011, Dmitry Morozovsky wrote: > > DM> DM> Well, I tried, and unfortunately I can not say that I'm happy after > the > DM> DM> upgrade. :( > DM> DM> > DM> DM> Particularly, adapter now takes *VERY* long time (>10 minutes) to > initialize, > DM> DM> and report as "ERROR" in BIOS utility (while seeing all 24 disks; > however, it > DM> DM> reports 8 x36 expanders instead of one). > DM> DM> > DM> DM> I can't boot the system off this array yet; will experiment further :( > DM> > DM> booted from USB stick, I have constantly repeating > DM> > DM> (ses3:mps0:0:25:0): lost device > DM> (ses3:mps0:0:25:0): removing device entry > DM> ses3 at mps0 bus 0 scbus0 target 25 lun 0 > DM> ses3: Fixed Enclosure Services SCSI-5 device > DM> ses3: 600.000MB/s transfers > DM> ses3: Command Queueing enabled > DM> ses3: SCSI-3 SES Device > DM> > DM> for different sesN, which are detected many times: > DM> > DM> at scbus0 target 0 lun 0 (da0,pass0) > DM> at scbus0 target 1 lun 0 (da1,pass1) > DM> at scbus0 target 2 lun 0 (da2,pass2) > DM> at scbus0 target 3 lun 0 (pass11,da4) > DM> at scbus0 target 4 lun 0 (pass12,da5) > DM> at scbus0 target 5 lun 0 (pass9,da3) > DM> at scbus0 target 24 lun 0 (pass5,ses3) > DM> at scbus0 target 25 lun 0 (pass19,ses5) > DM> at scbus0 target 26 lun 0 (pass10,ses4) > DM> at scbus0 target 27 lun 0 (pass14,ses7) > DM> at scbus0 target 33 lun 0 (pass13,ses6) > DM> at scbus0 target 39 lun 0 (pass3,ses0) > DM> at scbus0 target 45 lun 0 (pass4,ses1) > DM> at scbus0 target 51 lun 0 (pass8,ses2) > DM> at scbus0 target 55 lun 0 (pass15,da7) > DM> at scbus0 target 63 lun 0 (pass16,da8) > DM> at scbus0 target 71 lun 0 (pass17,da9) > DM> at scbus0 target 79 lun 0 (pass18,da10) > DM> at scbus0 target 87 lun 0 (pass6,da11) > DM> at scbus0 target 95 lun 0 (pass20,da12) > DM> at scbus0 target 103 lun 0 > (pass21,da13) > > Well, using > http://kb.lsi.com/KnowledgebaseArticle16414.aspx > I downgraded to version 8-fixed, and at least topology errors disappear. > > Just booted successfully (errm, it was a few nervous hours, to be honest :) > > Now I have in verbose kernel messages > > mps0: port 0xc000-0xc0ff mem > 0xfb43c000-0xfb43,0xfb44-0xfb47 irq 16 at device 0.0 on pci2 > mps0: Reserved 0x4000 bytes for rid 0x14 type 3 at 0xfb43c000 > mps0: Firmware: 08.00.00.00 > mps0: IOCCapabilities: 185c > mps0: attempting to allocate 1 MSI-X vectors (15 supported) > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49 > mps0: using IRQ 256 for MSI-X > mps0: [MPSAFE] > mps0: [ITHREAD] Sorry you ran into all of those problems! Needless to say I haven't seen that with the 9.0 firmware in my environment, but then again I've got a different setup. > Will see whether it helps. Yes. I know the 8.0 firmware also works well. The only issue I ran into there was the topology issues that I'm guessing they fixed in that build. If the firmware doesn't fix it, we'll go down the path of trying to see why the IOC fault is happening. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mps driver instability under stable/8
On Tue, 3 May 2011, Dmitry Morozovsky wrote: DM> DM> Well, I tried, and unfortunately I can not say that I'm happy after the DM> DM> upgrade. :( DM> DM> DM> DM> Particularly, adapter now takes *VERY* long time (>10 minutes) to initialize, DM> DM> and report as "ERROR" in BIOS utility (while seeing all 24 disks; however, it DM> DM> reports 8 x36 expanders instead of one). DM> DM> DM> DM> I can't boot the system off this array yet; will experiment further :( DM> DM> booted from USB stick, I have constantly repeating DM> DM> (ses3:mps0:0:25:0): lost device DM> (ses3:mps0:0:25:0): removing device entry DM> ses3 at mps0 bus 0 scbus0 target 25 lun 0 DM> ses3: Fixed Enclosure Services SCSI-5 device DM> ses3: 600.000MB/s transfers DM> ses3: Command Queueing enabled DM> ses3: SCSI-3 SES Device DM> DM> for different sesN, which are detected many times: DM> DM> at scbus0 target 0 lun 0 (da0,pass0) DM> at scbus0 target 1 lun 0 (da1,pass1) DM> at scbus0 target 2 lun 0 (da2,pass2) DM> at scbus0 target 3 lun 0 (pass11,da4) DM> at scbus0 target 4 lun 0 (pass12,da5) DM> at scbus0 target 5 lun 0 (pass9,da3) DM> at scbus0 target 24 lun 0 (pass5,ses3) DM> at scbus0 target 25 lun 0 (pass19,ses5) DM> at scbus0 target 26 lun 0 (pass10,ses4) DM> at scbus0 target 27 lun 0 (pass14,ses7) DM> at scbus0 target 33 lun 0 (pass13,ses6) DM> at scbus0 target 39 lun 0 (pass3,ses0) DM> at scbus0 target 45 lun 0 (pass4,ses1) DM> at scbus0 target 51 lun 0 (pass8,ses2) DM> at scbus0 target 55 lun 0 (pass15,da7) DM> at scbus0 target 63 lun 0 (pass16,da8) DM> at scbus0 target 71 lun 0 (pass17,da9) DM> at scbus0 target 79 lun 0 (pass18,da10) DM> at scbus0 target 87 lun 0 (pass6,da11) DM> at scbus0 target 95 lun 0 (pass20,da12) DM> at scbus0 target 103 lun 0 (pass21,da13) Well, using http://kb.lsi.com/KnowledgebaseArticle16414.aspx I downgraded to version 8-fixed, and at least topology errors disappear. Just booted successfully (errm, it was a few nervous hours, to be honest :) Now I have in verbose kernel messages mps0: port 0xc000-0xc0ff mem 0xfb43c000-0xfb43,0xfb44-0xfb47 irq 16 at device 0.0 on pci2 mps0: Reserved 0x4000 bytes for rid 0x14 type 3 at 0xfb43c000 mps0: Firmware: 08.00.00.00 mps0: IOCCapabilities: 185c mps0: attempting to allocate 1 MSI-X vectors (15 supported) msi: routing MSI-X IRQ 256 to local APIC 0 vector 49 mps0: using IRQ 256 for MSI-X mps0: [MPSAFE] mps0: [ITHREAD] Will see whether it helps. -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
/etc/rc.d/ipfw can't deal with firewall_type?
Hi all, Recently I upgraded to 8.2-STABLE and reconfigured natd + jailed box, but all packets could not over nat box. I've researched and found /etc/rc.firewall does not recieve argument of firewall_type. So ipfw does not divert and natd could not be performed. The reason is /etc/rc.d/ipfw incorrect. I think an patch below should be applyed to /etc/rc.d/ipfw. Is there any problem to do this? --- /etc/rc.d/ipfw.org 2011-05-03 18:19:28.0 +0900 +++ /etc/rc.d/ipfw 2011-05-03 22:08:14.0 +0900 @@ -35,15 +35,11 @@ ipfw_start() { - local _firewall_type - - _firewall_type=$1 - # set the firewall rules script if none was specified [ -z "${firewall_script}" ] && firewall_script=/etc/rc.firewall if [ -r "${firewall_script}" ]; then - /bin/sh "${firewall_script}" "${_firewall_type}" + /bin/sh "${firewall_script}" "${firewall_type}" echo 'Firewall rules loaded.' elif [ "`ipfw list 65535`" = "65535 deny ip from any to any" ]; then echo 'Warning: kernel has firewall functionality, but' \ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mps driver instability under stable/8
On Tue, 3 May 2011, Dmitry Morozovsky wrote: DM> KDM> It looks like you have a SAS2008, with the 4.0 firmware. I think it would DM> KDM> be worthwhile to upgrade to the 9.0 firmware. I know for sure there are DM> KDM> issues with the 2.0 firmware, and I know the 9.0 firmware works fairly DM> KDM> well. I don't know whether the 4.0 firmware has any severe issues, but it DM> KDM> would be good to eliminate firmware bugs before we chase driver issues. DM> DM> [snip] DM> DM> KDM> Well, I think the first thing to do is upgrade the firmware and see if that DM> KDM> fixes it. DM> DM> Well, I tried, and unfortunately I can not say that I'm happy after the DM> upgrade. :( DM> DM> Particularly, adapter now takes *VERY* long time (>10 minutes) to initialize, DM> and report as "ERROR" in BIOS utility (while seeing all 24 disks; however, it DM> reports 8 x36 expanders instead of one). DM> DM> I can't boot the system off this array yet; will experiment further :( booted from USB stick, I have constantly repeating (ses3:mps0:0:25:0): lost device (ses3:mps0:0:25:0): removing device entry ses3 at mps0 bus 0 scbus0 target 25 lun 0 ses3: Fixed Enclosure Services SCSI-5 device ses3: 600.000MB/s transfers ses3: Command Queueing enabled ses3: SCSI-3 SES Device for different sesN, which are detected many times: at scbus0 target 0 lun 0 (da0,pass0) at scbus0 target 1 lun 0 (da1,pass1) at scbus0 target 2 lun 0 (da2,pass2) at scbus0 target 3 lun 0 (pass11,da4) at scbus0 target 4 lun 0 (pass12,da5) at scbus0 target 5 lun 0 (pass9,da3) at scbus0 target 24 lun 0 (pass5,ses3) at scbus0 target 25 lun 0 (pass19,ses5) at scbus0 target 26 lun 0 (pass10,ses4) at scbus0 target 27 lun 0 (pass14,ses7) at scbus0 target 33 lun 0 (pass13,ses6) at scbus0 target 39 lun 0 (pass3,ses0) at scbus0 target 45 lun 0 (pass4,ses1) at scbus0 target 51 lun 0 (pass8,ses2) at scbus0 target 55 lun 0 (pass15,da7) at scbus0 target 63 lun 0 (pass16,da8) at scbus0 target 71 lun 0 (pass17,da9) at scbus0 target 79 lun 0 (pass18,da10) at scbus0 target 87 lun 0 (pass6,da11) at scbus0 target 95 lun 0 (pass20,da12) at scbus0 target 103 lun 0 (pass21,da13) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS vs OSX Time Machine
On 29/04/2011, at 10:38, Jeremy Chadwick wrote: >> The OSX box is connected via an Airport Express (11n). > > Can you connect something to it via Ethernet and attempt an FTP transfer > (both PUT (store on server) and GET (retrieve from server)) from a > client on the wired network? Make sure whatever you're PUT'ing and > GET'ing are using the ZFS filesystem. Don't forget "binary" mode too. I tried dd'ing /dev/zero over SMB and got 40MB/sec (although I'm not using AIO yet..) FTP'ing a 300 MB file averages 60-70MB/sec (the speed of my laptop HD) ttcp between the hosts hits wire speed (100MB/sec) >> OK. I don't think TM can use CIFS, I will try ISCSI as someone else >> suggested, perhaps it will help. > > Be aware there are all sorts of caveats/complexities with iSCSI on > FreeBSD. There are past threads on -stable and -fs talking about them > in great detail. I personally wouldn't go this route. > > Why can't OS X use CIFS? It has the ability to mount a SMB filesystem, > right? Is there some reason you can't mount that, then tell TM to write > its backups to /mountedcifs? It looks like I had a dodgy disk which was being tickled by the time machine backup (eg dodgy sector where the backup was located) so I have been chasing a ghost :) However, thanks to everyone for your helpful suggestions! I still haven't tried iSCSI, given I can't do a bare metal restore from it it doesn't seem worth it (also I don't have the time..) -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mps driver instability under stable/8
On Mon, 2 May 2011, Kenneth D. Merry wrote: KDM> It looks like you have a SAS2008, with the 4.0 firmware. I think it would KDM> be worthwhile to upgrade to the 9.0 firmware. I know for sure there are KDM> issues with the 2.0 firmware, and I know the 9.0 firmware works fairly KDM> well. I don't know whether the 4.0 firmware has any severe issues, but it KDM> would be good to eliminate firmware bugs before we chase driver issues. [snip] KDM> Well, I think the first thing to do is upgrade the firmware and see if that KDM> fixes it. Well, I tried, and unfortunately I can not say that I'm happy after the upgrade. :( Particularly, adapter now takes *VERY* long time (>10 minutes) to initialize, and report as "ERROR" in BIOS utility (while seeing all 24 disks; however, it reports 8 x36 expanders instead of one). I can't boot the system off this array yet; will experiment further :( -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Automatic reboot doesn't reboot
On Tue, May 03, 2011 at 02:30:15PM +0200, Olaf Seibert wrote: > On Tue 03 May 2011 at 05:20:52 -0700, Jeremy Chadwick wrote: > > To be on the safe side, pick something that's small at first, then work > > your way up. You'll need probably 1+ weeks of heavy ZFS I/O between > > tests (e.g. don't change the tunable, reboot, then 4 hours later declare > > the new (larger) value as stable). > > Ah, that's important: so far it seemed to me that a *too small* value > (for all various tunables) would cause problems, but now you're saying > that *too large* is the problem (at least for vfs.zfs.arc_max)! Too small = not-so-great performance (less data in the ARC means more reads from the disks. Disks are slower than RAM :-) ). Too large = increased risk of kmem exhaustion panic. > This machine has mixed loads; from time to time somebody starts a big > job with lots of I/O, and in between it is much more modestly loaded. I would recommend starting small (maybe 1/3rd of your physical RAM?) and increase from there. You can try the opposite technique too -- start large (e.g. 3/4ths of RAM) and wait for a panic. I'm of the opinion that I'd rather have a stable system with less memory used for ARC than a system which could panic and have more memory for ARC. Sadly there's no 100% reliable way to calculate what's "ideal". For example I might use a smaller value than 6144M on a machine where mysqld is tuned to utilise lots of RAM. There's a balancing act that goes on that takes some time to figure out. For example, on our FreeBSD ZFS-backed NFS filer on our network, I ran with a 3/4th amount for quite some time (we're talking 4-5 months). Then suddenly one day I noticed the client machines were complaining about NFS timeouts, etc... Got on the filer, lo and behold kmem exhaustion. I decreased arc_max by about 1024M and it's been fine since. There's a lot of evolution that's occurred in the FreeBSD ZFS kernel code over the years too. Originally arc_max was a "high-water mark" of some sort, but code was changed to make it a hard limit as much as it could be. Then some edge cases were found where it could still exceed the maximum, so those were fixed, etc... Tracking all the changes is very difficult (I became very frustrated/irate at having to do so, wishing that there was more of a "state of ZFS" announcement sent out every so often so users/admins would know what's changed and adjust things appropriately), requiring an admin to follow commits. That's just the nature of the beast. > > So for example on an 8GB RAM machine, I might recommend starting with > > vfs.zfs.arc_max="4096M" and let that run for a while. If you find your > > "Wired" value in top(1) remains fairly constant after a week or so of > > heavy I/O, consider bumping up the value a bit more (say 4608M). > > I'll do just that. Let us know how things turn out. Follow-ups that indicate things are working are just as important as initial mails stating things aren't, especially if you're someone searching the Web to try and find an answer to what this kmem thing is all about. :-) -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On Tue, May 03, 2011 at 10:31:57AM +0100, Vincent Hoffman wrote: > On 03/05/2011 10:16, Jeremy Chadwick wrote: > > > > Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt > > usage, etc. otherwise I'd be graphing that. The more monitoring the > > better; at least then I could say "wow, interrupts really did shoot > > through the roof -- the box went crazy!" and RMA the thing. :-) > > > you could use net-mgmt/bsnmp-regex although I dont know what the > overhead for that is like. Thanks for the tip. I've investigated that plugin before, and its implementation model seems like a very hackish way to accomplish something that should ultimately be done inside of bsnmpd(8) itself via native C. It's good for parsing a single log file via tail -F (not "tail -f" like the man page indicates), but it doesn't scale well. bsnmpd(8) just needs to be enhanced and fixed, and I know there's efforts underway by syrinx@ to do exactly that. I have chatted with her about some existing problems with bsnmpd(8) and its SNMP parser, and have chatted with philip@ about a pf-related bug with bsnmp(8) (but I can't remember what the details of that one is; I have a file with the info around here somewhere...) There was also a recent commit to net-mgmt/net-snmp that pertains to *properly* monitoring swap, which makes me wonder if net-mgmt/bsnmp-ucd (which a lot of people, myself included, rely on) also does the wrong thing. http://www.freebsd.org/cgi/query-pr.cgi?pr=153179 http://www.freebsd.org/cgi/cvsweb.cgi/ports/net-mgmt/net-snmp/files/patch-memory_freebsd.c Things like this make me question my graphs and my monitoring data pretty much every time I look at them. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Automatic reboot doesn't reboot
On Tue 03 May 2011 at 05:20:52 -0700, Jeremy Chadwick wrote: > To be on the safe side, pick something that's small at first, then work > your way up. You'll need probably 1+ weeks of heavy ZFS I/O between > tests (e.g. don't change the tunable, reboot, then 4 hours later declare > the new (larger) value as stable). Ah, that's important: so far it seemed to me that a *too small* value (for all various tunables) would cause problems, but now you're saying that *too large* is the problem (at least for vfs.zfs.arc_max)! This machine has mixed loads; from time to time somebody starts a big job with lots of I/O, and in between it is much more modestly loaded. > So for example on an 8GB RAM machine, I might recommend starting with > vfs.zfs.arc_max="4096M" and let that run for a while. If you find your > "Wired" value in top(1) remains fairly constant after a week or so of > heavy I/O, consider bumping up the value a bit more (say 4608M). I'll do just that. > Sorry to make this long-winded; bad habit of mine that I've never > managed to break. Oh no problem, it turns out to be eye-opening! > | Jeremy Chadwick j...@parodius.com | -Olaf. -- Pipe rene = new PipePicture(); assert(Not rene.GetType().Equals(Pipe)); ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Automatic reboot doesn't reboot
On Tue, May 03, 2011 at 12:08:54PM +0200, Olaf Seibert wrote: > On Tue 03 May 2011 at 02:21:13 -0700, Jeremy Chadwick wrote: > > There are two things you might try fiddling with. These are sysctls so > > you can try them on the fly: > > > > hw.acpi.disable_on_reboot > > hw.acpi.handle_reboot > > Thanks. For now I've set the second to 1 and we'll see if that affects > matters. > > > Check out the thread Peter Jeremy provided. This is a near-sure > > indicator of ZFS ARC exhaustion, and you seem to know of that. What's > > very interesting to me is this part of your mail: > ... > > > > Is this box running i386 or amd64? If amd64, I can't explain why your > > It's amd64. I double-checked just one, you never know what stupid > mistakes one might make :-) > > > /boot/loader.conf settings aren't taking -- they should be for sure. > > Maybe provide us a full dmesg and XXX out things you consider > > sensitive. If i386, I'm not too surprised that some automatic defaults > > get chosen instead of what you ask. > > Based on one of your mails where setting vm.kmem_size to twice the real > RAM size had adverse effects, I've taken the setting out to see if that > improves matters. I'll have to wait until the next crash (or opportunity > to reboot without too much disturbance) to see the effect. The ill-effects are a result of an underlying change that I had forgotten about but others remembered -- vm.kmem_size_scale used to be set to something like "2" by default, but it was changed to "1" prior to 8.2-RELEASE. So basically here's the current situation and how all of our 8.2-STABLE machines are tuned for ARC: we only set one single tunable for ARC "management": vfs.zfs.arc_max. We don't touch vm.kmem_size. Here's what we have literally in our /boot/loader.conf: # Limit ZFS ARC maximum. # NOTE #1: In 8.2-RELEASE and onward, vm.kmem_size_scale defaults to 1, # which means vm.kmem_size should match the amount of RAM installed # in the system. If using an earlier FreeBSD release, be sure to set # vm.kmem_size manually to the amount of RAM you have. # NOTE #2: Do not set vm.kmem_size to 2x that of physical RAM, otherwise # vfs.zfs.arc_max effectively becomes halved. # http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010875.html vfs.zfs.arc_max="6144M" The value specified here (6144MBytes) is for a machine with 8GB of RAM. Keep in mind that there is evidence that kmap/kmem exhaustion can still happen even if you tune the ARC like this. Apparently memory fragmentation plays a role, and there's some overhead as well, so calculating a 100% stable value is a little difficult. I can point you to that (very recent, as in last month) thread if you'd like. To be on the safe side, pick something that's small at first, then work your way up. You'll need probably 1+ weeks of heavy ZFS I/O between tests (e.g. don't change the tunable, reboot, then 4 hours later declare the new (larger) value as stable). So for example on an 8GB RAM machine, I might recommend starting with vfs.zfs.arc_max="4096M" and let that run for a while. If you find your "Wired" value in top(1) remains fairly constant after a week or so of heavy I/O, consider bumping up the value a bit more (say 4608M). Sorry to make this long-winded; bad habit of mine that I've never managed to break. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote: > The next thing I tried was "/etc/rc.d/pf stop", which worked. Then I > did "/etc/rc.d/pf start", which also worked. However, what I saw next > surely indicated a bug in the pf layer somewhere -- "pfctl -s states" > and "pfctl -s info" disagreed on the state count: This can be explained. Note that "/etc/rc.d/pf start" does first flush all states by calling pfctl -F all. This calls pf_unlink_state() for every state in the kernel, which marks each state with PFTM_UNLINKED, but doesn't free it yet. Such states do not show up in pfctl -s state output, but are still counted in pfctl -s info output. Normally, they are freed the next time the pfpurge thread runs (once per second). It looks like the pfpurge thread was either a) sleeping indefinitely, not returning once a second from tsleep(pf_purge_thread, PWAIT, "pftm", 1 * hz); or b) constantly failing to acquire a lock with if (!sx_try_upgrade(&pf_consistency_lock)) return (0); Maybe a) is possible when CLOCK_MONOTONIC is decreasing? And the "POKED TIMER" messages you get from BIND, too? Kind regards, Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On Tue, May 3, 2011 at 12:12 PM, Vlad Galu wrote: > > > On Tue, May 3, 2011 at 11:31 AM, Vincent Hoffman wrote: > >> On 03/05/2011 10:16, Jeremy Chadwick wrote: >> >> >> > Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt >> > usage, etc. otherwise I'd be graphing that. The more monitoring the >> > better; at least then I could say "wow, interrupts really did shoot >> > through the roof -- the box went crazy!" and RMA the thing. :-) >> > >> you could use net-mgmt/bsnmp-regex although I dont know what the >> overhead for that is like. >> > > I use munin for graphing, as it allows easy scripting without using SNMP. > > My case is a bit different from Jeremy's. Every once in a while there is a > sudden traffic spike which impacts pf performance as well. However, the > graphed figures are nowhere near what I'd consider alarming levels (this box > has withstood more in the past). I was able to coincidentally log in after > such a spike and noticed the pfpurge thread eating up about 30% of the CPU > while using the normal optimization policy. In my case, it could be related > to another issue I'm seeing on this box - mbuma allocation failures. Here > are my graphs: > > http://dl.dropbox.com/u/14650083/PF/bge_bits_1-week.png > http://dl.dropbox.com/u/14650083/PF/bge_packets_1-week.png > http://dl.dropbox.com/u/14650083/PF/bge_stats_1-week.png > http://dl.dropbox.com/u/14650083/PF/load-week.png > http://dl.dropbox.com/u/14650083/PF/mbuf_errors-week.png > http://dl.dropbox.com/u/14650083/PF/mbuf_usage-week.png > http://dl.dropbox.com/u/14650083/PF/pf_inserts-week.png > http://dl.dropbox.com/u/14650083/PF/pf_matches-week.png > http://dl.dropbox.com/u/14650083/PF/pf_removals-week.png > http://dl.dropbox.com/u/14650083/PF/pf_searches-week.png > http://dl.dropbox.com/u/14650083/PF/pf_src_limit-week.png > http://dl.dropbox.com/u/14650083/PF/pf_states-week.png > http://dl.dropbox.com/u/14650083/PF/pf_synproxy-week.png > > I'll wait for the next time the symptom occurs to switch to a stateless > configuration. > > I forgot to mention this is a UP box using TSC for timekeeping and running ntpd. -- /boot/loader.conf -- hint.p4tcc.0.disabled="1" hint.acpi_throttle.0.disabled="1" debug.acpi.disabled="timer" -- /boot/loader.conf -- -- sysctl output -- kern.timecounter.choice: TSC(800) i8254(0) dummy(-100) kern.timecounter.hardware: TSC -- sysctl output -- -- Good, fast & cheap. Pick any two. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On Tue, May 3, 2011 at 11:31 AM, Vincent Hoffman wrote: > On 03/05/2011 10:16, Jeremy Chadwick wrote: > > > > Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt > > usage, etc. otherwise I'd be graphing that. The more monitoring the > > better; at least then I could say "wow, interrupts really did shoot > > through the roof -- the box went crazy!" and RMA the thing. :-) > > > you could use net-mgmt/bsnmp-regex although I dont know what the > overhead for that is like. > I use munin for graphing, as it allows easy scripting without using SNMP. My case is a bit different from Jeremy's. Every once in a while there is a sudden traffic spike which impacts pf performance as well. However, the graphed figures are nowhere near what I'd consider alarming levels (this box has withstood more in the past). I was able to coincidentally log in after such a spike and noticed the pfpurge thread eating up about 30% of the CPU while using the normal optimization policy. In my case, it could be related to another issue I'm seeing on this box - mbuma allocation failures. Here are my graphs: http://dl.dropbox.com/u/14650083/PF/bge_bits_1-week.png http://dl.dropbox.com/u/14650083/PF/bge_packets_1-week.png http://dl.dropbox.com/u/14650083/PF/bge_stats_1-week.png http://dl.dropbox.com/u/14650083/PF/load-week.png http://dl.dropbox.com/u/14650083/PF/mbuf_errors-week.png http://dl.dropbox.com/u/14650083/PF/mbuf_usage-week.png http://dl.dropbox.com/u/14650083/PF/pf_inserts-week.png http://dl.dropbox.com/u/14650083/PF/pf_matches-week.png http://dl.dropbox.com/u/14650083/PF/pf_removals-week.png http://dl.dropbox.com/u/14650083/PF/pf_searches-week.png http://dl.dropbox.com/u/14650083/PF/pf_src_limit-week.png http://dl.dropbox.com/u/14650083/PF/pf_states-week.png http://dl.dropbox.com/u/14650083/PF/pf_synproxy-week.png I'll wait for the next time the symptom occurs to switch to a stateless configuration. -- Good, fast & cheap. Pick any two. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Automatic reboot doesn't reboot
On Tue 03 May 2011 at 02:21:13 -0700, Jeremy Chadwick wrote: > There are two things you might try fiddling with. These are sysctls so > you can try them on the fly: > > hw.acpi.disable_on_reboot > hw.acpi.handle_reboot Thanks. For now I've set the second to 1 and we'll see if that affects matters. > Check out the thread Peter Jeremy provided. This is a near-sure > indicator of ZFS ARC exhaustion, and you seem to know of that. What's > very interesting to me is this part of your mail: ... > > Is this box running i386 or amd64? If amd64, I can't explain why your It's amd64. I double-checked just one, you never know what stupid mistakes one might make :-) > /boot/loader.conf settings aren't taking -- they should be for sure. > Maybe provide us a full dmesg and XXX out things you consider > sensitive. If i386, I'm not too surprised that some automatic defaults > get chosen instead of what you ask. Based on one of your mails where setting vm.kmem_size to twice the real RAM size had adverse effects, I've taken the setting out to see if that improves matters. I'll have to wait until the next crash (or opportunity to reboot without too much disturbance) to see the effect. I put dmesg.boot in my other reply. Thanks, > | Jeremy Chadwick j...@parodius.com | -Olaf. -- Pipe rene = new PipePicture(); assert(Not rene.GetType().Equals(Pipe)); ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Automatic reboot doesn't reboot
On Tue 03 May 2011 at 17:21:52 +1000, Peter Jeremy wrote: > On 2011-May-02 16:32:30 +0200, Olaf Seibert wrote: > >However, it doesn't automatically reboot in 15 seconds, as promised. > >It just sits there the whole weekend, until I log onto the IPMI console > >and press the virtual reset button. > > Your reference to IMPI indicates this is not a consumer PC. Can you > please provide some details of the hardware. It is a Supermicro H8DME-2 motherboard with 2 dual Opteron S-F 2000 series CPUs (according to the spec sheet I have here). The IPMI console (front-end processor as one would call it in the mainframe years ;-) is an AOC-SiM1U+ with KVM over a dedicated LAN port. I usually access it via its built-in webserver. I have appendend dmesg.boot at the end. > Are you running ipmitools or similar? Not so far. > Does "shutdown -r" or "reboot" work normally? Yes, when I last used it while upgrading from 8.1 to 8.2 "shutdown -r" worked fine, and on previous upgrades it worked too. I can possibly imagine that the IPMI console would press a key just at this inconvenient moment (so that the fault is entirely outside FreeBSD's domain), but since it doesn't seem to do this at other moments, it seems unlikely. Would pressing a key like "shift" stop a reboot? > >panic: kmem_alloc(131072): kmem_map too small: 3428782080 total allocated > > I suggest you have a read of the thread beginning > http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010862.html > (note that mailman has split it into at least 3 threads). Thanks for the link. There seem to be some contradictory advices there though: tune vm.kmem_size to twice the physical RAM size, or the same size, or even 1,5 times. Apparently it is supposed to default to 1 x RAM size, but for some reason on this machine it doesn't: $ sysctl hw.realmem hw.physmem hw.usermem vm.kmem_size hw.realmem: 9126805504 hw.physmem: 8580272128 hw.usermem: 3317899264 vm.kmem_size: 3739230208 $ sysctl vm.kmem_size_scale vm.kmem_size_scale: 1 despite even the tune to 2 x RAM size in /boot/loader.conf. I can imagine that since vfs.zfs.arc_max="4G" is larger than vm.kmem_size, this might present a problem. On the other hand the currently set value has apparently also been adjusted down: $ sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 2665488384 This resembles the findings of Jeremy Chadwick in http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010880.html . I think, based on that, that I will simply take out these setting altogether, and after the next reboot we'll see how that affects matters. > -- > Peter Jeremy Copyright (c) 1992-2011 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.2-RELEASE #3: Tue Apr 19 13:02:11 CEST 2011 r...@fourquid.cs.ru.nl:/usr/obj/usr/src/sys/FOURQUID amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual-Core AMD Opteron(tm) Processor 2212 (2010.32-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x40f13 Family = f Model = 41 Stepping = 3 Features=0x178bfbff Features2=0x2001 AMD Features=0xea500800 AMD Features2=0x1f real memory = 8589934592 (8192 MB) avail memory = 8267616256 (7884 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 2 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of fec0, 1000 (3) failed acpi0: reservation of fee0, 1000 (3) failed acpi0: reservation of 0, a (3) failed acpi0: reservation of 10, dff0 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x2008-0x200b on acpi0 cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.0 (no driver attached) isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) ohci0: mem 0xfe9bf000-0xfe9b irq 22 at device 2.0 on pci0 ohci0: [ITHREAD] usbus0: on ohci0 ehci0: mem 0xfe9bec00-0xfe9becff irq 23 at device 2.1 on pci0 ehci0: [ITHREAD] usbus1: EHCI version 1.0 usbus1: on ehci0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 4.0 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] atapci1: port 0xd480-0xd487,0xd400-0xd403,0xd080-0xd087,0xd000-0xd003,0xcc00-0xcc0f mem 0xfe9bd000-0xfe9bdfff irq 21 at device 5.0 on pci0 atapci1: [ITHREAD] ata2: on atapci1 ata2: [ITHREAD] ata3: on atapci1 ata3: [ITHREAD] atapci2: port 0xc880-0xc887,0xc800-0xc803,0xc480-0xc487,0xc400-0xc403,0xc080-0xc08f mem 0xfe9bc000-0xfe9bcfff irq 22 at device 5.1 on pci0 atapci2: [ITHREAD] ata4: on a
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On 03/05/2011 10:16, Jeremy Chadwick wrote: > Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt > usage, etc. otherwise I'd be graphing that. The more monitoring the > better; at least then I could say "wow, interrupts really did shoot > through the roof -- the box went crazy!" and RMA the thing. :-) > you could use net-mgmt/bsnmp-regex although I dont know what the overhead for that is like. Vince ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote: > Here's one piece of core.0.txt which makes no sense to me -- the "rate" > column. I have a very hard time believing that was the interrupt rate > of all the relevant devices at the time (way too high). Maybe this data > becomes wrong only during a coredump? The total column I could believe. > > > vmstat -i > > interrupt total rate > irq4: uart054768912 > irq6: fdc0 1 0 > irq17: uhci1+172 2 > irq23: uhci3 ehci1+ 2367 39 > cpu0: timer 13183882632 219731377 > irq256: em02604910554341517 > irq257: em11275550362125917 > irq258: ahci0 2259231643765386 > cpu2: timer 13183881837 219731363 > cpu1: timer 13002196469 216703274 > cpu3: timer 13183881783 219731363 > Total53167869284 886131154 > > > Here's what a normal "vmstat -i" shows from the command-line: > > # vmstat -i > interrupt total rate > irq4: uart0 518 0 > irq6: fdc0 1 0 > irq23: uhci3 ehci1+ 145 0 > cpu0: timer 19041199 1999 > irq256: em0 614280 64 > irq257: em1 168529 17 > irq258: ahci0 355536 37 > cpu2: timer 19040462 1999 > cpu1: timer 19040458 1999 > cpu3: timer 19040454 1999 > Total 77301582 8119 The cpu0-3 timer totals seem consistent in the first output: 13183881783/1999/60/60/24 matches 76 days of uptime. The high rate in the first output comes from vmstat.c dointr()'s division of the total by the uptime: struct timespec sp; clock_gettime(CLOCK_MONOTONIC, &sp); uptime = sp.tv_sec; for (i = 0; i < nintr; i++) { printf("%-*s %20lu %10lu\n", istrnamlen, intrname, *intrcnt, *intrcnt / uptime); } >From this we can deduce that the value of uptime must have been 13183881783/219731363 = 60 (seconds). Since the uptime was 76 days (and not just 60 seconds), the CLOCK_MONOTONIC clock must have reset, wrapped, or been overwritten. I don't know how that's possible, but if this means that the kernel variable time_second was possibly going back, that could very well have messed up pf's state purging. Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Automatic reboot doesn't reboot
On Mon, May 02, 2011 at 04:32:30PM +0200, Olaf Seibert wrote: > I have a FreeBSD/amd64 8.2 server that has a few ZFS file systems served > over NFS. It has 8 GB of memory. There are 6 disks of 1,5 TB each > forming a pool with raidz2. > > >From time to time it crashes with some stack backtrace (included below). > This already happened before the upgrade to 8.2. > > Now a crash of a file server is annoying, but if it reboots > automatically, there is just a few minutes of downtime (most of it is > even spent by the BIOS before it gets to boot the OS). > > However, it doesn't automatically reboot in 15 seconds, as promised. > It just sits there the whole weekend, until I log onto the IPMI console > and press the virtual reset button. There are two things you might try fiddling with. These are sysctls so you can try them on the fly: hw.acpi.disable_on_reboot hw.acpi.handle_reboot On our systems we set hw.acpi.handle_reboot=1 to speed up the reboot process. I remember hearing long ago how some people had issues getting their machines to reboot (sometimes 100% of the time, other times occasionally); using ACPI to reboot the machine fixed their issues. > This was visible before I did that (4-finger copy): > > panic: kmem_alloc(131072): kmem_map too small: 3428782080 total allocated > cpuid = 0 Check out the thread Peter Jeremy provided. This is a near-sure indicator of ZFS ARC exhaustion, and you seem to know of that. What's very interesting to me is this part of your mail: > There is some tuning in /boot/loader.conf from previous attempts tune to > avoid crashes. > > vm.kmem_size="16G" > vfs.zfs.arc_max="4G" > > Is that still useful, or does it harm by now? Real memory is 8 GB. > I note that if I look with sysctl, I see > > vm.kmem_size: 3739230208 > vfs.zfs.arc_max: 2665488384 > > which doesn't seem to match these attempted settings. Is this box running i386 or amd64? If amd64, I can't explain why your /boot/loader.conf settings aren't taking -- they should be for sure. Maybe provide us a full dmesg and XXX out things you consider sensitive. If i386, I'm not too surprised that some automatic defaults get chosen instead of what you ask. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On Tue, May 03, 2011 at 10:48:00AM +0200, Daniel Hartmeier wrote: > On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote: > > > Status: Enabled for 76 days 06:49:10 Debug: Urgent > > > The "pf uptime" shown above, by the way, matches system uptime. > > > ps -axl > > > > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > > 0 422 0 0 -16 0 0 0 pftm DL?? 1362773081:04.00 > > [pfpurge] > > This looks weird, too. 1362773081 minutes would be >2500 years. > > Usually, you should see [idle] with almost uptime in minutes, and > [pfpurge] with much less, like in > > # uptime > 10:22AM up 87 days, 19:36, 1 user, load averages: 0.00, 0.03, 0.05 > # echo "((87*24)+19)*60+36" | bc > 126456 > > # ps -axl > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 0 7 0 0 44 0 0 8 pftm DL??0:13.16 [pfpurge] > 011 0 0 171 0 0 8 - RL?? 124311:23.04 [idle] Agreed -- and that's exactly how things look on the same box right now: $ ps -axl | egrep 'UID|pfpurge|idle' UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 011 0 0 171 0 064 - RL?? 2375:15.91 [idle] 0 422 0 0 -16 0 016 pftm DL??0:00.28 [pfpurge] The ps -axl output I provided earlier came from /var/crash/core.0.txt. So it's interesting that ps -axl as well as vmstat -i both showed something off-the-wall. I wonder if this can happen when within ddb? Unsure. I do have the core from "call doadump", so I should be able to go back and re-examine it with kgdb. I just wish I knew what to poke around looking for in there. Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt usage, etc. otherwise I'd be graphing that. The more monitoring the better; at least then I could say "wow, interrupts really did shoot through the roof -- the box went crazy!" and RMA the thing. :-) > How is time handled on your machine? ntpdate on boot and then ntpd? Yep, you got it: ntpdate_enable="yes" ntpdate_config="/conf/ME/ntp.conf" ntpd_enable="yes" ntpd_config="/conf/ME/ntp.conf" I don't use ntpd_sync_on_start because I've never had reason to. I always set the system/BIOS clock to UTC time when building a system. I use ntpd's complaint about excessive offset as an indicator that something bad happened. /conf/ME/ntp.conf on this machine syncs from another on the private network (em1) only, and that machine syncs from a series of geographically-diverse stratum 2 servers and one stratum 1 server. I've never seen high delays, offsets, or jitter using "ntpq -c peers" on any box we have. Actual timecounters (not time itself) are handled by ACPI-safe or ACPI-fast (varies per boot; I've talked to jhb@ about this before and it's normal). powerd is in use on all our systems, and on this box use of processor sleep states (lowest state = C2; physical CPU only supports C0-C2 and I wouldn't go any lower than that anyway :-) ). Appropriate /boot/loader.conf entries that pertain to it: # Enable use of P-state CPU frequency throttling. # http://wiki.freebsd.org/TuningPowerConsumption hint.p4tcc.0.disabled="1" hint.acpi_throttle.0.disabled="1" There are numerous other systems exactly like this one (literally same model of hardware, RAM amount, CPU model, BIOS version and settings, and system configuration, including pf) that have much higher load and fire many more interrupts (particularly the NFS server!) that haven't exhibited any problems. This box had an uptime of 72 days, and prior to that around 100 (before being taken down for world/kernel upgrades). All machines have ECC RAM too, and MCA/MCE is in use. You don't know how bad I'd love to blame this on a hardware issue (it's always possible in some way or another), but the way this manifest itself was extremely specific. The problem could be super rare and something triggered it that hasn't been seen before by developers. So far there's only 1 other user who has seen this behaviour but his was attributed to use of "reassemble tcp" which I wasn't using; so the true problem could still be out there. I feel better knowing I'm not the only one who's seen this oddity. Since his post, I've removed all scrub rules from all of our machines as a precaution. If it ever happens again we'll have one more thing to safely rule out. We have other machines (different hardware, running RELENG_7 i386) which have had 1+ year uptimes also using pf, so the possibility of just some "crazy fluke" is plausible to me. > Any manual time changes since the last boot? None unless adjkerntz did something during the PST->PDT switchover, but that would manifest itself as a +1 hour offset difference. Since the machine rebooted the system synced its time without issue and well within acceptable delta (1.075993 sec). I did not power-cycle the box during any of this; pure soft
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote: > Status: Enabled for 76 days 06:49:10 Debug: Urgent > The "pf uptime" shown above, by the way, matches system uptime. > ps -axl > > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 0 422 0 0 -16 0 0 0 pftm DL?? 1362773081:04.00 > [pfpurge] This looks weird, too. 1362773081 minutes would be >2500 years. Usually, you should see [idle] with almost uptime in minutes, and [pfpurge] with much less, like in # uptime 10:22AM up 87 days, 19:36, 1 user, load averages: 0.00, 0.03, 0.05 # echo "((87*24)+19)*60+36" | bc 126456 # ps -axl UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 7 0 0 44 0 0 8 pftm DL??0:13.16 [pfpurge] 011 0 0 171 0 0 8 - RL?? 124311:23.04 [idle] How is time handled on your machine? ntpdate on boot and then ntpd? Any manual time changes since the last boot? Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Automatic reboot doesn't reboot
On 2011-May-02 16:32:30 +0200, Olaf Seibert wrote: >However, it doesn't automatically reboot in 15 seconds, as promised. >It just sits there the whole weekend, until I log onto the IPMI console >and press the virtual reset button. Your reference to IMPI indicates this is not a consumer PC. Can you please provide some details of the hardware. Are you running ipmitools or similar? Does "shutdown -r" or "reboot" work normally? >panic: kmem_alloc(131072): kmem_map too small: 3428782080 total allocated I suggest you have a read of the thread beginning http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010862.html (note that mailman has split it into at least 3 threads). -- Peter Jeremy pgpQGveibDZlq.pgp Description: PGP signature
Re: RELENG_8 pf stack issue (state count spiraling out of control)
On Tue, May 03, 2011 at 09:00:42AM +0200, Daniel Hartmeier wrote: > I read those graphs differently: the problem doesn't arise slowly, > but rather seems to start suddenly at 13:00. > > Right after 13:00, traffic on em0 drops, i.e. the firewall seems > to stop forwarding packets completely. > > Yet, at the same time, the states start to increase, almost linearly > at about one state every two seconds, until the limit of 10,000 is > reached. Reaching the limit seems to be only a side-effect of a > problem that started at 13:00. > > > Here's one piece of core.0.txt which makes no sense to me -- the "rate" > > column. I have a very hard time believing that was the interrupt rate > > of all the relevant devices at the time (way too high). Maybe this data > > becomes wrong only during a coredump? The total column I could believe. > > > > > > vmstat -i > > > > interrupt total rate > > irq4: uart054768912 > > irq6: fdc0 1 0 > > irq17: uhci1+172 2 > > irq23: uhci3 ehci1+ 2367 39 > > cpu0: timer 13183882632 219731377 > > irq256: em02604910554341517 > > irq257: em11275550362125917 > > irq258: ahci0 2259231643765386 > > cpu2: timer 13183881837 219731363 > > cpu1: timer 13002196469 216703274 > > cpu3: timer 13183881783 219731363 > > Total53167869284 886131154 > > > > I find this suspect as well, but I don't have an explanation yet. > > Are you using anything non-GENERIC related to timers, like change > HZ or enable polling? HZ is standard (1000 is the default I believe), and I do not use polling. > Are you sure the problem didn't start right at 13:00, and cause complete > packet loss for the entire period, and that it grew gradually worse > instead? It's hard to discern from the graphs, but I can tell you exactly what I saw TCP-wise since I did have some already-existing/established TCP connections to the box (e.g. connections which already had ESTABLISHED states according to pfctl -s state) when it began exhibiting issues. Any packets which already had existing state entries in pf's state table continued to work, and bidirectionally. New inbound connections to the box via em0 would result in no response/timeout (and as indicated per pfctl, such packets were being dropped due to the state limit being reached). Outbound connections from the box via em0 to the outside world also resulted in no response/timeout. I will show you evidence of the latter. The first indication of a problem in syslog is the following message from named -- this is the first in my entire life I've ever seen this message, but seems to indicate some kind of internal watchdog was fired within named itself. The log I'm looking at, by the way, is /var/log/all.log -- yes, I do turn that on (for reasons exactly like this). This box is a secondary nameserver (public), so keep that in mind too. Anyway: May 1 12:50:14 isis named[728]: *** POKED TIMER *** Seconds later, I see unexpected RCODE messages, lame server messages, etc.. -- all which indicate packets to some degree are working ("the usual" badly-configured nameservers on the Internet). A few minutes later: May 1 12:53:15 isis named[728]: *** POKED TIMER *** May 1 12:53:54 isis named[728]: *** POKED TIMER *** With more of the usual unexpected RCODE/SERVFAIL messages after that. The next message: May 1 13:28:55 isis named[728]: *** POKED TIMER *** May 1 13:29:13 isis named[728]: *** POKED TIMER *** May 1 13:30:11 isis last message repeated 3 times Then more RCODE/SERVFAIL and something called "FORMERR" but that could be normal as well. Remember, all from named. This "cycle" of behaviour continued, with the number of POKED TIMER messages gradually increasing more and more as time went on. By 16:07 on May 1st, these messages were arriving usually in "bursts" of 5 or 6. Things finally "exploded", from named's perspective, here (with slaved zones X'd out): May 1 19:23:21 isis named[728]: *** POKED TIMER *** May 1 19:28:59 isis named[728]: zone /IN: refresh: failure trying master x.x.x.x#53 (source x.x.x.x#0): operation canceled May 1 19:35:32 isis named[728]: host unreachable resolving 'dns2.djaweb.dz/A/IN': 213.179.160.66#53 May 1 19:35:32 isis named[728]: host unreachable resolving 'dns2.djaweb.dz/A/IN': 193.0.12.4#53 May 1 19:35:32 isis named[728]: host unreachable resolving 'dns2.djaweb.dz/A/IN': 193.194.64.242#53 May 1 19:35:32 isis named[728]: host unreachable resolving 'dns2.djaweb.dz/A/IN': 192.134.0.49#53 And many other slaved zones reporting the exact same error. The hostnam
Re: RELENG_8 pf stack issue (state count spiraling out of control)
I read those graphs differently: the problem doesn't arise slowly, but rather seems to start suddenly at 13:00. Right after 13:00, traffic on em0 drops, i.e. the firewall seems to stop forwarding packets completely. Yet, at the same time, the states start to increase, almost linearly at about one state every two seconds, until the limit of 10,000 is reached. Reaching the limit seems to be only a side-effect of a problem that started at 13:00. > Here's one piece of core.0.txt which makes no sense to me -- the "rate" > column. I have a very hard time believing that was the interrupt rate > of all the relevant devices at the time (way too high). Maybe this data > becomes wrong only during a coredump? The total column I could believe. > > > vmstat -i > > interrupt total rate > irq4: uart054768912 > irq6: fdc0 1 0 > irq17: uhci1+172 2 > irq23: uhci3 ehci1+ 2367 39 > cpu0: timer 13183882632 219731377 > irq256: em02604910554341517 > irq257: em11275550362125917 > irq258: ahci0 2259231643765386 > cpu2: timer 13183881837 219731363 > cpu1: timer 13002196469 216703274 > cpu3: timer 13183881783 219731363 > Total53167869284 886131154 > I find this suspect as well, but I don't have an explanation yet. Are you using anything non-GENERIC related to timers, like change HZ or enable polling? Are you sure the problem didn't start right at 13:00, and cause complete packet loss for the entire period, and that it grew gradually worse instead? Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"