Re: cpu timer issues

2010-09-30 Thread Andriy Gapon
on 30/09/2010 01:27 Jurgen Weber said the following:
 Gentlemen
 
 Ah, ok. Learn something new everyday. Fantastic. The first time the machine
 stopped during the boot process, but that is ok the 2nd time we have success.
 
 http://pastebin.com/r4UWdN7U
 
 I am not sure if ACPI is on, Jeremy you mention below that it should be in 
 just
 by booting with this option so let me know if there are any problems there.

If you disabled it in BIOS, you have to re-enable it there.
There is no magic.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-30 Thread Andriy Gapon
on 30/09/2010 02:27 Don Lewis said the following:
 On 29 Sep, Andriy Gapon wrote:
 on 29/09/2010 11:56 Don Lewis said the following:
 I'm using the same kernel config as the one on a slower !SMP box which
 I'm trying to squeeze as much performance out of as possible.  My kernel
 config file contains these statements:
 nooptions   SMP
 nodeviceapic

 Testing with an SMP kernel is on my TODO list.

 SMP or not, it's really weird to see apic disabled nowadays.
 
 I tried enabling apic and got worse results.  I saw ping RTTs as high as
 67 seconds.  Here's the timer info with apic enabled:

I didn't expect anything to change in this output with APIC enabled.

 # sysctl kern.timecounter
 kern.timecounter.tick: 1
 kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-100)
 kern.timecounter.hardware: ACPI-fast
 kern.timecounter.stepwarnings: 0
 kern.timecounter.tc.i8254.mask: 65535
 kern.timecounter.tc.i8254.counter: 53633
 kern.timecounter.tc.i8254.frequency: 1193182
 kern.timecounter.tc.i8254.quality: 0
 kern.timecounter.tc.ACPI-fast.mask: 16777215
 kern.timecounter.tc.ACPI-fast.counter: 7988816
 kern.timecounter.tc.ACPI-fast.frequency: 3579545
 kern.timecounter.tc.ACPI-fast.quality: 1000
 kern.timecounter.tc.TSC.mask: 4294967295
 kern.timecounter.tc.TSC.counter: 1341917999
 kern.timecounter.tc.TSC.frequency: 2500014018
 kern.timecounter.tc.TSC.quality: 800
 kern.timecounter.invariant_tsc: 0
 
 Here's the verbose boot info with apic:
 http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-apic-verbose.txt

vmstat -i ?

 I've also experimented with SMP as well as SCHED_4BSD (all previous
 testing was with !SMP and SCHED_ULE).  I still see occasional problems
 with SCHED_4BSD and !SMP, but so far I have not seen any problems with
 SCHED_ULE and SMP.

Good!

 I did manage to catch the problem with lock profiling enabled:
 http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE_lock_profile_freeze.txt
 I'm currently testing SMP some more to verify if it really avoids this
 problem.

OK.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


MCA messages in dmesg

2010-09-30 Thread Adam Vande More
For awhile now, my home server has been acting up.  Actually it had a bad
set of RAM long ago, replaced and it and worked fine.  It's been weird again
now, and I've found this in dmesg:

MCA: Bank 0, Status 0xf2000800
MCA: Global Cap 0x0806, Status 0x
MCA: Vendor GenuineIntel, ID 0x6fb, APIC ID 2
MCA: CPU 2 UNCOR PCC OVER BUSL0 Source ERR Memory
MCA: Bank 0, Status 0xf2000800
MCA: Global Cap 0x0806, Status 0x
MCA: Vendor GenuineIntel, ID 0x6fb, APIC ID 3
MCA: CPU 3 UNCOR PCC OVER BUSL0 Source ERR Memory

I really don't know what MCA is, but that looks like possibility bad RAM
again.  I have some other DIMM's I can try, but I was hoping someone had
some info on exactly what those messages mean.  One concern is the
motherboard bad, and hosing the memory.

Some more info:

FreeBSD vbox.galacticdominator.com 8.1-STABLE FreeBSD 8.1-STABLE #0: Mon
Aug  2 11:19:16 CDT 2010
a...@vbox.galacticdominator.com:/usr/obj/usr/src/sys/GENERIC
amd64

smbios.bios.reldate=01/22/2008
smbios.bios.vendor=Phoenix Technologies, LTD
smbios.bios.version=6.00 PG
smbios.chassis.maker=NVIDIA
smbios.chassis.serial= 
smbios.chassis.tag= 
smbios.chassis.version=NFORCE 680i LT SLI
smbios.memory.enabled=4194304
smbios.planar.maker=NVIDIA
smbios.planar.product=NFORCE 680i LT SLI
smbios.planar.serial=1
smbios.planar.version=2
smbios.socket.enabled=1
smbios.socket.populated=1
smbios.system.maker=NVIDIA
smbios.system.product=NFORCE 680i LT SLI
smbios.system.serial=1
smbios.system.uuid=86fe600d-034b-0400--
smbios.system.version=2
smbios.version=2.4

Normal dmesg:

Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.1-STABLE #0: Mon Aug  2 11:19:16 CDT 2010
a...@vbox.galacticdominator.com:/usr/obj/usr/src/sys/GENERIC amd64
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Quad CPUQ6600  @ 2.40GHz (2700.03-MHz K8-class
CPU)
  Origin = GenuineIntel  Id = 0x6fb  Family = 6  Model = f  Stepping = 11

Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0xe3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 4073664512 (3884 MB)
ACPI APIC Table: Nvidia NVDAACPI
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ioapic0: Changing APIC ID to 4
ioapic0 Version 1.1 irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: Nvidia NVDAACPI on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, afdf (3) failed
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
cpu0: ACPI CPU on acpi0
cpu1: ACPI CPU on acpi0
cpu2: ACPI CPU on acpi0
cpu3: ACPI CPU on acpi0
acpi_hpet0: High Precision Event Timer iomem 0xfeff-0xfeff03ff on
acpi0
device_attach: acpi_hpet0 attach returned 12
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pci0: memory, RAM at device 0.1 (no driver attached)
pci0: memory, RAM at device 0.2 (no driver attached)
pci0: memory, RAM at device 0.3 (no driver attached)
pci0: memory, RAM at device 0.4 (no driver attached)
pci0: memory, RAM at device 0.5 (no driver attached)
pci0: memory, RAM at device 0.6 (no driver attached)
pci0: memory, RAM at device 0.7 (no driver attached)
pci0: memory, RAM at device 1.0 (no driver attached)
pci0: memory, RAM at device 1.1 (no driver attached)
pci0: memory, RAM at device 1.2 (no driver attached)
pci0: memory, RAM at device 1.3 (no driver attached)
pci0: memory, RAM at device 1.4 (no driver attached)
pci0: memory, RAM at device 1.5 (no driver attached)
pci0: memory, RAM at device 1.6 (no driver attached)
pci0: memory, RAM at device 2.0 (no driver attached)
pci0: memory, RAM at device 2.1 (no driver attached)
pci0: memory, RAM at device 2.2 (no driver attached)
pcib1: ACPI PCI-PCI bridge at device 3.0 on pci0
pci1: ACPI PCI bus on pcib1
vgapci0: VGA-compatible display port 0x8c00-0x8c7f mem
0xcc00-0xccff,0xb000-0xbfff,0xcd00-0xcdff irq 16 at
device 0.0 on pci1
nvidia0: GeForce 7600 GT on vgapci0
vgapci0: child nvidia0 requested pci_enable_busmaster
vgapci0: child nvidia0 requested pci_enable_io
vgapci0: child nvidia0 requested pci_enable_io
nvidia0: [ITHREAD]
pci0: memory, RAM at device 9.0 (no driver attached)
isab0: PCI-ISA bridge port 0xfc00-0xfc7f at device 10.0 on pci0
isa0: ISA 

mysqld_safe holding open a pty/tty on FreeBSD (7.x and 8.x)

2010-09-30 Thread Jeremy Chadwick
Something interesting I've come across which happens on both RELENG_7
and RELENG_8 (indicating it's not a problem with the older tty code or
the newer pty/pts code), and it's reproducible on Linux (sort of...).

mysqld_safe appears to hold a pty/tty open even after the process has
been backgrounded.  I can understand how/why this might occur, just not
in this particular case.

I had a colleague test the situation on his Linux machine.  He was able
to confirm that:

1) mysqld_safe  /dev/null 21  never released the tty
2) nohup mysqld_safe  /dev/null 21  did release the tty

With regards to test #1, looking in /proc/{pid}/fd showed that STDIN was
being held open.  I recommended he point STDIN to /dev/null as so:

mysqld_safe  /dev/null  /dev/null 21 

Which also solved the problem.

On FreeBSD it's a different story.  Below, mysql-server was started as
root on pts/1.  The open file descriptors all point to /dev/null, so I'm
not sure why the pty/tty is being held open.

icarus# ps -aux -U mysql
USERPID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
mysql 10078  0.2  0.3 35100 11032   1  S11:38PM   0:00.02 [mysqld]
mysql  9997  0.0  0.0  8228  1592   1  S11:38PM   0:00.01 /bin/sh 
/usr/local/bin/mysqld_safe --defaults-extra-file=/storage/mys
icarus# procstat -f 9997
  PID COMM   FD T V FLAGSREF  OFFSET PRO NAME
 9997 shcwd v d    -   - -   /root
 9997 sh   root v d    -   - -   /
 9997 sh  0 v c r---   1   0 -   /dev/null
 9997 sh  1 v c -w--   2   0 -   /dev/null
 9997 sh  2 v c -w--   2   0 -   /dev/null
icarus# procstat -f 10078
  PID COMM   FD T V FLAGSREF  OFFSET PRO NAME
10078 mysqldcwd v d    -   - -   /storage/mysql
10078 mysqld   root v d    -   - -   /
10078 mysqld  0 v c r---   1   0 -   /dev/null
10078 mysqld  1 v r rwa-   1   32048 -   
/storage/mysql/icarus.home.lan.err
10078 mysqld  2 v r rwa-   1   32380 -   
/storage/mysql/icarus.home.lan.err

At this point I log out of pts/1 and log back in to the machine (which
sticks me on pts/2 as a result of the problem).  Looking again, we see:

icarus# ps -aux -U mysql
USERPID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
mysql  9997  0.0  0.0  8228  1592   1- I11:38PM   0:00.01 /bin/sh 
/usr/local/bin/mysqld_safe --defaults-extra-file=/storage/mys
mysql 10078  0.0  0.3 35100 11032   1- I11:38PM   0:00.02 [mysqld]

With absolutely no change in procstat output relevant to fds 0/1/2.
Yet pts/1 still appears held open by something:

icarus# ls -l /dev/pts
total 0
crw--w  1 jdc   tty  0, 116 Sep 29 23:44 0
crw-rw-rw-  1 root  wheel0, 115 Sep 29 23:41 1
crw--w  1 jdc   tty  0, 117 Sep 29 23:44 2

fstat also shows no indication of anything using pts/1:

icarus# fstat /dev/pts/1
USER CMD  PID   FD MOUNT  INUM MODE SZ|DV R/W NAME
icarus# fstat | grep pts/1
icarus#

Ideas?

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mysqld_safe holding open a pty/tty on FreeBSD (7.x and 8.x)

2010-09-30 Thread Adam Vande More
On Thu, Sep 30, 2010 at 1:51 AM, Jeremy Chadwick
free...@jdc.parodius.comwrote:

 Something interesting I've come across which happens on both RELENG_7
 and RELENG_8 (indicating it's not a problem with the older tty code or
 the newer pty/pts code), and it's reproducible on Linux (sort of...).

 mysqld_safe appears to hold a pty/tty open even after the process has
 been backgrounded.  I can understand how/why this might occur, just not
 in this particular case.


Actually cam across this the other day:

http://lists.freebsd.org/pipermail/freebsd-ports/2010-July/062417.html

It appears you aren't the only one to notice the issue.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mysqld_safe holding open a pty/tty on FreeBSD (7.x and 8.x)

2010-09-30 Thread Ed Schouten
Hi Jeremy,

* Jeremy Chadwick free...@jdc.parodius.com wrote:
 1) mysqld_safe  /dev/null 21  never released the tty
 2) nohup mysqld_safe  /dev/null 21  did release the tty

What happens if you run the following command?

daemon -cf mysqld_safe

The point is that FreeBSD's pts(4) driver only deallocates TTYs when
it's really sure nothing uses it anymore. Even if there is not a single
file descriptor referring to the slave device, it has to wait until
there exist no processes which have the TTY as its controlling TTY.

The `pstat -t' command is quite useful to figure out whether there is
still a session associated with the TTY.

See the following thread:

http://lists.freebsd.org/pipermail/freebsd-ports/2010-July/062417.html

-- 
 Ed Schouten e...@80386.nl
 WWW: http://80386.nl/


pgpq5AlaJIXSZ.pgp
Description: PGP signature


Re: mysqld_safe holding open a pty/tty on FreeBSD (7.x and 8.x)

2010-09-30 Thread Jeremy Chadwick
On Thu, Sep 30, 2010 at 09:03:33AM +0200, Ed Schouten wrote:
 Hi Jeremy,
 
 * Jeremy Chadwick free...@jdc.parodius.com wrote:
  1) mysqld_safe  /dev/null 21  never released the tty
  2) nohup mysqld_safe  /dev/null 21  did release the tty
 
 What happens if you run the following command?
 
   daemon -cf mysqld_safe

Let's try it and find out.  This is all being done from pts/2.

icarus# ps -auxwww -U mysql | grep mysqld_safe
mysql9997  0.0  0.0  8228  1592   1- I11:38PM   0:00.01 /bin/sh 
/usr/local/bin/mysqld_safe --defaults-extra-file=/storage/mysql/my.cnf 
--user=mysql --datadir=/storage/mysql 
--pid-file=/storage/mysql/icarus.home.lan.pid --skip-innodb

icarus# /usr/local/etc/rc.d/mysql-server stop
Stopping mysql.
Waiting for PIDS: 10078.

icarus# daemon -c -f -u mysql /usr/local/bin/mysqld_safe 
--defaults-extra-file=/storage/mysql/my.cnf --user=mysql 
--datadir=/storage/mysql --pid-file=/storage/mysql/icarus.home.lan.pid 
--skip-innodb

icarus# ps -auxwww -U mysql
USERPID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
mysql 11036  0.0  0.0  8228  1600  ??  Is   12:21AM   0:00.01 /bin/sh 
/usr/local/bin/mysqld_safe --defaults-extra-file=/storage/mysql/my.cnf 
--user=mysql --datadir=/storage/mysql 
--pid-file=/storage/mysql/icarus.home.lan.pid --skip-innodb
mysql 6  0.0  0.3 35100 11032  ??  I12:21AM   0:00.02 [mysqld]

icarus# exit
$ exit

[another window, different tty]

icarus# pstat -t | grep pts/2
icarus#

Summary: looks good to me.

 The point is that FreeBSD's pts(4) driver only deallocates TTYs when
 it's really sure nothing uses it anymore. Even if there is not a single
 file descriptor referring to the slave device, it has to wait until
 there exist no processes which have the TTY as its controlling TTY.

Ah I see.  Well that would explain the difference between Linux and
FreeBSD then -- it sounds like Linux has a one-off with regards to fds
that point to /dev/null.

 The `pstat -t' command is quite useful to figure out whether there is
 still a session associated with the TTY.
 
 See the following thread:
 
   http://lists.freebsd.org/pipermail/freebsd-ports/2010-July/062417.html

Ahhh, two people pointing me to the same thread, sweet.  :-)  I wasn't
subscribed to -ports back in July, else I'd almost certainly have said
something then.

It's exactly as you stated in that thread -- the tty is in G state
(waiting to be freed/process to exist).  Please note the below output
was obtained *before* attempting the daemon -cf stuff you recommended.

icarus# pstat -t | grep pts/1
 pts/1 0000 000 0  9372 0 G

Until rc(8) can be updated to support daemon(8) natively, the ~76 ports
which Do The Wrong Thing(tm) should get updated to do it this way.  Ones
like mysqlXX-server should be placed high on the priority list given
their popularity/importance.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mysqld_safe holding open a pty/tty on FreeBSD (7.x and 8.x)

2010-09-30 Thread Jeremy Chadwick
On Thu, Sep 30, 2010 at 09:30:25AM +0200, Alex Dupre wrote:
 Jeremy Chadwick ha scritto:
  Until rc(8) can be updated to support daemon(8) natively,
 
 This would be the Right Thing IMHO.
 
  the ~76 ports
  which Do The Wrong Thing(tm) should get updated to do it this way.  Ones
  like mysqlXX-server should be placed high on the priority list given
  their popularity/importance.
 
 If you have an already tested patch for the mysql rc script, I'll commit
  it asap.

Just finished it for databases/mysql51-server.  Tested on RELENG_8 with
the below variables in use, and also tested with mysql_limits=yes.

mysql_enable=yes
mysql_dbdir=/storage/mysql
mysql_args=--skip-innodb

Should work fine on RELENG_7 since it has /usr/sbin/daemon too.

Tested using stop, start, and restart.  I can test a reboot if you'd
like, just let me know.  Validation:

icarus# /usr/local/etc/rc.d/mysql-server stop
Stopping mysql.
Waiting for PIDS: 12015.
icarus# /usr/local/etc/rc.d/mysql-server start
Starting mysql.
icarus# ps -auxwww -U mysql
USERPID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND
mysql 12271  0.0  0.0  8228  1600  ??  Is   12:53AM   0:00.01 /bin/sh 
/usr/local/bin/mysqld_safe --defaults-extra-file=/storage/mysql/my.cnf 
--user=mysql --datadir=/storage/mysql 
--pid-file=/storage/mysql/icarus.home.lan.pid --skip-innodb
mysql 12352  0.0  0.3 35100 11032  ??  I12:53AM   0:00.02 [mysqld]

I'll also take this opportunity to point this out, since I'm certain
someone will mention it: daemon's -u argument would be ideal except that
it breaks when using rc.subr's xxx_user variable (which uses su(1)
to change credentials/spawn $command).  With both in use, daemon then
fails on setusercontext(), which in turn fails because of initgroups()
returning EPERM -- and this does make sense.  So let's not use daemon -u
in rc.subr for the time being.

The diff is pretty obvious/simple (2 line change), so the other
databases/mysqlXX-server ports can be upgraded in the same manner.

--- files/mysql-server.sh.in.orig   2010-03-27 03:24:53.0 -0700
+++ files/mysql-server.sh.in2010-09-30 00:45:38.0 -0700
@@ -35,8 +35,8 @@
 mysql_user=mysql
 mysql_limits_args=-e -U ${mysql_user}
 pidfile=${mysql_dbdir}/`/bin/hostname`.pid
-command=%%PREFIX%%/bin/mysqld_safe
-command_args=--defaults-extra-file=${mysql_dbdir}/my.cnf --user=${mysql_user} 
--datadir=${mysql_dbdir} --pid-file=${pidfile} ${mysql_args}  /dev/null 21 
+command=/usr/sbin/daemon
+command_args=-c -f /usr/local/bin/mysqld_safe 
--defaults-extra-file=${mysql_dbdir}/my.cnf --user=${mysql_user} 
--datadir=${mysql_dbdir} --pid-file=${pidfile} ${mysql_args}
 procname=%%PREFIX%%/libexec/mysqld
 start_precmd=${name}_prestart
 start_postcmd=${name}_poststart

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mysqld_safe holding open a pty/tty on FreeBSD (7.x and 8.x)

2010-09-30 Thread Alex Dupre
Jeremy Chadwick ha scritto:
 Until rc(8) can be updated to support daemon(8) natively,

This would be the Right Thing IMHO.

 the ~76 ports
 which Do The Wrong Thing(tm) should get updated to do it this way.  Ones
 like mysqlXX-server should be placed high on the priority list given
 their popularity/importance.

If you have an already tested patch for the mysql rc script, I'll commit
 it asap.

-- 
Alex Dupre
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Diskless/readonly root booting issues

2010-09-30 Thread Daniel Braniss
 Hi all,
 
 I've been working on updating my semi-embedded images to
 7.3-stable of late (I generally wait for .3+ releases), it's been a
 few years since the last time I did one of these and I'm having some
 issues getting my netboot test environment to behave itself.
 
 I'm sure it's something simple but I've spent quite a bit of time
 looking for answers and poking the system but no joy yet.
 
 Basically I use a PXE booted NFS root to test my reduced footprint
 image builds, the boot is working but init is attempting to remount /
 rw (in spite of it being marked ro in fstab) which of course fails
 because the directory is exported ro from the NFS server at which
 point the system dumps me to single user mode;
 
 === OUTPUT ===
 
 Starting file system checks:
 udp: Netconfig database not found
 Mounting root filesystem rw failed, startup aborted
 ERROR: ABORTING BOOT (sending SIGTERM to parent)!
 Sep 30 09:60:02 init: /bin/sh on /etc/rc terminated abnormally, going
 to single user mode
 Enter full pathname of shell or RETURN for /bin/sh:
 
 
 
 Relevant configs from the diskless root
 
 == rc.conf ==
 
 ifconfig_le0=DHCP
 
 diskless_mount=/etc/rc.initdiskless
 
 varsize=8192
 varmfs=YES
 
 tmpsize=8192
 tmpmfs=YES
 
 nfs_client_enable=YES
 
 dumpdev=NO
 
 =
 
 rc.initdiskless is the version from /usr/share/examples/rc.initdiskless
 
 == fstab ==
 
 192.168.2.2:/usr/fbtest / nfs ro 0 0
 proc /proc procfs rw 0 0
 
 
 
 == loader.conf ==
 
 verbose_loading=YES
 
 autoboot_delay=2
 
 
 
 Kernel is (obviously) built with NFS_ROOT and NFSCLIENT, relatively
 minimalist otherwise, have also tested with GENERIC, same result.
 
 I must be forgetting something simple in all of this, I don't recall
 it being terribly difficult to get this stuff working when I was doing
 my original work with 6.3, though I don't recall the use of the
 initdiskless script, IIRC I was using rc.diskless2 which (again IIRC)
 was later replaced by /etc/rc.d/diskless but I've not been able to
 find this script anywhere.
 
 Any suggestions would be greatly appreciated at this point.
 
 Thanks,
 
 Morgan Reed

firstly, you should be using the latest pxeboot, it passes the root file-handle
to the kernel, so no need to remount it, so remove the line from the fstab.
secondly, try using /etc/rc.initdiskless - which is the default.
use the KISS method :-)

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


[releng_8_0 tinderbox] failure on ia64/ia64

2010-09-30 Thread FreeBSD Tinderbox
TB --- 2010-09-30 08:07:07 - tinderbox 2.6 running on freebsd-current.sentex.ca
TB --- 2010-09-30 08:07:07 - starting RELENG_8_0 tinderbox run for ia64/ia64
TB --- 2010-09-30 08:07:07 - cleaning the object tree
TB --- 2010-09-30 08:10:48 - cvsupping the source tree
TB --- 2010-09-30 08:10:48 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8_0/ia64/ia64/supfile
TB --- 2010-09-30 08:54:21 - WARNING: /usr/bin/csup returned exit code  1 
TB --- 2010-09-30 08:54:21 - ERROR: unable to cvsup the source tree
TB --- 2010-09-30 08:54:21 - 1.22 user 134.47 system 2833.60 real


http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8_0-ia64-ia64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


[releng_8_0 tinderbox] failure on mips/mips

2010-09-30 Thread FreeBSD Tinderbox
TB --- 2010-09-30 08:54:21 - tinderbox 2.6 running on freebsd-current.sentex.ca
TB --- 2010-09-30 08:54:21 - starting RELENG_8_0 tinderbox run for mips/mips
TB --- 2010-09-30 08:54:21 - cleaning the object tree
TB --- 2010-09-30 08:56:19 - cvsupping the source tree
TB --- 2010-09-30 08:56:19 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
/tinderbox/RELENG_8_0/mips/mips/supfile
TB --- 2010-09-30 09:39:03 - WARNING: /usr/bin/csup returned exit code  1 
TB --- 2010-09-30 09:39:03 - ERROR: unable to cvsup the source tree
TB --- 2010-09-30 09:39:03 - 0.80 user 76.49 system 2681.91 real


http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8_0-mips-mips.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mysqld_safe holding open a pty/tty on FreeBSD (7.x and 8.x)

2010-09-30 Thread Jeremy Chadwick
On Thu, Sep 30, 2010 at 08:53:07AM -0400, Paul Mather wrote:
 On Sep 30, 2010, at 3:56 AM, Jeremy Chadwick wrote:
 
  The diff is pretty obvious/simple (2 line change), so the other
  databases/mysqlXX-server ports can be upgraded in the same manner.
  
  --- files/mysql-server.sh.in.orig   2010-03-27 03:24:53.0 -0700
  +++ files/mysql-server.sh.in2010-09-30 00:45:38.0 -0700
  @@ -35,8 +35,8 @@
  mysql_user=mysql
  mysql_limits_args=-e -U ${mysql_user}
  pidfile=${mysql_dbdir}/`/bin/hostname`.pid
  -command=%%PREFIX%%/bin/mysqld_safe
  -command_args=--defaults-extra-file=${mysql_dbdir}/my.cnf 
  --user=${mysql_user} --datadir=${mysql_dbdir} --pid-file=${pidfile} 
  ${mysql_args}  /dev/null 21 
  +command=/usr/sbin/daemon
  +command_args=-c -f /usr/local/bin/mysqld_safe 
  --defaults-extra-file=${mysql_dbdir}/my.cnf --user=${mysql_user} 
  --datadir=${mysql_dbdir} --pid-file=${pidfile} ${mysql_args}
 
 Shouldn't this be -c -f %%PREFIX%%/bin/mysqld_safe ... rather than 
 hard-coding /usr/local?

Yes.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-30 Thread Don Lewis
On 30 Sep, Andriy Gapon wrote:
 on 30/09/2010 02:27 Don Lewis said the following:

 vmstat -i ?

I didn't see anything odd in the vmstat -i output that I posted to the list
earlier.  It looked more or less normal as the ntp offset suddenly went
insane.
 
 I did manage to catch the problem with lock profiling enabled:
 http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE_lock_profile_freeze.txt
 I'm currently testing SMP some more to verify if it really avoids this
 problem.
 
 OK.

I wasn't able to cause SMP on stable to break.

The silent reboots that I was seeing with WITNESS go away if I add
WITNESS_SKIPSPIN.  Witness doesn't complain about anything.

I tested -CURRENT and !SMP seems to work ok.  One difference in terms of
hardware between the two tests is that I'm using a SATA drive when
testing -STABLE and a SCSI drive when testing -CURRENT.

At this point, I think the biggest clues are going to be in the lock
profile results.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: RELENG_7 em problems (and RELENG_8)

2010-09-30 Thread Mike Tancsa

At 08:00 PM 9/26/2010, Jack Vogel wrote:

The system I've had stress tests running on has 82574 LOMs, so I hope it
will solve the problem, will see tomorrow morning at how things have held
up...


I pulled a copy of sys/dev/e1000 from HEAD and copied onto my 
RELENG_8 box. I had another nic lock up last night :(  Anyways, now 
running with the driver from HEAD on RELENG_8 amd64


em0: Intel(R) PRO/1000 Network Connection 7.0.8 port 0x4040-0x405f 
mem 0xb440-0xb441,0xb4425000-0xb4425fff irq 16 at device 25.0 on pci0

em0: Using an MSI interrupt
em0: [FILTER]
em0: Ethernet address: 00:15:17:ed:68:a5


em1: Intel(R) PRO/1000 Network Connection 7.0.8 port 0x2000-0x201f 
mem 0xb410-0xb411,0xb412-0xb4123fff irq 16 at device 0.0 on pci9

em1: Using MSIX interrupts with 3 vectors
em1: [ITHREAD]
em1: [ITHREAD]
em1: [ITHREAD]
em1: Ethernet address: 00:15:17:ed:68:a4

e...@pci0:0:25:0:class=0x02 card=0x34ec8086 
chip=0x10ef8086 rev=0x05 hdr=0x00

vendor = 'Intel Corporation'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 13[e0] = PCI Advanced Features: FLR TP

e...@pci0:9:0:0: class=0x02 card=0x34ec8086 chip=0x10d38086 
rev=0x00 hdr=0x00

vendor = 'Intel Corporation'
device = 'Intel 82574L Gigabit Ethernet Controller (82574L)'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected
ecap 0003[140] = Serial 1 001517ed68a4


interrupt  total   rate
irq4: uart0 2283  6
irq16: siis04332 11
irq18: arcmsr0137175372
irq19: twa018805 51
irq21: ehci02734  7
irq23: ehci1 675  1
cpu0: timer   733804   1994
irq256: em073195198
irq257: em1:rx 0 238  0
irq258: em1:tx 0  37  0
irq260: ahci0   4328 11
cpu1: timer   725637   1971
cpu3: timer   725709   1972
cpu2: timer   725688   1971
Total3154640   8572


---Mike


Jack


On Sun, Sep 26, 2010 at 4:43 PM, Mike Tancsa 
mailto:m...@sentex.netm...@sentex.net wrote:

At 06:19 PM 9/26/2010, Jack Vogel wrote:
Your em1 is using MSI not MSIX and thus can't have multiple queues. I'm
not sure whats broken from what you show here. I will try to get the new
driver out shortly for you to try.


With this particular NIC, it will wedge under high load.  I tried 2 
different motherboards and chipsets the same behaviour.


   ---Mike


Jack



On Sun, Sep 26, 2010 at 2:57 PM, Mike Tancsa 
mailto:m...@sentex.netmailto:m...@sentex.netm...@sentex.net wrote:

At 06:36 PM 9/24/2010, Jack Vogel wrote:
There is a new revision of the em driver coming next week, its going thru some
stress pounding over the weekend, if no issues show up I'll put it into HEAD.

Yongari's changes in TX context handling which effects checksum and tso
are added. I've also decided that multiple queues in 82574 just are a source
of problems without a lot of benefit, so it still uses MSIX but with 
only 3 vectors,

meaning it seperates TX and RX but has a single queue.


Thanks, looking forward to trying it out!  With respect to the 
multiple queues, I thought the driver already used just the one on 
RELENG_8 ?  If not, is there a way to force the existing driver to 
use just the one queue ?


On the box that has the NIC locking up, it shows

e...@pci0:9:0:0: class=0x02 card=0x34ec8086 chip=0x10d38086 
rev=0x00 hdr=0x00


  vendor = 'Intel Corporation'
  device = 'Intel 82574L Gigabit Ethernet Controller (82574L)'
  class  = network
  subclass   = ethernet
  cap 01[c8] = powerspec 2  supports D0 D3  current D0
  cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
  cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)

and

vmstat -i shows

irq256: em0  5129063353
irq257: em1   531251 36

in a wedged state, stats look like

dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.0.5
dev.em.1.%driver: em
dev.em.1.%location: slot=0 function=0 handle=\_SB_.PCI0.PEX4.HART
dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 
subdevice=0x34ec class=0x02

dev.em.1.%parent: pci9
dev.em.1.nvm: -1
dev.em.1.rx_int_delay: 0
dev.em.1.tx_int_delay: 66
dev.em.1.rx_abs_int_delay: 66

Re: fetch: Non-recoverable resolver failure

2010-09-30 Thread Miroslav Lachman

Jeremy Chadwick wrote:

On Tue, Sep 28, 2010 at 10:59:04PM +0200, Miroslav Lachman wrote:

Jeremy Chadwick wrote:

On Tue, Sep 28, 2010 at 08:12:00PM +0200, Miroslav Lachman wrote:

Hi,

we are using fetch command from cron to run PHP scripts periodically
and sometimes cron sends error e-mails like this:

fetch: https://hiden.example.com/cron/fiveminutes: Non-recoverable
resolver failure


[...]


Note: target domains are hosted on the server it-self and named too.

The system is FreeBSD 7.3-RELEASE-p2 i386 GENERIC

Can somebody help me to diagnose this random fetch+resolver issue?


[...]


There is PF with some basic rules, mostly blocking incomming
packets, allowing all outgoing and scrubbing:

scrub in on bge1 all fragment reassemble
scrub out on bge1 all no-df random-id min-ttl 24 max-mss 1492
fragment reassemble

pass out on bge1 inet proto udp all keep state
pass out on bge1 inet proto tcp from 1.2.3.40 to any flags S/SA
modulate state
pass out on bge1 inet proto tcp from 1.2.3.41 to any flags S/SA
modulate state
pass out on bge1 inet proto tcp from 1.2.3.42 to any flags S/SA
modulate state

modified PF options:

set timeout { frag 15, interval 5 }
set limit { frags 2500, states 5000 }
set optimization aggressive
set block-policy drop
set loginterface bge1
# Let loopback and internal interface traffic flow without restrictions
set skip on lo0


Please also provide pfctl -s info output, in addition to uname -a
output (you can hide the hostname), since the pf stack differs depending
on what FreeBSD version you're using.


# pfctl -s info
No ALTQ support in kernel
ALTQ related functions disabled
Status: Enabled for 32 days 11:31:02  Debug: Urgent

Interface Stats for bge1  IPv4 IPv6
  Bytes In 370643147870
  Bytes Out   2796338699760
  Packets In
Passed   2140574770
Blocked11801250
  Packets Out
Passed   2722667440
Blocked 1287770

State Table  Total Rate
  current entries  181
  searches   518860439  184.9/s
  inserts 166081725.9/s
  removals166079915.9/s
Counters
  match   179511316.4/s
  bad-offset 00.0/s
  fragment  230.0/s
  short  00.0/s
  normalize  40.0/s
  memory 00.0/s
  bad-timestamp  00.0/s
  congestion 00.0/s
  ip-option  00.0/s
  proto-cksum 30950.0/s
  state-mismatch 167070.0/s
  state-insert   00.0/s
  state-limit00.0/s
  src-limit  00.0/s
  synproxy   00.0/s


uname:
7.3-RELEASE-p2 FreeBSD 7.3-RELEASE-p2 #0: Mon Jul 12 19:04:04 UTC 2010 
   r...@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  i386



Things that catch my eye as potential problems -- I don't have a way to
confirm these are responsible for your issue (DNS resolver lookups are
UDP-based, not TCP), but I want to point them out anyway.

1) modulate state is broken on FreeBSD.  Taken from our pf.conf notes:

# Filtering (public interface only; see set skip)
#
# NOTE: Do not use modulate state, as it's known to be broken on FreeBSD.
# http://lists.freebsd.org/pipermail/freebsd-pf/2008-March/004227.html

2) optimization aggressive sounds dangerous given what pf.conf(5) says
about it.  I'd like to know what it considers idle.

3) I would also remove many of the options you have set in your scrub
out rule.  Starting with a clean slate to see if things improve is
probably a good idea.  As you'll see below, sometimes pf does things
which may be correct per IP specification but don't work quite right
with other vendors' IP stacks.

4) Your set timeout values look to be extreme.  I would recommend
leaving these at their defaults given your situation.

5) This feature is not in use in your pf.conf, but I want to point out
regardless.  reassemble tcp is also broken in some way.  Again taken
from our pf.conf notes:

# Normalization -- resolve/reduce traffic ambiguities.
#
# NOTE: Do NOT use 'reassemble tcp' as it definitely causes breakage.
# Issue may be related to other vendors' IP stacks, so let's leave it
# disabled.


Thank you for all your hints about PF! Maybe it's time to consider 
refactoring our standard pf.conf which was 

Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-30 Thread John Baldwin
On Tuesday, September 28, 2010 3:57:01 pm Vitaly Magerya wrote:
 Jung-uk Kim wrote:
  - the mouse doesn't work until I restart moused manually
  
  I always use hint.psm.0.flags=0x6000 in /boot/loader.conf, i.e., 
  turn on both HOOKRESUME and INITAFTERSUSPEND, to work around similar 
  problem on different laptop.
 
 Yes, that helps (after the stall period).
 
  Can you please report other problems in the appropriate ML?
  
  em -   freebsd-net@
  usb -  freebsd-usb@
  acpi_ec -  freebsd-acpi@
 
 I will try to do so.
 
 I'm not sure about acpi_ec issue though; it's only a warning, and it
 doesn't cause me any troubles.
 
 I also have this kernel message once in a few hours (seemingly random)
 if I used sleep/resume before:
 
   MCA: Bank 1, Status 0xe20001f5
   MCA: Global Cap 0x0005, Status 0x
   MCA: Vendor GenuineIntel, ID 0x695, APIC ID 0
   MCA: CPU 0 UNCOR PCC OVER DCACHE L1 ??? error
 
 But once again, it doesn't really cause any problems.

A true uncorrected machine check would trigger a MC# fault and panic.  I think 
this is just garbage in the MCx banks.  Are you running the latest 8-stable?  
The change to reset the banks on resume was MFC'd in r210509 on July 26.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: MCA messages in dmesg

2010-09-30 Thread John Baldwin
On Thursday, September 30, 2010 2:49:24 am Adam Vande More wrote:
 For awhile now, my home server has been acting up.  Actually it had a bad
 set of RAM long ago, replaced and it and worked fine.  It's been weird again
 now, and I've found this in dmesg:
 
 MCA: Bank 0, Status 0xf2000800
 MCA: Global Cap 0x0806, Status 0x
 MCA: Vendor GenuineIntel, ID 0x6fb, APIC ID 2
 MCA: CPU 2 UNCOR PCC OVER BUSL0 Source ERR Memory
 MCA: Bank 0, Status 0xf2000800
 MCA: Global Cap 0x0806, Status 0x
 MCA: Vendor GenuineIntel, ID 0x6fb, APIC ID 3
 MCA: CPU 3 UNCOR PCC OVER BUSL0 Source ERR Memory

Are you getting a panic when this happens?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-30 Thread Vitaly Magerya
John Baldwin wrote:
 A true uncorrected machine check would trigger a MC# fault and panic.  I 
 think 
 this is just garbage in the MCx banks.  Are you running the latest 8-stable?  

No, 8.1-RELEASE.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: MCA messages in dmesg

2010-09-30 Thread Adam Vande More
On Thu, Sep 30, 2010 at 8:40 AM, John Baldwin j...@freebsd.org wrote:

 On Thursday, September 30, 2010 2:49:24 am Adam Vande More wrote:
  For awhile now, my home server has been acting up.  Actually it had a bad
  set of RAM long ago, replaced and it and worked fine.  It's been weird
 again
  now, and I've found this in dmesg:
 
  MCA: Bank 0, Status 0xf2000800
  MCA: Global Cap 0x0806, Status 0x
  MCA: Vendor GenuineIntel, ID 0x6fb, APIC ID 2
  MCA: CPU 2 UNCOR PCC OVER BUSL0 Source ERR Memory
  MCA: Bank 0, Status 0xf2000800
  MCA: Global Cap 0x0806, Status 0x
  MCA: Vendor GenuineIntel, ID 0x6fb, APIC ID 3
  MCA: CPU 3 UNCOR PCC OVER BUSL0 Source ERR Memory

 Are you getting a panic when this happens?


It's symptoms vary, but yes I think so.  The box is headless, so I depend on
logs after boot to see what happens.  Sometimes the box panics and powers
off with no warning, and other times it just seems to hit a stall state
where everything become unresponsive and I have to manually power off.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-30 Thread John Baldwin
On Thursday, September 30, 2010 10:53:14 am Vitaly Magerya wrote:
 John Baldwin wrote:
  A true uncorrected machine check would trigger a MC# fault and panic.  I 
  think 
  this is just garbage in the MCx banks.  Are you running the latest 
  8-stable?  
 
 No, 8.1-RELEASE.

Ok, that almost certainly explains it then.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: MCA messages in dmesg

2010-09-30 Thread John Baldwin
On Thursday, September 30, 2010 12:33:24 pm Adam Vande More wrote:
 On Thu, Sep 30, 2010 at 8:40 AM, John Baldwin j...@freebsd.org wrote:
 
  On Thursday, September 30, 2010 2:49:24 am Adam Vande More wrote:
   For awhile now, my home server has been acting up.  Actually it had a bad
   set of RAM long ago, replaced and it and worked fine.  It's been weird
  again
   now, and I've found this in dmesg:
  
   MCA: Bank 0, Status 0xf2000800
   MCA: Global Cap 0x0806, Status 0x
   MCA: Vendor GenuineIntel, ID 0x6fb, APIC ID 2
   MCA: CPU 2 UNCOR PCC OVER BUSL0 Source ERR Memory
   MCA: Bank 0, Status 0xf2000800
   MCA: Global Cap 0x0806, Status 0x
   MCA: Vendor GenuineIntel, ID 0x6fb, APIC ID 3
   MCA: CPU 3 UNCOR PCC OVER BUSL0 Source ERR Memory
 
  Are you getting a panic when this happens?
 
 
 It's symptoms vary, but yes I think so.  The box is headless, so I depend on
 logs after boot to see what happens.  Sometimes the box panics and powers
 off with no warning, and other times it just seems to hit a stall state
 where everything become unresponsive and I have to manually power off.

Ok, it is a memory error of some sort, but mcelog claims it is a transaction
timeout rather than an ECC error, per se:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 BANK 0 
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access 
Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
STATUS f2000800 MCGSTATUS 0
MCGCAP 806 APICID 2 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 15
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 BANK 0 
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access 
Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
STATUS f2000800 MCGSTATUS 0
MCGCAP 806 APICID 3 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 15

I've no idea what specific hardware is busted (memory or motherboard or CPU),
but I suspect something is likely broken.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: MCA messages in dmesg

2010-09-30 Thread Adam Vande More
On Thu, Sep 30, 2010 at 12:25 PM, John Baldwin j...@freebsd.org wrote:

 Ok, it is a memory error of some sort, but mcelog claims it is a
 transaction
 timeout rather than an ECC error, per se:
 snip

 I've no idea what specific hardware is busted (memory or motherboard or
 CPU),
 but I suspect something is likely broken.


Thanks for looking into it, I'm going to play around with BIOS voltages to
see if I can achieve some stability since I don't have much to lose trying
that first.  The system may work fine for a week or more, then have a really
bad day.  I've made some raises to the cpu voltage and we'll see how that
goes.


-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problem running 8.1R on KVM with AMD hosts

2010-09-30 Thread Luke Marsden
Hi FreeBSD-stable,

  1. Please, build your kernel with debug symbols.
  2. Show kgdb output

I could not convince the kernel to dump (it was looping forever but not
panicing), but I have managed to compiled a kernel with debugging
symbols and DDB which immediately drops into the debugger when the
problem occurs, see screenshot at:

http://lukemarsden.net/kvm-panic.png

Progress, I sense.

I tried typing 'panic' on the understanding that this should force a
panic and cause it would dump core to the configured swap device (I have
set dump* in /etc/rc.conf) so that I could get you the kgdb output, but
it just looped back into the debugger. This issue seems to occur very
early in the boot process.

I would like to invite anyone with the skills and the inclination to
have a poke around with this directly over VNC to email me off-list and
I will turn on the VM and send you the VNC credentials. My email address
is: luke [at] hybrid-logic.co.uk

Or you can catch me on Skype at luke.marsden. I'm in GMT+1.

I look forward to hearing from you ;-)

-- 
Best Regards,
Luke Marsden
Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS

Mobile: +447791750420


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problem running 8.1R on KVM with AMD hosts

2010-09-30 Thread Jung-uk Kim
On Thursday 30 September 2010 02:57 pm, Luke Marsden wrote:
 Hi FreeBSD-stable,

   1. Please, build your kernel with debug symbols.
   2. Show kgdb output

 I could not convince the kernel to dump (it was looping forever but
 not panicing), but I have managed to compiled a kernel with
 debugging symbols and DDB which immediately drops into the debugger
 when the problem occurs, see screenshot at:

 http://lukemarsden.net/kvm-panic.png

It seems MCA capability is advertised by the CPUID translator but 
writing to the MSRs causes GPF.  In other words, it seems like a CPU 
emulator bug.  A simple workaround is 'set hw.mca.enabled=0' from the 
loader prompt.  If it works, add hw.mca.enabled=0 
in /boot/loader.conf to make it permanent.  MCA does not make any 
sense in emulation any way.

Jung-uk Kim
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problem running 8.1R on KVM with AMD hosts

2010-09-30 Thread Jeremy Chadwick
On Thu, Sep 30, 2010 at 07:57:51PM +0100, Luke Marsden wrote:
 Hi FreeBSD-stable,
 
   1. Please, build your kernel with debug symbols.
   2. Show kgdb output
 
 I could not convince the kernel to dump (it was looping forever but not
 panicing), but I have managed to compiled a kernel with debugging
 symbols and DDB which immediately drops into the debugger when the
 problem occurs, see screenshot at:
 
 http://lukemarsden.net/kvm-panic.png
 
 Progress, I sense.
 
 I tried typing 'panic' on the understanding that this should force a
 panic and cause it would dump core to the configured swap device (I have
 set dump* in /etc/rc.conf) so that I could get you the kgdb output, but
 it just looped back into the debugger.

Try call doadump instead.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mysqld_safe holding open a pty/tty on FreeBSD (7.x and 8.x)

2010-09-30 Thread Paul Mather
On Sep 30, 2010, at 3:56 AM, Jeremy Chadwick wrote:

 The diff is pretty obvious/simple (2 line change), so the other
 databases/mysqlXX-server ports can be upgraded in the same manner.
 
 --- files/mysql-server.sh.in.orig 2010-03-27 03:24:53.0 -0700
 +++ files/mysql-server.sh.in  2010-09-30 00:45:38.0 -0700
 @@ -35,8 +35,8 @@
 mysql_user=mysql
 mysql_limits_args=-e -U ${mysql_user}
 pidfile=${mysql_dbdir}/`/bin/hostname`.pid
 -command=%%PREFIX%%/bin/mysqld_safe
 -command_args=--defaults-extra-file=${mysql_dbdir}/my.cnf 
 --user=${mysql_user} --datadir=${mysql_dbdir} --pid-file=${pidfile} 
 ${mysql_args}  /dev/null 21 
 +command=/usr/sbin/daemon
 +command_args=-c -f /usr/local/bin/mysqld_safe 
 --defaults-extra-file=${mysql_dbdir}/my.cnf --user=${mysql_user} 
 --datadir=${mysql_dbdir} --pid-file=${pidfile} ${mysql_args}

Shouldn't this be -c -f %%PREFIX%%/bin/mysqld_safe ... rather than 
hard-coding /usr/local?

 procname=%%PREFIX%%/libexec/mysqld
 start_precmd=${name}_prestart
 start_postcmd=${name}_poststart
 
 -- 
 | Jeremy Chadwick   j...@parodius.com |
 | Parodius Networking   http://www.parodius.com/ |
 | UNIX Systems Administrator  Mountain View, CA, USA |
 | Making life hard for others since 1977.  PGP: 4BD6C0CB |

Cheers,

Paul.___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problem running 8.1R on KVM with AMD hosts

2010-09-30 Thread Luke Marsden
On Thu, 2010-09-30 at 18:55 -0400, Jung-uk Kim wrote:
 It seems MCA capability is advertised by the CPUID translator but 
 writing to the MSRs causes GPF.  In other words, it seems like a CPU 
 emulator bug.  A simple workaround is 'set hw.mca.enabled=0' from the 
 loader prompt.  If it works, add hw.mca.enabled=0 
 in /boot/loader.conf to make it permanent.  MCA does not make any 
 sense in emulation any way.

Awesome, this allows us to boot 8.1R on Linux KVM with AMD hardware!

Thank you very much. This has just doubled our number of availability
zones.

-- 
Best Regards,
Luke Marsden
Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting based on FreeBSD and ZFS

Mobile: +447791750420

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-30 Thread Don Lewis
On 30 Sep, Andriy Gapon wrote:
 on 30/09/2010 02:27 Don Lewis said the following:

 I tried enabling apic and got worse results.  I saw ping RTTs as high as
 67 seconds.  Here's the timer info with apic enabled:
[snip]
 Here's the verbose boot info with apic:
 http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-apic-verbose.txt
 
 vmstat -i ?

Here's the vmstat -i output at the time the machine starts experiencing
freezes and ntp goes insane:

Thu Sep 30 11:38:57 PDT 2010
interrupt  total   rate
irq1: atkbd0   6  0
irq9: acpi0   10  0
irq12: psm0   18  0
irq14: ata0 2845  1
irq17: ahc0  310  0
irq19: fwohci0 1  0
irq22: ehci0+  74628 40
cpu0: timer  3676399   1999
irq256: nfe03915  2
Total3758132   2043
 remote   refid  st t when poll reach   delay   offset  jitter
==
*gw.catspoiler.o .GPS.1 u  129  128  3770.185   -0.307   0.020

Thu Sep 30 11:39:59 PDT 2010
interrupt  total   rate
irq1: atkbd0   6  0
irq9: acpi0   10  0
irq12: psm0   18  0
irq14: ata0 2935  1
irq17: ahc0  310  0
irq19: fwohci0 1  0
irq22: ehci0+  78954 41
cpu0: timer  3796447   1998
irq256: nfe04090  2
Total3882771   2043
 remote   refid  st t when poll reach   delay   offset  jitter
==
*gw.catspoiler.o .GPS.1 u   61  128  3770.185   -0.307   0.023

Thu Sep 30 11:40:59 PDT 2010
interrupt  total   rate
irq1: atkbd0   6  0
irq9: acpi0   10  0
irq12: psm0   18  0
irq14: ata0 3025  1
irq17: ahc0  310  0
irq19: fwohci0 1  0
irq22: ehci0+  85038 43
cpu0: timer  3916483   1998
irq256: nfe04247  2
Total4009138   2045
 remote   refid  st t when poll reach   delay   offset  jitter
==
*gw.catspoiler.o .GPS.1 u  121  128  3770.185   -0.307   0.023

Thu Sep 30 11:41:59 PDT 2010
interrupt  total   rate
irq1: atkbd0   6  0
irq9: acpi0   10  0
irq12: psm0   18  0
irq14: ata0 3115  1
irq17: ahc0  310  0
irq19: fwohci0 1  0
irq22: ehci0+  89099 44
cpu0: timer  4036529   1998
irq256: nfe04384  2
Total4133472   2046
 remote   refid  st t when poll reach   delay   offset  jitter
==
*gw.catspoiler.o .GPS.1 u   54  128  3770.185   -0.307 43008.9

Thu Sep 30 11:42:59 PDT 2010
interrupt  total   rate
irq1: atkbd0   6  0
irq9: acpi0   11  0
irq12: psm0   18  0
irq14: ata0 3205  1
irq17: ahc0  310  0
irq19: fwohci0 1  0
irq22: ehci0+  92111 44
cpu0: timer  4156575   1998
irq256: nfe04421  2
Total4256658   2046
 remote   refid  st t when poll reach   delay   offset  jitter
==
*gw.catspoiler.o .GPS.1 u  114  128  3770.185   -0.307 43008.9

Thu Sep 30 11:43:59 PDT 2010
interrupt  total   rate
irq1: atkbd0   6  0
irq9: acpi0   12  0
irq12: psm0   18  0
irq14: ata0 3295  1
irq17: ahc0  310  0
irq19: