[EMAIL PROTECTED] wrote:
> Hi all,
>
> I am seeking information about what this and other similar messages
> mean, and corrective action to take. At the time of the error
> message, the machine spontaneously rebooted (apparently without panic
> ) and came back with a corrupt /var filesystem (to which fsck
> required manuall intervention to recover).
>
> The machine is a dual Xeon ASUS NCCH-DL board with 4 GB of ram,
> running 6.0 STABLE Thu Dec 222 18:24:2005, and has otherwise been
> reliable. The machine was placed into test as a secondary mail
> server, seeded with dictionary-attack accounts and allowed to collect
> UCE and ratware at will, as a test for SpamAssassin and MIMEDefang. (
> Also makes a goot test for a pf-spamd teergrube.)
>
> md2 is a 512mB memory disk mounted on /var/spool/MIMEDefang, to allow
> quick scanning with less hardware disk IO. The main hardware drive
> controller is a 3ware 4 port SATA controller in raid mirror mode.
>
> Googling on this vfs_done() seems to show various similar requests
> for information related to other circumstances but no paresable
> responses. (I dont *think* md2 was ever *full*.) I can read code..
> but.. Geez, filesystem code... Echh. Clue-stick -> manpage welcome
> here. Thanks.
>
> Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=434585600,
> length=131072)]error = 28
> Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=434716672,
> length=131072)]error = 28
> Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=434847744,
> length=131072)]error = 28
> Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=434978816,
> length=131072)]error = 28
> Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=435109888,
> length=131072)]error = 28
> Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=435240960,
> length=131072)]error = 28
>
Did you get a kernel dump after the reboot? If you did, and you generated a
backtrace as described here:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html
I reckon you'ld see that it panic'd with 'kmem_map too small':
#0 doadump () at pcpu.h:165
#1 0xc063ce7f in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2 0xc063d1a5 in panic (fmt=0xc0888692 "kmem_malloc(%ld): kmem_map too small: %
ld total allocated") at /usr/src/sys/kern/kern_shutdown.c:555
#3 0xc07aa349 in kmem_malloc (map=0xc10600c0, size=16384, flags=1026) at /usr/s
rc/sys/vm/vm_kern.c:299
#4 0xc07a1c72 in page_alloc (zone=0x0, bytes=16384, pflag=0x0, wait=1026) at /u
sr/src/sys/vm/uma_core.c:957
[etc...]
It's a bug -- the VM system seems to starve the memory disk of pages, causing
a crash. See the example given at the end of
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/87255
The ultimate cause would be running a bunch of programs that are heavy on
the memory requirements, and running out of memory for both them and the
malloc backed memory filesystem. See mdconfig(8) -- as it says:
malloc Storage for this type of memory disk is allocated with
malloc(9). This limits the size to the malloc bucket
limit in the kernel. If the -o reserve option is not
set, creating and filling a large malloc-backed memory
disk is a very easy way to panic a system.
Hence using '-o reserve' looks like a very good thing to try. Alternatively
use a swap backed memory disk, or don't use a memory disk at all.
Cheers,
Matthew
--
Dr Matthew J Seaman MA, D.Phil. Flat 3
7 Priory Courtyard
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
Kent, CT11 9PW, UK
signature.asc
Description: OpenPGP digital signature