[EMAIL PROTECTED] wrote: > Hi all, > > I am seeking information about what this and other similar messages > mean, and corrective action to take. At the time of the error > message, the machine spontaneously rebooted (apparently without panic > ) and came back with a corrupt /var filesystem (to which fsck > required manuall intervention to recover). > > The machine is a dual Xeon ASUS NCCH-DL board with 4 GB of ram, > running 6.0 STABLE Thu Dec 222 18:24:2005, and has otherwise been > reliable. The machine was placed into test as a secondary mail > server, seeded with dictionary-attack accounts and allowed to collect > UCE and ratware at will, as a test for SpamAssassin and MIMEDefang. ( > Also makes a goot test for a pf-spamd teergrube.) > > md2 is a 512mB memory disk mounted on /var/spool/MIMEDefang, to allow > quick scanning with less hardware disk IO. The main hardware drive > controller is a 3ware 4 port SATA controller in raid mirror mode. > > Googling on this vfs_done() seems to show various similar requests > for information related to other circumstances but no paresable > responses. (I dont *think* md2 was ever *full*.) I can read code.. > but.. Geez, filesystem code... Echh. Clue-stick -> manpage welcome > here. Thanks. > > Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=434585600, > length=131072)]error = 28 > Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=434716672, > length=131072)]error = 28 > Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=434847744, > length=131072)]error = 28 > Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=434978816, > length=131072)]error = 28 > Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=435109888, > length=131072)]error = 28 > Feb 8 13:48:59 testbed kernel: g_vfs_done():md2[WRITE(offset=435240960, > length=131072)]error = 28 >
Did you get a kernel dump after the reboot? If you did, and you generated a backtrace as described here: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html I reckon you'ld see that it panic'd with 'kmem_map too small': #0 doadump () at pcpu.h:165 #1 0xc063ce7f in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xc063d1a5 in panic (fmt=0xc0888692 "kmem_malloc(%ld): kmem_map too small: % ld total allocated") at /usr/src/sys/kern/kern_shutdown.c:555 #3 0xc07aa349 in kmem_malloc (map=0xc10600c0, size=16384, flags=1026) at /usr/s rc/sys/vm/vm_kern.c:299 #4 0xc07a1c72 in page_alloc (zone=0x0, bytes=16384, pflag=0x0, wait=1026) at /u sr/src/sys/vm/uma_core.c:957 [etc...] It's a bug -- the VM system seems to starve the memory disk of pages, causing a crash. See the example given at the end of http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/87255 The ultimate cause would be running a bunch of programs that are heavy on the memory requirements, and running out of memory for both them and the malloc backed memory filesystem. See mdconfig(8) -- as it says: malloc Storage for this type of memory disk is allocated with malloc(9). This limits the size to the malloc bucket limit in the kernel. If the -o reserve option is not set, creating and filling a large malloc-backed memory disk is a very easy way to panic a system. Hence using '-o reserve' looks like a very good thing to try. Alternatively use a swap backed memory disk, or don't use a memory disk at all. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. Flat 3 7 Priory Courtyard PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate Kent, CT11 9PW, UK
signature.asc
Description: OpenPGP digital signature