Re: 7.1-RELEASE I/O hang
On Thu, Feb 05, 2009 at 12:46:23PM +, Matt Burke wrote: Kostik Belousov wrote: Compile ddb into the kernel, and do ps from the ddb prompt. If there are processes hung in the nbufkv state, then the patch below might help. The bonnie++ processes are in state newbuf and other hung processes (bash, newly forked sshds, etc) appear to be in the ufs state. What is the state of the bufdaemon process ? qsleep Please, increase the value that is assigned to the target variable in the line 2193 of the patched sys/kern/vfs_bio.c from 1 to, say, 10 or 100. pgpEFGQSCV6gj.pgp Description: PGP signature
Re: 7.1-RELEASE I/O hang
Kostik Belousov wrote: Compile ddb into the kernel, and do ps from the ddb prompt. If there are processes hung in the nbufkv state, then the patch below might help. The bonnie++ processes are in state newbuf and other hung processes (bash, newly forked sshds, etc) appear to be in the ufs state. The patch appears to have no effect, although at the last hang I did see one of the bonnie++ processes in nbufkv state. This could be coincidental. The problem also exhibits itself when running a parallel bonnie++ on a single array, both with the onboard PERC6/i and the PERC6/e. I have no access to other controllers. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.1-RELEASE I/O hang
On Thu, Feb 05, 2009 at 11:26:58AM +, Matt Burke wrote: Kostik Belousov wrote: Compile ddb into the kernel, and do ps from the ddb prompt. If there are processes hung in the nbufkv state, then the patch below might help. The bonnie++ processes are in state newbuf and other hung processes (bash, newly forked sshds, etc) appear to be in the ufs state. What is the state of the bufdaemon process ? The patch appears to have no effect, although at the last hang I did see one of the bonnie++ processes in nbufkv state. This could be coincidental. The problem also exhibits itself when running a parallel bonnie++ on a single array, both with the onboard PERC6/i and the PERC6/e. I have no access to other controllers. pgprBwwjMdLSh.pgp Description: PGP signature
Re: 7.1-RELEASE I/O hang
Kostik Belousov wrote: Compile ddb into the kernel, and do ps from the ddb prompt. If there are processes hung in the nbufkv state, then the patch below might help. The bonnie++ processes are in state newbuf and other hung processes (bash, newly forked sshds, etc) appear to be in the ufs state. What is the state of the bufdaemon process ? qsleep -- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.1-RELEASE I/O hang
On Wed, Feb 04, 2009 at 12:46:53PM +, Matt Burke wrote: I have a machine with a PERC6/e controller. Attached to that are 3 disk shelves, each configured as individual 14-disk RAID10 arrays (the PERC annoyingly only lets you use 8 spans per array) I can run bonnie++ on the arrays individually with no problem. I can also run it across a gstripe of the arrays with no problem. However running it over the 3 arrays in parallel causes something I/O related in the kernel to hang. To define 'hang' better: It appears anything which needs disk io, even on a different controller (albeit the same mfi driver), will hang. A command like 'ps' cached in ram will work but bash hangs after execution, presumably while trying to write ~/.bash_history 'sysctl -a' works but trying to run 'sysctl kern.msgbuf' also hangs I've done some research and it seems the usual cause of bonnie++ crashing a system is due to overflowing TCQ. camcontrol doesn't see any disks, so I've tried setting hw.mfi.max_cmds=32 in /boot/loader.conf but it hadn't made any difference. The bonnie++ invocation is this: (newfs devices mfid[2-3], mount) bonnie++ -s 64g -u root -p3 bonnie++ -d /data/2 -s 64g -u root -y s b2 21 bonnie++ -d /data/3 -s 64g -u root -y s b3 21 bonnie++ -d /data/4 -s 64g -u root -y s b4 21 and it always hangs on Rewriting It's a fresh 7.1-RELEASE with nothing else running (devd, sshd, syslogd, etc) Any ideas? Compile ddb into the kernel, and do ps from the ddb prompt. If there are processes hung in the nbufkv state, then the patch below might help. Index: gnu/fs/xfs/FreeBSD/xfs_buf.c === --- gnu/fs/xfs/FreeBSD/xfs_buf.c(revision 188080) +++ gnu/fs/xfs/FreeBSD/xfs_buf.c(working copy) @@ -81,7 +81,7 @@ { struct buf *bp; - bp = geteblk(0); + bp = geteblk(0, 0); if (bp != NULL) { bp-b_bufsize = size; bp-b_bcount = size; @@ -101,7 +101,7 @@ if (len = MAXPHYS) return (NULL); - bp = geteblk(len); + bp = geteblk(len, 0); if (bp != NULL) { KASSERT(BUF_REFCNT(bp) == 1, (xfs_buf_get_empty: bp %p not locked,bp)); Index: ufs/ffs/ffs_vfsops.c === --- ufs/ffs/ffs_vfsops.c(revision 188080) +++ ufs/ffs/ffs_vfsops.c(working copy) @@ -1747,7 +1747,9 @@ (bufwrite: needs chained iodone (%p), bp-b_iodone)); /* get a new block */ - newbp = geteblk(bp-b_bufsize); + newbp = geteblk(bp-b_bufsize, GB_NOWAIT_BD); + if (newbp == NULL) + goto normal_write; /* * set it to be identical to the old block. We have to @@ -1787,6 +1789,7 @@ } /* Let the normal bufwrite do the rest for us */ +normal_write: return (bufwrite(bp)); } Index: kern/vfs_bio.c === --- kern/vfs_bio.c (revision 188080) +++ kern/vfs_bio.c (working copy) @@ -105,7 +105,8 @@ static void vfs_vmio_release(struct buf *bp); static int vfs_bio_clcheck(struct vnode *vp, int size, daddr_t lblkno, daddr_t blkno); -static int flushbufqueues(int, int); +static int buf_do_flush(struct vnode *vp); +static int flushbufqueues(struct vnode *, int, int); static void buf_daemon(void); static void bremfreel(struct buf *bp); @@ -258,6 +259,7 @@ #define QUEUE_DIRTY_GIANT 3/* B_DELWRI buffers that need giant */ #define QUEUE_EMPTYKVA 4 /* empty buffer headers w/KVA assignment */ #define QUEUE_EMPTY5 /* empty buffer headers */ +#define QUEUE_SENTINEL 1024/* not an queue index, but mark for sentinel */ /* Queues for free buffers with various properties */ static TAILQ_HEAD(bqueues, buf) bufqueues[BUFFER_QUEUES] = { { 0 } }; @@ -1703,21 +1705,23 @@ */ static struct buf * -getnewbuf(int slpflag, int slptimeo, int size, int maxsize) +getnewbuf(struct vnode *vp, int slpflag, int slptimeo, int size, int maxsize, +int gbflags) { + struct thread *td; struct buf *bp; struct buf *nbp; int defrag = 0; int nqindex; static int flushingbufs; + td = curthread; /* * We can't afford to block since we might be holding a vnode lock, * which may prevent system daemons from running. We deal with * low-memory situations by proactively returning memory and running * async I/O rather then sync I/O. */ - atomic_add_int(getnewbufcalls, 1); atomic_subtract_int(getnewbufrestarts, 1); restart: @@ -1949,8 +1953,9 @@ */ if (bp == NULL) { - int flags; + int flags, norunbuf; char *waitmsg; + int
7.1-RELEASE I/O hang
I have a machine with a PERC6/e controller. Attached to that are 3 disk shelves, each configured as individual 14-disk RAID10 arrays (the PERC annoyingly only lets you use 8 spans per array) I can run bonnie++ on the arrays individually with no problem. I can also run it across a gstripe of the arrays with no problem. However running it over the 3 arrays in parallel causes something I/O related in the kernel to hang. To define 'hang' better: It appears anything which needs disk io, even on a different controller (albeit the same mfi driver), will hang. A command like 'ps' cached in ram will work but bash hangs after execution, presumably while trying to write ~/.bash_history 'sysctl -a' works but trying to run 'sysctl kern.msgbuf' also hangs I've done some research and it seems the usual cause of bonnie++ crashing a system is due to overflowing TCQ. camcontrol doesn't see any disks, so I've tried setting hw.mfi.max_cmds=32 in /boot/loader.conf but it hadn't made any difference. The bonnie++ invocation is this: (newfs devices mfid[2-3], mount) bonnie++ -s 64g -u root -p3 bonnie++ -d /data/2 -s 64g -u root -y s b2 21 bonnie++ -d /data/3 -s 64g -u root -y s b3 21 bonnie++ -d /data/4 -s 64g -u root -y s b4 21 and it always hangs on Rewriting It's a fresh 7.1-RELEASE with nothing else running (devd, sshd, syslogd, etc) Any ideas? -- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org