Re: 7.1-RELEASE I/O hang

2009-02-06 Thread Kostik Belousov
On Thu, Feb 05, 2009 at 12:46:23PM +, Matt Burke wrote:
 Kostik Belousov wrote:
  Compile ddb into the kernel, and do ps from the ddb prompt. If there
  are processes hung in the nbufkv state, then the patch below might
  help.
  The bonnie++ processes are in state newbuf and other hung processes
  (bash, newly forked sshds, etc) appear to be in the ufs state.
  What is the state of the bufdaemon process ?
 
 qsleep

Please, increase the value that is assigned to the target variable in the
line 2193 of the patched sys/kern/vfs_bio.c from 1 to, say, 10 or 100.


pgpEFGQSCV6gj.pgp
Description: PGP signature


Re: 7.1-RELEASE I/O hang

2009-02-05 Thread Matt Burke
Kostik Belousov wrote:
 Compile ddb into the kernel, and do ps from the ddb prompt. If there
 are processes hung in the nbufkv state, then the patch below might
 help.

The bonnie++ processes are in state newbuf and other hung processes
(bash, newly forked sshds, etc) appear to be in the ufs state.

The patch appears to have no effect, although at the last hang I did see
one of the bonnie++ processes in nbufkv state. This could be coincidental.


The problem also exhibits itself when running a parallel bonnie++ on a
single array, both with the onboard PERC6/i and the PERC6/e. I have no
access to other controllers.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.1-RELEASE I/O hang

2009-02-05 Thread Kostik Belousov
On Thu, Feb 05, 2009 at 11:26:58AM +, Matt Burke wrote:
 Kostik Belousov wrote:
  Compile ddb into the kernel, and do ps from the ddb prompt. If there
  are processes hung in the nbufkv state, then the patch below might
  help.
 
 The bonnie++ processes are in state newbuf and other hung processes
 (bash, newly forked sshds, etc) appear to be in the ufs state.
What is the state of the bufdaemon process ?

 
 The patch appears to have no effect, although at the last hang I did see
 one of the bonnie++ processes in nbufkv state. This could be coincidental.
 
 
 The problem also exhibits itself when running a parallel bonnie++ on a
 single array, both with the onboard PERC6/i and the PERC6/e. I have no
 access to other controllers.


pgprBwwjMdLSh.pgp
Description: PGP signature


Re: 7.1-RELEASE I/O hang

2009-02-05 Thread Matt Burke
Kostik Belousov wrote:
 Compile ddb into the kernel, and do ps from the ddb prompt. If there
 are processes hung in the nbufkv state, then the patch below might
 help.
 The bonnie++ processes are in state newbuf and other hung processes
 (bash, newly forked sshds, etc) appear to be in the ufs state.
 What is the state of the bufdaemon process ?

qsleep


-- 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.1-RELEASE I/O hang

2009-02-04 Thread Kostik Belousov
On Wed, Feb 04, 2009 at 12:46:53PM +, Matt Burke wrote:
 I have a machine with a PERC6/e controller. Attached to that are 3 disk
 shelves, each configured as individual 14-disk RAID10 arrays (the PERC
 annoyingly only lets you use 8 spans per array)
 
 I can run bonnie++ on the arrays individually with no problem.
 I can also run it across a gstripe of the arrays with no problem.
 
 However running it over the 3 arrays in parallel causes something I/O
 related in the kernel to hang.
 
 To define 'hang' better:
 
 It appears anything which needs disk io, even on a different controller
 (albeit the same mfi driver), will hang. A command like 'ps' cached in
 ram will work but bash hangs after execution, presumably while trying to
 write ~/.bash_history
 
 'sysctl -a' works but trying to run 'sysctl kern.msgbuf' also hangs
 
 I've done some research and it seems the usual cause of bonnie++
 crashing a system is due to overflowing TCQ. camcontrol doesn't see any
 disks, so I've tried setting hw.mfi.max_cmds=32 in /boot/loader.conf but
 it hadn't made any difference.
 
 The bonnie++ invocation is this:
 
 (newfs devices mfid[2-3], mount)
 bonnie++ -s 64g -u root -p3
 bonnie++ -d /data/2 -s 64g -u root -y s b2 21 
 bonnie++ -d /data/3 -s 64g -u root -y s b3 21 
 bonnie++ -d /data/4 -s 64g -u root -y s b4 21 
 
 and it always hangs on Rewriting It's a fresh 7.1-RELEASE with
 nothing else running (devd, sshd, syslogd, etc)
 
 
 Any ideas?

Compile ddb into the kernel, and do ps from the ddb prompt. If there
are processes hung in the nbufkv state, then the patch below might
help.

Index: gnu/fs/xfs/FreeBSD/xfs_buf.c
===
--- gnu/fs/xfs/FreeBSD/xfs_buf.c(revision 188080)
+++ gnu/fs/xfs/FreeBSD/xfs_buf.c(working copy)
@@ -81,7 +81,7 @@
 {
struct buf *bp;
 
-   bp = geteblk(0);
+   bp = geteblk(0, 0);
if (bp != NULL) {
bp-b_bufsize = size;
bp-b_bcount = size;
@@ -101,7 +101,7 @@
if (len = MAXPHYS)
return (NULL);
 
-   bp = geteblk(len);
+   bp = geteblk(len, 0);
if (bp != NULL) {
KASSERT(BUF_REFCNT(bp) == 1,
(xfs_buf_get_empty: bp %p not locked,bp));
Index: ufs/ffs/ffs_vfsops.c
===
--- ufs/ffs/ffs_vfsops.c(revision 188080)
+++ ufs/ffs/ffs_vfsops.c(working copy)
@@ -1747,7 +1747,9 @@
(bufwrite: needs chained iodone (%p), bp-b_iodone));
 
/* get a new block */
-   newbp = geteblk(bp-b_bufsize);
+   newbp = geteblk(bp-b_bufsize, GB_NOWAIT_BD);
+   if (newbp == NULL)
+   goto normal_write;
 
/*
 * set it to be identical to the old block.  We have to
@@ -1787,6 +1789,7 @@
}
 
/* Let the normal bufwrite do the rest for us */
+normal_write:
return (bufwrite(bp));
 }
 
Index: kern/vfs_bio.c
===
--- kern/vfs_bio.c  (revision 188080)
+++ kern/vfs_bio.c  (working copy)
@@ -105,7 +105,8 @@
 static void vfs_vmio_release(struct buf *bp);
 static int vfs_bio_clcheck(struct vnode *vp, int size,
daddr_t lblkno, daddr_t blkno);
-static int flushbufqueues(int, int);
+static int buf_do_flush(struct vnode *vp);
+static int flushbufqueues(struct vnode *, int, int);
 static void buf_daemon(void);
 static void bremfreel(struct buf *bp);
 
@@ -258,6 +259,7 @@
 #define QUEUE_DIRTY_GIANT 3/* B_DELWRI buffers that need giant */
 #define QUEUE_EMPTYKVA 4   /* empty buffer headers w/KVA assignment */
 #define QUEUE_EMPTY5   /* empty buffer headers */
+#define QUEUE_SENTINEL 1024/* not an queue index, but mark for sentinel */
 
 /* Queues for free buffers with various properties */
 static TAILQ_HEAD(bqueues, buf) bufqueues[BUFFER_QUEUES] = { { 0 } };
@@ -1703,21 +1705,23 @@
  */
 
 static struct buf *
-getnewbuf(int slpflag, int slptimeo, int size, int maxsize)
+getnewbuf(struct vnode *vp, int slpflag, int slptimeo, int size, int maxsize,
+int gbflags)
 {
+   struct thread *td;
struct buf *bp;
struct buf *nbp;
int defrag = 0;
int nqindex;
static int flushingbufs;
 
+   td = curthread;
/*
 * We can't afford to block since we might be holding a vnode lock,
 * which may prevent system daemons from running.  We deal with
 * low-memory situations by proactively returning memory and running
 * async I/O rather then sync I/O.
 */
-
atomic_add_int(getnewbufcalls, 1);
atomic_subtract_int(getnewbufrestarts, 1);
 restart:
@@ -1949,8 +1953,9 @@
 */
 
if (bp == NULL) {
-   int flags;
+   int flags, norunbuf;
char *waitmsg;
+   int 

7.1-RELEASE I/O hang

2009-02-04 Thread Matt Burke
I have a machine with a PERC6/e controller. Attached to that are 3 disk
shelves, each configured as individual 14-disk RAID10 arrays (the PERC
annoyingly only lets you use 8 spans per array)

I can run bonnie++ on the arrays individually with no problem.
I can also run it across a gstripe of the arrays with no problem.

However running it over the 3 arrays in parallel causes something I/O
related in the kernel to hang.

To define 'hang' better:

It appears anything which needs disk io, even on a different controller
(albeit the same mfi driver), will hang. A command like 'ps' cached in
ram will work but bash hangs after execution, presumably while trying to
write ~/.bash_history

'sysctl -a' works but trying to run 'sysctl kern.msgbuf' also hangs

I've done some research and it seems the usual cause of bonnie++
crashing a system is due to overflowing TCQ. camcontrol doesn't see any
disks, so I've tried setting hw.mfi.max_cmds=32 in /boot/loader.conf but
it hadn't made any difference.

The bonnie++ invocation is this:

(newfs devices mfid[2-3], mount)
bonnie++ -s 64g -u root -p3
bonnie++ -d /data/2 -s 64g -u root -y s b2 21 
bonnie++ -d /data/3 -s 64g -u root -y s b3 21 
bonnie++ -d /data/4 -s 64g -u root -y s b4 21 

and it always hangs on Rewriting It's a fresh 7.1-RELEASE with
nothing else running (devd, sshd, syslogd, etc)


Any ideas?


-- 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org