On Mon, 29 Jan 2007 17:02:14 -0500 "Matthew Kirk" <[EMAIL PROTECTED]> wrote:
> Regarding the long fsyncs, here's a trace... > > I upgraded to a more recent kernel - 2.6.18.6 - and ran it on a workstation. > This particular box has In this case the elevator is CFQ. > > This sample came from a stall that lasted about 2.5 minutes(!) - the longest > one I've seen yet. The box is a bit more memory constrained than the > original system but exhibits similar behavior. It doesn't page. Also, > there is no raid card - simply striped PATA drives. Using your little test app, the longest fsync() stall I can demonstrate on 2.6.20-rc4-mm1 on plain-old-sata-disk is 1.2 seconds. What's the max stall you're able to see with the test app? Perhaps the file is just super-fragmented. If your production app does something like: for (a lot) { fd = open(name); write(fd, a little bit); close(fd); } in multiple threads, or against a lot of different files then you might be fragmenting the files a lot. This is because ext3 discards its in-core anti-fragmentation data structures on close(). So - Please check your app, see whether or not it is frequently opening and closing the output files. - Using `bmap' from http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz, determine whether the offending files are highly fragmented. - Tell us how much dirty data is being written out by these fsyncs? - Try mounting the filesystem with `-o data=writeback'. Probably won't help much if it's also demonstrable on ext2. - Can you reproduce the stalls on a plain-old-disk? Get RAID out of the picture? - You've seen the stalls with both CFQ and AS. I guess you could try deadline and no-op, but it sounds like that won't help. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/