On Wed, Dec 27, 2006 at 10:54:57PM +0000, Simon Riggs wrote: > On Wed, 2006-12-27 at 23:26 +0100, Martijn van Oosterhout wrote: > > On Wed, Dec 27, 2006 at 09:24:06PM +0000, Simon Riggs wrote: > > > On Fri, 2006-12-22 at 13:53 -0500, Bruce Momjian wrote: > > > > > > > I assume other kernels have similar I/O smoothing, so that data sent to > > > > the kernel via write() gets to disk within 30 seconds. > > > > > > > > I assume write() is not our checkpoint performance problem, but the > > > > transfer to disk via fsync(). > > > > > > Well, its correct to say that the transfer to disk is the source of the > > > problem, but that doesn't only occur when we fsync(). There are actually > > > two disk storms that occur, because of the way the fs cache works. [Ron > > > referred to this effect uplist] > > > > As someone looking from the outside: > > > > fsync only works on one file, so presumably the checkpoint process is > > opening each file one by one and fsyncing them. > > Yes > > > Does that make any > > difference here? Could you adjust the timing here? > > Thats the hard bit with io storm 2. When you fsync a file you don't > actually know how many blocks you're writing, plus there's no way to > slow down those writes by putting delays between them (although its > possible your controller might know how to do this, I've never heard of > one that does).
Any controller that sophisticated would likely also have a BBU and write caching, which should greatly reduce the impact of at least the fsync storm... unless you fill the cache. I suspect we might need a way to control how much data we try and push out at a time to avoid that... As for settings, I really like the simplicity of the Oracle system... "Just try to ensure recovery takes about X amount of seconds". I like the idea of a creeping checkpoint even more; only writing a buffer out when we need to checkpoint it makes a lot more sense to me than trying to guess when we'll next dirty a buffer. Such a system would probably also be a lot easier to tune than the current bgwriter, even if we couldn't simplify it all the way to "seconds for recovery". -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly