Re: [HACKERS] Load distributed checkpoint

Jim C. Nasby Thu, 28 Dec 2006 03:36:20 -0800

On Wed, Dec 27, 2006 at 10:54:57PM +0000, Simon Riggs wrote:
> On Wed, 2006-12-27 at 23:26 +0100, Martijn van Oosterhout wrote:
> > On Wed, Dec 27, 2006 at 09:24:06PM +0000, Simon Riggs wrote:
> > > On Fri, 2006-12-22 at 13:53 -0500, Bruce Momjian wrote:
> > > 
> > > > I assume other kernels have similar I/O smoothing, so that data sent to
> > > > the kernel via write() gets to disk within 30 seconds.  
> > > > 
> > > > I assume write() is not our checkpoint performance problem, but the
> > > > transfer to disk via fsync().  
> > > 
> > > Well, its correct to say that the transfer to disk is the source of the
> > > problem, but that doesn't only occur when we fsync(). There are actually
> > > two disk storms that occur, because of the way the fs cache works. [Ron
> > > referred to this effect uplist]
> > 
> > As someone looking from the outside:
> > 
> > fsync only works on one file, so presumably the checkpoint process is
> > opening each file one by one and fsyncing them. 
> 
> Yes
> 
> > Does that make any
> > difference here? Could you adjust the timing here?
> 
> Thats the hard bit with io storm 2. When you fsync a file you don't
> actually know how many blocks you're writing, plus there's no way to
> slow down those writes by putting delays between them (although its
> possible your controller might know how to do this, I've never heard of
> one that does).


Any controller that sophisticated would likely also have a BBU and write
caching, which should greatly reduce the impact of at least the fsync
storm... unless you fill the cache. I suspect we might need a way to
control how much data we try and push out at a time to avoid that...

As for settings, I really like the simplicity of the Oracle system...
"Just try to ensure recovery takes about X amount of seconds". I like
the idea of a creeping checkpoint even more; only writing a buffer out
when we need to checkpoint it makes a lot more sense to me than trying
to guess when we'll next dirty a buffer. Such a system would probably
also be a lot easier to tune than the current bgwriter, even if we
couldn't simplify it all the way to "seconds for recovery".
-- 
Jim Nasby                                            [EMAIL PROTECTED]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [HACKERS] Load distributed checkpoint

Reply via email to