Re: [HACKERS] Controlling Load Distributed Checkpoints

Jim C. Nasby Wed, 13 Jun 2007 11:09:01 -0700

On Sun, Jun 10, 2007 at 08:49:24PM +0100, Heikki Linnakangas wrote:
> Jim C. Nasby wrote:
> >On Thu, Jun 07, 2007 at 10:16:25AM -0400, Tom Lane wrote:
> >>Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> >>>Thinking about this whole idea a bit more, it occured to me that the 
> >>>current approach to write all, then fsync all is really a historical 
> >>>artifact of the fact that we used to use the system-wide sync call 
> >>>instead of fsyncs to flush the pages to disk. That might not be the best 
> >>>way to do things in the new load-distributed-checkpoint world.
> >>>How about interleaving the writes with the fsyncs?
> >>I don't think it's a historical artifact at all: it's a valid reflection
> >>of the fact that we don't know enough about disk layout to do low-level
> >>I/O scheduling.  Issuing more fsyncs than necessary will do little
> >>except guarantee a less-than-optimal scheduling of the writes.
> >
> >If we extended relations by more than 8k at a time, we would know a lot
> >more about disk layout, at least on filesystems with a decent amount of
> >free space.
> 
> I doubt it makes that much difference. If there was a significant amount 
> of fragmentation, we'd hear more complaints about seq scan performance.
> 
> The issue here is that we don't know which relations are on which drives 
> and controllers, how they're striped, mirrored etc.


Actually, isn't pre-allocation one of the tricks that Greenplum uses to
get it's seqscan performance?
-- 
Jim Nasby                                      [EMAIL PROTECTED]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

pgp9v0jJYJxA0.pgp
Description: PGP signature

Re: [HACKERS] Controlling Load Distributed Checkpoints

Reply via email to