Re: [HACKERS] Controlling Load Distributed Checkpoints

Heikki Linnakangas Mon, 11 Jun 2007 02:31:24 -0700

ITAGAKI Takahiro wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
True. On the other hand, if we issue writes in essentially random order,we might fill the kernel buffers with random blocks and the kernel needsto flush them to disk as almost random I/O. If we did the writes ingroups, the kernel has better chance at coalescing them.
If the kernel can treat sequential writes better than random writes,is it worth sorting dirty buffers in block order per file at the start
of checkpoints? Here is the pseudo code:

  buffers_to_be_written =
      SELECT buf_id, tag FROM BufferDescriptors
        WHERE (flags & BM_DIRTY) != 0 ORDER BY tag.rnode, tag.blockNum;
  for { buf_id, tag } in buffers_to_be_written:
      if BufferDescriptors[buf_id].tag == tag:
          FlushBuffer(&BufferDescriptors[buf_id])

We can also avoid writing buffers newly dirtied after the checkpoint was
started with this method.


That's worth testing, IMO. Probably won't happen for 8.3, though.

I tend to agree that if the goal is to finish the checkpoint as quicklyas possible, the current approach is better. In the context of loaddistributed checkpoints, however, it's unlikely the kernel can do anysignificant overlapping since we're trickling the writes anyway.
Some kernels or storage subsystems treat all I/Os too fairly so that user
transactions waiting for reads are blocked by checkpoints writes. It is
unavoidable behavior though, but we can split writes in small batches.

That's really the heart of our problems. If the kernel had support forprioritizing the normal backend activity and LRU cleaning over thecheckpoint I/O, we wouldn't need to throttle the I/O ourselves. Thekernel has the best knowledge of what it can and can't do, and how busythe I/O subsystems are. Recent Linux kernels have some support for readI/O priorities, but not for writes.

I believe the best long term solution is to add that support to thekernel, but it's going to take a long time until that's universallyavailable, and we have a lot of platforms to support.

I'm starting to feel we should give up on smoothing the fsyncs anddistribute the writes only, for 8.3. As we get more experience with thatand it's shortcomings, we can enhance our checkpoints further in 8.4.
I agree with the only writes distribution for 8.3. The new parameters
introduced by it (checkpoint_write_percent and checkpoint_write_min_rate)
will continue to be alive without major changes in the future, but other
parameters seem to be volatile.

I'm going to start testing with just distributing the writes. Let's seehow far that gets us.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [HACKERS] Controlling Load Distributed Checkpoints

Reply via email to