On 6/14/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
On Thu, 2007-06-14 at 16:39 +0900, ITAGAKI Takahiro wrote:
> Greg Smith <[EMAIL PROTECTED]> wrote:
>
> > On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote:
> > > If the kernel can treat sequential writes better than random writes, is
> > > it worth sorting dirty buffers in block order per file at the start of
> > > checkpoints?
>
> I wrote and tested the attached sorted-writes patch base on Heikki's
> ldc-justwrites-1.patch. There was obvious performance win on OLTP workload.
>
>   tests                    | pgbench | DBT-2 response time (avg/90%/max)
> ---------------------------+---------+-----------------------------------
>  LDC only                  | 181 tps | 1.12 / 4.38 / 12.13 s
>  + BM_CHECKPOINT_NEEDED(*) | 187 tps | 0.83 / 2.68 /  9.26 s
>  + Sorted writes           | 224 tps | 0.36 / 0.80 /  8.11 s
>
> (*) Don't write buffers that were dirtied after starting the checkpoint.
>
> machine : 2GB-ram, SCSI*4 RAID-5
> pgbench : -s400 -t40000 -c10  (about 5GB of database)
> DBT-2   : 60WH (about 6GB of database)

I'm very surprised by the BM_CHECKPOINT_NEEDED results. What percentage
of writes has been saved by doing that? We would expect a small
percentage of blocks only and so that shouldn't make a significant
difference. I thought we discussed this before, about a year ago. It
would be easy to get that wrong and to avoid writing a block that had
been re-dirtied after the start of checkpoint, but was already dirty
beforehand. How long was the write phase of the checkpoint, how long
between checkpoints?

I can see the sorted writes having an effect because the OS may not
receive blocks within a sufficient time window to fully optimise them.
That effect would grow with increasing sizes of shared_buffers and
decrease with size of controller cache. How big was the shared buffers
setting? What OS scheduler are you using? The effect would be greatest
when using Deadline.

Linux has some instrumentation that might be useful for this testing,

echo 1 > /proc/sys/vm/block_dump
Will have the kernel log all physical IO (disable syslog writing to
disk before turning it on if you don't want the system to blow up).

Certainly the OS elevator should be working well enough to not see
that much of an improvement. Perhaps frequent fsync behavior is having
unintended interaction with the elevator?  ... It might be worthwhile
to contact some Linux kernel developers and see if there is some
misunderstanding.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to