Tom Lane wrote:
Stefan Kaltenbrunner <ste...@kaltenbrunner.cc> writes:
ok after a bit of bisecting I'm happy to announce the winner of the contest:
http://archives.postgresql.org/pgsql-committers/2008-11/msg00054.php

this patch causes a 25-30% performance regression for WAL logged copy, however in the WAL bypass case (maybe that was what got tested?) it results in a 20% performance increase.

Hmm.  What that patch actually changes is that it prevents a bulk insert
(ie COPY in) from trashing the entire shared-buffers arena.  I think the
reason for the WAL correlation is that once it's filled the ring buffer,
creating new pages requires writing out old ones, and the
WAL-before-data rule means that the copy process has to block waiting
for WAL to go down to disk before it can write.  When it's allowed to
use the whole arena there is more chance for some of that writing to be
done by the walwriter or bgwriter.  But the details are going to depend
on the platform's CPU vs I/O balance, which no doubt explains why some
of us don't see it.

hmm - In my case both the CPU (an Intel E5530 Nehalem) and the IO subsystem (8GB Fiberchannel connected NetApp with 4GB cache) are pretty fast. and even with say fsync=off 8.4RC1 is only slightly faster than 8.3 with the same config and fsync=on so maybe there is a secondary effect at play too.


I don't think we want to revert that patch --- not trashing the whole
buffer arena seems like a Good Thing from a system-wide point of view,
even if it makes individual COPY operations go slower.  However, we
could maybe play around with the tradeoffs a bit.  In particular it
seems like it would be useful to experiment with different ring buffer
sizes.  Could you try increasing the ring size allowed in
src/backend/storage/buffer/freelist.c for the BULKWRITE case

***************
*** 384,389 ****
--- 384,392 ----
                case BAS_BULKREAD:
                        ring_size = 256 * 1024 / BLCKSZ;
                        break;
+               case BAS_BULKWRITE:
+                       ring_size = 256 * 1024 / BLCKSZ;
+                       break;
                case BAS_VACUUM:
                        ring_size = 256 * 1024 / BLCKSZ;
                        break;


and see if maybe we can buy back most of the loss with not too much
of a ring size increase?

already started testing that once I found the offending commit.

256 * 1024 / BLCKSZ
4min10s/4min19/4min12

512 * 1024 / BLCKSZ
3min27s/3min32s

1024 * 1024 / BLCKSZ
3min14s/3min12s

2048 * 1024 / BLCKSZ
3min02/3min02

4096 * 1024 / BLCKSZ
2m59/2m58s

8192 * 1024 / BLCKSZ

2m59/2m59s

so 4096 * 1024 / BLCKSZ seems to be the sweet spot and also results in more or less the same performance that 8.3 had.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to