* Greg Smith <[EMAIL PROTECTED]> [081001 00:00]:

> The overhead of clearing out the whole thing is just large enough that it 
> can be disruptive on systems generating lots of WAL traffic, so you don't 
> want the main database processes bothering with that.  A related fact is  
> that there is a noticable slowdown to clients that need a segment switch  
> on a newly initialized and fast system that has to create all its WAL  
> segments, compared to one that has been active long enough to only be  
> recycling them.  That's why this sort of thing has been getting pushed  
> into the archive_command path; nothing performance-sensitive that can 
> slow down clients is happening there, so long as your server is powerful 
> enough to handle that in parallel with everything else going on.

> Now, it would be possible to have that less sensitive archive code path  
> zero things out, but you'd need to introduce a way to note when it's been 
> done (so you don't do it for a segment twice) and a way to turn it off so 
> everybody doesn't go through that overhead (which probably means another  
> GUC).  That's a bit much trouble to go through just for a feature with a  
> fairly limited use-case that can easily live outside of the engine  
> altogether.

Remember that the place where this benifit is big is on a generally idle
server. Is it possible to make the "time based WAL switch" zero the tail?  You
don't even need to fsync it for durability (although you may want to hopefully
preventing a larger fsync delay on the next commit).  

<timid experince=none>
How about something like the attached.  It's been spun quickly, passed
regression tests, and some simple hand tests on REL8_3_STABLE.  It seem slike
HEAD can't  initdb on my machine (quad opteron with SW raid1), I tried a few
revision in the last few days, and initdb dies on them all...

I'm not expert in the PG code, I just greped around what looked like reasonable
functions in xlog.c until I (hopefully) figured out the basic flow of switching
to new xlog segments.    I *think* I'm using openLogFile and openLogOff
correctly.
 </timid>

Setting archiving, with archive_timeout of 30s, and a few hand
pg_start_backup/pg_stop_backup you can see it *really* does make things
really compressable...

It's output is like:
        Archiving 000000010000000000000002
        Archiving 000000010000000000000003
        Archiving 000000010000000000000004
        Archiving 000000010000000000000005
        Archiving 000000010000000000000006
        LOG:  checkpoints are occurring too frequently (10 seconds apart)
        HINT:  Consider increasing the configuration parameter 
"checkpoint_segments".
        Archiving 000000010000000000000007
        Archiving 000000010000000000000008
        Archiving 000000010000000000000009
        LOG:  checkpoints are occurring too frequently (7 seconds apart)
        HINT:  Consider increasing the configuration parameter 
"checkpoint_segments".
        Archiving 00000001000000000000000A
        Archiving 00000001000000000000000B
        Archiving 00000001000000000000000C
        LOG:  checkpoints are occurring too frequently (6 seconds apart)
        HINT:  Consider increasing the configuration parameter 
"checkpoint_segments".
        Archiving 00000001000000000000000D
        LOG:  ZEROING xlog file 0 segment 14 from 12615680 - 16777216 [4161536 
bytes]
        STATEMENT:  SELECT pg_stop_backup();
        Archiving 00000001000000000000000E
        Archiving 00000001000000000000000E.00C07098.backup
        LOG:  ZEROING xlog file 0 segment 15 from 8192 - 16777216 [16769024 
bytes]
        STATEMENT:  SELECT pg_stop_backup();
        Archiving 00000001000000000000000F
        Archiving 00000001000000000000000F.00000C60.backup
        LOG:  ZEROING xlog file 0 segment 16 from 8192 - 16777216 [16769024 
bytes]
        STATEMENT:  SELECT pg_stop_backup();
        Archiving 000000010000000000000010.00000F58.backup
        Archiving 000000010000000000000010
        LOG:  ZEROING xlog file 0 segment 17 from 8192 - 16777216 [16769024 
bytes]
        STATEMENT:  SELECT pg_stop_backup();
        Archiving 000000010000000000000011
        Archiving 000000010000000000000011.00000020.backup
        LOG:  ZEROING xlog file 0 segment 18 from 6815744 - 16777216 [9961472 
bytes]
        Archiving 000000010000000000000012
        LOG:  ZEROING xlog file 0 segment 19 from 8192 - 16777216 [16769024 
bytes]
        Archiving 000000010000000000000013
        LOG:  ZEROING xlog file 0 segment 20 from 16384 - 16777216 [16760832 
bytes]
        Archiving 000000010000000000000014
        LOG:  ZEROING xlog file 0 segment 23 from 8192 - 16777216 [16769024 
bytes]
        STATEMENT:  SELECT pg_switch_xlog();
        Archiving 000000010000000000000017
        LOG:  ZEROING xlog file 0 segment 24 from 8192 - 16777216 [16769024 
bytes]
        Archiving 000000010000000000000018
        LOG:  ZEROING xlog file 0 segment 25 from 8192 - 16777216 [16769024 
bytes]
        Archiving 000000010000000000000019

You can see that when DB activity was heavy enough to fill an xlog segment
before the timout (or interative forced switch), it didn't zero anything.  It
only zeroed on a timeout switch, or a forced switch 
(pg_switch_xlog/pg_stop_backup).

And compressed xlog segments:
        -rw-r--r-- 1 mountie mountie   18477 2008-10-31 14:44 
000000010000000000000010.gz
        -rw-r--r-- 1 mountie mountie   16394 2008-10-31 14:44 
000000010000000000000011.gz
        -rw-r--r-- 1 mountie mountie 2721615 2008-10-31 14:52 
000000010000000000000012.gz
        -rw-r--r-- 1 mountie mountie   16588 2008-10-31 14:52 
000000010000000000000013.gz
        -rw-r--r-- 1 mountie mountie   19230 2008-10-31 14:52 
000000010000000000000014.gz
        -rw-r--r-- 1 mountie mountie 4920063 2008-10-31 14:52 
000000010000000000000015.gz
        -rw-r--r-- 1 mountie mountie 5024705 2008-10-31 14:52 
000000010000000000000016.gz
        -rw-r--r-- 1 mountie mountie   18082 2008-10-31 14:52 
000000010000000000000017.gz
        -rw-r--r-- 1 mountie mountie   18477 2008-10-31 14:52 
000000010000000000000018.gz
        -rw-r--r-- 1 mountie mountie   16394 2008-10-31 14:52 
000000010000000000000019.gz
        -rw-r--r-- 1 mountie mountie 2721615 2008-10-31 15:02 
00000001000000000000001A.gz
        -rw-r--r-- 1 mountie mountie   16588 2008-10-31 15:02 
00000001000000000000001B.gz
        -rw-r--r-- 1 mountie mountie   19230 2008-10-31 15:02 
00000001000000000000001C.gz

And yes, even the non-zeroed segments compress well here, because
my test load is pretty simple:
        CREATE TABLE TEST
        (
         a numeric,
         b numeric,
         c numeric,
         i bigint not null
        );


        INSERT INTO test (a,b,c,i)
          SELECT random(),random(),random(),s FROM generate_series(1,1000000) s;


a.


-- 
Aidan Van Dyk                                             Create like a god,
[EMAIL PROTECTED]                                       command like a king,
http://www.highrise.ca/                                   work like a slave.
commit 3916c54126ffade0baad4609467393d9a1b53e37
Author: Aidan Van Dyk <[EMAIL PROTECTED]>
Date:   Fri Oct 31 12:35:24 2008 -0400

    WIP: Zero xlog tal on a forced switch
    
    If XLogWrite is called with xlog_switch, an XLog swithc has been force, either
    by a timeout based switch (archive_timeout), or an interactive force xlog
    switch (pg_switch_xlog/pg_stop_backup).  In those cases, we assume we can
    afford a little extra IO bandwidth to make xlogs so much more compressable

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8bc46da..a8d945d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1548,6 +1548,30 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch)
 			 */
 			if (finishing_seg || (xlog_switch && last_iteration))
 			{
+				/*
+				 * If we've had an xlog switch forced, then we want to zero
+				 * out the rest of the segment.  We zero it out here because at the
+				 * force switch time, IO bandwidth isn't a problem.
+				 *   -- AIDAN
+				 */
+				if (xlog_switch)
+				{
+					char buf[1024];
+					uint32 left = (XLogSegSize - openLogOff);
+					ereport(LOG,
+						(errmsg("ZEROING xlog file %u segment %u from %u - %u [%u bytes]",
+								openLogId, openLogSeg,
+								openLogOff, XLogSegSize, left)
+						 ));
+					memset(buf, 0, sizeof(buf));
+					while (left > 0)
+					{
+						size_t len = (left > sizeof(buf)) ? sizeof(buf) : left;
+						write(openLogFile, buf, len);
+						left -= len;
+					}
+				}
+
 				issue_xlog_fsync();
 				LogwrtResult.Flush = LogwrtResult.Write;		/* end of page */
 

Attachment: signature.asc
Description: Digital signature

Reply via email to