Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-07-12 Thread Junji TERAMOTO
Hi all,

I examined the effect of block-hole compression patch.

I compared the amounts of writing of the WAL data of CVS(7/11) and
8.0.3. (The amount of the WAL data writing was measured by the number of
executions of the write() function in XLogWrite().)
And, I measured the size of the hole.

Environment;
IBM x206 P4 3.0GHz Mem 4GB
CentOS 4.0 (Linux 2.6.9-5.0.3.ELsmp)

Parameters;
shared_buffers = 65535
checkpoint_segments = 30
default_with_oids = false (8.0.3)
default_with_oids = off   (CVS)

How to exam;
0) initdb --no-locale
1) pgbench -i -s 100 pgbench
2) pgbench -n -c 50 -t 5000 pgbench
3) vacuumdb -d pgbench
4) pgbench -n -c 50 -t 5000 pgbench

Results;
 |8.0.3|  CVS(7/11)
Exam | |   | |   |block-hole (byte)
 |  write  |C.P|  write  |C.P|   total   | min | max  |   avg
-+-+---+-+---+---+-+--+-
 1)  |  187505 | 3 |  187373 | 4 |194056 |  36 | 8124 | 3881.12
 2)  |  509725 | 6 |  513489 | 5 | 115564476 |  12 | 8096 |  347.69
 3)  |  280456 | 2 |  172973 | 2 |  95923360 | 248 | 8156 |  614.08
 4)  |  533971 | 7 |  525135 | 6 | 171147256 |  12 | 8140 |  482.11

C.P = Checkpoint frequency

It has been understood that patchs seems to be effective at VACUUM as a
result of the measurement. But, in other cases, the effect was not so seen.



-- 
Junji Teramoto


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-09 Thread Jim C. Nasby
On Sat, Jun 04, 2005 at 11:46:07AM -0400, Tom Lane wrote:
 I've completed a test run for this (it's essentially MySQL's sql-bench
 done immediately after initdb).  What I get is:
 
 CVS tip of 6/1: ending WAL offset = 0/A364A780 = 2741282688 bytes written
 
 CVS tip of 6/2: ending WAL offset = 0/8BB091DC = 2343604700 bytes written
 
 or about a 15% savings.  This is with a checkpoint_segments setting of 30.
 One can presume that the savings would be larger at smaller checkpoint
 intervals and smaller at larger intervals, but I didn't try more than
 one set of test conditions.
 
 I'd say that's an improvement worth having, especially considering that
 it requires no net expenditure of CPU time.  But the table is certainly
 still open to discuss more complicated approaches.

If it's not hard to hack in as a test, it'd be interesting to see what
additional gains a more aggresive compression algorithm like LZW got.
CPU is more of a concern in that case, but for databases generating a
lot of WAL it might still be a win.

BTW, is this the thread you reffered to? I wasn't able to find it in the
TODO and had to go into the archives to find it...
http://archives.postgresql.org/pgsql-performance/2005-04/msg00264.php
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-09 Thread Rod Taylor
  I'd say that's an improvement worth having, especially considering that
  it requires no net expenditure of CPU time.  But the table is certainly
  still open to discuss more complicated approaches.
 
 If it's not hard to hack in as a test, it'd be interesting to see what
 additional gains a more aggresive compression algorithm like LZW got.
 CPU is more of a concern in that case, but for databases generating a
 lot of WAL it might still be a win.

I've generate a fair amount of WAL segments (about 20GB per day), and
have a CPU issue.

I implemented a cronjob which compresses them using gzip on a different
machine.

Any chance we could have an official compression tool which is
independent of the server itself so we can distribute the load a little?

 BTW, is this the thread you reffered to? I wasn't able to find it in the
 TODO and had to go into the archives to find it...
 http://archives.postgresql.org/pgsql-performance/2005-04/msg00264.php
-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-07 Thread Junji TERAMOTO
Tom Lane wrote:
(B
(BIn the XLogInsert(), it makes two kinds of logs, "whole buffer(page)
(Blog" and "partial buffer log", isn't it?  Is it only "who buffer
(Blog"
(Bto generate a log with "hole"?
(B
(B
(B Right.
(B
(BI see.
(BI think, it is important to reduce the necessities to write whole pages
(Bto WAL (as TODO list).
(B# It seems difficult to do so... Compressing WAL is easier way.
(B
(B No, because the WAL code cannot assume that all pages follow the
(B convention that pd_lower and pd_upper represent the boundaries of
(B free space.  (As a counterexample: index metapages don't always
(B do that.)
(B
(BOh, I forget it.
(BAnd I think it is good idea to modify XLogInsert API as CVS, too.
(B
(B-- 
(BJunji Teramoto
(B
(B---(end of broadcast)---
(BTIP 8: explain analyze is your friend

Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-06 Thread Junji TERAMOTO
Hello all,
(B
(BI am interested in how to "Compress WAL entries".
(BThen, I study the source now, and read this discussion.
(B
(BThere are some questions.
(B
(B1.
(BIn the XLogInsert(), it makes two kinds of logs, "whole buffer(page)
(Blog" and "partial buffer log", isn't it?  Is it only "who buffer log"
(Bto generate a log with "hole"?
(B
(B2.
(BTom Lane wrote:
(B The overhead needed is only 2 bytes to show the number of
(B bytes removed.
(B
(BIn "whole buffer log", there is a page header that includes offset of
(B"hole" (lower and upper). If we use that information, we don't need
(Bany overhead, do we?
(B
(B# Sorry for my bad english..
(B
(B-- 
(BJunji Teramoto
(B
(B---(end of broadcast)---
(BTIP 8: explain analyze is your friend

Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-06 Thread Mark Cave-Ayland

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: 04 June 2005 16:46
 To: Mark Cave-Ayland (External)
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: Quick-and-dirty compression for WAL backup blocks

(cut)

 I've completed a test run for this (it's essentially MySQL's 
 sql-bench done immediately after initdb).  What I get is:
 
 CVS tip of 6/1: ending WAL offset = 0/A364A780 = 2741282688 
 bytes written
 
 CVS tip of 6/2: ending WAL offset = 0/8BB091DC = 2343604700 
 bytes written
 
 or about a 15% savings.  This is with a checkpoint_segments 
 setting of 30. One can presume that the savings would be 
 larger at smaller checkpoint intervals and smaller at larger 
 intervals, but I didn't try more than one set of test conditions.
 
 I'd say that's an improvement worth having, especially 
 considering that it requires no net expenditure of CPU time.  
 But the table is certainly still open to discuss more 
 complicated approaches.


Hi Tom,

Thanks for the numbers. I've just been across to the OSDL STP website and it
appears that the automatic nightly PostgreSQL CVS builds set up by Mark have
been broken since the middle of April :( A brief look shows that compilation
of the PostgreSQL sources is failing with lots of errors and warnings. I
don't think I can justify the time at the moment to go and look at this
myself, however I thought I'd post to -hackers as a heads up in case anyone
else could pick this up.


Kind regards,

Mark.


WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-06 Thread Tom Lane
Junji TERAMOTO [EMAIL PROTECTED] writes:
 In the XLogInsert(), it makes two kinds of logs, whole buffer(page)
 log and partial buffer log, isn't it?  Is it only who buffer log
 to generate a log with hole?

Right.

 Tom Lane wrote:
 The overhead needed is only 2 bytes to show the number of
 bytes removed.

 In whole buffer log, there is a page header that includes offset of
 hole (lower and upper). If we use that information, we don't need
 any overhead, do we?

No, because the WAL code cannot assume that all pages follow the
convention that pd_lower and pd_upper represent the boundaries of
free space.  (As a counterexample: index metapages don't always
do that.)  I think the transformation has to be guaranteed lossless,
which means that at a minimum you'd need to check whether the data
in between pd_lower and pd_upper really is zeroes.  So the irreducible
minimum overhead is 1 bit to tell whether you compressed or not.
Considering alignment requirements, you might as well expend a couple
of bytes and keep it simple.  (In the patch as committed, I ended up
using 4 bytes --- a uint16 hole start and a uint16 hole length ---
because it kept the code simple.  The alignment requirements mean the
extra 2 bytes are usually free anyway.)

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-06 Thread Heikki Linnakangas

On Mon, 6 Jun 2005, Tom Lane wrote:


Junji TERAMOTO [EMAIL PROTECTED] writes:


In whole buffer log, there is a page header that includes offset of
hole (lower and upper). If we use that information, we don't need
any overhead, do we?


No, because the WAL code cannot assume that all pages follow the
convention that pd_lower and pd_upper represent the boundaries of
free space.  (As a counterexample: index metapages don't always
do that.)  I think the transformation has to be guaranteed lossless,
which means that at a minimum you'd need to check whether the data
in between pd_lower and pd_upper really is zeroes.  So the irreducible
minimum overhead is 1 bit to tell whether you compressed or not.


Vacuum doesn't zero out the free space between lower and upper, it's 
just marked as unused, so a lossless compression becomes less efficient 
on tables that have free space released by vacuum in them.


How about adding a flag to XLogRecData to indicate if the space between 
pd_lower and pd_upper is meaningful or not? The XLogInsert caller probably 
knows that. That way you could completely skip over the free space if 
it's not meaningful, saving even more cycles.


- Heikki

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-06 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Vacuum doesn't zero out the free space between lower and upper,

It does now ;-)

 How about adding a flag to XLogRecData to indicate if the space between 
 pd_lower and pd_upper is meaningful or not? The XLogInsert caller probably 
 knows that. That way you could completely skip over the free space if 
 it's not meaningful, saving even more cycles.

Hmm ... that might not be a bad idea.  As far as I can think offhand,
all the XLogInsert callers know very well what type of page they are
working with, so they would always be able to set such a flag correctly.

Would this be institutionalizing a particular approach to data
compression in the XLogInsert API, though?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-06 Thread Heikki Linnakangas

On Mon, 6 Jun 2005, Tom Lane wrote:


Heikki Linnakangas [EMAIL PROTECTED] writes:

Vacuum doesn't zero out the free space between lower and upper,


It does now ;-)


Oh :). Does it affect vacuum performance?


How about adding a flag to XLogRecData to indicate if the space between
pd_lower and pd_upper is meaningful or not? The XLogInsert caller probably
knows that. That way you could completely skip over the free space if
it's not meaningful, saving even more cycles.


Hmm ... that might not be a bad idea.  As far as I can think offhand,
all the XLogInsert callers know very well what type of page they are
working with, so they would always be able to set such a flag correctly.

Would this be institutionalizing a particular approach to data
compression in the XLogInsert API, though?


The skip the free space optimization is still useful and worthwhile 
even if we have a more sophisticated compression method for the 
rest of the page.


- Heikki

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-06 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 On Mon, 6 Jun 2005, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 Vacuum doesn't zero out the free space between lower and upper,
 
 It does now ;-)

 Oh :). Does it affect vacuum performance?

I haven't tried to measure it ... but certainly it's not totally free.
I'd be happy to rip that change out again.

 Would this be institutionalizing a particular approach to data
 compression in the XLogInsert API, though?

 The skip the free space optimization is still useful and worthwhile 
 even if we have a more sophisticated compression method for the 
 rest of the page.

Good point.  OK, I'm hacking XLOG stuff now anyway so I'll see about
making that happen.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-04 Thread Tom Lane
Mark Cave-Ayland [EMAIL PROTECTED] writes:
 A run-length compressor would be reasonably quick but I think that the
 omit-the-middle-hole approach gets most of the possible win with even
 less work.

 I can't think that a RLE scheme would be much more expensive than a 'count
 the hole' approach with more benefit, so I wouldn't like to discount this
 straight away...

RLE would require scanning the whole page with no certainty of win,
whereas count-the-hole is a certain win since you only examine bytes
that are potentially removable from the later CRC calculation.

 If you do manage to go ahead with the code, I'd be very interested to see
 some comparisons in bytes written to XLog for old and new approaches for
 some inserts/updates. Perhaps we could ask Mark to run another TPC benchmark
 at OSDL when this and the CRC changes have been completed.

I've completed a test run for this (it's essentially MySQL's sql-bench
done immediately after initdb).  What I get is:

CVS tip of 6/1: ending WAL offset = 0/A364A780 = 2741282688 bytes written

CVS tip of 6/2: ending WAL offset = 0/8BB091DC = 2343604700 bytes written

or about a 15% savings.  This is with a checkpoint_segments setting of 30.
One can presume that the savings would be larger at smaller checkpoint
intervals and smaller at larger intervals, but I didn't try more than
one set of test conditions.

I'd say that's an improvement worth having, especially considering that
it requires no net expenditure of CPU time.  But the table is certainly
still open to discuss more complicated approaches.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-01 Thread Mark Cave-Ayland

 Date: Tue, 31 May 2005 16:26:20 -0400
 From: Tom Lane [EMAIL PROTECTED]
 To: pgsql-hackers@postgreSQL.org
 Subject: Quick-and-dirty compression for WAL backup blocks
 Message-ID: [EMAIL PROTECTED]

(cut)

 ... make the WAL writing logic aware of the layout
 of buffer pages --- specifically, to know that our pages generally
 contain an uninteresting hole in the middle, and not write the hole. 
 Optimistically this might reduce the WAL data volume by something 
 approaching 50%; though pessimistically (if most pages are near full) 
 it wouldn't help much.

 A more concrete version of this is: examine the page to see if the 
 pd_lower field is between SizeOfPageHeaderData and BLCKSZ, and if so 
 whether there is a run of consecutive zero bytes beginning at the 
 pd_lower position.  Omit any such bytes from what is written to WAL. 
 (This definition ensures that nothing goes wrong if the page does not 
 follow the normal page layout conventions: the transformation is 
 lossless no matter what, since we can always reconstruct the exact page
 contents.)  The overhead needed is only 2 bytes to show the number of
bytes removed.

I can't say I'm familiar enough with the exact format of the XLog records
(wasn't sure where to find this), but from what you're saying (and by
definition I suppose) I'm guessing it is just a complete copy of the new
page just before it gets written to the heap?

I also noticed your comment above that mentioned that compression would be
less effective as the pages became more full. Would changing the loading
factor of database pages have an effect here, as I would have thought that
the WAL would be fsync'd more aggressively than the heap?

 A run-length compressor would be reasonably quick but I think that the 
 omit-the-middle-
 hole approach gets most of the possible win with even less work.

I can't think that a RLE scheme would be much more expensive than a 'count
the hole' approach with more benefit, so I wouldn't like to discount this
straight away...

If you do manage to go ahead with the code, I'd be very interested to see
some comparisons in bytes written to XLog for old and new approaches for
some inserts/updates. Perhaps we could ask Mark to run another TPC benchmark
at OSDL when this and the CRC changes have been completed.


Kind regards,

Mark.


WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-01 Thread Tom Lane
Mark Cave-Ayland [EMAIL PROTECTED] writes:
 I also noticed your comment above that mentioned that compression would be
 less effective as the pages became more full. Would changing the loading
 factor of database pages have an effect here, as I would have thought that
 the WAL would be fsync'd more aggressively than the heap?

Yeah, it's predictable that leaving more free space per page would make
the optimization more effective.  I don't have anything quantitative to
say about it.

 If you do manage to go ahead with the code, I'd be very interested to see
 some comparisons in bytes written to XLog for old and new approaches for
 some inserts/updates. Perhaps we could ask Mark to run another TPC benchmark
 at OSDL when this and the CRC changes have been completed.

Good idea.  One point is that the larger the checkpoint interval, the
less this matters at all, because a backup block is only written for
a given page the first time it is modified after a checkpoint.  (This
is one of the big reasons frequent checkpoints are expensive.)  IIRC
Mark was using unrealistically large checkpoint intervals, which would
mean he'd underestimate the effects of optimizing backup blocks.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-01 Thread Alvaro Herrera
On Wed, Jun 01, 2005 at 10:12:34AM -0400, Tom Lane wrote:
 Mark Cave-Ayland [EMAIL PROTECTED] writes:
  I also noticed your comment above that mentioned that compression would be
  less effective as the pages became more full. Would changing the loading
  factor of database pages have an effect here, as I would have thought that
  the WAL would be fsync'd more aggressively than the heap?
 
 Yeah, it's predictable that leaving more free space per page would make
 the optimization more effective.

But it would also increase the number of pages, so maybe there would
have to be block dumps more often, which may make the whole thing worse
on average.

-- 
Alvaro Herrera (alvherre[a]surnet.cl)
La realidad se compone de muchos sueños, todos ellos diferentes,
pero en cierto aspecto, parecidos... (Yo, hablando de sueños eróticos)

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-05-31 Thread Simon Riggs
On Tue, 2005-05-31 at 16:26 -0400, Tom Lane wrote:
 The TODO item that comes to mind immediately is Compress WAL entries.
 A more concrete version of this is: examine the page to see if the
 pd_lower field is between SizeOfPageHeaderData and BLCKSZ, and if so
 whether there is a run of consecutive zero bytes beginning at the
 pd_lower position.  Omit any such bytes from what is written to WAL.
 (This definition ensures that nothing goes wrong if the page does not
 follow the normal page layout conventions: the transformation is
 lossless no matter what, since we can always reconstruct the exact page
 contents.)  The overhead needed is only 2 bytes to show the number of
 bytes removed.
 
 The other alternatives that were suggested included running the page
 contents through the same compressor used for TOAST, and implementing
 a general-purpose run-length compressor that could get rid of runs of
 zeroes anywhere on the page.  However, considering that the compression
 work has to be done while holding WALInsertLock, it seems to me there
 is a strong premium on speed.  I think that lets out the TOAST
 compressor, which isn't amazingly speedy.  (Another objection to the
 TOAST compressor is that it certainly won't win on already-compressed
 toasted data.)  A run-length compressor would be reasonably quick but
 I think that the omit-the-middle-hole approach gets most of the possible
 win with even less work.  In particular, I think it can be proven that
 omit-the-hole will actually require less CPU than now, since counting
 zero bytes should be strictly faster than CRC'ing bytes, and we'll be
 able to save the CRC work on whatever bytes we omit.
 
 Any objections?

None: completely agree with your analysis. Sounds great.

 It seems we are more or less agreed that 32-bit CRC ought to be enough
 for WAL; and we also need to make a change to ensure that backup blocks
 are positively linked to their parent WAL record, as I noted earlier
 today.  So as long as we have to mess with the WAL record format, I was
 wondering what else we could get done in the same change.

Is this a change that would be backpatched as you suggested previously?
It seems a rather large patch to change three things at once. Can the
backpatch wait until 8.1 has gone through beta to allow the changes to
be proven?

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-05-31 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Is this a change that would be backpatched as you suggested previously?

I don't think we can backpatch any of these items, since they involve
changes in the on-disk file format.  I was thinking of them as CVS HEAD
changes only.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]