subject:"Re\: \[HACKERS\] \[REVIEW\] Re\: Compression of full\-page\-writes"

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-11 Thread Michael Paquier

On Wed, Mar 11, 2015 at 3:57 PM, Fujii Masao  wrote:
> On Wed, Mar 11, 2015 at 7:08 AM, Rahila Syed  wrote:
>> Hello,
>>
>>>I have some minor comments
>>
>> The comments have been implemented in the attached patch.
>
> Thanks for updating the patch! I just changed a bit and finally pushed it.
> Thanks everyone involved in this patch!

Woohoo! Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-10 Thread Fujii Masao

On Wed, Mar 11, 2015 at 7:08 AM, Rahila Syed  wrote:
> Hello,
>
>>I have some minor comments
>
> The comments have been implemented in the attached patch.

Thanks for updating the patch! I just changed a bit and finally pushed it.
Thanks everyone involved in this patch!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-10 Thread Fujii Masao

On Mon, Mar 9, 2015 at 9:08 PM, Michael Paquier
 wrote:
> On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao  wrote:
>> On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier
>>  wrote:
>>> On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila  
>>> wrote:
 Please find attached  a patch. As discussed, flag to denote compression 
 and presence of hole in block image has been added in 
 XLogRecordImageHeader rather than block header.
>>
>> Thanks for updating the patch! Attached is the refactored version of the 
>> patch.
>
> Cool. Thanks!
>
> I have some minor comments:

Thanks for the comments!

> +Turning this parameter on can reduce the WAL volume without
> "Turning on this parameter

That tag is not used in other place in config.sgml, so I'm not sure if
that's really necessary.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-10 Thread Rahila Syed

Hello,

>I have some minor comments

The comments have been implemented in the attached patch.

>I think that extra parenthesis should be used for the first expression
>with BKPIMAGE_HAS_HOLE.
Parenthesis have been added to improve code readability.

Thank you,
Rahila Syed


On Mon, Mar 9, 2015 at 5:38 PM, Michael Paquier 
wrote:

> On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao  wrote:
> > On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier
> >  wrote:
> >> On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila 
> wrote:
> >>> Please find attached  a patch. As discussed, flag to denote
> compression and presence of hole in block image has been added in
> XLogRecordImageHeader rather than block header.
> >
> > Thanks for updating the patch! Attached is the refactored version of the
> patch.
>
> Cool. Thanks!
>
> I have some minor comments:
>
> +The default value is off
> Dot at the end of this sentence.
>
> +Turning this parameter on can reduce the WAL volume without
> "Turning on this parameter
>
> +but at the cost of some extra CPU time by the compression during
> +WAL logging and the decompression during WAL replay."
> Isn't a verb missing here, for something like that:
> "but at the cost of some extra CPU spent on the compression during WAL
> logging and on the decompression during WAL replay."
>
> + * This can reduce the WAL volume, but at some extra cost of CPU time
> + * by the compression during WAL logging.
> Er, similarly "some extra cost of CPU spent on the compression...".
>
> +   if (blk->bimg_info & BKPIMAGE_HAS_HOLE &&
> +   (blk->hole_offset == 0 ||
> +blk->hole_length == 0 ||
> I think that extra parenthesis should be used for the first expression
> with BKPIMAGE_HAS_HOLE.
>
> +   if (blk->bimg_info &
> BKPIMAGE_IS_COMPRESSED &&
> +   blk->bimg_len == BLCKSZ)
> +   {
> Same here.
>
> +   /*
> +* cross-check that hole_offset == 0
> and hole_length == 0
> +* if the HAS_HOLE flag is set.
> +*/
> I think that you mean here that this happens when the flag is *not* set.
>
> +   /*
> +* If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED,
> +* an XLogRecordBlockCompressHeader follows
> +*/
> Maybe a "struct" should be added for "an XLogRecordBlockCompressHeader
> struct". And a dot at the end of the sentence should be added?
>
> Regards,
> --
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Support-compression-full-page-writes-in-WAL_v25.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-10 Thread Michael Paquier

On Mon, Mar 9, 2015 at 9:08 PM, Michael Paquier wrote:
> On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao wrote:
>> Thanks for updating the patch! Attached is the refactored version of the 
>> patch.

Fujii-san and I had a short chat about tuning a bit the PGLZ strategy
which is now PGLZ_strategy_default in the patch (at least 25% of
compression, etc.). In particular min_input_size which is not set at
32B is too low, and knowing that the minimum fillfactor of a relation
page is 10% this looks really too low.

For example, using the extension attached to this email able to
compress and decompress bytea strings that I have developed after pglz
has been moved to libpqcommon (contains as well a function able to get
a relation page without its hole, feel free to use it), I am seeing
that we can gain quite a lot of space even with some incompressible
data like UUID or some random float data (pages are compressed without
their hole):
1) Float table:
=# create table float_tab (id float);
CREATE TABLE
=# insert into float_tab select random() from generate_series(1, 20);
INSERT 0 20
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('float_tab'::regclass, 0, false);
-[ RECORD 1 ]+
compress_size| 329
raw_size_no_hole | 744
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('float_tab'::regclass, 0, false);
-[ RECORD 1 ]+-
compress_size| 1753
raw_size_no_hole | 4344
So that's more or less 60% saved...
2) UUID table
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('uuid_tab'::regclass, 0, false);
-[ RECORD 1 ]+
compress_size| 590
raw_size_no_hole | 904
=# insert into uuid_tab select gen_random_uuid() from generate_series(1, 100);
INSERT 0 100
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('uuid_tab'::regclass, 0, false);
-[ RECORD 1 ]+-
compress_size| 3338
raw_size_no_hole | 5304
And in this case we are close to 40% saved...

At least, knowing that with the header there are at least 24B used on
a page, what about increasing min_input_size to something like 128B or
256B? I don't think that this is a blocker for this patch as most of
the relation pages are going to have far more data than that so they
will be unconditionally compressed, but there is definitely something
we could do in this area later on, perhaps even we could do
improvement with the other parameters like the compression rate. So
that's something to keep in mind...
-- 
Michael


compress_test.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-09 Thread Michael Paquier

On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao  wrote:
> On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier
>  wrote:
>> On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila  wrote:
>>> Please find attached  a patch. As discussed, flag to denote compression and 
>>> presence of hole in block image has been added in XLogRecordImageHeader 
>>> rather than block header.
>
> Thanks for updating the patch! Attached is the refactored version of the 
> patch.

Cool. Thanks!

I have some minor comments:

+The default value is off
Dot at the end of this sentence.

+Turning this parameter on can reduce the WAL volume without
"Turning on this parameter

+but at the cost of some extra CPU time by the compression during
+WAL logging and the decompression during WAL replay."
Isn't a verb missing here, for something like that:
"but at the cost of some extra CPU spent on the compression during WAL
logging and on the decompression during WAL replay."

+ * This can reduce the WAL volume, but at some extra cost of CPU time
+ * by the compression during WAL logging.
Er, similarly "some extra cost of CPU spent on the compression...".

+   if (blk->bimg_info & BKPIMAGE_HAS_HOLE &&
+   (blk->hole_offset == 0 ||
+blk->hole_length == 0 ||
I think that extra parenthesis should be used for the first expression
with BKPIMAGE_HAS_HOLE.

+   if (blk->bimg_info & BKPIMAGE_IS_COMPRESSED &&
+   blk->bimg_len == BLCKSZ)
+   {
Same here.

+   /*
+* cross-check that hole_offset == 0
and hole_length == 0
+* if the HAS_HOLE flag is set.
+*/
I think that you mean here that this happens when the flag is *not* set.

+   /*
+* If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED,
+* an XLogRecordBlockCompressHeader follows
+*/
Maybe a "struct" should be added for "an XLogRecordBlockCompressHeader
struct". And a dot at the end of the sentence should be added?

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-09 Thread Fujii Masao

On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier
 wrote:
> On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila  wrote:
>> Please find attached  a patch. As discussed, flag to denote compression and 
>> presence of hole in block image has been added in XLogRecordImageHeader 
>> rather than block header.

Thanks for updating the patch! Attached is the refactored version of the patch.

Regards,

-- 
Fujii Masao
*** a/contrib/pg_xlogdump/pg_xlogdump.c
--- b/contrib/pg_xlogdump/pg_xlogdump.c
***
*** 359,376  XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
  	rec_len = XLogRecGetDataLen(record) + SizeOfXLogRecord;
  
  	/*
! 	 * Calculate the amount of FPI data in the record. Each backup block
! 	 * takes up BLCKSZ bytes, minus the "hole" length.
  	 *
  	 * XXX: We peek into xlogreader's private decoded backup blocks for the
! 	 * hole_length. It doesn't seem worth it to add an accessor macro for
! 	 * this.
  	 */
  	fpi_len = 0;
  	for (block_id = 0; block_id <= record->max_block_id; block_id++)
  	{
  		if (XLogRecHasBlockImage(record, block_id))
! 			fpi_len += BLCKSZ - record->blocks[block_id].hole_length;
  	}
  
  	/* Update per-rmgr statistics */
--- 359,375 
  	rec_len = XLogRecGetDataLen(record) + SizeOfXLogRecord;
  
  	/*
! 	 * Calculate the amount of FPI data in the record.
  	 *
  	 * XXX: We peek into xlogreader's private decoded backup blocks for the
! 	 * bimg_len indicating the length of FPI data. It doesn't seem worth it to
! 	 * add an accessor macro for this.
  	 */
  	fpi_len = 0;
  	for (block_id = 0; block_id <= record->max_block_id; block_id++)
  	{
  		if (XLogRecHasBlockImage(record, block_id))
! 			fpi_len += record->blocks[block_id].bimg_len;
  	}
  
  	/* Update per-rmgr statistics */
***
*** 465,473  XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
     blk);
  			if (XLogRecHasBlockImage(record, block_id))
  			{
! printf(" (FPW); hole: offset: %u, length: %u\n",
! 	   record->blocks[block_id].hole_offset,
! 	   record->blocks[block_id].hole_length);
  			}
  			putchar('\n');
  		}
--- 464,485 
     blk);
  			if (XLogRecHasBlockImage(record, block_id))
  			{
! if (record->blocks[block_id].bimg_info &
! 	BKPIMAGE_IS_COMPRESSED)
! {
! 	printf(" (FPW); hole: offset: %u, length: %u, compression saved: %u\n",
! 		   record->blocks[block_id].hole_offset,
! 		   record->blocks[block_id].hole_length,
! 		   BLCKSZ -
! 		   record->blocks[block_id].hole_length -
! 		   record->blocks[block_id].bimg_len);
! }
! else
! {
! 	printf(" (FPW); hole: offset: %u, length: %u\n",
! 		   record->blocks[block_id].hole_offset,
! 		   record->blocks[block_id].hole_length);
! }
  			}
  			putchar('\n');
  		}
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2282,2287  include_dir 'conf.d'
--- 2282,2311 

   
  
+  
+   wal_compression (boolean)
+   
+wal_compression configuration parameter
+   
+   
+   
+
+ When this parameter is on, the PostgreSQL
+ server compresses a full page image written to WAL when
+  is on or during a base backup.
+ A compressed page image will be decompressed during WAL replay.
+ The default value is off
+
+ 
+
+ Turning this parameter on can reduce the WAL volume without
+ increasing the risk of unrecoverable data corruption,
+ but at the cost of some extra CPU time by the compression during
+ WAL logging and the decompression during WAL replay.
+
+   
+  
+ 
   
wal_buffers (integer)

*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 89,94  char	   *XLogArchiveCommand = NULL;
--- 89,95 
  bool		EnableHotStandby = false;
  bool		fullPageWrites = true;
  bool		wal_log_hints = false;
+ bool		wal_compression = false;
  bool		log_checkpoints = false;
  int			sync_method = DEFAULT_SYNC_METHOD;
  int			wal_level = WAL_LEVEL_MINIMAL;
*** a/src/backend/access/transam/xloginsert.c
--- b/src/backend/access/transam/xloginsert.c
***
*** 24,35 
--- 24,39 
  #include "access/xlog_internal.h"
  #include "access/xloginsert.h"
  #include "catalog/pg_control.h"
+ #include "common/pg_lzcompress.h"
  #include "miscadmin.h"
  #include "storage/bufmgr.h"
  #include "storage/proc.h"
  #include "utils/memutils.h"
  #include "pg_trace.h"
  
+ /* Buffer size required to store a compressed version of backup block image */
+ #define PGLZ_MAX_BLCKSZ	PGLZ_MAX_OUTPUT(BLCKSZ)
+ 
  /*
   * For each block reference registered with XLogRegisterBuffer, we fill in
   * a registered_buffer struct.
***
*** 50,55  typedef struct
--- 54,62 
  
  	XLogRecData bkp_rdatas[2];	/* temporary rdatas used to hold references to
   * backup block data

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-08 Thread Fujii Masao

On Mon, Feb 16, 2015 at 9:08 PM, Andres Freund  wrote:
> On 2015-02-16 20:55:20 +0900, Michael Paquier wrote:
>> On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila 
>> wrote:
>>
>> >
>> > Regarding the sanity checks that have been added recently. I think that
>> > they are useful but I am suspecting as well that only a check on the record
>> > CRC is done because that's reliable enough and not doing those checks
>> > accelerates a bit replay. So I am thinking that we should simply replace
>> > >them by assertions.
>> >
>> > Removing the checks makes sense as CRC ensures correctness . Moreover ,as
>> > error message for invalid length of record is present in the code ,
>> > messages for invalid block length can be redundant.
>> >
>> > Checks have been replaced by assertions in the attached patch.
>> >
>>
>> After more thinking, we may as well simply remove them, an error with CRC
>> having high chances to complain before reaching this point...
>
> Surely not. The existing code explicitly does it like
> if (blk->has_data && blk->data_len == 0)
> report_invalid_record(state,
>   "BKPBLOCK_HAS_DATA set, but no data included at 
> %X/%X",
>   (uint32) (state->ReadRecPtr >> 32), (uint32) 
> state->ReadRecPtr);
> these cross checks are important. And I see no reason to deviate from
> that. The CRC sum isn't foolproof - we intentionally do checks at
> several layers. And, as you can see from some other locations, we
> actually try to *not* fatally error out when hitting them at times - so
> an Assert also is wrong.
>
> Heikki:
> /* cross-check that the HAS_DATA flag is set iff data_length > 0 */
> if (blk->has_data && blk->data_len == 0)
> report_invalid_record(state,
>   "BKPBLOCK_HAS_DATA set, but no data included at 
> %X/%X",
>   (uint32) 
> (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr);
> if (!blk->has_data && blk->data_len != 0)
> report_invalid_record(state,
>  "BKPBLOCK_HAS_DATA not set, but data length is %u at %X/%X",
>   (unsigned int) 
> blk->data_len,
>   
> (uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr);
> those look like they're missing a goto err; to me.

Yes. I pushed the fix. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Fujii Masao

On Thu, Mar 5, 2015 at 10:28 PM, Andres Freund  wrote:
> On 2015-03-05 12:14:04 +, Syed, Rahila wrote:
>> Please find attached  a patch. As discussed, flag to denote
>> compression and presence of hole in block image has been added in
>> XLogRecordImageHeader rather than block header.
>
> FWIW, I personally won't commit it with things done that way. I think
> it's going the wrong way, leading to a harder to interpret and less
> flexible format.  I'm not going to further protest if Fujii or Heikki
> commit it this way though.

I'm pretty sure that we can discuss the *better* WAL format even after
committing this patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Andres Freund

On 2015-03-05 12:14:04 +, Syed, Rahila wrote:
> Please find attached  a patch. As discussed, flag to denote
> compression and presence of hole in block image has been added in
> XLogRecordImageHeader rather than block header.

FWIW, I personally won't commit it with things done that way. I think
it's going the wrong way, leading to a harder to interpret and less
flexible format.  I'm not going to further protest if Fujii or Heikki
commit it this way though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Michael Paquier

On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila  wrote:
> Please find attached  a patch. As discussed, flag to denote compression and 
> presence of hole in block image has been added in XLogRecordImageHeader 
> rather than block header.
>
> Following are WAL numbers based on attached  test script  posted by Michael 
> earlier in the thread.
>
>   WAL generated
> FPW compression on   122.032 MB
>
> FPW compression off   155.223 MB
>
> HEAD  155.236 MB
>
> Compression : 21 %
> Number of block images generated in WAL :   63637

ISTM that we are getting a nice thing here. I tested the patch and WAL
replay is working correctly.

Some nitpicky comments...

+ * bkp_info stores flags for information about the backup block image
+ * BKPIMAGE_IS_COMPRESSED is used to identify if a given block image
is compressed.
+ * BKPIMAGE_WITH_HOLE is used to identify the presence of a hole in a
block image.
+ * If the block image has no hole, it is ensured that the raw size of
a compressed
+ * block image is equal to BLCKSZ, hence the contents of
+ * XLogRecordBlockImageCompressionInfo are not necessary.
Take care of the limit of 80 characters per line. (Perhaps you could
run pgindent on your code before sending a patch?). The first line of
this paragraph is a sentence in itself, no?

In xlogreader.c, blk->with_hole is a boolean, you could remove the ==0
and ==1 it is compared with.

+  /*
+   * Length of a block image must be less than BLCKSZ
+   * if the block has hole
+   */
"if the block has a hole." (End of the sentence needs a dot.)

+   /*
+* Length of a block image must be equal to BLCKSZ
+* if the block does not have hole
+*/
"if the block does not have a hole."

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Syed, Rahila


Hello,

Please find attached  a patch. As discussed, flag to denote compression and 
presence of hole in block image has been added in XLogRecordImageHeader rather 
than block header.  

Following are WAL numbers based on attached  test script  posted by Michael 
earlier in the thread.
 
  WAL generated
FPW compression on   122.032 MB

FPW compression off   155.223 MB

HEAD  155.236 MB 

Compression : 21 %
Number of block images generated in WAL :   63637


Thank you,
Rahila Syed


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


Support-compression-of-full-page-writes-in-WAL_v23.patch
Description: Support-compression-of-full-page-writes-in-WAL_v23.patch


compress_run.bash
Description: compress_run.bash

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-04 Thread Syed, Rahila

Hello,

>Are there any other flag bits that we should or are planning to add into WAL 
>header newly, except the above two? If yes and they are required by even a 
>block which doesn't have an image, I will change my mind and agree to add 
>something like chunk ID to a block header. 
>But I guess the answer of the question is No. Since the flag bits now we are 
>thinking to add are required only by a block having an image, adding them into 
>a block header (instead of block image header) seems a waste of bytes in WAL. 
>So I concur with Michael.
I agree.
As per my understanding, this change of xlog format was to provide for future 
enhancement which would need flags relevant to entire block.
But as mentioned, currently the flags being added are related to block image 
only. Hence for this patch it makes sense to add a field to 
XLogRecordImageHeader rather than block header. 
This will also save bytes per WAL record. 

Thank you,
Rahila Syed

 


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-03 Thread Michael Paquier

On Wed, Mar 4, 2015 at 12:41 AM, Syed, Rahila  wrote:
> Please find attached updated patch with WAL replay error fixed. The patch 
> follows chunk ID approach of xlog format.

(Review done independently of the chunk_id stuff being good or not,
already gave my opinion on the matter).

  * readRecordBufSize is set to the new buffer size.
- *
+
The patch has some noise diffs.

You may want to change the values of BKPBLOCK_WILL_INIT and
BKPBLOCK_SAME_REL to respectively 0x01 and 0x02.

+   uint8   chunk_id = 0;
+   chunk_id |= XLR_CHUNK_BLOCK_REFERENCE;

Why not simply that:
chunk_id = XLR_CHUNK_BLOCK_REFERENCE;

+#define XLR_CHUNK_ID_DATA_SHORT255
+#define XLR_CHUNK_ID_DATA_LONG 254
Why aren't those just using one bit as well? This seems inconsistent
with the rest.

+ if ((blk->with_hole == 0 && blk->hole_offset != 0) ||
+ (blk->with_hole == 1 && blk->hole_offset <= 0))
In xlogreader.c blk->with_hole is defined as a boolean but compared
with an integer, could you remove the ==0 and ==1 portions for
clarity?

-   goto err;
+   goto err;
}
}
-
if (remaining != datatotal)
This gathers incorrect code alignment and unnecessary diffs.

 typedef struct XLogRecordBlockHeader
 {
+   /* Chunk ID precedes */
+
uint8   id;
What prevents the declaration of chunk_id as an int8 here instead of
this comment? This is confusing.

> Following are brief measurement numbers.
>   WAL
> FPW compression on   122.032 MB
> FPW compression off   155.239 MB
> HEAD   155.236 MB

What is the test run in this case? How many block images have been
generated in WAL for each case? You could gather some of those numbers
with pg_xlogdump --stat for example.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-03 Thread Fujii Masao

On Tue, Mar 3, 2015 at 9:34 AM, Michael Paquier
 wrote:
> On Tue, Mar 3, 2015 at 9:24 AM, Andres Freund  wrote:
>> On 2015-03-03 08:59:30 +0900, Michael Paquier wrote:
>>> Already mentioned upthread, but I agree with Fujii-san here: adding
>>> information related to the state of a block image in
>>> XLogRecordBlockHeader makes little sense because we are not sure to
>>> have a block image, perhaps there is only data associated to it, and
>>> that we should control that exclusively in XLogRecordBlockImageHeader
>>> and let the block ID alone for now.
>>
>> This argument doesn't make much sense to me. The flag byte could very
>> well indicate 'block reference without image following' vs 'block
>> reference with data + hole following' vs 'block reference with
>> compressed data following'.
>
> Information about the state of a block is decoupled with its
> existence, aka in the block header, we should control if:
> - record has data
> - record has a block
> And in the block image header, we control if the block is:
> - compressed or not
> - has a hole or not.

Are there any other flag bits that we should or are planning to add into
WAL header newly, except the above two? If yes and they are required by even
a block which doesn't have an image, I will change my mind and agree to
add something like chunk ID to a block header. But I guess the answer of the
question is No. Since the flag bits now we are thinking to add are required
only by a block having an image, adding them into a block header (instead of
block image header) seems a waste of bytes in WAL. So I concur with Michael.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-03 Thread Syed, Rahila

Hello,

>It would be good to get those problems fixed first. Could you send an updated 
>patch?

Please find attached updated patch with WAL replay error fixed. The patch 
follows chunk ID approach of xlog format.

Following are brief measurement numbers. 
  WAL
FPW compression on   122.032 MB

FPW compression off   155.239 MB

HEAD  155.236 MB


Thank you,
Rahila Syed


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-of-full-page-writes_v22.patch
Description: Support-compression-of-full-page-writes_v22.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Michael Paquier

On Tue, Mar 3, 2015 at 9:24 AM, Andres Freund  wrote:
> On 2015-03-03 08:59:30 +0900, Michael Paquier wrote:
>> Already mentioned upthread, but I agree with Fujii-san here: adding
>> information related to the state of a block image in
>> XLogRecordBlockHeader makes little sense because we are not sure to
>> have a block image, perhaps there is only data associated to it, and
>> that we should control that exclusively in XLogRecordBlockImageHeader
>> and let the block ID alone for now.
>
> This argument doesn't make much sense to me. The flag byte could very
> well indicate 'block reference without image following' vs 'block
> reference with data + hole following' vs 'block reference with
> compressed data following'.

Information about the state of a block is decoupled with its
existence, aka in the block header, we should control if:
- record has data
- record has a block
And in the block image header, we control if the block is:
- compressed or not
- has a hole or not.
Are you willing to sacrifice bytes in the block header to control if a
block is compressed or has a hole even if the block has only data but
no image?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Andres Freund

On 2015-03-03 08:59:30 +0900, Michael Paquier wrote:
> Already mentioned upthread, but I agree with Fujii-san here: adding
> information related to the state of a block image in
> XLogRecordBlockHeader makes little sense because we are not sure to
> have a block image, perhaps there is only data associated to it, and
> that we should control that exclusively in XLogRecordBlockImageHeader
> and let the block ID alone for now.

This argument doesn't make much sense to me. The flag byte could very
well indicate 'block reference without image following' vs 'block
reference with data + hole following' vs 'block reference with
compressed data following'.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Michael Paquier

On Tue, Mar 3, 2015 at 5:17 AM, Rahila Syed  wrote:
> Hello,
>
>>When I test the WAL or replication related features, I usually run
>>"make installcheck" and pgbench against the master at the same time
>>after setting up the replication environment.
> I will conduct these tests before sending updated version.
>
>>Seems this increases the header size of WAL record even if no backup block
>> image is included. Right?
> Yes, this increases the header size of WAL record by 1 byte for every block
> reference even if it has no backup block image.
>
>>Isn't it better to add the flag info about backup block image into
>> XLogRecordBlockImageHeader rather than XLogRecordBlockHeader
> Yes , this will make the code extensible,readable and  will save couple of
> bytes per record.
>  But the current approach is to provide a chunk ID identifying different
> xlog record fragments like main data , block references etc.
> Currently , block ID is used to identify record fragments which can be
> either XLR_BLOCK_ID_DATA_SHORT , XLR_BLOCK_ID_DATA_LONG or actual block ID.
> This can be replaced by chunk ID to separate it from block ID. Block ID can
> be used to number the block fragments whereas chunk ID can be used to
> distinguish between main data fragments and block references. Chunk ID of
> block references can contain information about presence of data, image ,
> hole and compression.
> Chunk ID for main data fragments remains as it is . This approach provides
> for readability and extensibility.

Already mentioned upthread, but I agree with Fujii-san here: adding
information related to the state of a block image in
XLogRecordBlockHeader makes little sense because we are not sure to
have a block image, perhaps there is only data associated to it, and
that we should control that exclusively in XLogRecordBlockImageHeader
and let the block ID alone for now. Hence we'd better have 1 extra
int8 in XLogRecordBlockImageHeader with now 2 flags:
- Is block compressed or not?
- Does block have a hole?
Perhaps this will not be considered as ugly, and this leaves plenty of
room for storing a version number for compression.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Rahila Syed

Hello,

>When I test the WAL or replication related features, I usually run
>"make installcheck" and pgbench against the master at the same time
>after setting up the replication environment.
I will conduct these tests before sending updated version.

>Seems this increases the header size of WAL record even if no backup block
image is included. Right?
Yes, this increases the header size of WAL record by 1 byte for every block
reference even if it has no backup block image.

>Isn't it better to add the flag info about backup block image into
XLogRecordBlockImageHeader rather than XLogRecordBlockHeader
Yes , this will make the code extensible,readable and  will save couple of
bytes per record.
 But the current approach is to provide a chunk ID identifying different
xlog record fragments like main data , block references etc.
Currently , block ID is used to identify record fragments which can be
either XLR_BLOCK_ID_DATA_SHORT , XLR_BLOCK_ID_DATA_LONG or actual block ID.
This can be replaced by chunk ID to separate it from block ID. Block ID can
be used to number the block fragments whereas chunk ID can be used to
distinguish between main data fragments and block references. Chunk ID of
block references can contain information about presence of data, image ,
hole and compression.
Chunk ID for main data fragments remains as it is . This approach provides
for readability and extensibility.

Thank you,
Rahila Syed

On Mon, Mar 2, 2015 at 3:43 PM, Fujii Masao  wrote:

> On Fri, Feb 27, 2015 at 12:44 PM, Michael Paquier
>  wrote:
> > On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier
> >  wrote:
> >> On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed 
> wrote:
> Even this patch doesn't work fine. The standby emit the following
> error messages.
> >>>
> >>> Yes this bug remains unsolved. I am still working on resolving this.
> >>>
> >>> Following chunk IDs have been added in the attached patch as suggested
> >>> upthread.
> >>> +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
> >>> +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
> >>> +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08
> >>>
> >>> XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
> >>> XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
> >>> and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.
> >>
> >> Before sending a new version, be sure that this get fixed by for
> >> example building up a master with a standby replaying WAL, and running
> >> make installcheck-world or similar. If the standby does not complain
> >> at all, you have good chances to not have bugs. You could also build
> >> with WAL_DEBUG to check record consistency.
>
> +1
>
> When I test the WAL or replication related features, I usually run
> "make installcheck" and pgbench against the master at the same time
> after setting up the replication environment.
>
>  typedef struct XLogRecordBlockHeader
>  {
> +uint8chunk_id;/* xlog fragment id */
>  uint8id;/* block reference ID */
>
> Seems this increases the header size of WAL record even if no backup block
> image is included. Right? Isn't it better to add the flag info about backup
> block image into XLogRecordBlockImageHeader rather than
> XLogRecordBlockHeader?
> Originally we borrowed one or two bits from its existing fields to minimize
> the header size, but we can just add new flag field if we prefer
> the extensibility and readability of the code.
>
> Regards,
>
> --
> Fujii Masao
>

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-02 Thread Fujii Masao

On Fri, Feb 27, 2015 at 12:44 PM, Michael Paquier
 wrote:
> On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier
>  wrote:
>> On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed  wrote:
Even this patch doesn't work fine. The standby emit the following
error messages.
>>>
>>> Yes this bug remains unsolved. I am still working on resolving this.
>>>
>>> Following chunk IDs have been added in the attached patch as suggested
>>> upthread.
>>> +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
>>> +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
>>> +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08
>>>
>>> XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
>>> XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
>>> and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.
>>
>> Before sending a new version, be sure that this get fixed by for
>> example building up a master with a standby replaying WAL, and running
>> make installcheck-world or similar. If the standby does not complain
>> at all, you have good chances to not have bugs. You could also build
>> with WAL_DEBUG to check record consistency.

+1

When I test the WAL or replication related features, I usually run
"make installcheck" and pgbench against the master at the same time
after setting up the replication environment.

 typedef struct XLogRecordBlockHeader
 {
+uint8chunk_id;/* xlog fragment id */
 uint8id;/* block reference ID */

Seems this increases the header size of WAL record even if no backup block
image is included. Right? Isn't it better to add the flag info about backup
block image into XLogRecordBlockImageHeader rather than XLogRecordBlockHeader?
Originally we borrowed one or two bits from its existing fields to minimize
the header size, but we can just add new flag field if we prefer
the extensibility and readability of the code.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-26 Thread Michael Paquier

On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier
 wrote:
> On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed  wrote:
>>>Even this patch doesn't work fine. The standby emit the following
>>>error messages.
>>
>> Yes this bug remains unsolved. I am still working on resolving this.
>>
>> Following chunk IDs have been added in the attached patch as suggested
>> upthread.
>> +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
>> +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
>> +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08
>>
>> XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
>> XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
>> and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.
>
> Before sending a new version, be sure that this get fixed by for
> example building up a master with a standby replaying WAL, and running
> make installcheck-world or similar. If the standby does not complain
> at all, you have good chances to not have bugs. You could also build
> with WAL_DEBUG to check record consistency.

It would be good to get those problems fixed first. Could you send an
updated patch? I'll look into it in more details. For the time being I
am switching this patch to "Waiting on Author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-26 Thread Michael Paquier

On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed  wrote:
> Hello,
>
>>Even this patch doesn't work fine. The standby emit the following
>>error messages.
>
> Yes this bug remains unsolved. I am still working on resolving this.
>
> Following chunk IDs have been added in the attached patch as suggested
> upthread.
> +#define XLR_CHUNK_BLOCK_REFERENCE  0x10
> +#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
> +#define XLR_CHUNK_BLOCK_HAS_DATA   0x08
>
> XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
> XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
> and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.

Before sending a new version, be sure that this get fixed by for
example building up a master with a standby replaying WAL, and running
make installcheck-world or similar. If the standby does not complain
at all, you have good chances to not have bugs. You could also build
with WAL_DEBUG to check record consistency.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-26 Thread Rahila Syed

Hello,

>Even this patch doesn't work fine. The standby emit the following
>error messages.

Yes this bug remains unsolved. I am still working on resolving this.

Following chunk IDs have been added in the attached patch as suggested
upthread.
+#define XLR_CHUNK_BLOCK_REFERENCE  0x10
+#define XLR_CHUNK_BLOCK_HAS_IMAGE  0x04
+#define XLR_CHUNK_BLOCK_HAS_DATA   0x08

XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references.
XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE
and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA.


Thank you,
Rahila Syed


Support-compression-of-full-page-writes-in-WAL_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-26 Thread Fujii Masao

On Tue, Feb 24, 2015 at 6:46 PM, Syed, Rahila  wrote:
> Hello ,
>
>>I've not read this logic yet, but ISTM there is a bug in that new WAL format 
>>because I got the following error and the startup process could not replay 
>>any WAL records when I set up replication and enabled wal_compression.
>
>>LOG:  record with invalid length at 0/3B0
>>LOG:  record with invalid length at 0/3000518
>>LOG:  Invalid block length in record 0/30005A0
>>LOG:  Invalid block length in record 0/3000D60 ...
>
> Please fine attached patch which replays WAL records.

Even this patch doesn't work fine. The standby emit the following
error messages.

LOG:  invalid block_id 255 at 0/3B0
LOG:  record with invalid length at 0/30017F0
LOG:  invalid block_id 255 at 0/3001878
LOG:  record with invalid length at 0/30027D0
LOG:  record with invalid length at 0/3002E58
...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-24 Thread Andres Freund

On 2015-02-24 16:03:41 +0900, Michael Paquier wrote:
> Looking at this code, I think that it is really confusing to move the data
> related to the status of the backup block out of XLogRecordBlockImageHeader
> to the chunk ID itself that may *not* include a backup block at all as it
> is conditioned by the presence of BKPBLOCK_HAS_IMAGE.

What's the problem here? We could actually now easily remove
BKPBLOCK_HAS_IMAGE and replace it by a chunk id.

> the idea of having the backup block data in its dedicated header with bits
> stolen from the existing fields, perhaps by rewriting it to something like
> that:
> typedef struct XLogRecordBlockImageHeader {
> uint32 length:15,
>  hole_length:15,
>  is_compressed:1,
>  is_hole:1;
> } XLogRecordBlockImageHeader;
> Now perhaps I am missing something and this is really "ugly" ;)

I think it's fantastically ugly. We'll also likely want different
compression formats and stuff in the not too far away future. This will
just end up being a pain.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-24 Thread Syed, Rahila

Hello ,

>I've not read this logic yet, but ISTM there is a bug in that new WAL format 
>because I got the following error and the startup process could not replay any 
>WAL records when I set up replication and enabled wal_compression.

>LOG:  record with invalid length at 0/3B0
>LOG:  record with invalid length at 0/3000518
>LOG:  Invalid block length in record 0/30005A0
>LOG:  Invalid block length in record 0/3000D60 ...

Please fine attached patch which replays WAL records.

Thank you,
Rahila Syed

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Fujii Masao
Sent: Monday, February 23, 2015 5:52 PM
To: Rahila Syed
Cc: PostgreSQL-development; Andres Freund; Michael Paquier
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Mon, Feb 23, 2015 at 5:28 PM, Rahila Syed  wrote:
> Hello,
>
> Attached is a patch which has following changes,
>
> As suggested above block ID in xlog structs has been replaced by chunk ID.
> Chunk ID is used to distinguish between different types of xlog record 
> fragments.
> Like,
> XLR_CHUNK_ID_DATA_SHORT
> XLR_CHUNK_ID_DATA_LONG
> XLR_CHUNK_BKP_COMPRESSED
> XLR_CHUNK_BKP_WITH_HOLE
>
> In block references, block ID follows the chunk ID. Here block ID 
> retains its functionality.
> This approach increases data by 1 byte for each block reference in an 
> xlog record. This approach separates ID referring different fragments 
> of xlog record from the actual block ID which is used to refer  block 
> references in xlog record.

I've not read this logic yet, but ISTM there is a bug in that new WAL format 
because I got the following error and the startup process could not replay any 
WAL records when I set up replication and enabled wal_compression.

LOG:  record with invalid length at 0/3B0
LOG:  record with invalid length at 0/3000518
LOG:  Invalid block length in record 0/30005A0
LOG:  Invalid block length in record 0/3000D60 ...

Regards,

--
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


Support-compression-for-full-page-writes-in-WAL_v20.patch
Description: Support-compression-for-full-page-writes-in-WAL_v20.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-23 Thread Michael Paquier

On Mon, Feb 23, 2015 at 9:22 PM, Fujii Masao  wrote:

> On Mon, Feb 23, 2015 at 5:28 PM, Rahila Syed 
> wrote:
> > Hello,
> >
> > Attached is a patch which has following changes,
> >
> > As suggested above block ID in xlog structs has been replaced by chunk
> ID.
> > Chunk ID is used to distinguish between different types of xlog record
> > fragments.
> > Like,
> > XLR_CHUNK_ID_DATA_SHORT
> > XLR_CHUNK_ID_DATA_LONG
> > XLR_CHUNK_BKP_COMPRESSED
> > XLR_CHUNK_BKP_WITH_HOLE
> >
> > In block references, block ID follows the chunk ID. Here block ID retains
> > its functionality.
> > This approach increases data by 1 byte for each block reference in an
> xlog
> > record. This approach separates ID referring different fragments of xlog
> > record from the actual block ID which is used to refer  block references
> in
> > xlog record.
>
> I've not read this logic yet, but ISTM there is a bug in that new WAL
> format
> because I got the following error and the startup process could not replay
> any WAL records when I set up replication and enabled wal_compression.
>
> LOG:  record with invalid length at 0/3B0
> LOG:  record with invalid length at 0/3000518
> LOG:  Invalid block length in record 0/30005A0
> LOG:  Invalid block length in record 0/3000D60
>

Looking at this code, I think that it is really confusing to move the data
related to the status of the backup block out of XLogRecordBlockImageHeader
to the chunk ID itself that may *not* include a backup block at all as it
is conditioned by the presence of BKPBLOCK_HAS_IMAGE. I would still prefer
the idea of having the backup block data in its dedicated header with bits
stolen from the existing fields, perhaps by rewriting it to something like
that:
typedef struct XLogRecordBlockImageHeader {
uint32 length:15,
 hole_length:15,
 is_compressed:1,
 is_hole:1;
} XLogRecordBlockImageHeader;
Now perhaps I am missing something and this is really "ugly" ;)

+#define XLR_CHUNK_ID_DATA_SHORT255
+#define XLR_CHUNK_ID_DATA_LONG 254
+#define XLR_CHUNK_BKP_COMPRESSED   0x01
+#define XLR_CHUNK_BKP_WITH_HOLE0x02
Wouldn't we need a XLR_CHUNK_ID_BKP_HEADER or equivalent? The idea between
this chunk_id stuff it to be able to make the difference between a short
header, a long header and a backup block header by looking at the first
byte.

The comments on top of XLogRecordBlockImageHeader are still mentioning the
old parameters like with_hole or is_compressed that you have removed.

It seems as well that there is some noise:
-   lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h).
+   lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h)
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-23 Thread Fujii Masao

On Mon, Feb 23, 2015 at 5:28 PM, Rahila Syed  wrote:
> Hello,
>
> Attached is a patch which has following changes,
>
> As suggested above block ID in xlog structs has been replaced by chunk ID.
> Chunk ID is used to distinguish between different types of xlog record
> fragments.
> Like,
> XLR_CHUNK_ID_DATA_SHORT
> XLR_CHUNK_ID_DATA_LONG
> XLR_CHUNK_BKP_COMPRESSED
> XLR_CHUNK_BKP_WITH_HOLE
>
> In block references, block ID follows the chunk ID. Here block ID retains
> its functionality.
> This approach increases data by 1 byte for each block reference in an xlog
> record. This approach separates ID referring different fragments of xlog
> record from the actual block ID which is used to refer  block references in
> xlog record.

I've not read this logic yet, but ISTM there is a bug in that new WAL format
because I got the following error and the startup process could not replay
any WAL records when I set up replication and enabled wal_compression.

LOG:  record with invalid length at 0/3B0
LOG:  record with invalid length at 0/3000518
LOG:  Invalid block length in record 0/30005A0
LOG:  Invalid block length in record 0/3000D60
...

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-23 Thread Rahila Syed

Hello,

Attached is a patch which has following changes,

As suggested above block ID in xlog structs has been replaced by chunk ID.
Chunk ID is used to distinguish between different types of xlog record
fragments.
Like,
XLR_CHUNK_ID_DATA_SHORT
XLR_CHUNK_ID_DATA_LONG
XLR_CHUNK_BKP_COMPRESSED
XLR_CHUNK_BKP_WITH_HOLE

In block references, block ID follows the chunk ID. Here block ID retains
its functionality.
This approach increases data by 1 byte for each block reference in an xlog
record. This approach separates ID referring different fragments of xlog
record from the actual block ID which is used to refer  block references in
xlog record.

Following are WAL numbers for each scenario,

 WAL
FPW compression on   121.652 MB

FPW compression off   148.998 MB

HEAD  148.764 MB

Compression remains nearly same as before. There is some difference in WAL
between HEAD and HEAD+patch+compression OFF. This difference corresponds to
1 byte increase with each block reference of xlog record.

Thank you,
Rahila Syed





On Wed, Feb 18, 2015 at 7:53 PM, Syed, Rahila 
wrote:

> Hello,
>
> >I think we should change the xlog format so that the block_id (which
> currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the
> block id but something like XLR_CHUNK_ID. Which is used as is for
> XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
> >XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
> XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
> block id following the chunk id.
>
> >Yes, that'll increase the amount of data for a backup block by 1 byte,
> but I think that's worth it. I'm pretty sure we will be happy about the
> added extensibility pretty soon.
>
> To clarify my understanding of the above change,
>
> Instead of a block id to reference different fragments of an xlog record ,
> a single byte field "chunk_id"  should be used.  chunk_id  will be same as
> XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments.
> But for block references, it will take store following values in order to
> store information about the backup blocks.
> #define XLR_CHUNK_BKP_COMPRESSED  0x01
> #define XLR_CHUNK_BKP_WITH_HOLE 0x02
> ...
>
> The new xlog format should look like follows,
>
> Fixed-size header (XLogRecord struct)
> Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
> XLogRecordBlockHeader
> Chunk_id
>  XLogRecordBlockHeader
> ...
> ...
> Chunk_id ( rename id field of the XLogRecordDataHeader struct)
> XLogRecordDataHeader[Short|Long]
>  block data
>  block data
>  ...
>  main data
>
> I will post a patch based on this.
>
> Thank you,
> Rahila Syed
>
> -Original Message-
> From: Andres Freund [mailto:and...@2ndquadrant.com]
> Sent: Monday, February 16, 2015 5:26 PM
> To: Syed, Rahila
> Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
> Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
>
> On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
> > - * As a trivial form of data compression, the XLOG code is aware that
> > - * PG data pages usually contain an unused "hole" in the middle,
> > which
> > - * contains only zero bytes.  If hole_length > 0 then we have removed
> > - * such a "hole" from the stored data (and it's not counted in the
> > - * XLOG record's CRC, either).  Hence, the amount of block data
> > actually
> > - * present is BLCKSZ - hole_length bytes.
> > + * Block images are able to do several types of compression:
> > + * - When wal_compression is off, as a trivial form of compression,
> > + the
> > + * XLOG code is aware that PG data pages usually contain an unused
> "hole"
> > + * in the middle, which contains only zero bytes.  If length < BLCKSZ
> > + * then we have removed such a "hole" from the stored data (and it is
> > + * not counted in the XLOG record's CRC, either).  Hence, the amount
> > + * of block data actually present is "length" bytes.  The hole "offset"
> > + * on page is defined using "hole_offset".
> > + * - When wal_compression is on, block images are compressed using a
> > + * compression algorithm without their hole to improve compression
> > + * process of the page. "length" corresponds in this case to the
> > + length
> > + * of the compressed block. "hole_offset" is the hole offset of the
> > + page,
> > + * and the length of the uncompressed block is defined by
> > + "raw_length&

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-18 Thread Michael Paquier

On Mon, Feb 16, 2015 at 8:55 PM, Andres Freund  wrote:
> On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
>> - * As a trivial form of data compression, the XLOG code is aware that
>> - * PG data pages usually contain an unused "hole" in the middle, which
>> - * contains only zero bytes.  If hole_length > 0 then we have removed
>> - * such a "hole" from the stored data (and it's not counted in the
>> - * XLOG record's CRC, either).  Hence, the amount of block data actually
>> - * present is BLCKSZ - hole_length bytes.
>> + * Block images are able to do several types of compression:
>> + * - When wal_compression is off, as a trivial form of compression, the
>> + * XLOG code is aware that PG data pages usually contain an unused "hole"
>> + * in the middle, which contains only zero bytes.  If length < BLCKSZ
>> + * then we have removed such a "hole" from the stored data (and it is
>> + * not counted in the XLOG record's CRC, either).  Hence, the amount
>> + * of block data actually present is "length" bytes.  The hole "offset"
>> + * on page is defined using "hole_offset".
>> + * - When wal_compression is on, block images are compressed using a
>> + * compression algorithm without their hole to improve compression
>> + * process of the page. "length" corresponds in this case to the length
>> + * of the compressed block. "hole_offset" is the hole offset of the page,
>> + * and the length of the uncompressed block is defined by "raw_length",
>> + * whose data is included in the record only when compression is enabled
>> + * and "with_hole" is set to true, see below.
>> + *
>> + * "is_compressed" is used to identify if a given block image is compressed
>> + * or not. Maximum page size allowed on the system being 32k, the hole
>> + * offset cannot be more than 15-bit long so the last free bit is used to
>> + * store the compression state of block image. If the maximum page size
>> + * allowed is increased to a value higher than that, we should consider
>> + * increasing this structure size as well, but this would increase the
>> + * length of block header in WAL records with alignment.
>> + *
>> + * "with_hole" is used to identify the presence of a hole in a block image.
>> + * As the length of a block cannot be more than 15-bit long, the extra bit 
>> in
>> + * the length field is used for this identification purpose. If the block 
>> image
>> + * has no hole, it is ensured that the raw size of a compressed block image 
>> is
>> + * equal to BLCKSZ, hence the contents of 
>> XLogRecordBlockImageCompressionInfo
>> + * are not necessary.
>>   */
>>  typedef struct XLogRecordBlockImageHeader
>>  {
>> - uint16  hole_offset;/* number of bytes before "hole" */
>> - uint16  hole_length;/* number of bytes in "hole" */
>> + uint16  length:15,  /* length of block data in 
>> record */
>> + with_hole:1;/* status of hole in the block 
>> */
>> +
>> + uint16  hole_offset:15, /* number of bytes before "hole" */
>> + is_compressed:1;/* compression status of image */
>> +
>> + /* Followed by the data related to compression if block is compressed 
>> */
>>  } XLogRecordBlockImageHeader;
>
> Yikes, this is ugly.
>
> I think we should change the xlog format so that the block_id (which
> currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't
> the block id but something like XLR_CHUNK_ID. Which is used as is for
> XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
> XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
> XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
> block id following the chunk id.
> Yes, that'll increase the amount of data for a backup block by 1 byte,
> but I think that's worth it. I'm pretty sure we will be happy about the
> added extensibility pretty soon.

Yeah, that would help for readability and does not cost much compared
to BLCKSZ. Still could you explain what kind of extensibility you have
in mind except code readability? It is hard to make a nice picture
with only the paper and the pencils, and the current patch approach
has been taken to minimize the record length, particularly for users
who do not care about WAL compression.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-18 Thread Syed, Rahila

Hello,

>I think we should change the xlog format so that the block_id (which currently 
>is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but 
>something like XLR_CHUNK_ID. Which is used as is for 
>XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to 
>>XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... 
>The BKP blocks will then follow, storing the block id following the chunk id.

>Yes, that'll increase the amount of data for a backup block by 1 byte, but I 
>think that's worth it. I'm pretty sure we will be happy about the added 
>extensibility pretty soon.

To clarify my understanding of the above change,

Instead of a block id to reference different fragments of an xlog record , a 
single byte field "chunk_id"  should be used.  chunk_id  will be same as 
XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments. 
But for block references, it will take store following values in order to store 
information about the backup blocks.
#define XLR_CHUNK_BKP_COMPRESSED  0x01  
#define XLR_CHUNK_BKP_WITH_HOLE 0x02
...

The new xlog format should look like follows,

Fixed-size header (XLogRecord struct)
Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
XLogRecordBlockHeader 
Chunk_id
 XLogRecordBlockHeader 
...
...
Chunk_id ( rename id field of the XLogRecordDataHeader struct) 
XLogRecordDataHeader[Short|Long] 
 block data
 block data
 ...
 main data

I will post a patch based on this.

Thank you,
Rahila Syed

-Original Message-
From: Andres Freund [mailto:and...@2ndquadrant.com] 
Sent: Monday, February 16, 2015 5:26 PM
To: Syed, Rahila
Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
> - * As a trivial form of data compression, the XLOG code is aware that
> - * PG data pages usually contain an unused "hole" in the middle, 
> which
> - * contains only zero bytes.  If hole_length > 0 then we have removed
> - * such a "hole" from the stored data (and it's not counted in the
> - * XLOG record's CRC, either).  Hence, the amount of block data 
> actually
> - * present is BLCKSZ - hole_length bytes.
> + * Block images are able to do several types of compression:
> + * - When wal_compression is off, as a trivial form of compression, 
> + the
> + * XLOG code is aware that PG data pages usually contain an unused "hole"
> + * in the middle, which contains only zero bytes.  If length < BLCKSZ
> + * then we have removed such a "hole" from the stored data (and it is
> + * not counted in the XLOG record's CRC, either).  Hence, the amount
> + * of block data actually present is "length" bytes.  The hole "offset"
> + * on page is defined using "hole_offset".
> + * - When wal_compression is on, block images are compressed using a
> + * compression algorithm without their hole to improve compression
> + * process of the page. "length" corresponds in this case to the 
> + length
> + * of the compressed block. "hole_offset" is the hole offset of the 
> + page,
> + * and the length of the uncompressed block is defined by 
> + "raw_length",
> + * whose data is included in the record only when compression is 
> + enabled
> + * and "with_hole" is set to true, see below.
> + *
> + * "is_compressed" is used to identify if a given block image is 
> + compressed
> + * or not. Maximum page size allowed on the system being 32k, the 
> + hole
> + * offset cannot be more than 15-bit long so the last free bit is 
> + used to
> + * store the compression state of block image. If the maximum page 
> + size
> + * allowed is increased to a value higher than that, we should 
> + consider
> + * increasing this structure size as well, but this would increase 
> + the
> + * length of block header in WAL records with alignment.
> + *
> + * "with_hole" is used to identify the presence of a hole in a block image.
> + * As the length of a block cannot be more than 15-bit long, the 
> + extra bit in
> + * the length field is used for this identification purpose. If the 
> + block image
> + * has no hole, it is ensured that the raw size of a compressed block 
> + image is
> + * equal to BLCKSZ, hence the contents of 
> + XLogRecordBlockImageCompressionInfo
> + * are not necessary.
>   */
>  typedef struct XLogRecordBlockImageHeader  {
> - uint16  hole_offset;/* number of bytes before "hole" */
> - uint16  hole_length;/* number of bytes in "hole" */
&g

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Andres Freund

On 2015-02-16 20:55:20 +0900, Michael Paquier wrote:
> On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila 
> wrote:
> 
> >
> > Regarding the sanity checks that have been added recently. I think that
> > they are useful but I am suspecting as well that only a check on the record
> > CRC is done because that's reliable enough and not doing those checks
> > accelerates a bit replay. So I am thinking that we should simply replace
> > >them by assertions.
> >
> > Removing the checks makes sense as CRC ensures correctness . Moreover ,as
> > error message for invalid length of record is present in the code ,
> > messages for invalid block length can be redundant.
> >
> > Checks have been replaced by assertions in the attached patch.
> >
> 
> After more thinking, we may as well simply remove them, an error with CRC
> having high chances to complain before reaching this point...

Surely not. The existing code explicitly does it like
if (blk->has_data && blk->data_len == 0)
report_invalid_record(state,
  "BKPBLOCK_HAS_DATA set, but no data included at 
%X/%X",
  (uint32) (state->ReadRecPtr >> 32), (uint32) 
state->ReadRecPtr);
these cross checks are important. And I see no reason to deviate from
that. The CRC sum isn't foolproof - we intentionally do checks at
several layers. And, as you can see from some other locations, we
actually try to *not* fatally error out when hitting them at times - so
an Assert also is wrong.

Heikki:
/* cross-check that the HAS_DATA flag is set iff data_length > 0 */
if (blk->has_data && blk->data_len == 0)
report_invalid_record(state,
  "BKPBLOCK_HAS_DATA set, but no data included at 
%X/%X",
  (uint32) 
(state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr);
if (!blk->has_data && blk->data_len != 0)
report_invalid_record(state,
 "BKPBLOCK_HAS_DATA not set, but data length is %u at %X/%X",
  (unsigned int) 
blk->data_len,
  
(uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr);
those look like they're missing a goto err; to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Andres Freund

On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
> - * As a trivial form of data compression, the XLOG code is aware that
> - * PG data pages usually contain an unused "hole" in the middle, which
> - * contains only zero bytes.  If hole_length > 0 then we have removed
> - * such a "hole" from the stored data (and it's not counted in the
> - * XLOG record's CRC, either).  Hence, the amount of block data actually
> - * present is BLCKSZ - hole_length bytes.
> + * Block images are able to do several types of compression:
> + * - When wal_compression is off, as a trivial form of compression, the
> + * XLOG code is aware that PG data pages usually contain an unused "hole"
> + * in the middle, which contains only zero bytes.  If length < BLCKSZ
> + * then we have removed such a "hole" from the stored data (and it is
> + * not counted in the XLOG record's CRC, either).  Hence, the amount
> + * of block data actually present is "length" bytes.  The hole "offset"
> + * on page is defined using "hole_offset".
> + * - When wal_compression is on, block images are compressed using a
> + * compression algorithm without their hole to improve compression
> + * process of the page. "length" corresponds in this case to the length
> + * of the compressed block. "hole_offset" is the hole offset of the page,
> + * and the length of the uncompressed block is defined by "raw_length",
> + * whose data is included in the record only when compression is enabled
> + * and "with_hole" is set to true, see below.
> + *
> + * "is_compressed" is used to identify if a given block image is compressed
> + * or not. Maximum page size allowed on the system being 32k, the hole
> + * offset cannot be more than 15-bit long so the last free bit is used to
> + * store the compression state of block image. If the maximum page size
> + * allowed is increased to a value higher than that, we should consider
> + * increasing this structure size as well, but this would increase the
> + * length of block header in WAL records with alignment.
> + *
> + * "with_hole" is used to identify the presence of a hole in a block image.
> + * As the length of a block cannot be more than 15-bit long, the extra bit in
> + * the length field is used for this identification purpose. If the block 
> image
> + * has no hole, it is ensured that the raw size of a compressed block image 
> is
> + * equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo
> + * are not necessary.
>   */
>  typedef struct XLogRecordBlockImageHeader
>  {
> - uint16  hole_offset;/* number of bytes before "hole" */
> - uint16  hole_length;/* number of bytes in "hole" */
> + uint16  length:15,  /* length of block data in 
> record */
> + with_hole:1;/* status of hole in the block 
> */
> +
> + uint16  hole_offset:15, /* number of bytes before "hole" */
> + is_compressed:1;/* compression status of image */
> +
> + /* Followed by the data related to compression if block is compressed */
>  } XLogRecordBlockImageHeader;

Yikes, this is ugly.

I think we should change the xlog format so that the block_id (which
currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't
the block id but something like XLR_CHUNK_ID. Which is used as is for
XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
block id following the chunk id.

Yes, that'll increase the amount of data for a backup block by 1 byte,
but I think that's worth it. I'm pretty sure we will be happy about the
added extensibility pretty soon.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Michael Paquier

On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila 
wrote:

>
> Regarding the sanity checks that have been added recently. I think that
> they are useful but I am suspecting as well that only a check on the record
> CRC is done because that's reliable enough and not doing those checks
> accelerates a bit replay. So I am thinking that we should simply replace
> >them by assertions.
>
> Removing the checks makes sense as CRC ensures correctness . Moreover ,as
> error message for invalid length of record is present in the code ,
> messages for invalid block length can be redundant.
>
> Checks have been replaced by assertions in the attached patch.
>

After more thinking, we may as well simply remove them, an error with CRC
having high chances to complain before reaching this point...



> Current
>
> if ((hole_length != 0) &&
>
> +  (*len >= orig_len -
> SizeOfXLogRecordBlockImageCompressionInfo))
>
> +  return false;
>
> +return true
>

This makes sense.

Nitpicking 1:
+Assert(!(blk->with_hole == 1  && blk->hole_offset <= 0));
Double-space here.

Nitpicking 2:
char * page
This should be rewritten as char *page, the "*" being assigned with the
variable name.
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Syed, Rahila

Hello,

Thank you for reviewing and testing the patch.

>+   /* leave if data cannot be compressed */
>+   if (compressed_len == 0)
>+   return false;
>This should be < 0, pglz_compress returns -1 when compression fails.
>
>+   if (pglz_decompress(block_image, bkpb->bkp_len, 
>record->uncompressBuf,
>+   
>bkpb->bkp_uncompress_len) == 0)
>Similarly, this should be < 0.

These have been corrected in the attached.

>Regarding the sanity checks that have been added recently. I think that they 
>are useful but I am suspecting as well that only a check on the record CRC is 
>done because that's reliable enough and not doing those checks accelerates a 
>bit replay. So I am thinking that we should simply replace >them by assertions.
Removing the checks makes sense as CRC ensures correctness . Moreover ,as error 
message for invalid length of record is present in the code , messages for 
invalid block length can be redundant.
Checks have been replaced by assertions in the attached patch.

Following if  condition in XLogCompressBackupBlock has been modified as follows

Previous
/*
+  * We recheck the actual size even if pglz_compress() reports success 
and
+  * see if at least 2 bytes of length have been saved, as this 
corresponds
+  * to the additional amount of data stored in WAL record for a 
compressed
+  * block via raw_length when block contains hole..
+  */
+  *len = (uint16) compressed_len;
+  if (*len >= orig_len - SizeOfXLogRecordBlockImageCompressionInfo)
+  return false;
+  return true;


Current
if ((hole_length != 0) &&
+  (*len >= orig_len - 
SizeOfXLogRecordBlockImageCompressionInfo))
+  return false;
+return true

This is because the extra information raw_length is included only if compressed 
block has hole in it.

>Once we get those small issues fixes, I think that it is with having a 
>committer look at this patch, presumably Fujii-san
Agree. I will mark this patch as ready for committer

Thank you,
Rahila Syed

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v19.patch
Description: Support-compression-for-full-page-writes-in-WAL_v19.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-12 Thread Michael Paquier

On Thu, Feb 12, 2015 at 8:08 PM, Syed, Rahila 
wrote:
>
>
>
> Thank you for comments. Please find attached the updated patch.
>
>
>
> >This patch fails to compile:
> >xlogreader.c:1049:46: error: extraneous ')' after condition, expected a
statement
> >blk->with_hole &&
blk->hole_offset <= 0))
>
> This has been rectified.
>
>
>
> >Note as well that at least clang does not like much how the sanity check
with with_hole are done. You should place parentheses around the '&&'
expressions. Also, I would rather define with_hole == 0 or with_hole == 1
explicitly int those checks
>
> The expressions are modified accordingly.
>
>
>
> >There is a typo:
>
> >s/true,see/true, see/
>
> >[nitpicky]Be as well aware of the 80-character limit per line that is
usually normally by comment blocks.[/]
>
>
>
> Have corrected the typos and changed the comments as mentioned. Also ,
realigned certain lines to meet the 80-char limit.


Thanks for the updated patch.

+   /* leave if data cannot be compressed */
+   if (compressed_len == 0)
+   return false;
This should be < 0, pglz_compress returns -1 when compression fails.

+   if (pglz_decompress(block_image, bkpb->bkp_len,
record->uncompressBuf,
+
bkpb->bkp_uncompress_len) == 0)
Similarly, this should be < 0.

Regarding the sanity checks that have been added recently. I think that
they are useful but I am suspecting as well that only a check on the record
CRC is done because that's reliable enough and not doing those checks
accelerates a bit replay. So I am thinking that we should simply replace
them by assertions.

I have as well re-run my small test case, with the following results
(scripts and results attached)
=# select test, user_diff,system_diff, pg_size_pretty(pre_update -
pre_insert),
pg_size_pretty(post_update - pre_update) from results;
  test   | user_diff | system_diff | pg_size_pretty | pg_size_pretty
-+---+-++
 FPW on  | 46.134564 |0.823306 | 429 MB | 566 MB
 FPW on  | 16.307575 |0.798591 | 171 MB | 229 MB
 FPW on  |  8.325136 |0.848390 | 86 MB  | 116 MB
 FPW off | 29.992383 |1.100458 | 440 MB | 746 MB
 FPW off | 12.237578 |1.027076 | 171 MB | 293 MB
 FPW off |  6.814926 |0.931624 | 86 MB  | 148 MB
 HEAD| 26.590816 |1.159255 | 440 MB | 746 MB
 HEAD| 11.620359 |0.990851 | 171 MB | 293 MB
 HEAD|  6.300401 |0.904311 | 86 MB  | 148 MB
(9 rows)
The level of compression reached is the same as previous mark, 566MB for
the case of fillfactor=50 (
cab7npqsc97o-ue5paxfmukwcxe_jioyxo1m4a0pmnmyqane...@mail.gmail.com) with a
similar CPU usage.

Once we get those small issues fixes, I think that it is with having a
committer look at this patch, presumably Fujii-san.
Regards,
-- 
Michael


compress_run.bash
Description: Binary data


results.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-12 Thread Syed, Rahila


Thank you for comments. Please find attached the updated patch.

>This patch fails to compile:
>xlogreader.c:1049:46: error: extraneous ')' after condition, expected a 
>statement
>blk->with_hole && blk->hole_offset <= 
> 0))
This has been rectified.

>Note as well that at least clang does not like much how the sanity check with 
>with_hole are done. You should place parentheses around the '&&' expressions. 
>Also, I would rather define with_hole == 0 or with_hole == 1 explicitly int 
>those checks
The expressions are modified accordingly.

>There is a typo:
>s/true,see/true, see/
>[nitpicky]Be as well aware of the 80-character limit per line that is usually 
>normally by comment blocks.[/]

Have corrected the typos and changed the comments as mentioned. Also , 
realigned certain lines to meet the 80-char limit.

Thank you,
Rahila Syed



__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v18.patch
Description: Support-compression-for-full-page-writes-in-WAL_v18.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-11 Thread Michael Paquier

On Wed, Feb 11, 2015 at 11:03 PM, Syed, Rahila 
wrote:

> >IMO, we should add details about how this new field is used in the
> comments on top of XLogRecordBlockImageHeader, meaning that when a page
> hole is present we use the compression info structure and when there is no
> hole, we are sure that the FPW raw length is BLCKSZ meaning that the two
> bytes of the CompressionInfo stuff is unnecessary.
> This comment is included in the patch attached.
>
> > For correctness with_hole should be set even for uncompressed pages. I
> think that we should as well use it for sanity checks in xlogreader.c when
> decoding records.
> This change is made in the attached patch. Following sanity checks have
> been added in xlogreader.c
>
> if (!(blk->with_hole) && blk->hole_offset != 0 || blk->with_hole &&
> blk->hole_offset <= 0))
>
> if (blk->with_hole && blk->bkp_len >= BLCKSZ)
>
> if (!(blk->with_hole) && blk->bkp_len != BLCKSZ)
>

Cool, thanks!

This patch fails to compile:
xlogreader.c:1049:46: error: extraneous ')' after condition, expected a
statement
blk->with_hole && blk->hole_offset
<= 0))

Note as well that at least clang does not like much how the sanity check
with with_hole are done. You should place parentheses around the '&&'
expressions. Also, I would rather define with_hole == 0 or with_hole == 1
explicitly int those checks.

There is a typo:
s/true,see/true, see/

[nitpicky]Be as well aware of the 80-character limit per line that is
usually normally by comment blocks.[/]

+ * "with_hole" is used to identify the presence of hole in a block.
+ * As mentioned above, length of block cannnot be more than 15-bit long.
+ * So, the free bit in the length field is used by "with_hole" to identify
presence of
+ * XLogRecordBlockImageCompressionInfo. If hole is not present ,the raw
size of
+ * a compressed block is equal to BLCKSZ therefore
XLogRecordBlockImageCompressionInfo
+ * for the corresponding compressed block need not be stored in header.
+ * If hole is present raw size is stored.
I would rewrite this paragraph as follows, fixing the multiple typos:
"with_hole" is used to identify the presence of a hole in a block image. As
the length of a block cannot be more than 15-bit long, the extra bit in the
length field is used for this identification purpose. If the block image
has no hole, it is ensured that the raw size of a compressed block image is
equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo
are not necessary.

+   /* Followed by the data related to compression if block is
compressed */
This comment needs to be updated to "if block image is compressed and has a
hole".

+   lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h)
and
+   XLogRecordBlockImageHeader where page hole offset and length is limited
to 15-bit
+   length (see src/include/access/xlogrecord.h).
80-character limit...

Regards
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-11 Thread Syed, Rahila


>IMO, we should add details about how this new field is used in the comments on 
>top of XLogRecordBlockImageHeader, meaning that when a page hole is present we 
>use the compression info structure and when there is no hole, we are sure that 
>the FPW raw length is BLCKSZ meaning that the two bytes of the CompressionInfo 
>stuff is unnecessary.
This comment is included in the patch attached.

> For correctness with_hole should be set even for uncompressed pages. I think 
> that we should as well use it for sanity checks in xlogreader.c when decoding 
> records.
This change is made in the attached patch. Following sanity checks have been 
added in xlogreader.c

if (!(blk->with_hole) && blk->hole_offset != 0 || blk->with_hole && 
blk->hole_offset <= 0))

if (blk->with_hole && blk->bkp_len >= BLCKSZ)

if (!(blk->with_hole) && blk->bkp_len != BLCKSZ)

Thank you,
Rahila Syed



__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v18.patch
Description: Support-compression-for-full-page-writes-in-WAL_v18.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Syed, Rahila

Hello,

A bug had been introduced in the latest versions of the patch. The order of 
parameters passed to pglz_decompress was wrong.
Please find attached patch with following correction,

Original code,
+   if (pglz_decompress(block_image, record->uncompressBuf,
+   bkpb->bkp_len, 
bkpb->bkp_uncompress_len) == 0)
Correction
+   if (pglz_decompress(block_image, bkpb->bkp_len,
+   record->uncompressBuf, 
bkpb->bkp_uncompress_len) == 0)


>For example here you should not have a space between the function definition 
>and the variable declarations:
>+{
>+
>+int orig_len = BLCKSZ - hole_length;
>This is as well incorrect in two places:
>if(hole_length != 0)
>There should be a space between the if and its condition in parenthesis.

Also corrected above code format mistakes.

Thank you,
Rahila Syed

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Syed, Rahila
Sent: Monday, February 09, 2015 6:58 PM
To: Michael Paquier; Fujii Masao
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

Hello,

>> Do we always need extra two bytes for compressed backup block?
>> ISTM that extra bytes are not necessary when the hole length is zero.
>> In this case the length of the original backup block (i.e.,
>> uncompressed) must be BLCKSZ, so we don't need to save the original 
>> size in the extra bytes.

>Yes, we would need a additional bit to identify that. We could steal it from 
>length in XLogRecordBlockImageHeader.

This is implemented in the attached patch by dividing length field as follows,
uint16  length:15,  
with_hole:1; 

>"2" should be replaced with the macro variable indicating the size of 
>extra header for compressed backup block.
Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2

>You can refactor XLogCompressBackupBlock() and move all the above code 
>to it for more simplicity
This is also implemented in the patch attached.

Thank you,
Rahila Syed


-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com]
Sent: Friday, February 06, 2015 6:00 PM
To: Fujii Masao
Cc: Syed, Rahila; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
> Do we always need extra two bytes for compressed backup block?
> ISTM that extra bytes are not necessary when the hole length is zero.
> In this case the length of the original backup block (i.e.,
> uncompressed) must be BLCKSZ, so we don't need to save the original 
> size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

> Furthermore, when fpw compression is disabled and the hole length is 
> zero, we seem to be able to save one byte from the header of backup 
> block. Currently we use 4 bytes for the header, 2 bytes for the length 
> of backup block, 15 bits for the hole offset and 1 bit for the flag 
> indicating whether block is compressed or not. But in that case, the 
> length of backup block doesn't need to be stored because it must be 
> BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on HEAD, because 
you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader 
to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 
bytes as hole-related data is not necessary. I imagine that a patch optimizing 
that wouldn't be that hard to write as well.

> +int page_len = BLCKSZ - hole_length;
> +char *scratch_buf;
> +if (hole_length != 0)
> +{
> +scratch_buf = compression_scratch;
> +memcpy(scratch_buf, page, hole_offset);
> +memcpy(scratch_buf + hole_offset,
> +   page + (hole_offset + hole_length),
> +   BLCKSZ - (hole_length + hole_offset));
> +}
> +else
> +scratch_buf = page;
> +
> +/* Perform compression of block */
> +if (XLogCompressBackupBlock(scratch_buf,
> +page_len,
> +regbuf->compressed_page,
> +&compress_len))
> +{
> +/* compression is done, add record */
> +is_compresse

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Michael Paquier

On Mon, Feb 9, 2015 at 10:27 PM, Syed, Rahila wrote:
> (snip)

Thanks for showing up here! I have not tested the test the patch,
those comments are based on what I read from v17.

>>> Do we always need extra two bytes for compressed backup block?
>>> ISTM that extra bytes are not necessary when the hole length is zero.
>>> In this case the length of the original backup block (i.e.,
>>> uncompressed) must be BLCKSZ, so we don't need to save the original
>>> size in the extra bytes.
>
>>Yes, we would need a additional bit to identify that. We could steal it from 
>>length in XLogRecordBlockImageHeader.
>
> This is implemented in the attached patch by dividing length field as follows,
> uint16  length:15,
> with_hole:1;

IMO, we should add details about how this new field is used in the
comments on top of XLogRecordBlockImageHeader, meaning that when a
page hole is present we use the compression info structure and when
there is no hole, we are sure that the FPW raw length is BLCKSZ
meaning that the two bytes of the CompressionInfo stuff is
unnecessary.

>
>>"2" should be replaced with the macro variable indicating the size of
>>extra header for compressed backup block.
> Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2
>
>>You can refactor XLogCompressBackupBlock() and move all the
>>above code to it for more simplicity
> This is also implemented in the patch attached.

This portion looks correct to me.

A couple of other comments:
1) Nitpicky but, code format is sometimes strange.
For example here you should not have a space between the function
definition and the variable declarations:
+{
+
+int orig_len = BLCKSZ - hole_length;
This is as well incorrect in two places:
if(hole_length != 0)
There should be a space between the if and its condition in parenthesis.
2) For correctness with_hole should be set even for uncompressed
pages. I think that we should as well use it for sanity checks in
xlogreader.c when decoding records.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Syed, Rahila

Hello,

>> Do we always need extra two bytes for compressed backup block?
>> ISTM that extra bytes are not necessary when the hole length is zero.
>> In this case the length of the original backup block (i.e., 
>> uncompressed) must be BLCKSZ, so we don't need to save the original 
>> size in the extra bytes.

>Yes, we would need a additional bit to identify that. We could steal it from 
>length in XLogRecordBlockImageHeader.

This is implemented in the attached patch by dividing length field as follows,
uint16  length:15,  
with_hole:1; 

>"2" should be replaced with the macro variable indicating the size of 
>extra header for compressed backup block.
Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2

>You can refactor XLogCompressBackupBlock() and move all the 
>above code to it for more simplicity
This is also implemented in the patch attached.

Thank you,
Rahila Syed


-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com] 
Sent: Friday, February 06, 2015 6:00 PM
To: Fujii Masao
Cc: Syed, Rahila; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
> Do we always need extra two bytes for compressed backup block?
> ISTM that extra bytes are not necessary when the hole length is zero.
> In this case the length of the original backup block (i.e., 
> uncompressed) must be BLCKSZ, so we don't need to save the original 
> size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

> Furthermore, when fpw compression is disabled and the hole length is 
> zero, we seem to be able to save one byte from the header of backup 
> block. Currently we use 4 bytes for the header, 2 bytes for the length 
> of backup block, 15 bits for the hole offset and 1 bit for the flag 
> indicating whether block is compressed or not. But in that case, the 
> length of backup block doesn't need to be stored because it must be 
> BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on HEAD, because 
you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader 
to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 
bytes as hole-related data is not necessary. I imagine that a patch optimizing 
that wouldn't be that hard to write as well.

> +int page_len = BLCKSZ - hole_length;
> +char *scratch_buf;
> +if (hole_length != 0)
> +{
> +scratch_buf = compression_scratch;
> +memcpy(scratch_buf, page, hole_offset);
> +memcpy(scratch_buf + hole_offset,
> +   page + (hole_offset + hole_length),
> +   BLCKSZ - (hole_length + hole_offset));
> +}
> +else
> +scratch_buf = page;
> +
> +/* Perform compression of block */
> +if (XLogCompressBackupBlock(scratch_buf,
> +page_len,
> +regbuf->compressed_page,
> +&compress_len))
> +{
> +/* compression is done, add record */
> +is_compressed = true;
> +}
>
> You can refactor XLogCompressBackupBlock() and move all the above code 
> to it for more simplicity.

Sure.
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v17.patch
Description: Support-compression-for-full-page-writes-in-WAL_v17.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-08 Thread Fujii Masao

On Fri, Feb 6, 2015 at 3:42 AM, Michael Paquier
 wrote:
> Fujii Masao wrote:
>> I wrote
>>> This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
>>> compression algorithm to return a size of 0 bytes as compressed or
>>> decompressed length btw? We could as well make it return a negative
>>> value when a failure occurs if you feel more comfortable with it.
>>
>> I feel that's better. Attached is the updated version of the patch.
>> I changed the pg_lzcompress and pg_lzdecompress so that they return -1
>> when failure happens. Also I applied some cosmetic changes to the patch
>> (e.g., shorten the long name of the newly-added macros).
>> Barring any objection, I will commit this.
>
> I just had a look at your updated version, ran some sanity tests, and
> things look good from me. The new names of the macros at the top of
> tuptoaster.c are clearer as well.

Thanks for the review! Pushed!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier

On Fri, Feb 6, 2015 at 6:35 PM, Syed, Rahila wrote:
> The compression patch can  use the latest interface 
> MemoryContextAllocExtended to proceed without compression when sufficient 
> memory is not available for
> scratch buffer.
> The attached patch introduces OutOfMem flag which is set on when 
> MemoryContextAllocExtended returns NULL .

TBH, I don't think that brings much as this allocation is done once
and process would surely fail before reaching the first code path
doing a WAL record insertion. In any case, OutOfMem is useless, you
could simply check if compression_scratch is NULL when assembling a
record.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Syed, Rahila


>In any case, those things have been introduced by what I did in previous 
>versions... And attached is a new patch.
Thank you for feedback.

>   /* allocate scratch buffer used for compression of block images */
>+  if (compression_scratch == NULL)
>+  compression_scratch = MemoryContextAllocZero(xloginsert_cxt,
>+  
> BLCKSZ);
 >}
The compression patch can  use the latest interface MemoryContextAllocExtended 
to proceed without compression when sufficient memory is not available for 
scratch buffer. 
The attached patch introduces OutOfMem flag which is set on when 
MemoryContextAllocExtended returns NULL . 

Thank you,
Rahila Syed

-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com] 
Sent: Friday, February 06, 2015 12:46 AM
To: Syed, Rahila
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila  wrote:
>>/*
>>+* We recheck the actual size even if pglz_compress() report success,
>>+* because it might be satisfied with having saved as little as one byte
>>+* in the compressed data.
>>+*/
>>+   *len = (uint16) compressed_len;
>>+   if (*len >= orig_len - 1)
>>+   return false;
>>+   return true;
>>+}
>
> As per latest code ,when compression is 'on' we introduce additional 2 bytes 
> in the header of each block image for storing raw_length of the compressed 
> block.
> In order to achieve compression while accounting for these two additional 
> bytes, we must ensure that compressed length is less than original length - 2.
> So , IIUC the above condition should rather be
>
> If (*len >= orig_len -2 )
> return false;
> return true;
> The attached patch contains this. It also has a cosmetic change-  renaming 
> compressBuf to uncompressBuf as it is used to store uncompressed page.

Agreed on both things.

Just looking at your latest patch after some time to let it cool down, I 
noticed a couple of things.

 #define MaxSizeOfXLogRecordBlockHeader \
 (SizeOfXLogRecordBlockHeader + \
- SizeOfXLogRecordBlockImageHeader + \
+ SizeOfXLogRecordBlockImageHeader, \
+ SizeOfXLogRecordBlockImageCompressionInfo + \
There is a comma here instead of a sum sign. We should really sum up all those 
sizes to evaluate the maximum size of a block header.

+ * Permanently allocate readBuf uncompressBuf.  We do it this way,
+ * rather than just making a static array, for two reasons:
This comment is just but weird, "readBuf AND uncompressBuf" is more appropriate.

+ * We recheck the actual size even if pglz_compress() report success,
+ * because it might be satisfied with having saved as little as one byte
+ * in the compressed data. We add two bytes to store raw_length  with the
+ * compressed image. So for compression to be effective
compressed_len should
+ * be atleast < orig_len - 2
This comment block should be reworked, and misses a dot at its end. I rewrote 
it like that, hopefully that's clearer:
+   /*
+* We recheck the actual size even if pglz_compress() reports
success and see
+* if at least 2 bytes of length have been saved, as this
corresponds to the
+* additional amount of data stored in WAL record for a compressed block
+* via raw_length.
+*/

In any case, those things have been introduced by what I did in previous 
versions... And attached is a new patch.
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v17.patch
Description: Support-compression-for-full-page-writes-in-WAL_v17.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier

On Fri, Feb 6, 2015 at 4:30 PM, Michael Paquier wrote:
> On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
>> Do we always need extra two bytes for compressed backup block?
>> ISTM that extra bytes are not necessary when the hole length is zero.
>> In this case the length of the original backup block (i.e., uncompressed)
>> must be BLCKSZ, so we don't need to save the original size in
>> the extra bytes.
>
> Yes, we would need a additional bit to identify that. We could steal
> it from length in XLogRecordBlockImageHeader.
>
>> Furthermore, when fpw compression is disabled and the hole length
>> is zero, we seem to be able to save one byte from the header of
>> backup block. Currently we use 4 bytes for the header, 2 bytes for
>> the length of backup block, 15 bits for the hole offset and 1 bit for
>> the flag indicating whether block is compressed or not. But in that case,
>> the length of backup block doesn't need to be stored because it must
>> be BLCKSZ. Shouldn't we optimize the header in this way? Thought?
>
> If we do it, that's something to tackle even before this patch on
> HEAD, because you could use the 16th bit of the first 2 bytes of
> XLogRecordBlockImageHeader to do necessary sanity checks, to actually
> not reduce record by 1 byte, but 2 bytes as hole-related data is not
> necessary. I imagine that a patch optimizing that wouldn't be that
> hard to write as well.

Actually, as Heikki pointed me out... A block image is 8k and pages
without holes are rare, so it may be not worth sacrificing code
simplicity for record reduction at the order of 0.1% or smth like
that, and the current patch is light because it keeps things simple.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Michael Paquier

On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
> Do we always need extra two bytes for compressed backup block?
> ISTM that extra bytes are not necessary when the hole length is zero.
> In this case the length of the original backup block (i.e., uncompressed)
> must be BLCKSZ, so we don't need to save the original size in
> the extra bytes.

Yes, we would need a additional bit to identify that. We could steal
it from length in XLogRecordBlockImageHeader.

> Furthermore, when fpw compression is disabled and the hole length
> is zero, we seem to be able to save one byte from the header of
> backup block. Currently we use 4 bytes for the header, 2 bytes for
> the length of backup block, 15 bits for the hole offset and 1 bit for
> the flag indicating whether block is compressed or not. But in that case,
> the length of backup block doesn't need to be stored because it must
> be BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on
HEAD, because you could use the 16th bit of the first 2 bytes of
XLogRecordBlockImageHeader to do necessary sanity checks, to actually
not reduce record by 1 byte, but 2 bytes as hole-related data is not
necessary. I imagine that a patch optimizing that wouldn't be that
hard to write as well.

> +int page_len = BLCKSZ - hole_length;
> +char *scratch_buf;
> +if (hole_length != 0)
> +{
> +scratch_buf = compression_scratch;
> +memcpy(scratch_buf, page, hole_offset);
> +memcpy(scratch_buf + hole_offset,
> +   page + (hole_offset + hole_length),
> +   BLCKSZ - (hole_length + hole_offset));
> +}
> +else
> +scratch_buf = page;
> +
> +/* Perform compression of block */
> +if (XLogCompressBackupBlock(scratch_buf,
> +page_len,
> +regbuf->compressed_page,
> +&compress_len))
> +{
> +/* compression is done, add record */
> +is_compressed = true;
> +}
>
> You can refactor XLogCompressBackupBlock() and move all the
> above code to it for more simplicity.

Sure.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-06 Thread Fujii Masao

On Fri, Feb 6, 2015 at 4:15 AM, Michael Paquier
 wrote:
> On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila  wrote:
>>>/*
>>>+* We recheck the actual size even if pglz_compress() report success,
>>>+* because it might be satisfied with having saved as little as one byte
>>>+* in the compressed data.
>>>+*/
>>>+   *len = (uint16) compressed_len;
>>>+   if (*len >= orig_len - 1)
>>>+   return false;
>>>+   return true;
>>>+}
>>
>> As per latest code ,when compression is 'on' we introduce additional 2 bytes 
>> in the header of each block image for storing raw_length of the compressed 
>> block.
>> In order to achieve compression while accounting for these two additional 
>> bytes, we must ensure that compressed length is less than original length - 
>> 2.
>> So , IIUC the above condition should rather be
>>
>> If (*len >= orig_len -2 )
>> return false;

"2" should be replaced with the macro variable indicating the size of
extra header for compressed backup block.

Do we always need extra two bytes for compressed backup block?
ISTM that extra bytes are not necessary when the hole length is zero.
In this case the length of the original backup block (i.e., uncompressed)
must be BLCKSZ, so we don't need to save the original size in
the extra bytes.

Furthermore, when fpw compression is disabled and the hole length
is zero, we seem to be able to save one byte from the header of
backup block. Currently we use 4 bytes for the header, 2 bytes for
the length of backup block, 15 bits for the hole offset and 1 bit for
the flag indicating whether block is compressed or not. But in that case,
the length of backup block doesn't need to be stored because it must
be BLCKSZ. Shouldn't we optimize the header in this way? Thought?

+int page_len = BLCKSZ - hole_length;
+char *scratch_buf;
+if (hole_length != 0)
+{
+scratch_buf = compression_scratch;
+memcpy(scratch_buf, page, hole_offset);
+memcpy(scratch_buf + hole_offset,
+   page + (hole_offset + hole_length),
+   BLCKSZ - (hole_length + hole_offset));
+}
+else
+scratch_buf = page;
+
+/* Perform compression of block */
+if (XLogCompressBackupBlock(scratch_buf,
+page_len,
+regbuf->compressed_page,
+&compress_len))
+{
+/* compression is done, add record */
+is_compressed = true;
+}

You can refactor XLogCompressBackupBlock() and move all the
above code to it for more simplicity.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Michael Paquier

On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila  wrote:
>>/*
>>+* We recheck the actual size even if pglz_compress() report success,
>>+* because it might be satisfied with having saved as little as one byte
>>+* in the compressed data.
>>+*/
>>+   *len = (uint16) compressed_len;
>>+   if (*len >= orig_len - 1)
>>+   return false;
>>+   return true;
>>+}
>
> As per latest code ,when compression is 'on' we introduce additional 2 bytes 
> in the header of each block image for storing raw_length of the compressed 
> block.
> In order to achieve compression while accounting for these two additional 
> bytes, we must ensure that compressed length is less than original length - 2.
> So , IIUC the above condition should rather be
>
> If (*len >= orig_len -2 )
> return false;
> return true;
> The attached patch contains this. It also has a cosmetic change-  renaming 
> compressBuf to uncompressBuf as it is used to store uncompressed page.

Agreed on both things.

Just looking at your latest patch after some time to let it cool down,
I noticed a couple of things.

 #define MaxSizeOfXLogRecordBlockHeader \
 (SizeOfXLogRecordBlockHeader + \
- SizeOfXLogRecordBlockImageHeader + \
+ SizeOfXLogRecordBlockImageHeader, \
+ SizeOfXLogRecordBlockImageCompressionInfo + \
There is a comma here instead of a sum sign. We should really sum up
all those sizes to evaluate the maximum size of a block header.

+ * Permanently allocate readBuf uncompressBuf.  We do it this way,
+ * rather than just making a static array, for two reasons:
This comment is just but weird, "readBuf AND uncompressBuf" is more appropriate.

+ * We recheck the actual size even if pglz_compress() report success,
+ * because it might be satisfied with having saved as little as one byte
+ * in the compressed data. We add two bytes to store raw_length  with the
+ * compressed image. So for compression to be effective
compressed_len should
+ * be atleast < orig_len - 2
This comment block should be reworked, and misses a dot at its end. I
rewrote it like that, hopefully that's clearer:
+   /*
+* We recheck the actual size even if pglz_compress() reports
success and see
+* if at least 2 bytes of length have been saved, as this
corresponds to the
+* additional amount of data stored in WAL record for a compressed block
+* via raw_length.
+*/

In any case, those things have been introduced by what I did in
previous versions... And attached is a new patch.
-- 
Michael
From 2b0c54bc7c4bfb2494cf8b0394d56635b01d7c5a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 25 Nov 2014 14:24:26 +0900
Subject: [PATCH] Support compression for full-page writes in WAL

Compression is controlled with a new parameter called wal_compression.
This parameter can be changed at session level to control WAL compression.
---
 contrib/pg_xlogdump/pg_xlogdump.c |  17 ++-
 doc/src/sgml/config.sgml  |  29 +
 src/backend/access/transam/xlog.c |   1 +
 src/backend/access/transam/xloginsert.c   | 161 ++
 src/backend/access/transam/xlogreader.c   |  71 +---
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 src/include/access/xlogreader.h   |   7 +-
 src/include/access/xlogrecord.h   |  49 ++--
 src/include/pg_config.h.in|   4 +-
 11 files changed, 297 insertions(+), 53 deletions(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index c1bfbc2..3ebaac6 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -363,14 +363,14 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	 * takes up BLCKSZ bytes, minus the "hole" length.
 	 *
 	 * XXX: We peek into xlogreader's private decoded backup blocks for the
-	 * hole_length. It doesn't seem worth it to add an accessor macro for
+	 * length of block. It doesn't seem worth it to add an accessor macro for
 	 * this.
 	 */
 	fpi_len = 0;
 	for (block_id = 0; block_id <= record->max_block_id; block_id++)
 	{
 		if (XLogRecHasBlockImage(record, block_id))
-			fpi_len += BLCKSZ - record->blocks[block_id].hole_length;
+			fpi_len += record->blocks[block_id].bkp_len;
 	}
 
 	/* Update per-rmgr statistics */
@@ -465,9 +465,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
    blk);
 			if (XLogRecHasBlockImage(record, block_id))
 			{
-printf(" (FPW); hole: offset: %u, length: %u\n",
-	   record->blocks[block_id].hole_offset,
-	   record->blocks[block_id].hole_length);
+if (record->blocks[block_id].is_compressed)
+	printf(" (FPW compressed); hole offset: %u, "
+		   "compressed length: %u, original length: %u\n",
+		   record->blocks[block_id].h

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Michael Paquier

Fujii Masao wrote:
> I wrote
>> This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
>> compression algorithm to return a size of 0 bytes as compressed or
>> decompressed length btw? We could as well make it return a negative
>> value when a failure occurs if you feel more comfortable with it.
>
> I feel that's better. Attached is the updated version of the patch.
> I changed the pg_lzcompress and pg_lzdecompress so that they return -1
> when failure happens. Also I applied some cosmetic changes to the patch
> (e.g., shorten the long name of the newly-added macros).
> Barring any objection, I will commit this.

I just had a look at your updated version, ran some sanity tests, and
things look good from me. The new names of the macros at the top of
tuptoaster.c are clearer as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Syed, Rahila

Hello,
 
>/*
>+* We recheck the actual size even if pglz_compress() report success,
>+* because it might be satisfied with having saved as little as one byte
>+* in the compressed data.
>+*/
>+   *len = (uint16) compressed_len;
>+   if (*len >= orig_len - 1)
>+   return false;
>+   return true;
>+}

As per latest code ,when compression is 'on' we introduce additional 2 bytes in 
the header of each block image for storing raw_length of the compressed block. 
In order to achieve compression while accounting for these two additional 
bytes, we must ensure that compressed length is less than original length - 2.
So , IIUC the above condition should rather be 

If (*len >= orig_len -2 )
return false;
return true;
 
The attached patch contains this. It also has a cosmetic change-  renaming 
compressBuf to uncompressBuf as it is used to store uncompressed page.
 
Thank you,
Rahila Syed

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
Sent: Wednesday, January 07, 2015 9:32 AM
To: Rahila Syed
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Wed, Jan 7, 2015 at 12:51 AM, Rahila Syed  wrote:
> Following are some comments,
Thanks for the feedback.

>>uint16  hole_offset:15, /* number of bytes in "hole" */
> Typo in description of hole_offset
Fixed. That's "before hole".

>>for (block_id = 0; block_id <= record->max_block_id; block_id++)
>>-   {
>>-   if (XLogRecHasBlockImage(record, block_id))
>>-   fpi_len += BLCKSZ -
> record->blocks[block_id].hole_length;
>>-   }
>>+   fpi_len += record->blocks[block_id].bkp_len;
>
> IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is 
> incorrectly removed from the above for loop.
Fixed.

>>typedef struct XLogRecordCompressedBlockImageHeader
> I am trying to understand the purpose behind declaration of the above 
> struct. IIUC, it is defined in order to introduce new field uint16 
> raw_length and it has been declared as a separate struct from 
> XLogRecordBlockImageHeader to not affect the size of WAL record when 
> compression is off.
> I wonder if it is ok to simply memcpy the uint16 raw_length in the 
> hdr_scratch when compression is on and not have a separate header 
> struct for it neither declare it in existing header. raw_length can be 
> a locally defined variable is XLogRecordAssemble or it can be a field 
> in registered_buffer struct like compressed_page.
> I think this can simplify the code.
> Am I missing something obvious?
You are missing nothing. I just introduced this structure for a matter of 
readability to show the two-byte difference between non-compressed and 
compressed header information. It is true that doing it my way makes the 
structures duplicated, so let's simply add the compression-related information 
as an extra structure added after XLogRecordBlockImageHeader if the block is 
compressed. I hope this addresses your concerns.

>> /*
>>  * Fill in the remaining fields in the XLogRecordBlockImageHeader
>>  * struct and add new entries in the record chain.
>> */
>
>>   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
>
> This code line seems to be misplaced with respect to the above comment.
> Comment indicates filling of XLogRecordBlockImageHeader fields while 
> fork_flags is a field of XLogRecordBlockHeader.
> Is it better to place the code close to following condition?
>   if (needs_backup)
>   {
Yes, this comment should not be here. I replaced it with the comment in HEAD.


>>+  *the original length of the
>>+ * block without its page hole being deducible from the compressed 
>>+ data
>>+ * itself.
> IIUC, this comment before XLogRecordBlockImageHeader seems to be no 
> longer valid as original length is not deducible from compressed data 
> and rather stored in header.
Aah, true. This was originally present in the header of PGLZ that has been 
removed to make it available for frontends.

Updated patches are attached.
Regards,
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v15.patch
Description: Support-compression-for-full-page-writes-in-WAL_v15.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Fujii Masao

On Tue, Jan 6, 2015 at 11:09 AM, Michael Paquier
 wrote:
> On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao  wrote:
>> On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote:
>> The patch 1 cannot be applied to the master successfully because of
>> recent change.
> Yes, that's caused by ccb161b. Attached are rebased versions.
>
>>> - The real stuff comes with patch 2, that implements the removal of
>>> PGLZ_Header, changing the APIs of compression and decompression to pglz to
>>> not have anymore toast metadata, this metadata being now localized in
>>> tuptoaster.c. Note that this patch protects the on-disk format (tested with
>>> pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
>>> compression and decompression look like with this patch, simply performing
>>> operations from a source to a destination:
>>> extern int32 pglz_compress(const char *source, int32 slen, char *dest,
>>>   const PGLZ_Strategy *strategy);
>>> extern int32 pglz_decompress(const char *source, char *dest,
>>>   int32 compressed_size, int32 raw_size);
>>> The return value of those functions is the number of bytes written in the
>>> destination buffer, and 0 if operation failed.
>>
>> So it's guaranteed that 0 is never returned in success case? I'm not sure
>> if that case can really happen, though.
> This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
> compression algorithm to return a size of 0 bytes as compressed or
> decompressed length btw? We could as well make it return a negative
> value when a failure occurs if you feel more comfortable with it.

I feel that's better. Attached is the updated version of the patch.
I changed the pg_lzcompress and pg_lzdecompress so that they return -1
when failure happens. Also I applied some cosmetic changes to the patch
(e.g., shorten the long name of the newly-added macros).
Barring any objection, I will commit this.

Regards,

-- 
Fujii Masao
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***
*** 35,43 
  #include "access/tuptoaster.h"
  #include "access/xact.h"
  #include "catalog/catalog.h"
  #include "miscadmin.h"
  #include "utils/fmgroids.h"
- #include "utils/pg_lzcompress.h"
  #include "utils/rel.h"
  #include "utils/typcache.h"
  #include "utils/tqual.h"
--- 35,43 
  #include "access/tuptoaster.h"
  #include "access/xact.h"
  #include "catalog/catalog.h"
+ #include "common/pg_lzcompress.h"
  #include "miscadmin.h"
  #include "utils/fmgroids.h"
  #include "utils/rel.h"
  #include "utils/typcache.h"
  #include "utils/tqual.h"
***
*** 45,50 
--- 45,70 
  
  #undef TOAST_DEBUG
  
+ /*
+  *	The information at the start of the compressed toast data.
+  */
+ typedef struct toast_compress_header
+ {
+ 	int32		vl_len_;	/* varlena header (do not touch directly!) */
+ 	int32		rawsize;
+ } toast_compress_header;
+ 
+ /*
+  * Utilities for manipulation of header information for compressed
+  * toast entries.
+  */
+ #define	TOAST_COMPRESS_HDRSZ		((int32) sizeof(toast_compress_header))
+ #define TOAST_COMPRESS_RAWSIZE(ptr)	(((toast_compress_header *) ptr)->rawsize)
+ #define TOAST_COMPRESS_RAWDATA(ptr) \
+ 	(((char *) ptr) + TOAST_COMPRESS_HDRSZ)
+ #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
+ 	(((toast_compress_header *) ptr)->rawsize = len)
+ 
  static void toast_delete_datum(Relation rel, Datum value);
  static Datum toast_save_datum(Relation rel, Datum value,
   struct varlena * oldexternal, int options);
***
*** 53,58  static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
--- 73,79 
  static struct varlena *toast_fetch_datum(struct varlena * attr);
  static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
  		int32 sliceoffset, int32 length);
+ static struct varlena *toast_decompress_datum(struct varlena * attr);
  static int toast_open_indexes(Relation toastrel,
     LOCKMODE lock,
     Relation **toastidxs,
***
*** 138,148  heap_tuple_untoast_attr(struct varlena * attr)
  		/* If it's compressed, decompress it */
  		if (VARATT_IS_COMPRESSED(attr))
  		{
! 			PGLZ_Header *tmp = (PGLZ_Header *) attr;
! 
! 			attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 			SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 			pglz_decompress(tmp, VARDATA(attr));
  			pfree(tmp);
  		}
  	}
--- 159,166 
  		/* If it's compressed, decompress it */
  		if (VARATT_IS_COMPRESSED(attr))
  		{
! 			struct varlena *tmp = attr;
! 			attr = toast_decompress_datum(tmp);
  			pfree(tmp);
  		}
  	}
***
*** 163,173  heap_tuple_untoast_attr(struct varlena * attr)
  		/*
  		 * This is a compressed value inside of the main tuple
  		 */
! 		PGLZ_Header *tmp = (PGLZ_Header *) attr;
! 
! 		attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 		SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 		pglz_decompress(tmp, VARDATA(attr));

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-06 Thread Michael Paquier

On Wed, Jan 7, 2015 at 12:51 AM, Rahila Syed  wrote:
> Following are some comments,
Thanks for the feedback.

>>uint16  hole_offset:15, /* number of bytes in "hole" */
> Typo in description of hole_offset
Fixed. That's "before hole".

>>for (block_id = 0; block_id <= record->max_block_id; block_id++)
>>-   {
>>-   if (XLogRecHasBlockImage(record, block_id))
>>-   fpi_len += BLCKSZ -
> record->blocks[block_id].hole_length;
>>-   }
>>+   fpi_len += record->blocks[block_id].bkp_len;
>
> IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is
> incorrectly removed from the above for loop.
Fixed.

>>typedef struct XLogRecordCompressedBlockImageHeader
> I am trying to understand the purpose behind declaration of the above
> struct. IIUC, it is defined in order to introduce new field uint16
> raw_length and it has been declared as a separate struct from
> XLogRecordBlockImageHeader to not affect the size of WAL record when
> compression is off.
> I wonder if it is ok to simply memcpy the uint16 raw_length in the
> hdr_scratch when compression is on
> and not have a separate header struct for it neither declare it in existing
> header. raw_length can be a locally defined variable is XLogRecordAssemble
> or it can be a field in registered_buffer struct like compressed_page.
> I think this can simplify the code.
> Am I missing something obvious?
You are missing nothing. I just introduced this structure for a matter
of readability to show the two-byte difference between non-compressed
and compressed header information. It is true that doing it my way
makes the structures duplicated, so let's simply add the
compression-related information as an extra structure added after
XLogRecordBlockImageHeader if the block is compressed. I hope this
addresses your concerns.

>> /*
>>  * Fill in the remaining fields in the XLogRecordBlockImageHeader
>>  * struct and add new entries in the record chain.
>> */
>
>>   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
>
> This code line seems to be misplaced with respect to the above comment.
> Comment indicates filling of XLogRecordBlockImageHeader fields while
> fork_flags is a field of XLogRecordBlockHeader.
> Is it better to place the code close to following condition?
>   if (needs_backup)
>   {
Yes, this comment should not be here. I replaced it with the comment in HEAD.


>>+  *the original length of the
>>+ * block without its page hole being deducible from the compressed data
>>+ * itself.
> IIUC, this comment before XLogRecordBlockImageHeader seems to be no longer
> valid as original length is not deducible from compressed data and rather
> stored in header.
Aah, true. This was originally present in the header of PGLZ that has
been removed to make it available for frontends.

Updated patches are attached.
Regards,
-- 
Michael


20150107_fpw_compression_v14.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-06 Thread Rahila Syed

Hello,

>Yes, that's caused by ccb161b. Attached are rebased versions. 

Following are some comments, 

>uint16  hole_offset:15, /* number of bytes in "hole" */
Typo in description of hole_offset


>for (block_id = 0; block_id <= record->max_block_id; block_id++)
>-   {
>-   if (XLogRecHasBlockImage(record, block_id))
>-   fpi_len += BLCKSZ -
record->blocks[block_id].hole_length;
>-   }
>+   fpi_len += record->blocks[block_id].bkp_len;

IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is
incorrectly removed from the above for loop.

>typedef struct XLogRecordCompressedBlockImageHeader
I am trying to understand the purpose behind declaration of the above
struct. IIUC, it is defined in order to introduce new field uint16 
raw_length and it has been declared as a separate struct from
XLogRecordBlockImageHeader to not affect the size of WAL record when
compression is off.
I wonder if it is ok to simply memcpy the uint16 raw_length in the
hdr_scratch when compression is on
and not have a separate header struct for it neither declare it in existing
header. raw_length can be a locally defined variable is XLogRecordAssemble
or it can be a field in registered_buffer struct like compressed_page.
I think this can simplify the code. 
Am I missing something obvious?   

> /*
>  * Fill in the remaining fields in the XLogRecordBlockImageHeader
>  * struct and add new entries in the record chain.
> */
   
>   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;

This code line seems to be misplaced with respect to the above comment. 
Comment indicates filling of XLogRecordBlockImageHeader fields while
fork_flags is a field of XLogRecordBlockHeader.
Is it better to place the code close to following condition?
  if (needs_backup)
  {

>+  *the original length of the
>+ * block without its page hole being deducible from the compressed data
>+ * itself.
IIUC, this comment before XLogRecordBlockImageHeader seems to be no longer
valid as original length is not deducible from compressed data and rather
stored in header.


Thank you,
Rahila Syed



--
View this message in context: 
http://postgresql.nabble.com/Compression-of-full-page-writes-tp5769039p5833025.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-05 Thread Michael Paquier

On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao  wrote:
> On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote:
> The patch 1 cannot be applied to the master successfully because of
> recent change.
Yes, that's caused by ccb161b. Attached are rebased versions.

>> - The real stuff comes with patch 2, that implements the removal of
>> PGLZ_Header, changing the APIs of compression and decompression to pglz to
>> not have anymore toast metadata, this metadata being now localized in
>> tuptoaster.c. Note that this patch protects the on-disk format (tested with
>> pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
>> compression and decompression look like with this patch, simply performing
>> operations from a source to a destination:
>> extern int32 pglz_compress(const char *source, int32 slen, char *dest,
>>   const PGLZ_Strategy *strategy);
>> extern int32 pglz_decompress(const char *source, char *dest,
>>   int32 compressed_size, int32 raw_size);
>> The return value of those functions is the number of bytes written in the
>> destination buffer, and 0 if operation failed.
>
> So it's guaranteed that 0 is never returned in success case? I'm not sure
> if that case can really happen, though.
This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
compression algorithm to return a size of 0 bytes as compressed or
decompressed length btw? We could as well make it return a negative
value when a failure occurs if you feel more comfortable with it.
-- 
Michael


20150105_fpw_compression_v13.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-01-05 Thread Fujii Masao

On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier
 wrote:
>
>
> On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier 
> wrote:
>> On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao 
>> wrote:
>>> pglz_compress() and pglz_decompress() still use PGLZ_Header, so the
>>> frontend
>>> which uses those functions needs to handle PGLZ_Header. But it basically
>>> should
>>> be handled via the varlena macros. That is, the frontend still seems to
>>> need to
>>> understand the varlena datatype. I think we should avoid that. Thought?
>> Hm, yes it may be wiser to remove it and make the data passed to pglz
>> for varlena 8 bytes shorter..
>
> OK, here is the result of this work, made of 3 patches.

Thanks for updating the patches!

> The first two patches move pglz stuff to src/common and make it a frontend
> utility entirely independent on varlena and its related metadata.
> - Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still
> present. There is nothing amazing here, and that's the broken version that
> has been reverted in 966115c.

The patch 1 cannot be applied to the master successfully because of
recent change.

> - The real stuff comes with patch 2, that implements the removal of
> PGLZ_Header, changing the APIs of compression and decompression to pglz to
> not have anymore toast metadata, this metadata being now localized in
> tuptoaster.c. Note that this patch protects the on-disk format (tested with
> pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
> compression and decompression look like with this patch, simply performing
> operations from a source to a destination:
> extern int32 pglz_compress(const char *source, int32 slen, char *dest,
>   const PGLZ_Strategy *strategy);
> extern int32 pglz_decompress(const char *source, char *dest,
>   int32 compressed_size, int32 raw_size);
> The return value of those functions is the number of bytes written in the
> destination buffer, and 0 if operation failed.

So it's guaranteed that 0 is never returned in success case? I'm not sure
if that case can really happen, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-28 Thread Michael Paquier

On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier 
wrote:
> On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao 
wrote:
>> pglz_compress() and pglz_decompress() still use PGLZ_Header, so the
frontend
>> which uses those functions needs to handle PGLZ_Header. But it basically
should
>> be handled via the varlena macros. That is, the frontend still seems to
need to
>> understand the varlena datatype. I think we should avoid that. Thought?
> Hm, yes it may be wiser to remove it and make the data passed to pglz
> for varlena 8 bytes shorter..

OK, here is the result of this work, made of 3 patches.

The first two patches move pglz stuff to src/common and make it a frontend
utility entirely independent on varlena and its related metadata.
- Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still
present. There is nothing amazing here, and that's the broken version that
has been reverted in 966115c.
- The real stuff comes with patch 2, that implements the removal of
PGLZ_Header, changing the APIs of compression and decompression to pglz to
not have anymore toast metadata, this metadata being now localized in
tuptoaster.c. Note that this patch protects the on-disk format (tested with
pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
compression and decompression look like with this patch, simply performing
operations from a source to a destination:
extern int32 pglz_compress(const char *source, int32 slen, char *dest,
  const PGLZ_Strategy *strategy);
extern int32 pglz_decompress(const char *source, char *dest,
  int32 compressed_size, int32 raw_size);
The return value of those functions is the number of bytes written in the
destination buffer, and 0 if operation failed. This is aimed to make
backend as well more pluggable. The reason why patch 2 exists (it could be
merged with patch 1), is to facilitate the review and the changes made to
pglz to make it an entirely independent facility.

Patch 3 is the FPW compression, changed to fit with those changes. Note
that as PGLZ_Header contains the raw size of the compressed data, and that
it does not exist, it is necessary to store the raw length of the block
image directly in the block image header with 2 additional bytes. Those 2
bytes are used only if wal_compression is set to true thanks to a boolean
flag, so if wal_compression is disabled, the WAL record length is exactly
the same as HEAD, and there is no penalty in the default case. Similarly to
previous patches, the block image is compressed without its hole.

To finish, here are some results using the same test as here with the hack
on getrusage to get the system and user CPU diff on a single backend
execution:
http://www.postgresql.org/message-id/cab7npqsc97o-ue5paxfmukwcxe_jioyxo1m4a0pmnmyqane...@mail.gmail.com
Just as a reminder, this test generated a fixed number of FPWs on a single
backend with fsync and autovacuum disabled with several values of
fillfactor to see the effect of page holes.

  test   | ffactor | user_diff | system_diff | pg_size_pretty
-+-+---+-+
 FPW on  |  50 | 48.823907 |0.737649 | 582 MB
 FPW on  |  20 | 16.135000 |0.764682 | 229 MB
 FPW on  |  10 |  8.521099 |0.751947 | 116 MB
 FPW off |  50 | 29.722793 |1.045577 | 746 MB
 FPW off |  20 | 12.673375 |0.905422 | 293 MB
 FPW off |  10 |  6.723120 |0.779936 | 148 MB
 HEAD|  50 | 30.763136 |1.129822 | 746 MB
 HEAD|  20 | 13.340823 |0.893365 | 293 MB
 HEAD|  10 |  7.267311 |0.909057 | 148 MB
(9 rows)

Results are similar to what has been measured previously, it doesn't hurt
to check again, but roughly the CPU cost is balanced by the WAL record
reduction. There is 0 byte of difference in term of WAL record length
between HEAD this patch when wal_compression = off.

Patches, as well as the test script and the results are attached.
Regards,
-- 
Michael

results.sql
Description: Binary data

test_compress
Description: Binary data

20141228_fpw_compression_v12.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier

On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao  wrote:
> pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend
> which uses those functions needs to handle PGLZ_Header. But it basically 
> should
> be handled via the varlena macros. That is, the frontend still seems to need 
> to
> understand the varlena datatype. I think we should avoid that. Thought?
Hm, yes it may be wiser to remove it and make the data passed to pglz
for varlena 8 bytes shorter..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Fujii Masao

On Fri, Dec 26, 2014 at 12:31 PM, Michael Paquier
 wrote:
> On Thu, Dec 25, 2014 at 10:10 PM, Michael Paquier
>  wrote:
>> On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
>>  wrote:
>>> Returning only a boolean is fine for me (that's what my first patch
>>> did), especially if we add at some point hooks for compression and
>>> decompression calls.
>> Here is a patch rebased on current HEAD (60838df) for the core feature
>> with the APIs of pglz using booleans as return values.
> After the revert of 1st patch moving pglz to src/common, I have
> reworked both patches, resulting in the attached.
>
> For pglz, the dependency to varlena has been removed to make the code
> able to run independently on both frontend and backend sides. In order
> to do that the APIs of pglz_compress and pglz_decompress have been
> changed a bit:
> - pglz_compress returns the number of bytes compressed.
> - pglz_decompress takes as additional argument the compressed length
> of the buffer, and returns the number of bytes decompressed instead of
> a simple boolean for consistency with the compression API.
> PGLZ_Header is not modified to keep the on-disk format intact.

pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend
which uses those functions needs to handle PGLZ_Header. But it basically should
be handled via the varlena macros. That is, the frontend still seems to need to
understand the varlena datatype. I think we should avoid that. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier

On Thu, Dec 25, 2014 at 10:10 PM, Michael Paquier
 wrote:
> On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
>  wrote:
>> Returning only a boolean is fine for me (that's what my first patch
>> did), especially if we add at some point hooks for compression and
>> decompression calls.
> Here is a patch rebased on current HEAD (60838df) for the core feature
> with the APIs of pglz using booleans as return values.
After the revert of 1st patch moving pglz to src/common, I have
reworked both patches, resulting in the attached.

For pglz, the dependency to varlena has been removed to make the code
able to run independently on both frontend and backend sides. In order
to do that the APIs of pglz_compress and pglz_decompress have been
changed a bit:
- pglz_compress returns the number of bytes compressed.
- pglz_decompress takes as additional argument the compressed length
of the buffer, and returns the number of bytes decompressed instead of
a simple boolean for consistency with the compression API.
PGLZ_Header is not modified to keep the on-disk format intact.

The WAL compression patch is realigned based on those changes.
Regards,
-- 
Michael


20141226_fpw_compression_v11.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier

On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
 wrote:
> Returning only a boolean is fine for me (that's what my first patch
> did), especially if we add at some point hooks for compression and
> decompression calls.
Here is a patch rebased on current HEAD (60838df) for the core feature
with the APIs of pglz using booleans as return values.
-- 
Michael
From c8891e6086682d4e5bc197ef3047068b3a3875c5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 25 Nov 2014 14:24:26 +0900
Subject: [PATCH] Support compression for full-page writes in WAL

Compression is controlled with a new parameter called wal_compression.
This parameter can be changed at session level to control WAL compression.
---
 contrib/pg_xlogdump/pg_xlogdump.c |  20 ++--
 doc/src/sgml/config.sgml  |  29 +
 src/backend/access/transam/xlog.c |   1 +
 src/backend/access/transam/xloginsert.c   | 155 ++
 src/backend/access/transam/xlogreader.c   |  77 ++---
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 src/include/access/xlogreader.h   |   7 +-
 src/include/access/xlogrecord.h   |  36 --
 src/include/pg_config.h.in|   4 +-
 11 files changed, 283 insertions(+), 57 deletions(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index 9f05e25..3ba1d8f 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -363,15 +363,12 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	 * takes up BLCKSZ bytes, minus the "hole" length.
 	 *
 	 * XXX: We peek into xlogreader's private decoded backup blocks for the
-	 * hole_length. It doesn't seem worth it to add an accessor macro for
+	 * length of block. It doesn't seem worth it to add an accessor macro for
 	 * this.
 	 */
 	fpi_len = 0;
 	for (block_id = 0; block_id <= record->max_block_id; block_id++)
-	{
-		if (XLogRecHasBlockImage(record, block_id))
-			fpi_len += BLCKSZ - record->blocks[block_id].hole_length;
-	}
+		fpi_len += record->blocks[block_id].bkp_len;
 
 	/* Update per-rmgr statistics */
 
@@ -465,9 +462,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
    blk);
 			if (XLogRecHasBlockImage(record, block_id))
 			{
-printf(" (FPW); hole: offset: %u, length: %u\n",
-	   record->blocks[block_id].hole_offset,
-	   record->blocks[block_id].hole_length);
+if (record->blocks[block_id].is_compressed)
+	printf(" (FPW compressed); hole offset: %u, "
+		   "compressed length: %u, original length: %u\n",
+		   record->blocks[block_id].hole_offset,
+		   record->blocks[block_id].bkp_len,
+		   record->blocks[block_id].bkp_uncompress_len);
+else
+	printf(" (FPW); hole offset: %u, length: %u\n",
+		   record->blocks[block_id].hole_offset,
+		   record->blocks[block_id].bkp_len);
 			}
 			putchar('\n');
 		}
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bcb106..acbbd20 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2282,6 +2282,35 @@ include_dir 'conf.d'
   
  
 
+ 
+  wal_compression (boolean)
+  
+   wal_compression configuration parameter
+  
+  
+  
+   
+When this parameter is on, the PostgreSQL
+server compresses the content of full-page writes when necessary and
+inserts in WAL records with smaller sizes, reducing the amount of
+WAL stored on disk.
+   
+
+   
+Compression has the advantage of reducing the amount of disk I/O when
+doing WAL-logging, at the cost of some extra CPU to perform the
+compression of a block image.  At WAL replay, compressed block images
+need extra CPU cycles to perform the decompression of each block
+image, but it can reduce as well replay time in I/O bounded
+environments.
+   
+
+   
+The default value is off.
+   
+  
+ 
+
  
   wal_buffers (integer)
   
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e54..d68d9e3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -88,6 +88,7 @@ char	   *XLogArchiveCommand = NULL;
 bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
+bool		wal_compression = false;
 bool		log_checkpoints = false;
 int			sync_method = DEFAULT_SYNC_METHOD;
 int			wal_level = WAL_LEVEL_MINIMAL;
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index f3d610f..a1496aa 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -24,12 +24,16 @@
 #include "access/xlog_internal.h"
 #include "access/xloginsert.h"
 #inc

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-24 Thread Michael Paquier

On Wed, Dec 24, 2014 at 8:44 PM, Fujii Masao  wrote:
> On Fri, Dec 19, 2014 at 12:19 AM, Michael Paquier
>  wrote:
> Firstly I'm thinking to commit the
> 0001-Move-pg_lzcompress.c-to-src-common.patch.
>
> pg_lzcompress.h still exists in include/utils, but it should be moved to
> include/common?
You are right. This is a remnant of first version of this patch where
pglz was added in port/ and not common/.

> Do we really need PGLZ_Status? I'm not sure whether your categorization of
> the result status of compress/decompress functions is right or not. For 
> example,
> pglz_decompress() can return PGLZ_INCOMPRESSIBLE status, but which seems
> invalid logically... Maybe this needs to be revisited when we introduce other
> compression algorithms and create the wrapper function for those compression
> and decompression functions. Anyway making pg_lzdecompress return
> the boolean value seems enough.
Returning only a boolean is fine for me (that's what my first patch
did), especially if we add at some point hooks for compression and
decompression calls.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-24 Thread Fujii Masao

On Fri, Dec 19, 2014 at 12:19 AM, Michael Paquier
 wrote:
> On Thu, Dec 18, 2014 at 5:27 PM, Fujii Masao  wrote:
>> Thanks!
> Thanks for your input.
>
>> +else
>> +memcpy(compression_scratch, page, page_len);
>>
>> I don't think the block image needs to be copied to scratch buffer here.
>> We can try to compress the "page" directly.
> Check.
>
>> +#include "utils/pg_lzcompress.h"
>>  #include "utils/memutils.h"
>>
>> pg_lzcompress.h should be after meutils.h.
> Oops.
>
>> +/* Scratch buffer used to store block image to-be-compressed */
>> +static char compression_scratch[PGLZ_MAX_BLCKSZ];
>>
>> Isn't it better to allocate the memory for compression_scratch in
>> InitXLogInsert()
>> like hdr_scratch?
> Because the OS would not touch it if wal_compression is never used,
> but now that you mention it, it may be better to get that in the
> context of xlog_insert..
>
>> +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));
>>
>> Why don't we allocate the buffer for uncompressed page only once and
>> keep reusing it like XLogReaderState->readBuf? The size of uncompressed
>> page is at most BLCKSZ, so we can allocate the memory for it even before
>> knowing the real size of each block image.
> OK, this would save some cycles. I was trying to make process allocate
> a minimum of memory only when necessary.
>
>> -printf(" (FPW); hole: offset: %u, length: %u\n",
>> -   record->blocks[block_id].hole_offset,
>> -   record->blocks[block_id].hole_length);
>> +if (record->blocks[block_id].is_compressed)
>> +printf(" (FPW); hole offset: %u, compressed length 
>> %u\n",
>> +   record->blocks[block_id].hole_offset,
>> +   record->blocks[block_id].bkp_len);
>> +else
>> +printf(" (FPW); hole offset: %u, length: %u\n",
>> +   record->blocks[block_id].hole_offset,
>> +   record->blocks[block_id].bkp_len);
>>
>> We need to consider what info about FPW we want pg_xlogdump to report.
>> I'd like to calculate how much bytes FPW was compressed, from the report
>> of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
>> and that of compressed one in the report.
> OK, so let's add a parameter in the decoder for the uncompressed
> length. Sounds fine?
>
>> In pg_config.h, the comment of BLCKSZ needs to be updated? Because
>> the maximum size of BLCKSZ can be affected by not only itemid but also
>> XLogRecordBlockImageHeader.
> Check.
>
>>  boolhas_image;
>> +boolis_compressed;
>>
>> Doesn't ResetDecoder need to reset is_compressed?
> Check.
>
>> +#wal_compression = off# enable compression of full-page writes
>> Currently wal_compression compresses only FPW, so isn't it better to place
>> it after full_page_writes in postgresql.conf.sample?
> Check.
>
>> +uint16extra_data;/* used to store offset of bytes in
>> "hole", with
>> + * last free bit used to check if block is
>> + * compressed */
>> At least to me, defining something like the following seems more easy to
>> read.
>> uint16hole_offset:15,
>> is_compressed:1
> Check++.
>
> Updated patches addressing all those things are attached.

Thanks for updating the patch!

Firstly I'm thinking to commit the
0001-Move-pg_lzcompress.c-to-src-common.patch.

pg_lzcompress.h still exists in include/utils, but it should be moved to
include/common?

Do we really need PGLZ_Status? I'm not sure whether your categorization of
the result status of compress/decompress functions is right or not. For example,
pglz_decompress() can return PGLZ_INCOMPRESSIBLE status, but which seems
invalid logically... Maybe this needs to be revisited when we introduce other
compression algorithms and create the wrapper function for those compression
and decompression functions. Anyway making pg_lzdecompress return
the boolean value seems enough.

I updated 0001-Move-pg_lzcompress.c-to-src-common.patch accordingly.
Barring objections, I will push the attached patch firstly.

Regards,

-- 
Fujii Masao
From 92a7c2ac8a6a8ec6ae84c6713f8ad30b7958de94 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 25 Nov 2014 14:05:59 +0900
Subject: [PATCH] Move pg_lzcompress.c to src/common

Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. pglz_decompress contained a call
to elog to emit an error message in case of corrupted data. This function
is changed to return a status code to let its callers return an error
instead. Compression function is changed similarly to make the whole set
consistent.
---
 src/backend/access/heap/tuptoaster.c  |   11 +-
 src/backend/utils/adt/Makefile|4 +-
 src/backend/utils/adt/pg_lzcompress.c |  779

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-23 Thread Rahila Syed

Hello,

>Updated patches addressing all those things are attached.

Below are some performance numbers using latest patch
20141219_fpw_compression_v9 . Compression looks promising with reduced
impact on CPU usage, tps and runtime.

pgbench command : pgbench  -c 16 -j 16 -r -t 25 -M prepared

To ensure that data is not highly compressible, empty filler columns were
altered using

alter table pgbench_accounts alter column filler type text using
gen_random_uuid()::text

checkpoint_segments = 1024
checkpoint_timeout =  5min
fsync = on


Compression On
 Off

WAL generated   24558983188(~24.56 GB)
 35931217248  (~ 35.93GB)

Runtime   5987.0 s
  5825.0 s


TPS tps = 668.05
 tps = 686.69

Latency average 23.935 ms
 23.211 ms



Latency stddev   80.619 ms
80.141 ms


CPU usage(user)   916.64s
  614.76s


CPU usage(system)   54.96s
64.14s


IO(average writes to disk)10.43   MB   .
12.5 MB

IO(total writes to disk)   64268.94 MB
 72920 MB


Reduction in WAL is around 32 %.  Reduction in total IO writes to disk is
around 12%.  Impact on runtime , tps and latency is very less.
CPU usage when compression is on is increased by 49% which is lesser as
compared to earlier measurements.


Server specifications:
Processors:Intel® Xeon ® Processor E5-2650 (2 GHz, 8C/16T, 20 MB) * 2 nos
RAM: 32GB
Disk : HDD  450GB 10K Hot Plug 2.5-inch SAS HDD * 8 nos
1 x 450 GB SAS HDD, 2.5-inch, 6Gb/s, 10,000 rpm

Thank you,
Rahila Syed

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Michael Paquier

On Thu, Dec 18, 2014 at 5:27 PM, Fujii Masao  wrote:
> Thanks!
Thanks for your input.

> +else
> +memcpy(compression_scratch, page, page_len);
>
> I don't think the block image needs to be copied to scratch buffer here.
> We can try to compress the "page" directly.
Check.

> +#include "utils/pg_lzcompress.h"
>  #include "utils/memutils.h"
>
> pg_lzcompress.h should be after meutils.h.
Oops.

> +/* Scratch buffer used to store block image to-be-compressed */
> +static char compression_scratch[PGLZ_MAX_BLCKSZ];
>
> Isn't it better to allocate the memory for compression_scratch in
> InitXLogInsert()
> like hdr_scratch?
Because the OS would not touch it if wal_compression is never used,
but now that you mention it, it may be better to get that in the
context of xlog_insert..

> +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));
>
> Why don't we allocate the buffer for uncompressed page only once and
> keep reusing it like XLogReaderState->readBuf? The size of uncompressed
> page is at most BLCKSZ, so we can allocate the memory for it even before
> knowing the real size of each block image.
OK, this would save some cycles. I was trying to make process allocate
a minimum of memory only when necessary.

> -printf(" (FPW); hole: offset: %u, length: %u\n",
> -   record->blocks[block_id].hole_offset,
> -   record->blocks[block_id].hole_length);
> +if (record->blocks[block_id].is_compressed)
> +printf(" (FPW); hole offset: %u, compressed length %u\n",
> +   record->blocks[block_id].hole_offset,
> +   record->blocks[block_id].bkp_len);
> +else
> +printf(" (FPW); hole offset: %u, length: %u\n",
> +   record->blocks[block_id].hole_offset,
> +   record->blocks[block_id].bkp_len);
>
> We need to consider what info about FPW we want pg_xlogdump to report.
> I'd like to calculate how much bytes FPW was compressed, from the report
> of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
> and that of compressed one in the report.
OK, so let's add a parameter in the decoder for the uncompressed
length. Sounds fine?

> In pg_config.h, the comment of BLCKSZ needs to be updated? Because
> the maximum size of BLCKSZ can be affected by not only itemid but also
> XLogRecordBlockImageHeader.
Check.

>  boolhas_image;
> +boolis_compressed;
>
> Doesn't ResetDecoder need to reset is_compressed?
Check.

> +#wal_compression = off# enable compression of full-page writes
> Currently wal_compression compresses only FPW, so isn't it better to place
> it after full_page_writes in postgresql.conf.sample?
Check.

> +uint16extra_data;/* used to store offset of bytes in
> "hole", with
> + * last free bit used to check if block is
> + * compressed */
> At least to me, defining something like the following seems more easy to
> read.
> uint16hole_offset:15,
> is_compressed:1
Check++.

Updated patches addressing all those things are attached.
Regards,
-- 
Michael


20141219_fpw_compression_v9.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Michael Paquier

On Thu, Dec 18, 2014 at 7:31 PM, Rahila Syed  wrote:
>>Isn't it better to allocate the memory for compression_scratch in
>>InitXLogInsert()
>>like hdr_scratch?
>
> I think making compression_scratch a statically allocated global variable
> is the result of  following discussion earlier,
> http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com
Yep, in this case the OS does not request this memory as long as it is
not touched, like when wal_compression is off all the time in the
backend. Robert mentioned that upthread.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Fujii Masao

On Thu, Dec 18, 2014 at 7:31 PM, Rahila Syed  wrote:
>>Isn't it better to allocate the memory for compression_scratch in
>>InitXLogInsert()
>>like hdr_scratch?
>
> I think making compression_scratch a statically allocated global variable
> is the result of  following discussion earlier,
>
> http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com

/*
 * Permanently allocate readBuf.  We do it this way, rather than just
 * making a static array, for two reasons: (1) no need to waste the
 * storage in most instantiations of the backend; (2) a static char array
 * isn't guaranteed to have any particular alignment, whereas palloc()
 * will provide MAXALIGN'd storage.
 */

The above source code comment in XLogReaderAllocate() makes me think that
it's better to avoid using a static array. The point (1) seems less important in
this case because most processes need the buffer for WAL compression,
though.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Rahila Syed

>Isn't it better to allocate the memory for compression_scratch in
>InitXLogInsert()
>like hdr_scratch?

I think making compression_scratch a statically allocated global variable
 is the result of  following discussion earlier,

http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com


Thank you,
Rahila Syed



On Thu, Dec 18, 2014 at 1:57 PM, Fujii Masao  wrote:
>
> On Thu, Dec 18, 2014 at 2:21 PM, Michael Paquier
>  wrote:
> >
> >
> > On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed 
> > wrote:
> >>
> >> I had a look at code. I have few minor points,
> >
> > Thanks!
> >
> >> +   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
> >> +
> >> +   if (is_compressed)
> >> {
> >> -   rdt_datas_last->data = page;
> >> -   rdt_datas_last->len = BLCKSZ;
> >> +   /* compressed block information */
> >> +   bimg.length = compress_len;
> >> +   bimg.extra_data = hole_offset;
> >> +   bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK;
> >>
> >> For consistency with the existing code , how about renaming the macro
> >> XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines
> of
> >> BKPBLOCK_HAS_IMAGE.
> >
> > OK, why not...
> >
> >>
> >> +   blk->hole_offset = extra_data &
> ~XLR_BLCK_COMPRESSED_MASK;
> >> Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be
> >> more indicative of the fact that lower 15 bits of extra_data field
> comprises
> >> of hole_offset value. This suggestion is also just to achieve
> consistency
> >> with the existing BKPBLOCK_FORK_MASK for fork_flags field.
> >
> > Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK
> > though.
> >
> >> And comment typo
> >> +* First try to compress block, filling in the page hole
> with
> >> zeros
> >> +* to improve the compression of the whole. If the block is
> >> considered
> >> +* as incompressible, complete the block header information
> as
> >> if
> >> +* nothing happened.
> >>
> >> As hole is no longer being compressed, this needs to be changed.
> >
> > Fixed. As well as an additional comment block down.
> >
> > A couple of things noticed on the fly:
> > - Fixed pg_xlogdump being not completely correct to report the FPW
> > information
> > - A couple of typos and malformed sentences fixed
> > - Added an assertion to check that the hole offset value does not the bit
> > used for compression status
> > - Reworked docs, mentioning as well that wal_compression is off by
> default.
> > - Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by
> > Fujii-san)
>
> Thanks!
>
> +else
> +memcpy(compression_scratch, page, page_len);
>
> I don't think the block image needs to be copied to scratch buffer here.
> We can try to compress the "page" directly.
>
> +#include "utils/pg_lzcompress.h"
>  #include "utils/memutils.h"
>
> pg_lzcompress.h should be after meutils.h.
>
> +/* Scratch buffer used to store block image to-be-compressed */
> +static char compression_scratch[PGLZ_MAX_BLCKSZ];
>
> Isn't it better to allocate the memory for compression_scratch in
> InitXLogInsert()
> like hdr_scratch?
>
> +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));
>
> Why don't we allocate the buffer for uncompressed page only once and
> keep reusing it like XLogReaderState->readBuf? The size of uncompressed
> page is at most BLCKSZ, so we can allocate the memory for it even before
> knowing the real size of each block image.
>
> -printf(" (FPW); hole: offset: %u, length: %u\n",
> -   record->blocks[block_id].hole_offset,
> -   record->blocks[block_id].hole_length);
> +if (record->blocks[block_id].is_compressed)
> +printf(" (FPW); hole offset: %u, compressed length
> %u\n",
> +   record->blocks[block_id].hole_offset,
> +   record->blocks[block_id].bkp_len);
> +else
> +printf(" (FPW); hole offset: %u, length: %u\n",
> +   record->blocks[block_id].hole_offset,
> +   record->blocks[block_id].bkp_len);
>
> We need to consider what info about FPW we want pg_xlogdump to report.
> I'd like to calculate how much bytes FPW was compressed, from the report
> of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
> and that of compressed one in the report.
>
> In pg_config.h, the comment of BLCKSZ needs to be updated? Because
> the maximum size of BLCKSZ can be affected by not only itemid but also
> XLogRecordBlockImageHeader.
>
>  boolhas_image;
> +boolis_compressed;
>
> Doesn't ResetDecoder need to reset is_compressed?
>
> +#wal_compression = off# enable compression of full-page writes
>
> Currently wal_com

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-18 Thread Fujii Masao

On Thu, Dec 18, 2014 at 2:21 PM, Michael Paquier
 wrote:
>
>
> On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed 
> wrote:
>>
>> I had a look at code. I have few minor points,
>
> Thanks!
>
>> +   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
>> +
>> +   if (is_compressed)
>> {
>> -   rdt_datas_last->data = page;
>> -   rdt_datas_last->len = BLCKSZ;
>> +   /* compressed block information */
>> +   bimg.length = compress_len;
>> +   bimg.extra_data = hole_offset;
>> +   bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK;
>>
>> For consistency with the existing code , how about renaming the macro
>> XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of
>> BKPBLOCK_HAS_IMAGE.
>
> OK, why not...
>
>>
>> +   blk->hole_offset = extra_data & ~XLR_BLCK_COMPRESSED_MASK;
>> Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be
>> more indicative of the fact that lower 15 bits of extra_data field comprises
>> of hole_offset value. This suggestion is also just to achieve consistency
>> with the existing BKPBLOCK_FORK_MASK for fork_flags field.
>
> Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK
> though.
>
>> And comment typo
>> +* First try to compress block, filling in the page hole with
>> zeros
>> +* to improve the compression of the whole. If the block is
>> considered
>> +* as incompressible, complete the block header information as
>> if
>> +* nothing happened.
>>
>> As hole is no longer being compressed, this needs to be changed.
>
> Fixed. As well as an additional comment block down.
>
> A couple of things noticed on the fly:
> - Fixed pg_xlogdump being not completely correct to report the FPW
> information
> - A couple of typos and malformed sentences fixed
> - Added an assertion to check that the hole offset value does not the bit
> used for compression status
> - Reworked docs, mentioning as well that wal_compression is off by default.
> - Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by
> Fujii-san)

Thanks!

+else
+memcpy(compression_scratch, page, page_len);

I don't think the block image needs to be copied to scratch buffer here.
We can try to compress the "page" directly.

+#include "utils/pg_lzcompress.h"
 #include "utils/memutils.h"

pg_lzcompress.h should be after meutils.h.

+/* Scratch buffer used to store block image to-be-compressed */
+static char compression_scratch[PGLZ_MAX_BLCKSZ];

Isn't it better to allocate the memory for compression_scratch in
InitXLogInsert()
like hdr_scratch?

+uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header));

Why don't we allocate the buffer for uncompressed page only once and
keep reusing it like XLogReaderState->readBuf? The size of uncompressed
page is at most BLCKSZ, so we can allocate the memory for it even before
knowing the real size of each block image.

-printf(" (FPW); hole: offset: %u, length: %u\n",
-   record->blocks[block_id].hole_offset,
-   record->blocks[block_id].hole_length);
+if (record->blocks[block_id].is_compressed)
+printf(" (FPW); hole offset: %u, compressed length %u\n",
+   record->blocks[block_id].hole_offset,
+   record->blocks[block_id].bkp_len);
+else
+printf(" (FPW); hole offset: %u, length: %u\n",
+   record->blocks[block_id].hole_offset,
+   record->blocks[block_id].bkp_len);

We need to consider what info about FPW we want pg_xlogdump to report.
I'd like to calculate how much bytes FPW was compressed, from the report
of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW
and that of compressed one in the report.

In pg_config.h, the comment of BLCKSZ needs to be updated? Because
the maximum size of BLCKSZ can be affected by not only itemid but also
XLogRecordBlockImageHeader.

 boolhas_image;
+boolis_compressed;

Doesn't ResetDecoder need to reset is_compressed?

+#wal_compression = off# enable compression of full-page writes

Currently wal_compression compresses only FPW, so isn't it better to place
it after full_page_writes in postgresql.conf.sample?

+uint16extra_data;/* used to store offset of bytes in
"hole", with
+ * last free bit used to check if block is
+ * compressed */

At least to me, defining something like the following seems more easy to
read.

uint16hole_offset:15,
is_compressed:1

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-17 Thread Michael Paquier

On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed 
wrote:

> I had a look at code. I have few minor points,
>
Thanks!

+   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
> +
> +   if (is_compressed)
> {
> -   rdt_datas_last->data = page;
> -   rdt_datas_last->len = BLCKSZ;
> +   /* compressed block information */
> +   bimg.length = compress_len;
> +   bimg.extra_data = hole_offset;
> +   bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK;
>
> For consistency with the existing code , how about renaming the macro
> XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of
> BKPBLOCK_HAS_IMAGE.
>
OK, why not...


> +   blk->hole_offset = extra_data & ~XLR_BLCK_COMPRESSED_MASK;
> Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be
> more indicative of the fact that lower 15 bits of extra_data field
> comprises of hole_offset value. This suggestion is also just to achieve
> consistency with the existing BKPBLOCK_FORK_MASK for fork_flags field.
>
Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK
though.

And comment typo
> +* First try to compress block, filling in the page hole with
> zeros
> +* to improve the compression of the whole. If the block is
> considered
> +* as incompressible, complete the block header information as
> if
> +* nothing happened.
>
> As hole is no longer being compressed, this needs to be changed.
>
Fixed. As well as an additional comment block down.

A couple of things noticed on the fly:
- Fixed pg_xlogdump being not completely correct to report the FPW
information
- A couple of typos and malformed sentences fixed
- Added an assertion to check that the hole offset value does not the bit
used for compression status
- Reworked docs, mentioning as well that wal_compression is off by default.
- Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by
Fujii-san)

Regards,
-- 
Michael


20141218_fpw_compression_v8.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-17 Thread Michael Paquier

On Thu, Dec 18, 2014 at 1:05 PM, Fujii Masao  wrote:
> On Wed, Dec 17, 2014 at 1:34 AM, Michael Paquier
>  wrote:
> I think that neither pg_control nor xl_parameter_change need to have the info
> about WAL compression because each backup block has that entry.
>
> Will review the remaining part later.
I got into wondering the utility of this part earlier this morning as
that's some remnant of when wal_compression was set as PGC_POSTMASTER.
Will remove.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-17 Thread Fujii Masao

On Wed, Dec 17, 2014 at 1:34 AM, Michael Paquier
 wrote:
>
>
> On Wed, Dec 17, 2014 at 12:00 AM, Michael Paquier
>  wrote:
>>
>> Actually, the original length of the compressed block in saved in
>> PGLZ_Header, so if we are fine to not check the size of the block
>> decompressed when decoding WAL we can do without the hole filled with zeros,
>> and use only 1 bit to see if the block is compressed or not.
>
> And.. After some more hacking, I have been able to come up with a patch that
> is able to compress blocks without the page hole, and that keeps the WAL
> record length the same as HEAD when compression switch is off. The numbers
> are pretty good, CPU is saved in the same proportions as previous patches
> when compression is enabled, and there is zero delta with HEAD when
> compression switch is off.
>
> Here are the actual numbers:
>test_name   | pg_size_pretty | user_diff | system_diff
> ---++---+-
>  FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
>  FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
>  FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
>  FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
>  FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
>  FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
>  FPW on + 0 bytes, ffactor 50  | 582 MB | 42.174297 |0.790596
>  FPW on + 0 bytes, ffactor 20  | 229 MB | 14.424233 |0.770459
>  FPW on + 0 bytes, ffactor 10  | 117 MB |  7.057195 |0.584806
>  FPW off + 0 bytes, ffactor 50 | 746 MB | 25.261998 |1.054516
>  FPW off + 0 bytes, ffactor 20 | 293 MB | 10.589888 |0.860207
>  FPW off + 0 bytes, ffactor 10 | 148 MB |  5.827191 |0.874285
>  HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
>  HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
>  HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
>  Record, ffactor 50| 582 MB | 54.904374 |0.678204
>  Record, ffactor 20| 229 MB | 19.798268 |0.807220
>  Record, ffactor 10| 116 MB |  9.401877 |0.668454
> (18 rows)
>
> The new tests of this patch are "FPW off + 0 bytes". Patches as well as
> results are attached.

I think that neither pg_control nor xl_parameter_change need to have the info
about WAL compression because each backup block has that entry.

Will review the remaining part later.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-17 Thread Rahila Syed

Hello,

>Patches as well as results are attached.

I had a look at code. I have few minor points,

+   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+
+   if (is_compressed)
{
-   rdt_datas_last->data = page;
-   rdt_datas_last->len = BLCKSZ;
+   /* compressed block information */
+   bimg.length = compress_len;
+   bimg.extra_data = hole_offset;
+   bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK;

For consistency with the existing code , how about renaming the macro
XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of
BKPBLOCK_HAS_IMAGE.

+   blk->hole_offset = extra_data & ~XLR_BLCK_COMPRESSED_MASK;
Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be
more indicative of the fact that lower 15 bits of extra_data field
comprises of hole_offset value. This suggestion is also just to achieve
consistency with the existing BKPBLOCK_FORK_MASK for fork_flags field.

And comment typo
+* First try to compress block, filling in the page hole with
zeros
+* to improve the compression of the whole. If the block is
considered
+* as incompressible, complete the block header information as
if
+* nothing happened.

As hole is no longer being compressed, this needs to be changed.

Thank you,
Rahila Syed







On Tue, Dec 16, 2014 at 10:04 PM, Michael Paquier  wrote:
>
>
>
> On Wed, Dec 17, 2014 at 12:00 AM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>>
>> Actually, the original length of the compressed block in saved in
>> PGLZ_Header, so if we are fine to not check the size of the block
>> decompressed when decoding WAL we can do without the hole filled with
>> zeros, and use only 1 bit to see if the block is compressed or not.
>>
> And.. After some more hacking, I have been able to come up with a patch
> that is able to compress blocks without the page hole, and that keeps the
> WAL record length the same as HEAD when compression switch is off. The
> numbers are pretty good, CPU is saved in the same proportions as previous
> patches when compression is enabled, and there is zero delta with HEAD when
> compression switch is off.
>
> Here are the actual numbers:
>test_name   | pg_size_pretty | user_diff | system_diff
> ---++---+-
>  FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
>  FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
>  FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
>  FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
>  FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
>  FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
>  FPW on + 0 bytes, ffactor 50  | 582 MB | 42.174297 |0.790596
>  FPW on + 0 bytes, ffactor 20  | 229 MB | 14.424233 |0.770459
>  FPW on + 0 bytes, ffactor 10  | 117 MB |  7.057195 |0.584806
>  FPW off + 0 bytes, ffactor 50 | 746 MB | 25.261998 |1.054516
>  FPW off + 0 bytes, ffactor 20 | 293 MB | 10.589888 |0.860207
>  FPW off + 0 bytes, ffactor 10 | 148 MB |  5.827191 |0.874285
>  HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
>  HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
>  HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
>  Record, ffactor 50| 582 MB | 54.904374 |0.678204
>  Record, ffactor 20| 229 MB | 19.798268 |0.807220
>  Record, ffactor 10| 116 MB |  9.401877 |0.668454
> (18 rows)
>
> The new tests of this patch are "FPW off + 0 bytes". Patches as well as
> results are attached.
> Regards,
> --
> Michael
>

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Wed, Dec 17, 2014 at 12:12 AM, Merlin Moncure  wrote:

> Compressing *one* file with lz4 and a quick/n/dirty plgz i hacked out
> of the source (borrowing heavily from
> https://github.com/maropu/pglz_bench/blob/master/pglz_bench.cpp),  I
> tested the results:
>
> lz4 real time:  0m0.032s
> pglz real time: 0m0.281s
>
> mmoncure@mernix2 ~/src/lz4/lz4-r125 $ ls -lh test.*
> -rw-r--r-- 1 mmoncure mmoncure 2.7M Dec 16 09:04 test.lz4
> -rw-r--r-- 1 mmoncure mmoncure 2.5M Dec 16 09:01 test.pglz


> A better test would examine all manner of different xlog files in a
> fashion closer to how your patch would need to compress them but the
> numbers here tell a fairly compelling story: similar compression
> results for around 9x the cpu usage.
>
Yes that could be better... Thanks for some real numbers that's really
informative.

Be advised that compression alg
> selection is one of those types of discussions that tends to spin off
> into outer space; that's not something you have to solve today.  Just
> try and make things so that they can be switched out if things
> change
>
One way to get around that would be a set of hooks to allow people to set
up the compression algorithm they want:
- One for buffer compression
- One for buffer decompression
- Perhaps one to set up the size of the buffer used for
compression/decompression scratch buffer. In the case of pglz, this needs
for example to be PGLZ_MAX_OUTPUT(buffer_size). The part actually tricky is
that as xlogreader.c is used by pg_xlogdump, we cannot directly use a hook
in it, but we may as well be able to live with a fixed maximum size like
BLCKSZ * 2 by default, this would let largely enough room for all kinds of
compression algos IMO...
Regards,
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Wed, Dec 17, 2014 at 12:00 AM, Michael Paquier  wrote:
>
> Actually, the original length of the compressed block in saved in
> PGLZ_Header, so if we are fine to not check the size of the block
> decompressed when decoding WAL we can do without the hole filled with
> zeros, and use only 1 bit to see if the block is compressed or not.
>
And.. After some more hacking, I have been able to come up with a patch
that is able to compress blocks without the page hole, and that keeps the
WAL record length the same as HEAD when compression switch is off. The
numbers are pretty good, CPU is saved in the same proportions as previous
patches when compression is enabled, and there is zero delta with HEAD when
compression switch is off.

Here are the actual numbers:
   test_name   | pg_size_pretty | user_diff | system_diff
---++---+-
 FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
 FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
 FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
 FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
 FPW on + 0 bytes, ffactor 50  | 582 MB | 42.174297 |0.790596
 FPW on + 0 bytes, ffactor 20  | 229 MB | 14.424233 |0.770459
 FPW on + 0 bytes, ffactor 10  | 117 MB |  7.057195 |0.584806
 FPW off + 0 bytes, ffactor 50 | 746 MB | 25.261998 |1.054516
 FPW off + 0 bytes, ffactor 20 | 293 MB | 10.589888 |0.860207
 FPW off + 0 bytes, ffactor 10 | 148 MB |  5.827191 |0.874285
 HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
 HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
 HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
 Record, ffactor 50| 582 MB | 54.904374 |0.678204
 Record, ffactor 20| 229 MB | 19.798268 |0.807220
 Record, ffactor 10| 116 MB |  9.401877 |0.668454
(18 rows)

The new tests of this patch are "FPW off + 0 bytes". Patches as well as
results are attached.
Regards,
-- 
Michael


results.sql
Description: Binary data


20141216_fpw_compression_v7.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Merlin Moncure

On Mon, Dec 15, 2014 at 5:37 PM, Michael Paquier
 wrote:
> On Tue, Dec 16, 2014 at 5:14 AM, Merlin Moncure  wrote:
>> OTOH, Our built in compressor as we all know is a complete dog in
>> terms of cpu when stacked up against some more modern implementations.
>> All that said, as long as there is a clean path to migrating to
>> another compression alg should one materialize, that problem can be
>> nicely decoupled from this patch as Robert pointed out.
> I am curious to see some numbers about that. Has anyone done such
> comparison measurements?

I don't, but I can make some.  There are some numbers on the web but
it's better to make some new ones because IIRC some light optimization
had gone into plgz of late.

Compressing *one* file with lz4 and a quick/n/dirty plgz i hacked out
of the source (borrowing heavily from
https://github.com/maropu/pglz_bench/blob/master/pglz_bench.cpp),  I
tested the results:

lz4 real time:  0m0.032s
pglz real time: 0m0.281s

mmoncure@mernix2 ~/src/lz4/lz4-r125 $ ls -lh test.*
-rw-r--r-- 1 mmoncure mmoncure 2.7M Dec 16 09:04 test.lz4
-rw-r--r-- 1 mmoncure mmoncure 2.5M Dec 16 09:01 test.pglz

A better test would examine all manner of different xlog files in a
fashion closer to how your patch would need to compress them but the
numbers here tell a fairly compelling story: similar compression
results for around 9x the cpu usage.  Be advised that compression alg
selection is one of those types of discussions that tends to spin off
into outer space; that's not something you have to solve today.  Just
try and make things so that they can be switched out if things
change

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Tue, Dec 16, 2014 at 11:30 PM, Michael Paquier  wrote:
>
>
>
> On Tue, Dec 16, 2014 at 11:24 PM, Alvaro Herrera  > wrote:
>>
>> Michael Paquier wrote:
>>
>> > And here are attached fresh patches reducing the WAL record size to
>> what it
>> > is in head when the compression switch is off. Looking at the logic in
>> > xlogrecord.h, the block header stores the hole length and hole offset. I
>> > changed that a bit to store the length of raw block, with hole or
>> > compressed as the 1st uint16. The second uint16 is used to store the
>> hole
>> > offset, same as HEAD when compression switch is off. When compression is
>> > on, a special value 0x is saved (actually only filling 1 in the 16th
>> > bit is fine...). Note that this forces to fill in the hole with zeros
>> and
>> > to compress always BLCKSZ worth of data.
>>
>> Why do we compress the hole?  This seems pointless, considering that we
>> know it's all zeroes.  Is it possible to compress the head and tail of
>> page separately?
>>
> This would take 2 additional bytes at minimum in the block header,
> resulting in 8 additional bytes in record each time a FPW shows up. IMO it
> is important to check the length of things obtained when replaying WAL,
> that's something the current code of HEAD does quite well.
>
Actually, the original length of the compressed block in saved in
PGLZ_Header, so if we are fine to not check the size of the block
decompressed when decoding WAL we can do without the hole filled with
zeros, and use only 1 bit to see if the block is compressed or not.
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Tue, Dec 16, 2014 at 11:24 PM, Alvaro Herrera 
wrote:
>
> Michael Paquier wrote:
>
> > And here are attached fresh patches reducing the WAL record size to what
> it
> > is in head when the compression switch is off. Looking at the logic in
> > xlogrecord.h, the block header stores the hole length and hole offset. I
> > changed that a bit to store the length of raw block, with hole or
> > compressed as the 1st uint16. The second uint16 is used to store the hole
> > offset, same as HEAD when compression switch is off. When compression is
> > on, a special value 0x is saved (actually only filling 1 in the 16th
> > bit is fine...). Note that this forces to fill in the hole with zeros and
> > to compress always BLCKSZ worth of data.
>
> Why do we compress the hole?  This seems pointless, considering that we
> know it's all zeroes.  Is it possible to compress the head and tail of
> page separately?
>
This would take 2 additional bytes at minimum in the block header,
resulting in 8 additional bytes in record each time a FPW shows up. IMO it
is important to check the length of things obtained when replaying WAL,
that's something the current code of HEAD does quite well.
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Alvaro Herrera

Michael Paquier wrote:

> And here are attached fresh patches reducing the WAL record size to what it
> is in head when the compression switch is off. Looking at the logic in
> xlogrecord.h, the block header stores the hole length and hole offset. I
> changed that a bit to store the length of raw block, with hole or
> compressed as the 1st uint16. The second uint16 is used to store the hole
> offset, same as HEAD when compression switch is off. When compression is
> on, a special value 0x is saved (actually only filling 1 in the 16th
> bit is fine...). Note that this forces to fill in the hole with zeros and
> to compress always BLCKSZ worth of data.

Why do we compress the hole?  This seems pointless, considering that we
know it's all zeroes.  Is it possible to compress the head and tail of
page separately?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-16 Thread Michael Paquier

On Tue, Dec 16, 2014 at 8:35 AM, Michael Paquier 
wrote:
> On Tue, Dec 16, 2014 at 3:46 AM, Robert Haas 
wrote:
>> On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier
>>  wrote:
>>> Something to be aware of btw is that this patch introduces an
>>> additional 8 bytes per block image in WAL as it contains additional
>>> information to control the compression. In this case this is the
>>> uint16 compress_len present in XLogRecordBlockImageHeader. In the case
>>> of the measurements done, knowing that 63638 FPWs have been written,
>>> there is a difference of a bit less than 500k in WAL between HEAD and
>>> "FPW off" in favor of HEAD. The gain with compression is welcome,
>>> still for the default there is a small price to track down if a block
>>> is compressed or not. This patch still takes advantage of it by not
>>> compressing the hole present in page and reducing CPU work a bit.
>>
>> That sounds like a pretty serious problem to me.
> OK. If that's so much a problem, I'll switch back to the version using
> 1 bit in the block header to identify if a block is compressed or not.
> This way, when switch will be off the record length will be the same
> as HEAD.
And here are attached fresh patches reducing the WAL record size to what it
is in head when the compression switch is off. Looking at the logic in
xlogrecord.h, the block header stores the hole length and hole offset. I
changed that a bit to store the length of raw block, with hole or
compressed as the 1st uint16. The second uint16 is used to store the hole
offset, same as HEAD when compression switch is off. When compression is
on, a special value 0x is saved (actually only filling 1 in the 16th
bit is fine...). Note that this forces to fill in the hole with zeros and
to compress always BLCKSZ worth of data.
Those patches pass make check-world, even WAL replay on standbys.

I have done as well measurements using this patch set, with the following
things that can be noticed:
- When compression switch is off, the same quantity of WAL as HEAD is
produced
- pglz is very bad at compressing page hole. I mean, really bad. Have a
look at the user CPU particularly when pages are empty and you'll
understand... Other compression algorithms would be better here. Tests are
done with various values of fillfactor, 10 means that after the update 80%
of the page is empty, at 50% the page is more or less completely full.

Here are the results, with 5 test cases:
- FPW on + 2 bytes, compression switch is on, using 2 additional bytes in
block header, resulting in WAL records longer as 8 more bytes are used per
block with lower CPU usage as page holes are not compressed by pglz.
- FPW off + 2 bytes, same as previous, with compression switch to on.
- FPW on + 0 bytes, compression switch to on, the same block header size as
HEAD is used, at the cost of compressing page holes filled with zeros
- FPW on + 0 bytes, compression switch to off, same as previous
- HEAD, unpatched master (except with hack to calculate user and system CPU)
- Record, the record-level compression, with compression lower-bound set at
0.

=# select test || ', ffactor ' || ffactor, pg_size_pretty(post_update -
pre_update), user_diff, system_diff from results;
   ?column?| pg_size_pretty | user_diff | system_diff
---++---+-
 FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
 FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
 FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
 FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
 FPW on + 0 bytes, ffactor 50  | 585 MB | 54.115496 |0.924891
 FPW on + 0 bytes, ffactor 20  | 234 MB | 26.270404 |0.755862
 FPW on + 0 bytes, ffactor 10  | 122 MB | 19.540131 |0.800981
 FPW off + 0 bytes, ffactor 50 | 746 MB | 25.102241 |1.110677
 FPW off + 0 bytes, ffactor 20 | 293 MB |  9.889374 |0.749884
 FPW off + 0 bytes, ffactor 10 | 148 MB |  5.286767 |0.682746
 HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
 HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
 HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
 Record, ffactor 50| 582 MB | 54.904374 |0.678204
 Record, ffactor 20| 229 MB | 19.798268 |0.807220
 Record, ffactor 10| 116 MB |  9.401877 |0.668454
(18 rows)

Attached are as well the results of the measurements, and the test case
used.
Regards,
-- 
Michael


20141216_fpw_compression_v7.tar.gz
Description: GNU Zip compressed data


results.sql
Description: Binary data


test_compress
Description: Binary data

-- 
Sent via pgsql-hackers mailin

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-15 Thread Michael Paquier

On Tue, Dec 16, 2014 at 5:14 AM, Merlin Moncure  wrote:
> OTOH, Our built in compressor as we all know is a complete dog in
> terms of cpu when stacked up against some more modern implementations.
> All that said, as long as there is a clean path to migrating to
> another compression alg should one materialize, that problem can be
> nicely decoupled from this patch as Robert pointed out.
I am curious to see some numbers about that. Has anyone done such
comparison measurements?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-15 Thread Michael Paquier

On Tue, Dec 16, 2014 at 3:46 AM, Robert Haas  wrote:
> On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier
>  wrote:
>> Something to be aware of btw is that this patch introduces an
>> additional 8 bytes per block image in WAL as it contains additional
>> information to control the compression. In this case this is the
>> uint16 compress_len present in XLogRecordBlockImageHeader. In the case
>> of the measurements done, knowing that 63638 FPWs have been written,
>> there is a difference of a bit less than 500k in WAL between HEAD and
>> "FPW off" in favor of HEAD. The gain with compression is welcome,
>> still for the default there is a small price to track down if a block
>> is compressed or not. This patch still takes advantage of it by not
>> compressing the hole present in page and reducing CPU work a bit.
>
> That sounds like a pretty serious problem to me.
OK. If that's so much a problem, I'll switch back to the version using
1 bit in the block header to identify if a block is compressed or not.
This way, when switch will be off the record length will be the same
as HEAD.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-15 Thread Merlin Moncure

On Fri, Dec 12, 2014 at 8:27 AM, Andres Freund  wrote:
> On 2014-12-12 09:24:27 -0500, Bruce Momjian wrote:
>> On Fri, Dec 12, 2014 at 03:22:24PM +0100, Andres Freund wrote:
>> > > Well, the larger question is why wouldn't we just have the user compress
>> > > the entire WAL file before archiving --- why have each backend do it?
>> > > Is it the write volume we are saving?  I though this WAL compression
>> > > gave better performance in some cases.
>> >
>> > Err. Streaming?
>>
>> Well, you can already set up SSL for compression while streaming.  In
>> fact, I assume many are already using SSL for streaming as the majority
>> of SSL overhead is from connection start.
>
> That's not really true. The overhead of SSL during streaming is
> *significant*. Both the kind of compression it does (which is far more
> expensive than pglz or lz4) and the encyrption itself. In many cases
> it's prohibitively expensive - there's even a fair number on-list
> reports about this.

(late to the party)
That may be true, but there are a number of ways to work around SSL
performance issues such as hardware acceleration (perhaps deferring
encryption to another point in the network), weakening the protocol,
or not using it at all.

OTOH, Our built in compressor as we all know is a complete dog in
terms of cpu when stacked up against some more modern implementations.
All that said, as long as there is a clean path to migrating to
another compression alg should one materialize, that problem can be
nicely decoupled from this patch as Robert pointed out.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-15 Thread Robert Haas

On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier
 wrote:
> Something to be aware of btw is that this patch introduces an
> additional 8 bytes per block image in WAL as it contains additional
> information to control the compression. In this case this is the
> uint16 compress_len present in XLogRecordBlockImageHeader. In the case
> of the measurements done, knowing that 63638 FPWs have been written,
> there is a difference of a bit less than 500k in WAL between HEAD and
> "FPW off" in favor of HEAD. The gain with compression is welcome,
> still for the default there is a small price to track down if a block
> is compressed or not. This patch still takes advantage of it by not
> compressing the hole present in page and reducing CPU work a bit.

That sounds like a pretty serious problem to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-14 Thread Michael Paquier

Note: this patch has been moved to CF 2014-12 and I marked myself as
an author if that's fine... I've finished by being really involved in
that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Michael Paquier

On Sun, Dec 14, 2014 at 1:16 PM, Andres Freund  wrote:
> On 2014-12-14 09:56:59 +0900, Michael Paquier wrote:
>> On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs  wrote:
>> > On 13 December 2014 at 14:36, Michael Paquier  
>> > wrote:
>> >
>> >> Something to be aware of btw is that this patch introduces an
>> >> additional 8 bytes per block image in WAL as it contains additional
>> >> information to control the compression. In this case this is the
>> >> uint16 compress_len present in XLogRecordBlockImageHeader.
>> >
>> > So we add 8 bytes to all FPWs, or only for compressed FPWs?
>> In this case that was all. We could still use xl_info to put a flag
>> telling that blocks are compressed, but it feels more consistent to
>> have a way to identify if a block is compressed inside its own header.
>
> Your 'consistency' argument doesn't convince me.

Could you be more precise (perhaps my use of the word "consistent" was
incorrect here)? Isn't it the most natural way of doing to have the
compression information of each block in their own headers? There may
be blocks that are marked as incompressible in a whole set, so we need
to track for each block individually if they are compressed. Now,
instead of an additional uint16 to store the compressed length of the
block, we can take 1 bit from hole_length and 1 bit from hole_offset
to store a status flag deciding if a block is compressed. If we do so,
the tradeoff is to fill in the block hole with zeros and compress
BLCKSZ worth of data all the time, costing more CPU. But doing so we
would still use only 4 bytes for the block information, making default
case, aka compression switch off, behave like HEAD in term of pure
record quantity.
This second method has been as well mentioned upthread a couple of times.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Andres Freund

On 2014-12-14 09:56:59 +0900, Michael Paquier wrote:
> On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs  wrote:
> > On 13 December 2014 at 14:36, Michael Paquier  
> > wrote:
> >
> >> Something to be aware of btw is that this patch introduces an
> >> additional 8 bytes per block image in WAL as it contains additional
> >> information to control the compression. In this case this is the
> >> uint16 compress_len present in XLogRecordBlockImageHeader.
> >
> > So we add 8 bytes to all FPWs, or only for compressed FPWs?
> In this case that was all. We could still use xl_info to put a flag
> telling that blocks are compressed, but it feels more consistent to
> have a way to identify if a block is compressed inside its own header.

Your 'consistency' argument doesn't convince me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Michael Paquier

On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs  wrote:
> On 13 December 2014 at 14:36, Michael Paquier  
> wrote:
>
>> Something to be aware of btw is that this patch introduces an
>> additional 8 bytes per block image in WAL as it contains additional
>> information to control the compression. In this case this is the
>> uint16 compress_len present in XLogRecordBlockImageHeader.
>
> So we add 8 bytes to all FPWs, or only for compressed FPWs?
In this case that was all. We could still use xl_info to put a flag
telling that blocks are compressed, but it feels more consistent to
have a way to identify if a block is compressed inside its own header.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Simon Riggs

On 13 December 2014 at 14:36, Michael Paquier  wrote:

> Something to be aware of btw is that this patch introduces an
> additional 8 bytes per block image in WAL as it contains additional
> information to control the compression. In this case this is the
> uint16 compress_len present in XLogRecordBlockImageHeader.

So we add 8 bytes to all FPWs, or only for compressed FPWs?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Michael Paquier

On Fri, Dec 12, 2014 at 11:50 PM, Michael Paquier
 wrote:
>
>
> On Wed, Dec 10, 2014 at 11:25 PM, Bruce Momjian  wrote:
>>
>> On Wed, Dec 10, 2014 at 07:40:46PM +0530, Rahila Syed wrote:
>> > The tests ran for around 30 mins.Manual checkpoint was run before each
>> > test.
>> >
>> > Compression   WAL generated%compressionLatency-avg   CPU usage
>> > (seconds)  TPS
>> > Latency
>> > stddev
>> >
>> >
>> > on  1531.4 MB  ~35 %  7.351 ms
>> >   user diff: 562.67s system diff: 41.40s  135.96
>> >   13.759 ms
>> >
>> >
>> > off  2373.1 MB 6.781
>> > ms
>> >   user diff: 354.20s  system diff: 39.67s147.40
>> >   14.152 ms
>> >
>> > The compression obtained is quite high close to 35 %.
>> > CPU usage at user level when compression is on is quite noticeably high
>> > as
>> > compared to that when compression is off. But gain in terms of reduction
>> > of WAL
>> > is also high.
>>
>> I am sorry but I can't understand the above results due to wrapping.
>> Are you saying compression was twice as slow?
>
>
> I got curious to see how the compression of an entire record would perform
> and how it compares for small WAL records, and here are some numbers based
> on the patch attached, this patch compresses the whole record including the
> block headers, letting only XLogRecord out of it with a flag indicating that
> the record is compressed (note that this patch contains a portion for replay
> untested, still this patch gives an idea on how much compression of the
> whole record affects user CPU in this test case). It uses a buffer of 4 *
> BLCKSZ, if the record is longer than that compression is simply given up.
> Those tests are using the hack upthread calculating user and system CPU
> using getrusage() when a backend.
>
> Here is the simple test case I used with 512MB of shared_buffers and small
> records, filling up a bunch of buffers, dirtying them and them compressing
> FPWs with a checkpoint.
> #!/bin/bash
> psql < SELECT pg_backend_pid();
> CREATE TABLE aa (a int);
> CREATE TABLE results (phase text, position pg_lsn);
> CREATE EXTENSION IF NOT EXISTS pg_prewarm;
> ALTER TABLE aa SET (FILLFACTOR = 50);
> INSERT INTO results VALUES ('pre-insert', pg_current_xlog_location());
> INSERT INTO aa VALUES (generate_series(1,700)); -- 484MB
> SELECT pg_size_pretty(pg_relation_size('aa'::regclass));
> SELECT pg_prewarm('aa'::regclass);
> CHECKPOINT;
> INSERT INTO results VALUES ('pre-update', pg_current_xlog_location());
> UPDATE aa SET a = 700 + a;
> CHECKPOINT;
> INSERT INTO results VALUES ('post-update', pg_current_xlog_location());
> SELECT * FROM results;
> EOF
Re-using this test case, I have produced more results by changing the
fillfactor of the table:
=# select test || ', ffactor ' || ffactor, pg_size_pretty(post_update
- pre_update), user_diff, system_diff from results;
   ?column?| pg_size_pretty | user_diff | system_diff
---++---+-
 FPW on + 2 bytes, ffactor 50  | 582 MB | 42.391894 |0.807444
 FPW on + 2 bytes, ffactor 20  | 229 MB | 14.330304 |0.729626
 FPW on + 2 bytes, ffactor 10  | 117 MB |  7.335442 |0.570996
 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503
 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448
 FPW off + 2 bytes, ffactor 10 | 148 MB |  5.762775 |0.763761
 HEAD, ffactor 50  | 746 MB | 25.181729 |1.133433
 HEAD, ffactor 20  | 293 MB |  9.962242 |0.765970
 HEAD, ffactor 10  | 148 MB |  5.693426 |0.775371
 Record, ffactor 50| 582 MB | 54.904374 |0.678204
 Record, ffactor 20| 229 MB | 19.798268 |0.807220
 Record, ffactor 10| 116 MB |  9.401877 |0.668454
(12 rows)

The following tests are run:
- "Record" means the record-level compression
- "HEAD" is postgres at 1c5c70df
- "FPW off" is HEAD + patch with switch set to off
- "FPW on" is HEAD + patch with switch set to on
The gain in compression has a linear profile with the length of page
hole. There was visibly some noise in the tests: you can see that the
CPU of "FPW off" is a bit higher than HEAD.

Something to be aware of btw is that this patch introduces an
additional 8 bytes per block image in WAL as it contains additional
information to control the compression. In this case this is the
uint16 compress_len present in XLogRecordBlockImageHeader. In the case
of the measurements done, knowing that 63638 FPWs have been written,
there is a difference of a bit less than 500k in WAL between HEAD and
"FPW off" in favor of HEAD. The gain with compression is welcome,
still for the default there is a small price to track down if a block
is compressed or not. T

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-13 Thread Simon Riggs

On 12 December 2014 at 21:40, Robert Haas  wrote:
> On Fri, Dec 12, 2014 at 1:51 PM, Simon Riggs  wrote:
>> What I don't understand is why we aren't working on double buffering,
>> since that cost would be paid in a background process and would be
>> evenly spread out across a checkpoint. Plus we'd be able to remove
>> FPWs altogether, which is like 100% compression.
>
> The previous patch to implement that - by somebody at vmware - was an
> epic fail.  I'm not opposed to seeing somebody try again, but it's a
> tricky problem.  When the double buffer fills up, then you've got to
> finish flushing the pages whose images are stored in the buffer to
> disk before you can overwrite it, which acts like a kind of
> mini-checkpoint.  That problem might be solvable, but let's use this
> thread to discuss this patch, not some other patch that someone might
> have chosen to write but didn't.

No, I think its relevant.

WAL compression looks to me like a short term tweak, not the end game.

On that basis, we should go for simple and effective, user-settable
compression of FPWs and not spend too much Valuable Committer Time on
it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Claudio Freire

On Fri, Dec 12, 2014 at 7:25 PM, Michael Paquier
 wrote:
> On Sat, Dec 13, 2014 at 1:08 AM, Robert Haas  wrote:
>> On Fri, Dec 12, 2014 at 10:04 AM, Andres Freund  wrote:
 Note that autovacuum and fsync are off.
 =# select phase, user_diff, system_diff,
 pg_size_pretty(pre_update - pre_insert),
 pg_size_pretty(post_update - pre_update) from results;
phase| user_diff | system_diff | pg_size_pretty |
 pg_size_pretty
 +---+-++
  Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB
  No compression | 25.688731 |1.236551 | 429 MB | 727 MB
  Compression record | 56.376750 |0.769603 | 429 MB | 566 MB
 (3 rows)
 If we do record-level compression, we'll need to be very careful in
 defining a lower-bound to not eat unnecessary CPU resources, perhaps
 something that should be controlled with a GUC. I presume that this stands
 true as well for the upper bound.
>>>
>>> Record level compression pretty obviously would need a lower boundary
>>> for when to use compression. It won't be useful for small heapam/btree
>>> records, but it'll be rather useful for large multi_insert, clean or
>>> similar records...
>>
>> Unless I'm missing something, this test is showing that FPW
>> compression saves 298MB of WAL for 17.3 seconds of CPU time, as
>> against master.  And compressing the whole record saves a further 1MB
>> of WAL for a further 13.39 seconds of CPU time.  That makes
>> compressing the whole record sound like a pretty terrible idea - even
>> if you get more benefit by reducing the lower boundary, you're still
>> burning a ton of extra CPU time for almost no gain on the larger
>> records.  Ouch!
>>
>> (Of course, I'm assuming that Michael's patch is reasonably efficient,
>> which might not be true.)
> Note that I was curious about the worst-case ever, aka how much CPU
> pg_lzcompress would use if everything is compressed, even the smallest
> records. So we'll surely need a lower-bound. I think that doing some
> tests with a lower bound set as a multiple of SizeOfXLogRecord would
> be fine, but in this case what we'll see is a result similar to what
> FPW compression does.


In general, lz4 (and pg_lz is similar to lz4) compresses very poorly
anything below about 128b in length. Of course there are outliers,
with some very compressible stuff, but with regular text or JSON data,
it's quite unlikely to compress at all with smaller input. Compression
is modest up to about 1k when it starts to really pay off.

That's at least my experience with lots JSON-ish, text-ish and CSV
data sets, compressible but not so much in small bits.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Michael Paquier

On Sat, Dec 13, 2014 at 1:08 AM, Robert Haas  wrote:
> On Fri, Dec 12, 2014 at 10:04 AM, Andres Freund  wrote:
>>> Note that autovacuum and fsync are off.
>>> =# select phase, user_diff, system_diff,
>>> pg_size_pretty(pre_update - pre_insert),
>>> pg_size_pretty(post_update - pre_update) from results;
>>>phase| user_diff | system_diff | pg_size_pretty |
>>> pg_size_pretty
>>> +---+-++
>>>  Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB
>>>  No compression | 25.688731 |1.236551 | 429 MB | 727 MB
>>>  Compression record | 56.376750 |0.769603 | 429 MB | 566 MB
>>> (3 rows)
>>> If we do record-level compression, we'll need to be very careful in
>>> defining a lower-bound to not eat unnecessary CPU resources, perhaps
>>> something that should be controlled with a GUC. I presume that this stands
>>> true as well for the upper bound.
>>
>> Record level compression pretty obviously would need a lower boundary
>> for when to use compression. It won't be useful for small heapam/btree
>> records, but it'll be rather useful for large multi_insert, clean or
>> similar records...
>
> Unless I'm missing something, this test is showing that FPW
> compression saves 298MB of WAL for 17.3 seconds of CPU time, as
> against master.  And compressing the whole record saves a further 1MB
> of WAL for a further 13.39 seconds of CPU time.  That makes
> compressing the whole record sound like a pretty terrible idea - even
> if you get more benefit by reducing the lower boundary, you're still
> burning a ton of extra CPU time for almost no gain on the larger
> records.  Ouch!
>
> (Of course, I'm assuming that Michael's patch is reasonably efficient,
> which might not be true.)
Note that I was curious about the worst-case ever, aka how much CPU
pg_lzcompress would use if everything is compressed, even the smallest
records. So we'll surely need a lower-bound. I think that doing some
tests with a lower bound set as a multiple of SizeOfXLogRecord would
be fine, but in this case what we'll see is a result similar to what
FPW compression does.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Robert Haas

On Fri, Dec 12, 2014 at 1:51 PM, Simon Riggs  wrote:
> What I don't understand is why we aren't working on double buffering,
> since that cost would be paid in a background process and would be
> evenly spread out across a checkpoint. Plus we'd be able to remove
> FPWs altogether, which is like 100% compression.

The previous patch to implement that - by somebody at vmware - was an
epic fail.  I'm not opposed to seeing somebody try again, but it's a
tricky problem.  When the double buffer fills up, then you've got to
finish flushing the pages whose images are stored in the buffer to
disk before you can overwrite it, which acts like a kind of
mini-checkpoint.  That problem might be solvable, but let's use this
thread to discuss this patch, not some other patch that someone might
have chosen to write but didn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Simon Riggs

On 12 December 2014 at 18:04, Bruce Momjian  wrote:

> Well, it seems we need to see some actual cases where compression does
> help before moving forward.  I thought Amit had some amazing numbers for
> WAL compression --- has that changed?

For background processes, like VACUUM, then WAL compression will be
helpful. The numbers show that only applies to FPWs.

I remain concerned about the cost in foreground processes, especially
since the cost will be paid immediately after checkpoint, making our
spikes worse.

What I don't understand is why we aren't working on double buffering,
since that cost would be paid in a background process and would be
evenly spread out across a checkpoint. Plus we'd be able to remove
FPWs altogether, which is like 100% compression.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Bruce Momjian

On Fri, Dec 12, 2014 at 05:19:42PM +0100, Andres Freund wrote:
> On 2014-12-12 11:15:46 -0500, Robert Haas wrote:
> > On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund  wrote:
> > > On 2014-12-12 11:08:52 -0500, Robert Haas wrote:
> > >> Unless I'm missing something, this test is showing that FPW
> > >> compression saves 298MB of WAL for 17.3 seconds of CPU time, as
> > >> against master.  And compressing the whole record saves a further 1MB
> > >> of WAL for a further 13.39 seconds of CPU time.  That makes
> > >> compressing the whole record sound like a pretty terrible idea - even
> > >> if you get more benefit by reducing the lower boundary, you're still
> > >> burning a ton of extra CPU time for almost no gain on the larger
> > >> records.  Ouch!
> > >
> > > Well, that test pretty much doesn't have any large records besides FPWs
> > > afaics. So it's unsurprising that it's not beneficial.
> > 
> > "Not beneficial" is rather an understatement.  It's actively harmful,
> > and not by a small margin.
> 
> Sure, but that's just because it's too simplistic. I don't think it
> makes sense to make any inference about the worthyness of the general
> approach from the, nearly obvious, fact that compressing every tiny
> record is a bad idea.

Well, it seems we need to see some actual cases where compression does
help before moving forward.  I thought Amit had some amazing numbers for
WAL compression --- has that changed?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 11:15:46 -0500, Robert Haas wrote:
> On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund  wrote:
> > On 2014-12-12 11:08:52 -0500, Robert Haas wrote:
> >> Unless I'm missing something, this test is showing that FPW
> >> compression saves 298MB of WAL for 17.3 seconds of CPU time, as
> >> against master.  And compressing the whole record saves a further 1MB
> >> of WAL for a further 13.39 seconds of CPU time.  That makes
> >> compressing the whole record sound like a pretty terrible idea - even
> >> if you get more benefit by reducing the lower boundary, you're still
> >> burning a ton of extra CPU time for almost no gain on the larger
> >> records.  Ouch!
> >
> > Well, that test pretty much doesn't have any large records besides FPWs
> > afaics. So it's unsurprising that it's not beneficial.
> 
> "Not beneficial" is rather an understatement.  It's actively harmful,
> and not by a small margin.

Sure, but that's just because it's too simplistic. I don't think it
makes sense to make any inference about the worthyness of the general
approach from the, nearly obvious, fact that compressing every tiny
record is a bad idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Robert Haas

On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund  wrote:
> On 2014-12-12 11:08:52 -0500, Robert Haas wrote:
>> Unless I'm missing something, this test is showing that FPW
>> compression saves 298MB of WAL for 17.3 seconds of CPU time, as
>> against master.  And compressing the whole record saves a further 1MB
>> of WAL for a further 13.39 seconds of CPU time.  That makes
>> compressing the whole record sound like a pretty terrible idea - even
>> if you get more benefit by reducing the lower boundary, you're still
>> burning a ton of extra CPU time for almost no gain on the larger
>> records.  Ouch!
>
> Well, that test pretty much doesn't have any large records besides FPWs
> afaics. So it's unsurprising that it's not beneficial.

"Not beneficial" is rather an understatement.  It's actively harmful,
and not by a small margin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-12 Thread Andres Freund

On 2014-12-12 11:08:52 -0500, Robert Haas wrote:
> Unless I'm missing something, this test is showing that FPW
> compression saves 298MB of WAL for 17.3 seconds of CPU time, as
> against master.  And compressing the whole record saves a further 1MB
> of WAL for a further 13.39 seconds of CPU time.  That makes
> compressing the whole record sound like a pretty terrible idea - even
> if you get more benefit by reducing the lower boundary, you're still
> burning a ton of extra CPU time for almost no gain on the larger
> records.  Ouch!

Well, that test pretty much doesn't have any large records besides FPWs
afaics. So it's unsurprising that it's not beneficial.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 >

1 - 100 of 255 matches

Mail list logo