Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Mar 11, 2015 at 3:57 PM, Fujii Masao wrote: > On Wed, Mar 11, 2015 at 7:08 AM, Rahila Syed wrote: >> Hello, >> >>>I have some minor comments >> >> The comments have been implemented in the attached patch. > > Thanks for updating the patch! I just changed a bit and finally pushed it. > Thanks everyone involved in this patch! Woohoo! Thanks! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Mar 11, 2015 at 7:08 AM, Rahila Syed wrote: > Hello, > >>I have some minor comments > > The comments have been implemented in the attached patch. Thanks for updating the patch! I just changed a bit and finally pushed it. Thanks everyone involved in this patch! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Mar 9, 2015 at 9:08 PM, Michael Paquier wrote: > On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao wrote: >> On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier >> wrote: >>> On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila >>> wrote: Please find attached a patch. As discussed, flag to denote compression and presence of hole in block image has been added in XLogRecordImageHeader rather than block header. >> >> Thanks for updating the patch! Attached is the refactored version of the >> patch. > > Cool. Thanks! > > I have some minor comments: Thanks for the comments! > +Turning this parameter on can reduce the WAL volume without > "Turning on this parameter That tag is not used in other place in config.sgml, so I'm not sure if that's really necessary. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >I have some minor comments The comments have been implemented in the attached patch. >I think that extra parenthesis should be used for the first expression >with BKPIMAGE_HAS_HOLE. Parenthesis have been added to improve code readability. Thank you, Rahila Syed On Mon, Mar 9, 2015 at 5:38 PM, Michael Paquier wrote: > On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao wrote: > > On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier > > wrote: > >> On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila > wrote: > >>> Please find attached a patch. As discussed, flag to denote > compression and presence of hole in block image has been added in > XLogRecordImageHeader rather than block header. > > > > Thanks for updating the patch! Attached is the refactored version of the > patch. > > Cool. Thanks! > > I have some minor comments: > > +The default value is off > Dot at the end of this sentence. > > +Turning this parameter on can reduce the WAL volume without > "Turning on this parameter > > +but at the cost of some extra CPU time by the compression during > +WAL logging and the decompression during WAL replay." > Isn't a verb missing here, for something like that: > "but at the cost of some extra CPU spent on the compression during WAL > logging and on the decompression during WAL replay." > > + * This can reduce the WAL volume, but at some extra cost of CPU time > + * by the compression during WAL logging. > Er, similarly "some extra cost of CPU spent on the compression...". > > + if (blk->bimg_info & BKPIMAGE_HAS_HOLE && > + (blk->hole_offset == 0 || > +blk->hole_length == 0 || > I think that extra parenthesis should be used for the first expression > with BKPIMAGE_HAS_HOLE. > > + if (blk->bimg_info & > BKPIMAGE_IS_COMPRESSED && > + blk->bimg_len == BLCKSZ) > + { > Same here. > > + /* > +* cross-check that hole_offset == 0 > and hole_length == 0 > +* if the HAS_HOLE flag is set. > +*/ > I think that you mean here that this happens when the flag is *not* set. > > + /* > +* If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED, > +* an XLogRecordBlockCompressHeader follows > +*/ > Maybe a "struct" should be added for "an XLogRecordBlockCompressHeader > struct". And a dot at the end of the sentence should be added? > > Regards, > -- > Michael > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > Support-compression-full-page-writes-in-WAL_v25.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Mar 9, 2015 at 9:08 PM, Michael Paquier wrote: > On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao wrote: >> Thanks for updating the patch! Attached is the refactored version of the >> patch. Fujii-san and I had a short chat about tuning a bit the PGLZ strategy which is now PGLZ_strategy_default in the patch (at least 25% of compression, etc.). In particular min_input_size which is not set at 32B is too low, and knowing that the minimum fillfactor of a relation page is 10% this looks really too low. For example, using the extension attached to this email able to compress and decompress bytea strings that I have developed after pglz has been moved to libpqcommon (contains as well a function able to get a relation page without its hole, feel free to use it), I am seeing that we can gain quite a lot of space even with some incompressible data like UUID or some random float data (pages are compressed without their hole): 1) Float table: =# create table float_tab (id float); CREATE TABLE =# insert into float_tab select random() from generate_series(1, 20); INSERT 0 20 =# SELECT bytea_size(compress_data(page)) AS compress_size, bytea_size(page) AS raw_size_no_hole FROM get_raw_page('float_tab'::regclass, 0, false); -[ RECORD 1 ]+ compress_size| 329 raw_size_no_hole | 744 =# SELECT bytea_size(compress_data(page)) AS compress_size, bytea_size(page) AS raw_size_no_hole FROM get_raw_page('float_tab'::regclass, 0, false); -[ RECORD 1 ]+- compress_size| 1753 raw_size_no_hole | 4344 So that's more or less 60% saved... 2) UUID table =# SELECT bytea_size(compress_data(page)) AS compress_size, bytea_size(page) AS raw_size_no_hole FROM get_raw_page('uuid_tab'::regclass, 0, false); -[ RECORD 1 ]+ compress_size| 590 raw_size_no_hole | 904 =# insert into uuid_tab select gen_random_uuid() from generate_series(1, 100); INSERT 0 100 =# SELECT bytea_size(compress_data(page)) AS compress_size, bytea_size(page) AS raw_size_no_hole FROM get_raw_page('uuid_tab'::regclass, 0, false); -[ RECORD 1 ]+- compress_size| 3338 raw_size_no_hole | 5304 And in this case we are close to 40% saved... At least, knowing that with the header there are at least 24B used on a page, what about increasing min_input_size to something like 128B or 256B? I don't think that this is a blocker for this patch as most of the relation pages are going to have far more data than that so they will be unconditionally compressed, but there is definitely something we could do in this area later on, perhaps even we could do improvement with the other parameters like the compression rate. So that's something to keep in mind... -- Michael compress_test.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao wrote: > On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier > wrote: >> On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila wrote: >>> Please find attached a patch. As discussed, flag to denote compression and >>> presence of hole in block image has been added in XLogRecordImageHeader >>> rather than block header. > > Thanks for updating the patch! Attached is the refactored version of the > patch. Cool. Thanks! I have some minor comments: +The default value is off Dot at the end of this sentence. +Turning this parameter on can reduce the WAL volume without "Turning on this parameter +but at the cost of some extra CPU time by the compression during +WAL logging and the decompression during WAL replay." Isn't a verb missing here, for something like that: "but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay." + * This can reduce the WAL volume, but at some extra cost of CPU time + * by the compression during WAL logging. Er, similarly "some extra cost of CPU spent on the compression...". + if (blk->bimg_info & BKPIMAGE_HAS_HOLE && + (blk->hole_offset == 0 || +blk->hole_length == 0 || I think that extra parenthesis should be used for the first expression with BKPIMAGE_HAS_HOLE. + if (blk->bimg_info & BKPIMAGE_IS_COMPRESSED && + blk->bimg_len == BLCKSZ) + { Same here. + /* +* cross-check that hole_offset == 0 and hole_length == 0 +* if the HAS_HOLE flag is set. +*/ I think that you mean here that this happens when the flag is *not* set. + /* +* If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED, +* an XLogRecordBlockCompressHeader follows +*/ Maybe a "struct" should be added for "an XLogRecordBlockCompressHeader struct". And a dot at the end of the sentence should be added? Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Mar 5, 2015 at 10:08 PM, Michael Paquier wrote: > On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila wrote: >> Please find attached a patch. As discussed, flag to denote compression and >> presence of hole in block image has been added in XLogRecordImageHeader >> rather than block header. Thanks for updating the patch! Attached is the refactored version of the patch. Regards, -- Fujii Masao *** a/contrib/pg_xlogdump/pg_xlogdump.c --- b/contrib/pg_xlogdump/pg_xlogdump.c *** *** 359,376 XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats, rec_len = XLogRecGetDataLen(record) + SizeOfXLogRecord; /* ! * Calculate the amount of FPI data in the record. Each backup block ! * takes up BLCKSZ bytes, minus the "hole" length. * * XXX: We peek into xlogreader's private decoded backup blocks for the ! * hole_length. It doesn't seem worth it to add an accessor macro for ! * this. */ fpi_len = 0; for (block_id = 0; block_id <= record->max_block_id; block_id++) { if (XLogRecHasBlockImage(record, block_id)) ! fpi_len += BLCKSZ - record->blocks[block_id].hole_length; } /* Update per-rmgr statistics */ --- 359,375 rec_len = XLogRecGetDataLen(record) + SizeOfXLogRecord; /* ! * Calculate the amount of FPI data in the record. * * XXX: We peek into xlogreader's private decoded backup blocks for the ! * bimg_len indicating the length of FPI data. It doesn't seem worth it to ! * add an accessor macro for this. */ fpi_len = 0; for (block_id = 0; block_id <= record->max_block_id; block_id++) { if (XLogRecHasBlockImage(record, block_id)) ! fpi_len += record->blocks[block_id].bimg_len; } /* Update per-rmgr statistics */ *** *** 465,473 XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record) blk); if (XLogRecHasBlockImage(record, block_id)) { ! printf(" (FPW); hole: offset: %u, length: %u\n", ! record->blocks[block_id].hole_offset, ! record->blocks[block_id].hole_length); } putchar('\n'); } --- 464,485 blk); if (XLogRecHasBlockImage(record, block_id)) { ! if (record->blocks[block_id].bimg_info & ! BKPIMAGE_IS_COMPRESSED) ! { ! printf(" (FPW); hole: offset: %u, length: %u, compression saved: %u\n", ! record->blocks[block_id].hole_offset, ! record->blocks[block_id].hole_length, ! BLCKSZ - ! record->blocks[block_id].hole_length - ! record->blocks[block_id].bimg_len); ! } ! else ! { ! printf(" (FPW); hole: offset: %u, length: %u\n", ! record->blocks[block_id].hole_offset, ! record->blocks[block_id].hole_length); ! } } putchar('\n'); } *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 2282,2287 include_dir 'conf.d' --- 2282,2311 + + wal_compression (boolean) + +wal_compression configuration parameter + + + + + When this parameter is on, the PostgreSQL + server compresses a full page image written to WAL when + is on or during a base backup. + A compressed page image will be decompressed during WAL replay. + The default value is off + + + + Turning this parameter on can reduce the WAL volume without + increasing the risk of unrecoverable data corruption, + but at the cost of some extra CPU time by the compression during + WAL logging and the decompression during WAL replay. + + + + wal_buffers (integer) *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 89,94 char *XLogArchiveCommand = NULL; --- 89,95 bool EnableHotStandby = false; bool fullPageWrites = true; bool wal_log_hints = false; + bool wal_compression = false; bool log_checkpoints = false; int sync_method = DEFAULT_SYNC_METHOD; int wal_level = WAL_LEVEL_MINIMAL; *** a/src/backend/access/transam/xloginsert.c --- b/src/backend/access/transam/xloginsert.c *** *** 24,35 --- 24,39 #include "access/xlog_internal.h" #include "access/xloginsert.h" #include "catalog/pg_control.h" + #include "common/pg_lzcompress.h" #include "miscadmin.h" #include "storage/bufmgr.h" #include "storage/proc.h" #include "utils/memutils.h" #include "pg_trace.h" + /* Buffer size required to store a compressed version of backup block image */ + #define PGLZ_MAX_BLCKSZ PGLZ_MAX_OUTPUT(BLCKSZ) + /* * For each block reference registered with XLogRegisterBuffer, we fill in * a registered_buffer struct. *** *** 50,55 typedef struct --- 54,62 XLogRecData bkp_rdatas[2]; /* temporary rdatas used to hold references to * backup block data
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 16, 2015 at 9:08 PM, Andres Freund wrote: > On 2015-02-16 20:55:20 +0900, Michael Paquier wrote: >> On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila >> wrote: >> >> > >> > Regarding the sanity checks that have been added recently. I think that >> > they are useful but I am suspecting as well that only a check on the record >> > CRC is done because that's reliable enough and not doing those checks >> > accelerates a bit replay. So I am thinking that we should simply replace >> > >them by assertions. >> > >> > Removing the checks makes sense as CRC ensures correctness . Moreover ,as >> > error message for invalid length of record is present in the code , >> > messages for invalid block length can be redundant. >> > >> > Checks have been replaced by assertions in the attached patch. >> > >> >> After more thinking, we may as well simply remove them, an error with CRC >> having high chances to complain before reaching this point... > > Surely not. The existing code explicitly does it like > if (blk->has_data && blk->data_len == 0) > report_invalid_record(state, > "BKPBLOCK_HAS_DATA set, but no data included at > %X/%X", > (uint32) (state->ReadRecPtr >> 32), (uint32) > state->ReadRecPtr); > these cross checks are important. And I see no reason to deviate from > that. The CRC sum isn't foolproof - we intentionally do checks at > several layers. And, as you can see from some other locations, we > actually try to *not* fatally error out when hitting them at times - so > an Assert also is wrong. > > Heikki: > /* cross-check that the HAS_DATA flag is set iff data_length > 0 */ > if (blk->has_data && blk->data_len == 0) > report_invalid_record(state, > "BKPBLOCK_HAS_DATA set, but no data included at > %X/%X", > (uint32) > (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr); > if (!blk->has_data && blk->data_len != 0) > report_invalid_record(state, > "BKPBLOCK_HAS_DATA not set, but data length is %u at %X/%X", > (unsigned int) > blk->data_len, > > (uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr); > those look like they're missing a goto err; to me. Yes. I pushed the fix. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Mar 5, 2015 at 10:28 PM, Andres Freund wrote: > On 2015-03-05 12:14:04 +, Syed, Rahila wrote: >> Please find attached a patch. As discussed, flag to denote >> compression and presence of hole in block image has been added in >> XLogRecordImageHeader rather than block header. > > FWIW, I personally won't commit it with things done that way. I think > it's going the wrong way, leading to a harder to interpret and less > flexible format. I'm not going to further protest if Fujii or Heikki > commit it this way though. I'm pretty sure that we can discuss the *better* WAL format even after committing this patch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2015-03-05 12:14:04 +, Syed, Rahila wrote: > Please find attached a patch. As discussed, flag to denote > compression and presence of hole in block image has been added in > XLogRecordImageHeader rather than block header. FWIW, I personally won't commit it with things done that way. I think it's going the wrong way, leading to a harder to interpret and less flexible format. I'm not going to further protest if Fujii or Heikki commit it this way though. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila wrote: > Please find attached a patch. As discussed, flag to denote compression and > presence of hole in block image has been added in XLogRecordImageHeader > rather than block header. > > Following are WAL numbers based on attached test script posted by Michael > earlier in the thread. > > WAL generated > FPW compression on 122.032 MB > > FPW compression off 155.223 MB > > HEAD 155.236 MB > > Compression : 21 % > Number of block images generated in WAL : 63637 ISTM that we are getting a nice thing here. I tested the patch and WAL replay is working correctly. Some nitpicky comments... + * bkp_info stores flags for information about the backup block image + * BKPIMAGE_IS_COMPRESSED is used to identify if a given block image is compressed. + * BKPIMAGE_WITH_HOLE is used to identify the presence of a hole in a block image. + * If the block image has no hole, it is ensured that the raw size of a compressed + * block image is equal to BLCKSZ, hence the contents of + * XLogRecordBlockImageCompressionInfo are not necessary. Take care of the limit of 80 characters per line. (Perhaps you could run pgindent on your code before sending a patch?). The first line of this paragraph is a sentence in itself, no? In xlogreader.c, blk->with_hole is a boolean, you could remove the ==0 and ==1 it is compared with. + /* + * Length of a block image must be less than BLCKSZ + * if the block has hole + */ "if the block has a hole." (End of the sentence needs a dot.) + /* +* Length of a block image must be equal to BLCKSZ +* if the block does not have hole +*/ "if the block does not have a hole." Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, Please find attached a patch. As discussed, flag to denote compression and presence of hole in block image has been added in XLogRecordImageHeader rather than block header. Following are WAL numbers based on attached test script posted by Michael earlier in the thread. WAL generated FPW compression on 122.032 MB FPW compression off 155.223 MB HEAD 155.236 MB Compression : 21 % Number of block images generated in WAL : 63637 Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-of-full-page-writes-in-WAL_v23.patch Description: Support-compression-of-full-page-writes-in-WAL_v23.patch compress_run.bash Description: compress_run.bash -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >Are there any other flag bits that we should or are planning to add into WAL >header newly, except the above two? If yes and they are required by even a >block which doesn't have an image, I will change my mind and agree to add >something like chunk ID to a block header. >But I guess the answer of the question is No. Since the flag bits now we are >thinking to add are required only by a block having an image, adding them into >a block header (instead of block image header) seems a waste of bytes in WAL. >So I concur with Michael. I agree. As per my understanding, this change of xlog format was to provide for future enhancement which would need flags relevant to entire block. But as mentioned, currently the flags being added are related to block image only. Hence for this patch it makes sense to add a field to XLogRecordImageHeader rather than block header. This will also save bytes per WAL record. Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Mar 4, 2015 at 12:41 AM, Syed, Rahila wrote: > Please find attached updated patch with WAL replay error fixed. The patch > follows chunk ID approach of xlog format. (Review done independently of the chunk_id stuff being good or not, already gave my opinion on the matter). * readRecordBufSize is set to the new buffer size. - * + The patch has some noise diffs. You may want to change the values of BKPBLOCK_WILL_INIT and BKPBLOCK_SAME_REL to respectively 0x01 and 0x02. + uint8 chunk_id = 0; + chunk_id |= XLR_CHUNK_BLOCK_REFERENCE; Why not simply that: chunk_id = XLR_CHUNK_BLOCK_REFERENCE; +#define XLR_CHUNK_ID_DATA_SHORT255 +#define XLR_CHUNK_ID_DATA_LONG 254 Why aren't those just using one bit as well? This seems inconsistent with the rest. + if ((blk->with_hole == 0 && blk->hole_offset != 0) || + (blk->with_hole == 1 && blk->hole_offset <= 0)) In xlogreader.c blk->with_hole is defined as a boolean but compared with an integer, could you remove the ==0 and ==1 portions for clarity? - goto err; + goto err; } } - if (remaining != datatotal) This gathers incorrect code alignment and unnecessary diffs. typedef struct XLogRecordBlockHeader { + /* Chunk ID precedes */ + uint8 id; What prevents the declaration of chunk_id as an int8 here instead of this comment? This is confusing. > Following are brief measurement numbers. > WAL > FPW compression on 122.032 MB > FPW compression off 155.239 MB > HEAD 155.236 MB What is the test run in this case? How many block images have been generated in WAL for each case? You could gather some of those numbers with pg_xlogdump --stat for example. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Mar 3, 2015 at 9:34 AM, Michael Paquier wrote: > On Tue, Mar 3, 2015 at 9:24 AM, Andres Freund wrote: >> On 2015-03-03 08:59:30 +0900, Michael Paquier wrote: >>> Already mentioned upthread, but I agree with Fujii-san here: adding >>> information related to the state of a block image in >>> XLogRecordBlockHeader makes little sense because we are not sure to >>> have a block image, perhaps there is only data associated to it, and >>> that we should control that exclusively in XLogRecordBlockImageHeader >>> and let the block ID alone for now. >> >> This argument doesn't make much sense to me. The flag byte could very >> well indicate 'block reference without image following' vs 'block >> reference with data + hole following' vs 'block reference with >> compressed data following'. > > Information about the state of a block is decoupled with its > existence, aka in the block header, we should control if: > - record has data > - record has a block > And in the block image header, we control if the block is: > - compressed or not > - has a hole or not. Are there any other flag bits that we should or are planning to add into WAL header newly, except the above two? If yes and they are required by even a block which doesn't have an image, I will change my mind and agree to add something like chunk ID to a block header. But I guess the answer of the question is No. Since the flag bits now we are thinking to add are required only by a block having an image, adding them into a block header (instead of block image header) seems a waste of bytes in WAL. So I concur with Michael. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >It would be good to get those problems fixed first. Could you send an updated >patch? Please find attached updated patch with WAL replay error fixed. The patch follows chunk ID approach of xlog format. Following are brief measurement numbers. WAL FPW compression on 122.032 MB FPW compression off 155.239 MB HEAD 155.236 MB Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-of-full-page-writes_v22.patch Description: Support-compression-of-full-page-writes_v22.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Mar 3, 2015 at 9:24 AM, Andres Freund wrote: > On 2015-03-03 08:59:30 +0900, Michael Paquier wrote: >> Already mentioned upthread, but I agree with Fujii-san here: adding >> information related to the state of a block image in >> XLogRecordBlockHeader makes little sense because we are not sure to >> have a block image, perhaps there is only data associated to it, and >> that we should control that exclusively in XLogRecordBlockImageHeader >> and let the block ID alone for now. > > This argument doesn't make much sense to me. The flag byte could very > well indicate 'block reference without image following' vs 'block > reference with data + hole following' vs 'block reference with > compressed data following'. Information about the state of a block is decoupled with its existence, aka in the block header, we should control if: - record has data - record has a block And in the block image header, we control if the block is: - compressed or not - has a hole or not. Are you willing to sacrifice bytes in the block header to control if a block is compressed or has a hole even if the block has only data but no image? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2015-03-03 08:59:30 +0900, Michael Paquier wrote: > Already mentioned upthread, but I agree with Fujii-san here: adding > information related to the state of a block image in > XLogRecordBlockHeader makes little sense because we are not sure to > have a block image, perhaps there is only data associated to it, and > that we should control that exclusively in XLogRecordBlockImageHeader > and let the block ID alone for now. This argument doesn't make much sense to me. The flag byte could very well indicate 'block reference without image following' vs 'block reference with data + hole following' vs 'block reference with compressed data following'. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Mar 3, 2015 at 5:17 AM, Rahila Syed wrote: > Hello, > >>When I test the WAL or replication related features, I usually run >>"make installcheck" and pgbench against the master at the same time >>after setting up the replication environment. > I will conduct these tests before sending updated version. > >>Seems this increases the header size of WAL record even if no backup block >> image is included. Right? > Yes, this increases the header size of WAL record by 1 byte for every block > reference even if it has no backup block image. > >>Isn't it better to add the flag info about backup block image into >> XLogRecordBlockImageHeader rather than XLogRecordBlockHeader > Yes , this will make the code extensible,readable and will save couple of > bytes per record. > But the current approach is to provide a chunk ID identifying different > xlog record fragments like main data , block references etc. > Currently , block ID is used to identify record fragments which can be > either XLR_BLOCK_ID_DATA_SHORT , XLR_BLOCK_ID_DATA_LONG or actual block ID. > This can be replaced by chunk ID to separate it from block ID. Block ID can > be used to number the block fragments whereas chunk ID can be used to > distinguish between main data fragments and block references. Chunk ID of > block references can contain information about presence of data, image , > hole and compression. > Chunk ID for main data fragments remains as it is . This approach provides > for readability and extensibility. Already mentioned upthread, but I agree with Fujii-san here: adding information related to the state of a block image in XLogRecordBlockHeader makes little sense because we are not sure to have a block image, perhaps there is only data associated to it, and that we should control that exclusively in XLogRecordBlockImageHeader and let the block ID alone for now. Hence we'd better have 1 extra int8 in XLogRecordBlockImageHeader with now 2 flags: - Is block compressed or not? - Does block have a hole? Perhaps this will not be considered as ugly, and this leaves plenty of room for storing a version number for compression. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >When I test the WAL or replication related features, I usually run >"make installcheck" and pgbench against the master at the same time >after setting up the replication environment. I will conduct these tests before sending updated version. >Seems this increases the header size of WAL record even if no backup block image is included. Right? Yes, this increases the header size of WAL record by 1 byte for every block reference even if it has no backup block image. >Isn't it better to add the flag info about backup block image into XLogRecordBlockImageHeader rather than XLogRecordBlockHeader Yes , this will make the code extensible,readable and will save couple of bytes per record. But the current approach is to provide a chunk ID identifying different xlog record fragments like main data , block references etc. Currently , block ID is used to identify record fragments which can be either XLR_BLOCK_ID_DATA_SHORT , XLR_BLOCK_ID_DATA_LONG or actual block ID. This can be replaced by chunk ID to separate it from block ID. Block ID can be used to number the block fragments whereas chunk ID can be used to distinguish between main data fragments and block references. Chunk ID of block references can contain information about presence of data, image , hole and compression. Chunk ID for main data fragments remains as it is . This approach provides for readability and extensibility. Thank you, Rahila Syed On Mon, Mar 2, 2015 at 3:43 PM, Fujii Masao wrote: > On Fri, Feb 27, 2015 at 12:44 PM, Michael Paquier > wrote: > > On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier > > wrote: > >> On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed > wrote: > Even this patch doesn't work fine. The standby emit the following > error messages. > >>> > >>> Yes this bug remains unsolved. I am still working on resolving this. > >>> > >>> Following chunk IDs have been added in the attached patch as suggested > >>> upthread. > >>> +#define XLR_CHUNK_BLOCK_REFERENCE 0x10 > >>> +#define XLR_CHUNK_BLOCK_HAS_IMAGE 0x04 > >>> +#define XLR_CHUNK_BLOCK_HAS_DATA 0x08 > >>> > >>> XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references. > >>> XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE > >>> and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA. > >> > >> Before sending a new version, be sure that this get fixed by for > >> example building up a master with a standby replaying WAL, and running > >> make installcheck-world or similar. If the standby does not complain > >> at all, you have good chances to not have bugs. You could also build > >> with WAL_DEBUG to check record consistency. > > +1 > > When I test the WAL or replication related features, I usually run > "make installcheck" and pgbench against the master at the same time > after setting up the replication environment. > > typedef struct XLogRecordBlockHeader > { > +uint8chunk_id;/* xlog fragment id */ > uint8id;/* block reference ID */ > > Seems this increases the header size of WAL record even if no backup block > image is included. Right? Isn't it better to add the flag info about backup > block image into XLogRecordBlockImageHeader rather than > XLogRecordBlockHeader? > Originally we borrowed one or two bits from its existing fields to minimize > the header size, but we can just add new flag field if we prefer > the extensibility and readability of the code. > > Regards, > > -- > Fujii Masao >
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Feb 27, 2015 at 12:44 PM, Michael Paquier wrote: > On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier > wrote: >> On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed wrote: Even this patch doesn't work fine. The standby emit the following error messages. >>> >>> Yes this bug remains unsolved. I am still working on resolving this. >>> >>> Following chunk IDs have been added in the attached patch as suggested >>> upthread. >>> +#define XLR_CHUNK_BLOCK_REFERENCE 0x10 >>> +#define XLR_CHUNK_BLOCK_HAS_IMAGE 0x04 >>> +#define XLR_CHUNK_BLOCK_HAS_DATA 0x08 >>> >>> XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references. >>> XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE >>> and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA. >> >> Before sending a new version, be sure that this get fixed by for >> example building up a master with a standby replaying WAL, and running >> make installcheck-world or similar. If the standby does not complain >> at all, you have good chances to not have bugs. You could also build >> with WAL_DEBUG to check record consistency. +1 When I test the WAL or replication related features, I usually run "make installcheck" and pgbench against the master at the same time after setting up the replication environment. typedef struct XLogRecordBlockHeader { +uint8chunk_id;/* xlog fragment id */ uint8id;/* block reference ID */ Seems this increases the header size of WAL record even if no backup block image is included. Right? Isn't it better to add the flag info about backup block image into XLogRecordBlockImageHeader rather than XLogRecordBlockHeader? Originally we borrowed one or two bits from its existing fields to minimize the header size, but we can just add new flag field if we prefer the extensibility and readability of the code. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Feb 27, 2015 at 8:01 AM, Michael Paquier wrote: > On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed wrote: >>>Even this patch doesn't work fine. The standby emit the following >>>error messages. >> >> Yes this bug remains unsolved. I am still working on resolving this. >> >> Following chunk IDs have been added in the attached patch as suggested >> upthread. >> +#define XLR_CHUNK_BLOCK_REFERENCE 0x10 >> +#define XLR_CHUNK_BLOCK_HAS_IMAGE 0x04 >> +#define XLR_CHUNK_BLOCK_HAS_DATA 0x08 >> >> XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references. >> XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE >> and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA. > > Before sending a new version, be sure that this get fixed by for > example building up a master with a standby replaying WAL, and running > make installcheck-world or similar. If the standby does not complain > at all, you have good chances to not have bugs. You could also build > with WAL_DEBUG to check record consistency. It would be good to get those problems fixed first. Could you send an updated patch? I'll look into it in more details. For the time being I am switching this patch to "Waiting on Author". -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Feb 27, 2015 at 6:54 AM, Rahila Syed wrote: > Hello, > >>Even this patch doesn't work fine. The standby emit the following >>error messages. > > Yes this bug remains unsolved. I am still working on resolving this. > > Following chunk IDs have been added in the attached patch as suggested > upthread. > +#define XLR_CHUNK_BLOCK_REFERENCE 0x10 > +#define XLR_CHUNK_BLOCK_HAS_IMAGE 0x04 > +#define XLR_CHUNK_BLOCK_HAS_DATA 0x08 > > XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references. > XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE > and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA. Before sending a new version, be sure that this get fixed by for example building up a master with a standby replaying WAL, and running make installcheck-world or similar. If the standby does not complain at all, you have good chances to not have bugs. You could also build with WAL_DEBUG to check record consistency. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >Even this patch doesn't work fine. The standby emit the following >error messages. Yes this bug remains unsolved. I am still working on resolving this. Following chunk IDs have been added in the attached patch as suggested upthread. +#define XLR_CHUNK_BLOCK_REFERENCE 0x10 +#define XLR_CHUNK_BLOCK_HAS_IMAGE 0x04 +#define XLR_CHUNK_BLOCK_HAS_DATA 0x08 XLR_CHUNK_BLOCK_REFERENCE denotes chunk ID of block references. XLR_CHUNK_BLOCK_HAS_IMAGE is a replacement of BKPBLOCK_HAS_IMAGE and XLR_CHUNK_BLOCK_HAS DATA a replacement of BKPBLOCK_HAS_DATA. Thank you, Rahila Syed Support-compression-of-full-page-writes-in-WAL_v21.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Feb 24, 2015 at 6:46 PM, Syed, Rahila wrote: > Hello , > >>I've not read this logic yet, but ISTM there is a bug in that new WAL format >>because I got the following error and the startup process could not replay >>any WAL records when I set up replication and enabled wal_compression. > >>LOG: record with invalid length at 0/3B0 >>LOG: record with invalid length at 0/3000518 >>LOG: Invalid block length in record 0/30005A0 >>LOG: Invalid block length in record 0/3000D60 ... > > Please fine attached patch which replays WAL records. Even this patch doesn't work fine. The standby emit the following error messages. LOG: invalid block_id 255 at 0/3B0 LOG: record with invalid length at 0/30017F0 LOG: invalid block_id 255 at 0/3001878 LOG: record with invalid length at 0/30027D0 LOG: record with invalid length at 0/3002E58 ... Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2015-02-24 16:03:41 +0900, Michael Paquier wrote: > Looking at this code, I think that it is really confusing to move the data > related to the status of the backup block out of XLogRecordBlockImageHeader > to the chunk ID itself that may *not* include a backup block at all as it > is conditioned by the presence of BKPBLOCK_HAS_IMAGE. What's the problem here? We could actually now easily remove BKPBLOCK_HAS_IMAGE and replace it by a chunk id. > the idea of having the backup block data in its dedicated header with bits > stolen from the existing fields, perhaps by rewriting it to something like > that: > typedef struct XLogRecordBlockImageHeader { > uint32 length:15, > hole_length:15, > is_compressed:1, > is_hole:1; > } XLogRecordBlockImageHeader; > Now perhaps I am missing something and this is really "ugly" ;) I think it's fantastically ugly. We'll also likely want different compression formats and stuff in the not too far away future. This will just end up being a pain. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello , >I've not read this logic yet, but ISTM there is a bug in that new WAL format >because I got the following error and the startup process could not replay any >WAL records when I set up replication and enabled wal_compression. >LOG: record with invalid length at 0/3B0 >LOG: record with invalid length at 0/3000518 >LOG: Invalid block length in record 0/30005A0 >LOG: Invalid block length in record 0/3000D60 ... Please fine attached patch which replays WAL records. Thank you, Rahila Syed -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Fujii Masao Sent: Monday, February 23, 2015 5:52 PM To: Rahila Syed Cc: PostgreSQL-development; Andres Freund; Michael Paquier Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On Mon, Feb 23, 2015 at 5:28 PM, Rahila Syed wrote: > Hello, > > Attached is a patch which has following changes, > > As suggested above block ID in xlog structs has been replaced by chunk ID. > Chunk ID is used to distinguish between different types of xlog record > fragments. > Like, > XLR_CHUNK_ID_DATA_SHORT > XLR_CHUNK_ID_DATA_LONG > XLR_CHUNK_BKP_COMPRESSED > XLR_CHUNK_BKP_WITH_HOLE > > In block references, block ID follows the chunk ID. Here block ID > retains its functionality. > This approach increases data by 1 byte for each block reference in an > xlog record. This approach separates ID referring different fragments > of xlog record from the actual block ID which is used to refer block > references in xlog record. I've not read this logic yet, but ISTM there is a bug in that new WAL format because I got the following error and the startup process could not replay any WAL records when I set up replication and enabled wal_compression. LOG: record with invalid length at 0/3B0 LOG: record with invalid length at 0/3000518 LOG: Invalid block length in record 0/30005A0 LOG: Invalid block length in record 0/3000D60 ... Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v20.patch Description: Support-compression-for-full-page-writes-in-WAL_v20.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 23, 2015 at 9:22 PM, Fujii Masao wrote: > On Mon, Feb 23, 2015 at 5:28 PM, Rahila Syed > wrote: > > Hello, > > > > Attached is a patch which has following changes, > > > > As suggested above block ID in xlog structs has been replaced by chunk > ID. > > Chunk ID is used to distinguish between different types of xlog record > > fragments. > > Like, > > XLR_CHUNK_ID_DATA_SHORT > > XLR_CHUNK_ID_DATA_LONG > > XLR_CHUNK_BKP_COMPRESSED > > XLR_CHUNK_BKP_WITH_HOLE > > > > In block references, block ID follows the chunk ID. Here block ID retains > > its functionality. > > This approach increases data by 1 byte for each block reference in an > xlog > > record. This approach separates ID referring different fragments of xlog > > record from the actual block ID which is used to refer block references > in > > xlog record. > > I've not read this logic yet, but ISTM there is a bug in that new WAL > format > because I got the following error and the startup process could not replay > any WAL records when I set up replication and enabled wal_compression. > > LOG: record with invalid length at 0/3B0 > LOG: record with invalid length at 0/3000518 > LOG: Invalid block length in record 0/30005A0 > LOG: Invalid block length in record 0/3000D60 > Looking at this code, I think that it is really confusing to move the data related to the status of the backup block out of XLogRecordBlockImageHeader to the chunk ID itself that may *not* include a backup block at all as it is conditioned by the presence of BKPBLOCK_HAS_IMAGE. I would still prefer the idea of having the backup block data in its dedicated header with bits stolen from the existing fields, perhaps by rewriting it to something like that: typedef struct XLogRecordBlockImageHeader { uint32 length:15, hole_length:15, is_compressed:1, is_hole:1; } XLogRecordBlockImageHeader; Now perhaps I am missing something and this is really "ugly" ;) +#define XLR_CHUNK_ID_DATA_SHORT255 +#define XLR_CHUNK_ID_DATA_LONG 254 +#define XLR_CHUNK_BKP_COMPRESSED 0x01 +#define XLR_CHUNK_BKP_WITH_HOLE0x02 Wouldn't we need a XLR_CHUNK_ID_BKP_HEADER or equivalent? The idea between this chunk_id stuff it to be able to make the difference between a short header, a long header and a backup block header by looking at the first byte. The comments on top of XLogRecordBlockImageHeader are still mentioning the old parameters like with_hole or is_compressed that you have removed. It seems as well that there is some noise: - lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h). + lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h) -- Michael
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 23, 2015 at 5:28 PM, Rahila Syed wrote: > Hello, > > Attached is a patch which has following changes, > > As suggested above block ID in xlog structs has been replaced by chunk ID. > Chunk ID is used to distinguish between different types of xlog record > fragments. > Like, > XLR_CHUNK_ID_DATA_SHORT > XLR_CHUNK_ID_DATA_LONG > XLR_CHUNK_BKP_COMPRESSED > XLR_CHUNK_BKP_WITH_HOLE > > In block references, block ID follows the chunk ID. Here block ID retains > its functionality. > This approach increases data by 1 byte for each block reference in an xlog > record. This approach separates ID referring different fragments of xlog > record from the actual block ID which is used to refer block references in > xlog record. I've not read this logic yet, but ISTM there is a bug in that new WAL format because I got the following error and the startup process could not replay any WAL records when I set up replication and enabled wal_compression. LOG: record with invalid length at 0/3B0 LOG: record with invalid length at 0/3000518 LOG: Invalid block length in record 0/30005A0 LOG: Invalid block length in record 0/3000D60 ... Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, Attached is a patch which has following changes, As suggested above block ID in xlog structs has been replaced by chunk ID. Chunk ID is used to distinguish between different types of xlog record fragments. Like, XLR_CHUNK_ID_DATA_SHORT XLR_CHUNK_ID_DATA_LONG XLR_CHUNK_BKP_COMPRESSED XLR_CHUNK_BKP_WITH_HOLE In block references, block ID follows the chunk ID. Here block ID retains its functionality. This approach increases data by 1 byte for each block reference in an xlog record. This approach separates ID referring different fragments of xlog record from the actual block ID which is used to refer block references in xlog record. Following are WAL numbers for each scenario, WAL FPW compression on 121.652 MB FPW compression off 148.998 MB HEAD 148.764 MB Compression remains nearly same as before. There is some difference in WAL between HEAD and HEAD+patch+compression OFF. This difference corresponds to 1 byte increase with each block reference of xlog record. Thank you, Rahila Syed On Wed, Feb 18, 2015 at 7:53 PM, Syed, Rahila wrote: > Hello, > > >I think we should change the xlog format so that the block_id (which > currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the > block id but something like XLR_CHUNK_ID. Which is used as is for > XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to > >XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, > XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the > block id following the chunk id. > > >Yes, that'll increase the amount of data for a backup block by 1 byte, > but I think that's worth it. I'm pretty sure we will be happy about the > added extensibility pretty soon. > > To clarify my understanding of the above change, > > Instead of a block id to reference different fragments of an xlog record , > a single byte field "chunk_id" should be used. chunk_id will be same as > XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments. > But for block references, it will take store following values in order to > store information about the backup blocks. > #define XLR_CHUNK_BKP_COMPRESSED 0x01 > #define XLR_CHUNK_BKP_WITH_HOLE 0x02 > ... > > The new xlog format should look like follows, > > Fixed-size header (XLogRecord struct) > Chunk_id(add a field before id field in XLogRecordBlockHeader struct) > XLogRecordBlockHeader > Chunk_id > XLogRecordBlockHeader > ... > ... > Chunk_id ( rename id field of the XLogRecordDataHeader struct) > XLogRecordDataHeader[Short|Long] > block data > block data > ... > main data > > I will post a patch based on this. > > Thank you, > Rahila Syed > > -Original Message- > From: Andres Freund [mailto:and...@2ndquadrant.com] > Sent: Monday, February 16, 2015 5:26 PM > To: Syed, Rahila > Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists > Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes > > On 2015-02-16 11:30:20 +, Syed, Rahila wrote: > > - * As a trivial form of data compression, the XLOG code is aware that > > - * PG data pages usually contain an unused "hole" in the middle, > > which > > - * contains only zero bytes. If hole_length > 0 then we have removed > > - * such a "hole" from the stored data (and it's not counted in the > > - * XLOG record's CRC, either). Hence, the amount of block data > > actually > > - * present is BLCKSZ - hole_length bytes. > > + * Block images are able to do several types of compression: > > + * - When wal_compression is off, as a trivial form of compression, > > + the > > + * XLOG code is aware that PG data pages usually contain an unused > "hole" > > + * in the middle, which contains only zero bytes. If length < BLCKSZ > > + * then we have removed such a "hole" from the stored data (and it is > > + * not counted in the XLOG record's CRC, either). Hence, the amount > > + * of block data actually present is "length" bytes. The hole "offset" > > + * on page is defined using "hole_offset". > > + * - When wal_compression is on, block images are compressed using a > > + * compression algorithm without their hole to improve compression > > + * process of the page. "length" corresponds in this case to the > > + length > > + * of the compressed block. "hole_offset" is the hole offset of the > > + page, > > + * and the length of the uncompressed block is defined by > > + "raw_length&
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 16, 2015 at 8:55 PM, Andres Freund wrote: > On 2015-02-16 11:30:20 +, Syed, Rahila wrote: >> - * As a trivial form of data compression, the XLOG code is aware that >> - * PG data pages usually contain an unused "hole" in the middle, which >> - * contains only zero bytes. If hole_length > 0 then we have removed >> - * such a "hole" from the stored data (and it's not counted in the >> - * XLOG record's CRC, either). Hence, the amount of block data actually >> - * present is BLCKSZ - hole_length bytes. >> + * Block images are able to do several types of compression: >> + * - When wal_compression is off, as a trivial form of compression, the >> + * XLOG code is aware that PG data pages usually contain an unused "hole" >> + * in the middle, which contains only zero bytes. If length < BLCKSZ >> + * then we have removed such a "hole" from the stored data (and it is >> + * not counted in the XLOG record's CRC, either). Hence, the amount >> + * of block data actually present is "length" bytes. The hole "offset" >> + * on page is defined using "hole_offset". >> + * - When wal_compression is on, block images are compressed using a >> + * compression algorithm without their hole to improve compression >> + * process of the page. "length" corresponds in this case to the length >> + * of the compressed block. "hole_offset" is the hole offset of the page, >> + * and the length of the uncompressed block is defined by "raw_length", >> + * whose data is included in the record only when compression is enabled >> + * and "with_hole" is set to true, see below. >> + * >> + * "is_compressed" is used to identify if a given block image is compressed >> + * or not. Maximum page size allowed on the system being 32k, the hole >> + * offset cannot be more than 15-bit long so the last free bit is used to >> + * store the compression state of block image. If the maximum page size >> + * allowed is increased to a value higher than that, we should consider >> + * increasing this structure size as well, but this would increase the >> + * length of block header in WAL records with alignment. >> + * >> + * "with_hole" is used to identify the presence of a hole in a block image. >> + * As the length of a block cannot be more than 15-bit long, the extra bit >> in >> + * the length field is used for this identification purpose. If the block >> image >> + * has no hole, it is ensured that the raw size of a compressed block image >> is >> + * equal to BLCKSZ, hence the contents of >> XLogRecordBlockImageCompressionInfo >> + * are not necessary. >> */ >> typedef struct XLogRecordBlockImageHeader >> { >> - uint16 hole_offset;/* number of bytes before "hole" */ >> - uint16 hole_length;/* number of bytes in "hole" */ >> + uint16 length:15, /* length of block data in >> record */ >> + with_hole:1;/* status of hole in the block >> */ >> + >> + uint16 hole_offset:15, /* number of bytes before "hole" */ >> + is_compressed:1;/* compression status of image */ >> + >> + /* Followed by the data related to compression if block is compressed >> */ >> } XLogRecordBlockImageHeader; > > Yikes, this is ugly. > > I think we should change the xlog format so that the block_id (which > currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't > the block id but something like XLR_CHUNK_ID. Which is used as is for > XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to > XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, > XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the > block id following the chunk id. > Yes, that'll increase the amount of data for a backup block by 1 byte, > but I think that's worth it. I'm pretty sure we will be happy about the > added extensibility pretty soon. Yeah, that would help for readability and does not cost much compared to BLCKSZ. Still could you explain what kind of extensibility you have in mind except code readability? It is hard to make a nice picture with only the paper and the pencils, and the current patch approach has been taken to minimize the record length, particularly for users who do not care about WAL compression. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >I think we should change the xlog format so that the block_id (which currently >is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but >something like XLR_CHUNK_ID. Which is used as is for >XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to >>XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... >The BKP blocks will then follow, storing the block id following the chunk id. >Yes, that'll increase the amount of data for a backup block by 1 byte, but I >think that's worth it. I'm pretty sure we will be happy about the added >extensibility pretty soon. To clarify my understanding of the above change, Instead of a block id to reference different fragments of an xlog record , a single byte field "chunk_id" should be used. chunk_id will be same as XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments. But for block references, it will take store following values in order to store information about the backup blocks. #define XLR_CHUNK_BKP_COMPRESSED 0x01 #define XLR_CHUNK_BKP_WITH_HOLE 0x02 ... The new xlog format should look like follows, Fixed-size header (XLogRecord struct) Chunk_id(add a field before id field in XLogRecordBlockHeader struct) XLogRecordBlockHeader Chunk_id XLogRecordBlockHeader ... ... Chunk_id ( rename id field of the XLogRecordDataHeader struct) XLogRecordDataHeader[Short|Long] block data block data ... main data I will post a patch based on this. Thank you, Rahila Syed -Original Message- From: Andres Freund [mailto:and...@2ndquadrant.com] Sent: Monday, February 16, 2015 5:26 PM To: Syed, Rahila Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On 2015-02-16 11:30:20 +, Syed, Rahila wrote: > - * As a trivial form of data compression, the XLOG code is aware that > - * PG data pages usually contain an unused "hole" in the middle, > which > - * contains only zero bytes. If hole_length > 0 then we have removed > - * such a "hole" from the stored data (and it's not counted in the > - * XLOG record's CRC, either). Hence, the amount of block data > actually > - * present is BLCKSZ - hole_length bytes. > + * Block images are able to do several types of compression: > + * - When wal_compression is off, as a trivial form of compression, > + the > + * XLOG code is aware that PG data pages usually contain an unused "hole" > + * in the middle, which contains only zero bytes. If length < BLCKSZ > + * then we have removed such a "hole" from the stored data (and it is > + * not counted in the XLOG record's CRC, either). Hence, the amount > + * of block data actually present is "length" bytes. The hole "offset" > + * on page is defined using "hole_offset". > + * - When wal_compression is on, block images are compressed using a > + * compression algorithm without their hole to improve compression > + * process of the page. "length" corresponds in this case to the > + length > + * of the compressed block. "hole_offset" is the hole offset of the > + page, > + * and the length of the uncompressed block is defined by > + "raw_length", > + * whose data is included in the record only when compression is > + enabled > + * and "with_hole" is set to true, see below. > + * > + * "is_compressed" is used to identify if a given block image is > + compressed > + * or not. Maximum page size allowed on the system being 32k, the > + hole > + * offset cannot be more than 15-bit long so the last free bit is > + used to > + * store the compression state of block image. If the maximum page > + size > + * allowed is increased to a value higher than that, we should > + consider > + * increasing this structure size as well, but this would increase > + the > + * length of block header in WAL records with alignment. > + * > + * "with_hole" is used to identify the presence of a hole in a block image. > + * As the length of a block cannot be more than 15-bit long, the > + extra bit in > + * the length field is used for this identification purpose. If the > + block image > + * has no hole, it is ensured that the raw size of a compressed block > + image is > + * equal to BLCKSZ, hence the contents of > + XLogRecordBlockImageCompressionInfo > + * are not necessary. > */ > typedef struct XLogRecordBlockImageHeader { > - uint16 hole_offset;/* number of bytes before "hole" */ > - uint16 hole_length;/* number of bytes in "hole" */ &g
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2015-02-16 20:55:20 +0900, Michael Paquier wrote: > On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila > wrote: > > > > > Regarding the sanity checks that have been added recently. I think that > > they are useful but I am suspecting as well that only a check on the record > > CRC is done because that's reliable enough and not doing those checks > > accelerates a bit replay. So I am thinking that we should simply replace > > >them by assertions. > > > > Removing the checks makes sense as CRC ensures correctness . Moreover ,as > > error message for invalid length of record is present in the code , > > messages for invalid block length can be redundant. > > > > Checks have been replaced by assertions in the attached patch. > > > > After more thinking, we may as well simply remove them, an error with CRC > having high chances to complain before reaching this point... Surely not. The existing code explicitly does it like if (blk->has_data && blk->data_len == 0) report_invalid_record(state, "BKPBLOCK_HAS_DATA set, but no data included at %X/%X", (uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr); these cross checks are important. And I see no reason to deviate from that. The CRC sum isn't foolproof - we intentionally do checks at several layers. And, as you can see from some other locations, we actually try to *not* fatally error out when hitting them at times - so an Assert also is wrong. Heikki: /* cross-check that the HAS_DATA flag is set iff data_length > 0 */ if (blk->has_data && blk->data_len == 0) report_invalid_record(state, "BKPBLOCK_HAS_DATA set, but no data included at %X/%X", (uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr); if (!blk->has_data && blk->data_len != 0) report_invalid_record(state, "BKPBLOCK_HAS_DATA not set, but data length is %u at %X/%X", (unsigned int) blk->data_len, (uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr); those look like they're missing a goto err; to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2015-02-16 11:30:20 +, Syed, Rahila wrote: > - * As a trivial form of data compression, the XLOG code is aware that > - * PG data pages usually contain an unused "hole" in the middle, which > - * contains only zero bytes. If hole_length > 0 then we have removed > - * such a "hole" from the stored data (and it's not counted in the > - * XLOG record's CRC, either). Hence, the amount of block data actually > - * present is BLCKSZ - hole_length bytes. > + * Block images are able to do several types of compression: > + * - When wal_compression is off, as a trivial form of compression, the > + * XLOG code is aware that PG data pages usually contain an unused "hole" > + * in the middle, which contains only zero bytes. If length < BLCKSZ > + * then we have removed such a "hole" from the stored data (and it is > + * not counted in the XLOG record's CRC, either). Hence, the amount > + * of block data actually present is "length" bytes. The hole "offset" > + * on page is defined using "hole_offset". > + * - When wal_compression is on, block images are compressed using a > + * compression algorithm without their hole to improve compression > + * process of the page. "length" corresponds in this case to the length > + * of the compressed block. "hole_offset" is the hole offset of the page, > + * and the length of the uncompressed block is defined by "raw_length", > + * whose data is included in the record only when compression is enabled > + * and "with_hole" is set to true, see below. > + * > + * "is_compressed" is used to identify if a given block image is compressed > + * or not. Maximum page size allowed on the system being 32k, the hole > + * offset cannot be more than 15-bit long so the last free bit is used to > + * store the compression state of block image. If the maximum page size > + * allowed is increased to a value higher than that, we should consider > + * increasing this structure size as well, but this would increase the > + * length of block header in WAL records with alignment. > + * > + * "with_hole" is used to identify the presence of a hole in a block image. > + * As the length of a block cannot be more than 15-bit long, the extra bit in > + * the length field is used for this identification purpose. If the block > image > + * has no hole, it is ensured that the raw size of a compressed block image > is > + * equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo > + * are not necessary. > */ > typedef struct XLogRecordBlockImageHeader > { > - uint16 hole_offset;/* number of bytes before "hole" */ > - uint16 hole_length;/* number of bytes in "hole" */ > + uint16 length:15, /* length of block data in > record */ > + with_hole:1;/* status of hole in the block > */ > + > + uint16 hole_offset:15, /* number of bytes before "hole" */ > + is_compressed:1;/* compression status of image */ > + > + /* Followed by the data related to compression if block is compressed */ > } XLogRecordBlockImageHeader; Yikes, this is ugly. I think we should change the xlog format so that the block_id (which currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but something like XLR_CHUNK_ID. Which is used as is for XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the block id following the chunk id. Yes, that'll increase the amount of data for a backup block by 1 byte, but I think that's worth it. I'm pretty sure we will be happy about the added extensibility pretty soon. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila wrote: > > Regarding the sanity checks that have been added recently. I think that > they are useful but I am suspecting as well that only a check on the record > CRC is done because that's reliable enough and not doing those checks > accelerates a bit replay. So I am thinking that we should simply replace > >them by assertions. > > Removing the checks makes sense as CRC ensures correctness . Moreover ,as > error message for invalid length of record is present in the code , > messages for invalid block length can be redundant. > > Checks have been replaced by assertions in the attached patch. > After more thinking, we may as well simply remove them, an error with CRC having high chances to complain before reaching this point... > Current > > if ((hole_length != 0) && > > + (*len >= orig_len - > SizeOfXLogRecordBlockImageCompressionInfo)) > > + return false; > > +return true > This makes sense. Nitpicking 1: +Assert(!(blk->with_hole == 1 && blk->hole_offset <= 0)); Double-space here. Nitpicking 2: char * page This should be rewritten as char *page, the "*" being assigned with the variable name. -- Michael
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, Thank you for reviewing and testing the patch. >+ /* leave if data cannot be compressed */ >+ if (compressed_len == 0) >+ return false; >This should be < 0, pglz_compress returns -1 when compression fails. > >+ if (pglz_decompress(block_image, bkpb->bkp_len, >record->uncompressBuf, >+ >bkpb->bkp_uncompress_len) == 0) >Similarly, this should be < 0. These have been corrected in the attached. >Regarding the sanity checks that have been added recently. I think that they >are useful but I am suspecting as well that only a check on the record CRC is >done because that's reliable enough and not doing those checks accelerates a >bit replay. So I am thinking that we should simply replace >them by assertions. Removing the checks makes sense as CRC ensures correctness . Moreover ,as error message for invalid length of record is present in the code , messages for invalid block length can be redundant. Checks have been replaced by assertions in the attached patch. Following if condition in XLogCompressBackupBlock has been modified as follows Previous /* + * We recheck the actual size even if pglz_compress() reports success and + * see if at least 2 bytes of length have been saved, as this corresponds + * to the additional amount of data stored in WAL record for a compressed + * block via raw_length when block contains hole.. + */ + *len = (uint16) compressed_len; + if (*len >= orig_len - SizeOfXLogRecordBlockImageCompressionInfo) + return false; + return true; Current if ((hole_length != 0) && + (*len >= orig_len - SizeOfXLogRecordBlockImageCompressionInfo)) + return false; +return true This is because the extra information raw_length is included only if compressed block has hole in it. >Once we get those small issues fixes, I think that it is with having a >committer look at this patch, presumably Fujii-san Agree. I will mark this patch as ready for committer Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v19.patch Description: Support-compression-for-full-page-writes-in-WAL_v19.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Feb 12, 2015 at 8:08 PM, Syed, Rahila wrote: > > > > Thank you for comments. Please find attached the updated patch. > > > > >This patch fails to compile: > >xlogreader.c:1049:46: error: extraneous ')' after condition, expected a statement > >blk->with_hole && blk->hole_offset <= 0)) > > This has been rectified. > > > > >Note as well that at least clang does not like much how the sanity check with with_hole are done. You should place parentheses around the '&&' expressions. Also, I would rather define with_hole == 0 or with_hole == 1 explicitly int those checks > > The expressions are modified accordingly. > > > > >There is a typo: > > >s/true,see/true, see/ > > >[nitpicky]Be as well aware of the 80-character limit per line that is usually normally by comment blocks.[/] > > > > Have corrected the typos and changed the comments as mentioned. Also , realigned certain lines to meet the 80-char limit. Thanks for the updated patch. + /* leave if data cannot be compressed */ + if (compressed_len == 0) + return false; This should be < 0, pglz_compress returns -1 when compression fails. + if (pglz_decompress(block_image, bkpb->bkp_len, record->uncompressBuf, + bkpb->bkp_uncompress_len) == 0) Similarly, this should be < 0. Regarding the sanity checks that have been added recently. I think that they are useful but I am suspecting as well that only a check on the record CRC is done because that's reliable enough and not doing those checks accelerates a bit replay. So I am thinking that we should simply replace them by assertions. I have as well re-run my small test case, with the following results (scripts and results attached) =# select test, user_diff,system_diff, pg_size_pretty(pre_update - pre_insert), pg_size_pretty(post_update - pre_update) from results; test | user_diff | system_diff | pg_size_pretty | pg_size_pretty -+---+-++ FPW on | 46.134564 |0.823306 | 429 MB | 566 MB FPW on | 16.307575 |0.798591 | 171 MB | 229 MB FPW on | 8.325136 |0.848390 | 86 MB | 116 MB FPW off | 29.992383 |1.100458 | 440 MB | 746 MB FPW off | 12.237578 |1.027076 | 171 MB | 293 MB FPW off | 6.814926 |0.931624 | 86 MB | 148 MB HEAD| 26.590816 |1.159255 | 440 MB | 746 MB HEAD| 11.620359 |0.990851 | 171 MB | 293 MB HEAD| 6.300401 |0.904311 | 86 MB | 148 MB (9 rows) The level of compression reached is the same as previous mark, 566MB for the case of fillfactor=50 ( cab7npqsc97o-ue5paxfmukwcxe_jioyxo1m4a0pmnmyqane...@mail.gmail.com) with a similar CPU usage. Once we get those small issues fixes, I think that it is with having a committer look at this patch, presumably Fujii-san. Regards, -- Michael compress_run.bash Description: Binary data results.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Thank you for comments. Please find attached the updated patch. >This patch fails to compile: >xlogreader.c:1049:46: error: extraneous ')' after condition, expected a >statement >blk->with_hole && blk->hole_offset <= > 0)) This has been rectified. >Note as well that at least clang does not like much how the sanity check with >with_hole are done. You should place parentheses around the '&&' expressions. >Also, I would rather define with_hole == 0 or with_hole == 1 explicitly int >those checks The expressions are modified accordingly. >There is a typo: >s/true,see/true, see/ >[nitpicky]Be as well aware of the 80-character limit per line that is usually >normally by comment blocks.[/] Have corrected the typos and changed the comments as mentioned. Also , realigned certain lines to meet the 80-char limit. Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v18.patch Description: Support-compression-for-full-page-writes-in-WAL_v18.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Feb 11, 2015 at 11:03 PM, Syed, Rahila wrote: > >IMO, we should add details about how this new field is used in the > comments on top of XLogRecordBlockImageHeader, meaning that when a page > hole is present we use the compression info structure and when there is no > hole, we are sure that the FPW raw length is BLCKSZ meaning that the two > bytes of the CompressionInfo stuff is unnecessary. > This comment is included in the patch attached. > > > For correctness with_hole should be set even for uncompressed pages. I > think that we should as well use it for sanity checks in xlogreader.c when > decoding records. > This change is made in the attached patch. Following sanity checks have > been added in xlogreader.c > > if (!(blk->with_hole) && blk->hole_offset != 0 || blk->with_hole && > blk->hole_offset <= 0)) > > if (blk->with_hole && blk->bkp_len >= BLCKSZ) > > if (!(blk->with_hole) && blk->bkp_len != BLCKSZ) > Cool, thanks! This patch fails to compile: xlogreader.c:1049:46: error: extraneous ')' after condition, expected a statement blk->with_hole && blk->hole_offset <= 0)) Note as well that at least clang does not like much how the sanity check with with_hole are done. You should place parentheses around the '&&' expressions. Also, I would rather define with_hole == 0 or with_hole == 1 explicitly int those checks. There is a typo: s/true,see/true, see/ [nitpicky]Be as well aware of the 80-character limit per line that is usually normally by comment blocks.[/] + * "with_hole" is used to identify the presence of hole in a block. + * As mentioned above, length of block cannnot be more than 15-bit long. + * So, the free bit in the length field is used by "with_hole" to identify presence of + * XLogRecordBlockImageCompressionInfo. If hole is not present ,the raw size of + * a compressed block is equal to BLCKSZ therefore XLogRecordBlockImageCompressionInfo + * for the corresponding compressed block need not be stored in header. + * If hole is present raw size is stored. I would rewrite this paragraph as follows, fixing the multiple typos: "with_hole" is used to identify the presence of a hole in a block image. As the length of a block cannot be more than 15-bit long, the extra bit in the length field is used for this identification purpose. If the block image has no hole, it is ensured that the raw size of a compressed block image is equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo are not necessary. + /* Followed by the data related to compression if block is compressed */ This comment needs to be updated to "if block image is compressed and has a hole". + lp_off and lp_len fields in ItemIdData (see include/storage/itemid.h) and + XLogRecordBlockImageHeader where page hole offset and length is limited to 15-bit + length (see src/include/access/xlogrecord.h). 80-character limit... Regards -- Michael
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
>IMO, we should add details about how this new field is used in the comments on >top of XLogRecordBlockImageHeader, meaning that when a page hole is present we >use the compression info structure and when there is no hole, we are sure that >the FPW raw length is BLCKSZ meaning that the two bytes of the CompressionInfo >stuff is unnecessary. This comment is included in the patch attached. > For correctness with_hole should be set even for uncompressed pages. I think > that we should as well use it for sanity checks in xlogreader.c when decoding > records. This change is made in the attached patch. Following sanity checks have been added in xlogreader.c if (!(blk->with_hole) && blk->hole_offset != 0 || blk->with_hole && blk->hole_offset <= 0)) if (blk->with_hole && blk->bkp_len >= BLCKSZ) if (!(blk->with_hole) && blk->bkp_len != BLCKSZ) Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v18.patch Description: Support-compression-for-full-page-writes-in-WAL_v18.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, A bug had been introduced in the latest versions of the patch. The order of parameters passed to pglz_decompress was wrong. Please find attached patch with following correction, Original code, + if (pglz_decompress(block_image, record->uncompressBuf, + bkpb->bkp_len, bkpb->bkp_uncompress_len) == 0) Correction + if (pglz_decompress(block_image, bkpb->bkp_len, + record->uncompressBuf, bkpb->bkp_uncompress_len) == 0) >For example here you should not have a space between the function definition >and the variable declarations: >+{ >+ >+int orig_len = BLCKSZ - hole_length; >This is as well incorrect in two places: >if(hole_length != 0) >There should be a space between the if and its condition in parenthesis. Also corrected above code format mistakes. Thank you, Rahila Syed -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Syed, Rahila Sent: Monday, February 09, 2015 6:58 PM To: Michael Paquier; Fujii Masao Cc: PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes Hello, >> Do we always need extra two bytes for compressed backup block? >> ISTM that extra bytes are not necessary when the hole length is zero. >> In this case the length of the original backup block (i.e., >> uncompressed) must be BLCKSZ, so we don't need to save the original >> size in the extra bytes. >Yes, we would need a additional bit to identify that. We could steal it from >length in XLogRecordBlockImageHeader. This is implemented in the attached patch by dividing length field as follows, uint16 length:15, with_hole:1; >"2" should be replaced with the macro variable indicating the size of >extra header for compressed backup block. Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2 >You can refactor XLogCompressBackupBlock() and move all the above code >to it for more simplicity This is also implemented in the patch attached. Thank you, Rahila Syed -Original Message- From: Michael Paquier [mailto:michael.paqu...@gmail.com] Sent: Friday, February 06, 2015 6:00 PM To: Fujii Masao Cc: Syed, Rahila; PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote: > Do we always need extra two bytes for compressed backup block? > ISTM that extra bytes are not necessary when the hole length is zero. > In this case the length of the original backup block (i.e., > uncompressed) must be BLCKSZ, so we don't need to save the original > size in the extra bytes. Yes, we would need a additional bit to identify that. We could steal it from length in XLogRecordBlockImageHeader. > Furthermore, when fpw compression is disabled and the hole length is > zero, we seem to be able to save one byte from the header of backup > block. Currently we use 4 bytes for the header, 2 bytes for the length > of backup block, 15 bits for the hole offset and 1 bit for the flag > indicating whether block is compressed or not. But in that case, the > length of backup block doesn't need to be stored because it must be > BLCKSZ. Shouldn't we optimize the header in this way? Thought? If we do it, that's something to tackle even before this patch on HEAD, because you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 bytes as hole-related data is not necessary. I imagine that a patch optimizing that wouldn't be that hard to write as well. > +int page_len = BLCKSZ - hole_length; > +char *scratch_buf; > +if (hole_length != 0) > +{ > +scratch_buf = compression_scratch; > +memcpy(scratch_buf, page, hole_offset); > +memcpy(scratch_buf + hole_offset, > + page + (hole_offset + hole_length), > + BLCKSZ - (hole_length + hole_offset)); > +} > +else > +scratch_buf = page; > + > +/* Perform compression of block */ > +if (XLogCompressBackupBlock(scratch_buf, > +page_len, > +regbuf->compressed_page, > +&compress_len)) > +{ > +/* compression is done, add record */ > +is_compresse
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 9, 2015 at 10:27 PM, Syed, Rahila wrote: > (snip) Thanks for showing up here! I have not tested the test the patch, those comments are based on what I read from v17. >>> Do we always need extra two bytes for compressed backup block? >>> ISTM that extra bytes are not necessary when the hole length is zero. >>> In this case the length of the original backup block (i.e., >>> uncompressed) must be BLCKSZ, so we don't need to save the original >>> size in the extra bytes. > >>Yes, we would need a additional bit to identify that. We could steal it from >>length in XLogRecordBlockImageHeader. > > This is implemented in the attached patch by dividing length field as follows, > uint16 length:15, > with_hole:1; IMO, we should add details about how this new field is used in the comments on top of XLogRecordBlockImageHeader, meaning that when a page hole is present we use the compression info structure and when there is no hole, we are sure that the FPW raw length is BLCKSZ meaning that the two bytes of the CompressionInfo stuff is unnecessary. > >>"2" should be replaced with the macro variable indicating the size of >>extra header for compressed backup block. > Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2 > >>You can refactor XLogCompressBackupBlock() and move all the >>above code to it for more simplicity > This is also implemented in the patch attached. This portion looks correct to me. A couple of other comments: 1) Nitpicky but, code format is sometimes strange. For example here you should not have a space between the function definition and the variable declarations: +{ + +int orig_len = BLCKSZ - hole_length; This is as well incorrect in two places: if(hole_length != 0) There should be a space between the if and its condition in parenthesis. 2) For correctness with_hole should be set even for uncompressed pages. I think that we should as well use it for sanity checks in xlogreader.c when decoding records. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >> Do we always need extra two bytes for compressed backup block? >> ISTM that extra bytes are not necessary when the hole length is zero. >> In this case the length of the original backup block (i.e., >> uncompressed) must be BLCKSZ, so we don't need to save the original >> size in the extra bytes. >Yes, we would need a additional bit to identify that. We could steal it from >length in XLogRecordBlockImageHeader. This is implemented in the attached patch by dividing length field as follows, uint16 length:15, with_hole:1; >"2" should be replaced with the macro variable indicating the size of >extra header for compressed backup block. Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2 >You can refactor XLogCompressBackupBlock() and move all the >above code to it for more simplicity This is also implemented in the patch attached. Thank you, Rahila Syed -Original Message- From: Michael Paquier [mailto:michael.paqu...@gmail.com] Sent: Friday, February 06, 2015 6:00 PM To: Fujii Masao Cc: Syed, Rahila; PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote: > Do we always need extra two bytes for compressed backup block? > ISTM that extra bytes are not necessary when the hole length is zero. > In this case the length of the original backup block (i.e., > uncompressed) must be BLCKSZ, so we don't need to save the original > size in the extra bytes. Yes, we would need a additional bit to identify that. We could steal it from length in XLogRecordBlockImageHeader. > Furthermore, when fpw compression is disabled and the hole length is > zero, we seem to be able to save one byte from the header of backup > block. Currently we use 4 bytes for the header, 2 bytes for the length > of backup block, 15 bits for the hole offset and 1 bit for the flag > indicating whether block is compressed or not. But in that case, the > length of backup block doesn't need to be stored because it must be > BLCKSZ. Shouldn't we optimize the header in this way? Thought? If we do it, that's something to tackle even before this patch on HEAD, because you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 bytes as hole-related data is not necessary. I imagine that a patch optimizing that wouldn't be that hard to write as well. > +int page_len = BLCKSZ - hole_length; > +char *scratch_buf; > +if (hole_length != 0) > +{ > +scratch_buf = compression_scratch; > +memcpy(scratch_buf, page, hole_offset); > +memcpy(scratch_buf + hole_offset, > + page + (hole_offset + hole_length), > + BLCKSZ - (hole_length + hole_offset)); > +} > +else > +scratch_buf = page; > + > +/* Perform compression of block */ > +if (XLogCompressBackupBlock(scratch_buf, > +page_len, > +regbuf->compressed_page, > +&compress_len)) > +{ > +/* compression is done, add record */ > +is_compressed = true; > +} > > You can refactor XLogCompressBackupBlock() and move all the above code > to it for more simplicity. Sure. -- Michael __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v17.patch Description: Support-compression-for-full-page-writes-in-WAL_v17.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Feb 6, 2015 at 3:42 AM, Michael Paquier wrote: > Fujii Masao wrote: >> I wrote >>> This is an inspiration from lz4 APIs. Wouldn't it be buggy for a >>> compression algorithm to return a size of 0 bytes as compressed or >>> decompressed length btw? We could as well make it return a negative >>> value when a failure occurs if you feel more comfortable with it. >> >> I feel that's better. Attached is the updated version of the patch. >> I changed the pg_lzcompress and pg_lzdecompress so that they return -1 >> when failure happens. Also I applied some cosmetic changes to the patch >> (e.g., shorten the long name of the newly-added macros). >> Barring any objection, I will commit this. > > I just had a look at your updated version, ran some sanity tests, and > things look good from me. The new names of the macros at the top of > tuptoaster.c are clearer as well. Thanks for the review! Pushed! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Feb 6, 2015 at 6:35 PM, Syed, Rahila wrote: > The compression patch can use the latest interface > MemoryContextAllocExtended to proceed without compression when sufficient > memory is not available for > scratch buffer. > The attached patch introduces OutOfMem flag which is set on when > MemoryContextAllocExtended returns NULL . TBH, I don't think that brings much as this allocation is done once and process would surely fail before reaching the first code path doing a WAL record insertion. In any case, OutOfMem is useless, you could simply check if compression_scratch is NULL when assembling a record. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
>In any case, those things have been introduced by what I did in previous >versions... And attached is a new patch. Thank you for feedback. > /* allocate scratch buffer used for compression of block images */ >+ if (compression_scratch == NULL) >+ compression_scratch = MemoryContextAllocZero(xloginsert_cxt, >+ > BLCKSZ); >} The compression patch can use the latest interface MemoryContextAllocExtended to proceed without compression when sufficient memory is not available for scratch buffer. The attached patch introduces OutOfMem flag which is set on when MemoryContextAllocExtended returns NULL . Thank you, Rahila Syed -Original Message- From: Michael Paquier [mailto:michael.paqu...@gmail.com] Sent: Friday, February 06, 2015 12:46 AM To: Syed, Rahila Cc: PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila wrote: >>/* >>+* We recheck the actual size even if pglz_compress() report success, >>+* because it might be satisfied with having saved as little as one byte >>+* in the compressed data. >>+*/ >>+ *len = (uint16) compressed_len; >>+ if (*len >= orig_len - 1) >>+ return false; >>+ return true; >>+} > > As per latest code ,when compression is 'on' we introduce additional 2 bytes > in the header of each block image for storing raw_length of the compressed > block. > In order to achieve compression while accounting for these two additional > bytes, we must ensure that compressed length is less than original length - 2. > So , IIUC the above condition should rather be > > If (*len >= orig_len -2 ) > return false; > return true; > The attached patch contains this. It also has a cosmetic change- renaming > compressBuf to uncompressBuf as it is used to store uncompressed page. Agreed on both things. Just looking at your latest patch after some time to let it cool down, I noticed a couple of things. #define MaxSizeOfXLogRecordBlockHeader \ (SizeOfXLogRecordBlockHeader + \ - SizeOfXLogRecordBlockImageHeader + \ + SizeOfXLogRecordBlockImageHeader, \ + SizeOfXLogRecordBlockImageCompressionInfo + \ There is a comma here instead of a sum sign. We should really sum up all those sizes to evaluate the maximum size of a block header. + * Permanently allocate readBuf uncompressBuf. We do it this way, + * rather than just making a static array, for two reasons: This comment is just but weird, "readBuf AND uncompressBuf" is more appropriate. + * We recheck the actual size even if pglz_compress() report success, + * because it might be satisfied with having saved as little as one byte + * in the compressed data. We add two bytes to store raw_length with the + * compressed image. So for compression to be effective compressed_len should + * be atleast < orig_len - 2 This comment block should be reworked, and misses a dot at its end. I rewrote it like that, hopefully that's clearer: + /* +* We recheck the actual size even if pglz_compress() reports success and see +* if at least 2 bytes of length have been saved, as this corresponds to the +* additional amount of data stored in WAL record for a compressed block +* via raw_length. +*/ In any case, those things have been introduced by what I did in previous versions... And attached is a new patch. -- Michael __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v17.patch Description: Support-compression-for-full-page-writes-in-WAL_v17.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Feb 6, 2015 at 4:30 PM, Michael Paquier wrote: > On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote: >> Do we always need extra two bytes for compressed backup block? >> ISTM that extra bytes are not necessary when the hole length is zero. >> In this case the length of the original backup block (i.e., uncompressed) >> must be BLCKSZ, so we don't need to save the original size in >> the extra bytes. > > Yes, we would need a additional bit to identify that. We could steal > it from length in XLogRecordBlockImageHeader. > >> Furthermore, when fpw compression is disabled and the hole length >> is zero, we seem to be able to save one byte from the header of >> backup block. Currently we use 4 bytes for the header, 2 bytes for >> the length of backup block, 15 bits for the hole offset and 1 bit for >> the flag indicating whether block is compressed or not. But in that case, >> the length of backup block doesn't need to be stored because it must >> be BLCKSZ. Shouldn't we optimize the header in this way? Thought? > > If we do it, that's something to tackle even before this patch on > HEAD, because you could use the 16th bit of the first 2 bytes of > XLogRecordBlockImageHeader to do necessary sanity checks, to actually > not reduce record by 1 byte, but 2 bytes as hole-related data is not > necessary. I imagine that a patch optimizing that wouldn't be that > hard to write as well. Actually, as Heikki pointed me out... A block image is 8k and pages without holes are rare, so it may be not worth sacrificing code simplicity for record reduction at the order of 0.1% or smth like that, and the current patch is light because it keeps things simple. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote: > Do we always need extra two bytes for compressed backup block? > ISTM that extra bytes are not necessary when the hole length is zero. > In this case the length of the original backup block (i.e., uncompressed) > must be BLCKSZ, so we don't need to save the original size in > the extra bytes. Yes, we would need a additional bit to identify that. We could steal it from length in XLogRecordBlockImageHeader. > Furthermore, when fpw compression is disabled and the hole length > is zero, we seem to be able to save one byte from the header of > backup block. Currently we use 4 bytes for the header, 2 bytes for > the length of backup block, 15 bits for the hole offset and 1 bit for > the flag indicating whether block is compressed or not. But in that case, > the length of backup block doesn't need to be stored because it must > be BLCKSZ. Shouldn't we optimize the header in this way? Thought? If we do it, that's something to tackle even before this patch on HEAD, because you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 bytes as hole-related data is not necessary. I imagine that a patch optimizing that wouldn't be that hard to write as well. > +int page_len = BLCKSZ - hole_length; > +char *scratch_buf; > +if (hole_length != 0) > +{ > +scratch_buf = compression_scratch; > +memcpy(scratch_buf, page, hole_offset); > +memcpy(scratch_buf + hole_offset, > + page + (hole_offset + hole_length), > + BLCKSZ - (hole_length + hole_offset)); > +} > +else > +scratch_buf = page; > + > +/* Perform compression of block */ > +if (XLogCompressBackupBlock(scratch_buf, > +page_len, > +regbuf->compressed_page, > +&compress_len)) > +{ > +/* compression is done, add record */ > +is_compressed = true; > +} > > You can refactor XLogCompressBackupBlock() and move all the > above code to it for more simplicity. Sure. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Feb 6, 2015 at 4:15 AM, Michael Paquier wrote: > On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila wrote: >>>/* >>>+* We recheck the actual size even if pglz_compress() report success, >>>+* because it might be satisfied with having saved as little as one byte >>>+* in the compressed data. >>>+*/ >>>+ *len = (uint16) compressed_len; >>>+ if (*len >= orig_len - 1) >>>+ return false; >>>+ return true; >>>+} >> >> As per latest code ,when compression is 'on' we introduce additional 2 bytes >> in the header of each block image for storing raw_length of the compressed >> block. >> In order to achieve compression while accounting for these two additional >> bytes, we must ensure that compressed length is less than original length - >> 2. >> So , IIUC the above condition should rather be >> >> If (*len >= orig_len -2 ) >> return false; "2" should be replaced with the macro variable indicating the size of extra header for compressed backup block. Do we always need extra two bytes for compressed backup block? ISTM that extra bytes are not necessary when the hole length is zero. In this case the length of the original backup block (i.e., uncompressed) must be BLCKSZ, so we don't need to save the original size in the extra bytes. Furthermore, when fpw compression is disabled and the hole length is zero, we seem to be able to save one byte from the header of backup block. Currently we use 4 bytes for the header, 2 bytes for the length of backup block, 15 bits for the hole offset and 1 bit for the flag indicating whether block is compressed or not. But in that case, the length of backup block doesn't need to be stored because it must be BLCKSZ. Shouldn't we optimize the header in this way? Thought? +int page_len = BLCKSZ - hole_length; +char *scratch_buf; +if (hole_length != 0) +{ +scratch_buf = compression_scratch; +memcpy(scratch_buf, page, hole_offset); +memcpy(scratch_buf + hole_offset, + page + (hole_offset + hole_length), + BLCKSZ - (hole_length + hole_offset)); +} +else +scratch_buf = page; + +/* Perform compression of block */ +if (XLogCompressBackupBlock(scratch_buf, +page_len, +regbuf->compressed_page, +&compress_len)) +{ +/* compression is done, add record */ +is_compressed = true; +} You can refactor XLogCompressBackupBlock() and move all the above code to it for more simplicity. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila wrote: >>/* >>+* We recheck the actual size even if pglz_compress() report success, >>+* because it might be satisfied with having saved as little as one byte >>+* in the compressed data. >>+*/ >>+ *len = (uint16) compressed_len; >>+ if (*len >= orig_len - 1) >>+ return false; >>+ return true; >>+} > > As per latest code ,when compression is 'on' we introduce additional 2 bytes > in the header of each block image for storing raw_length of the compressed > block. > In order to achieve compression while accounting for these two additional > bytes, we must ensure that compressed length is less than original length - 2. > So , IIUC the above condition should rather be > > If (*len >= orig_len -2 ) > return false; > return true; > The attached patch contains this. It also has a cosmetic change- renaming > compressBuf to uncompressBuf as it is used to store uncompressed page. Agreed on both things. Just looking at your latest patch after some time to let it cool down, I noticed a couple of things. #define MaxSizeOfXLogRecordBlockHeader \ (SizeOfXLogRecordBlockHeader + \ - SizeOfXLogRecordBlockImageHeader + \ + SizeOfXLogRecordBlockImageHeader, \ + SizeOfXLogRecordBlockImageCompressionInfo + \ There is a comma here instead of a sum sign. We should really sum up all those sizes to evaluate the maximum size of a block header. + * Permanently allocate readBuf uncompressBuf. We do it this way, + * rather than just making a static array, for two reasons: This comment is just but weird, "readBuf AND uncompressBuf" is more appropriate. + * We recheck the actual size even if pglz_compress() report success, + * because it might be satisfied with having saved as little as one byte + * in the compressed data. We add two bytes to store raw_length with the + * compressed image. So for compression to be effective compressed_len should + * be atleast < orig_len - 2 This comment block should be reworked, and misses a dot at its end. I rewrote it like that, hopefully that's clearer: + /* +* We recheck the actual size even if pglz_compress() reports success and see +* if at least 2 bytes of length have been saved, as this corresponds to the +* additional amount of data stored in WAL record for a compressed block +* via raw_length. +*/ In any case, those things have been introduced by what I did in previous versions... And attached is a new patch. -- Michael From 2b0c54bc7c4bfb2494cf8b0394d56635b01d7c5a Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 25 Nov 2014 14:24:26 +0900 Subject: [PATCH] Support compression for full-page writes in WAL Compression is controlled with a new parameter called wal_compression. This parameter can be changed at session level to control WAL compression. --- contrib/pg_xlogdump/pg_xlogdump.c | 17 ++- doc/src/sgml/config.sgml | 29 + src/backend/access/transam/xlog.c | 1 + src/backend/access/transam/xloginsert.c | 161 ++ src/backend/access/transam/xlogreader.c | 71 +--- src/backend/utils/misc/guc.c | 9 ++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + src/include/access/xlogreader.h | 7 +- src/include/access/xlogrecord.h | 49 ++-- src/include/pg_config.h.in| 4 +- 11 files changed, 297 insertions(+), 53 deletions(-) diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c index c1bfbc2..3ebaac6 100644 --- a/contrib/pg_xlogdump/pg_xlogdump.c +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -363,14 +363,14 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats, * takes up BLCKSZ bytes, minus the "hole" length. * * XXX: We peek into xlogreader's private decoded backup blocks for the - * hole_length. It doesn't seem worth it to add an accessor macro for + * length of block. It doesn't seem worth it to add an accessor macro for * this. */ fpi_len = 0; for (block_id = 0; block_id <= record->max_block_id; block_id++) { if (XLogRecHasBlockImage(record, block_id)) - fpi_len += BLCKSZ - record->blocks[block_id].hole_length; + fpi_len += record->blocks[block_id].bkp_len; } /* Update per-rmgr statistics */ @@ -465,9 +465,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record) blk); if (XLogRecHasBlockImage(record, block_id)) { -printf(" (FPW); hole: offset: %u, length: %u\n", - record->blocks[block_id].hole_offset, - record->blocks[block_id].hole_length); +if (record->blocks[block_id].is_compressed) + printf(" (FPW compressed); hole offset: %u, " + "compressed length: %u, original length: %u\n", + record->blocks[block_id].h
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Fujii Masao wrote: > I wrote >> This is an inspiration from lz4 APIs. Wouldn't it be buggy for a >> compression algorithm to return a size of 0 bytes as compressed or >> decompressed length btw? We could as well make it return a negative >> value when a failure occurs if you feel more comfortable with it. > > I feel that's better. Attached is the updated version of the patch. > I changed the pg_lzcompress and pg_lzdecompress so that they return -1 > when failure happens. Also I applied some cosmetic changes to the patch > (e.g., shorten the long name of the newly-added macros). > Barring any objection, I will commit this. I just had a look at your updated version, ran some sanity tests, and things look good from me. The new names of the macros at the top of tuptoaster.c are clearer as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >/* >+* We recheck the actual size even if pglz_compress() report success, >+* because it might be satisfied with having saved as little as one byte >+* in the compressed data. >+*/ >+ *len = (uint16) compressed_len; >+ if (*len >= orig_len - 1) >+ return false; >+ return true; >+} As per latest code ,when compression is 'on' we introduce additional 2 bytes in the header of each block image for storing raw_length of the compressed block. In order to achieve compression while accounting for these two additional bytes, we must ensure that compressed length is less than original length - 2. So , IIUC the above condition should rather be If (*len >= orig_len -2 ) return false; return true; The attached patch contains this. It also has a cosmetic change- renaming compressBuf to uncompressBuf as it is used to store uncompressed page. Thank you, Rahila Syed -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier Sent: Wednesday, January 07, 2015 9:32 AM To: Rahila Syed Cc: PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On Wed, Jan 7, 2015 at 12:51 AM, Rahila Syed wrote: > Following are some comments, Thanks for the feedback. >>uint16 hole_offset:15, /* number of bytes in "hole" */ > Typo in description of hole_offset Fixed. That's "before hole". >>for (block_id = 0; block_id <= record->max_block_id; block_id++) >>- { >>- if (XLogRecHasBlockImage(record, block_id)) >>- fpi_len += BLCKSZ - > record->blocks[block_id].hole_length; >>- } >>+ fpi_len += record->blocks[block_id].bkp_len; > > IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is > incorrectly removed from the above for loop. Fixed. >>typedef struct XLogRecordCompressedBlockImageHeader > I am trying to understand the purpose behind declaration of the above > struct. IIUC, it is defined in order to introduce new field uint16 > raw_length and it has been declared as a separate struct from > XLogRecordBlockImageHeader to not affect the size of WAL record when > compression is off. > I wonder if it is ok to simply memcpy the uint16 raw_length in the > hdr_scratch when compression is on and not have a separate header > struct for it neither declare it in existing header. raw_length can be > a locally defined variable is XLogRecordAssemble or it can be a field > in registered_buffer struct like compressed_page. > I think this can simplify the code. > Am I missing something obvious? You are missing nothing. I just introduced this structure for a matter of readability to show the two-byte difference between non-compressed and compressed header information. It is true that doing it my way makes the structures duplicated, so let's simply add the compression-related information as an extra structure added after XLogRecordBlockImageHeader if the block is compressed. I hope this addresses your concerns. >> /* >> * Fill in the remaining fields in the XLogRecordBlockImageHeader >> * struct and add new entries in the record chain. >> */ > >> bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; > > This code line seems to be misplaced with respect to the above comment. > Comment indicates filling of XLogRecordBlockImageHeader fields while > fork_flags is a field of XLogRecordBlockHeader. > Is it better to place the code close to following condition? > if (needs_backup) > { Yes, this comment should not be here. I replaced it with the comment in HEAD. >>+ *the original length of the >>+ * block without its page hole being deducible from the compressed >>+ data >>+ * itself. > IIUC, this comment before XLogRecordBlockImageHeader seems to be no > longer valid as original length is not deducible from compressed data > and rather stored in header. Aah, true. This was originally present in the header of PGLZ that has been removed to make it available for frontends. Updated patches are attached. Regards, -- Michael __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v15.patch Description: Support-compression-for-full-page-writes-in-WAL_v15.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Jan 6, 2015 at 11:09 AM, Michael Paquier wrote: > On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao wrote: >> On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote: >> The patch 1 cannot be applied to the master successfully because of >> recent change. > Yes, that's caused by ccb161b. Attached are rebased versions. > >>> - The real stuff comes with patch 2, that implements the removal of >>> PGLZ_Header, changing the APIs of compression and decompression to pglz to >>> not have anymore toast metadata, this metadata being now localized in >>> tuptoaster.c. Note that this patch protects the on-disk format (tested with >>> pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of >>> compression and decompression look like with this patch, simply performing >>> operations from a source to a destination: >>> extern int32 pglz_compress(const char *source, int32 slen, char *dest, >>> const PGLZ_Strategy *strategy); >>> extern int32 pglz_decompress(const char *source, char *dest, >>> int32 compressed_size, int32 raw_size); >>> The return value of those functions is the number of bytes written in the >>> destination buffer, and 0 if operation failed. >> >> So it's guaranteed that 0 is never returned in success case? I'm not sure >> if that case can really happen, though. > This is an inspiration from lz4 APIs. Wouldn't it be buggy for a > compression algorithm to return a size of 0 bytes as compressed or > decompressed length btw? We could as well make it return a negative > value when a failure occurs if you feel more comfortable with it. I feel that's better. Attached is the updated version of the patch. I changed the pg_lzcompress and pg_lzdecompress so that they return -1 when failure happens. Also I applied some cosmetic changes to the patch (e.g., shorten the long name of the newly-added macros). Barring any objection, I will commit this. Regards, -- Fujii Masao *** a/src/backend/access/heap/tuptoaster.c --- b/src/backend/access/heap/tuptoaster.c *** *** 35,43 #include "access/tuptoaster.h" #include "access/xact.h" #include "catalog/catalog.h" #include "miscadmin.h" #include "utils/fmgroids.h" - #include "utils/pg_lzcompress.h" #include "utils/rel.h" #include "utils/typcache.h" #include "utils/tqual.h" --- 35,43 #include "access/tuptoaster.h" #include "access/xact.h" #include "catalog/catalog.h" + #include "common/pg_lzcompress.h" #include "miscadmin.h" #include "utils/fmgroids.h" #include "utils/rel.h" #include "utils/typcache.h" #include "utils/tqual.h" *** *** 45,50 --- 45,70 #undef TOAST_DEBUG + /* + * The information at the start of the compressed toast data. + */ + typedef struct toast_compress_header + { + int32 vl_len_; /* varlena header (do not touch directly!) */ + int32 rawsize; + } toast_compress_header; + + /* + * Utilities for manipulation of header information for compressed + * toast entries. + */ + #define TOAST_COMPRESS_HDRSZ ((int32) sizeof(toast_compress_header)) + #define TOAST_COMPRESS_RAWSIZE(ptr) (((toast_compress_header *) ptr)->rawsize) + #define TOAST_COMPRESS_RAWDATA(ptr) \ + (((char *) ptr) + TOAST_COMPRESS_HDRSZ) + #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \ + (((toast_compress_header *) ptr)->rawsize = len) + static void toast_delete_datum(Relation rel, Datum value); static Datum toast_save_datum(Relation rel, Datum value, struct varlena * oldexternal, int options); *** *** 53,58 static bool toastid_valueid_exists(Oid toastrelid, Oid valueid); --- 73,79 static struct varlena *toast_fetch_datum(struct varlena * attr); static struct varlena *toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length); + static struct varlena *toast_decompress_datum(struct varlena * attr); static int toast_open_indexes(Relation toastrel, LOCKMODE lock, Relation **toastidxs, *** *** 138,148 heap_tuple_untoast_attr(struct varlena * attr) /* If it's compressed, decompress it */ if (VARATT_IS_COMPRESSED(attr)) { ! PGLZ_Header *tmp = (PGLZ_Header *) attr; ! ! attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ); ! SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ); ! pglz_decompress(tmp, VARDATA(attr)); pfree(tmp); } } --- 159,166 /* If it's compressed, decompress it */ if (VARATT_IS_COMPRESSED(attr)) { ! struct varlena *tmp = attr; ! attr = toast_decompress_datum(tmp); pfree(tmp); } } *** *** 163,173 heap_tuple_untoast_attr(struct varlena * attr) /* * This is a compressed value inside of the main tuple */ ! PGLZ_Header *tmp = (PGLZ_Header *) attr; ! ! attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ); ! SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ); ! pglz_decompress(tmp, VARDATA(attr));
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Jan 7, 2015 at 12:51 AM, Rahila Syed wrote: > Following are some comments, Thanks for the feedback. >>uint16 hole_offset:15, /* number of bytes in "hole" */ > Typo in description of hole_offset Fixed. That's "before hole". >>for (block_id = 0; block_id <= record->max_block_id; block_id++) >>- { >>- if (XLogRecHasBlockImage(record, block_id)) >>- fpi_len += BLCKSZ - > record->blocks[block_id].hole_length; >>- } >>+ fpi_len += record->blocks[block_id].bkp_len; > > IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is > incorrectly removed from the above for loop. Fixed. >>typedef struct XLogRecordCompressedBlockImageHeader > I am trying to understand the purpose behind declaration of the above > struct. IIUC, it is defined in order to introduce new field uint16 > raw_length and it has been declared as a separate struct from > XLogRecordBlockImageHeader to not affect the size of WAL record when > compression is off. > I wonder if it is ok to simply memcpy the uint16 raw_length in the > hdr_scratch when compression is on > and not have a separate header struct for it neither declare it in existing > header. raw_length can be a locally defined variable is XLogRecordAssemble > or it can be a field in registered_buffer struct like compressed_page. > I think this can simplify the code. > Am I missing something obvious? You are missing nothing. I just introduced this structure for a matter of readability to show the two-byte difference between non-compressed and compressed header information. It is true that doing it my way makes the structures duplicated, so let's simply add the compression-related information as an extra structure added after XLogRecordBlockImageHeader if the block is compressed. I hope this addresses your concerns. >> /* >> * Fill in the remaining fields in the XLogRecordBlockImageHeader >> * struct and add new entries in the record chain. >> */ > >> bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; > > This code line seems to be misplaced with respect to the above comment. > Comment indicates filling of XLogRecordBlockImageHeader fields while > fork_flags is a field of XLogRecordBlockHeader. > Is it better to place the code close to following condition? > if (needs_backup) > { Yes, this comment should not be here. I replaced it with the comment in HEAD. >>+ *the original length of the >>+ * block without its page hole being deducible from the compressed data >>+ * itself. > IIUC, this comment before XLogRecordBlockImageHeader seems to be no longer > valid as original length is not deducible from compressed data and rather > stored in header. Aah, true. This was originally present in the header of PGLZ that has been removed to make it available for frontends. Updated patches are attached. Regards, -- Michael 20150107_fpw_compression_v14.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >Yes, that's caused by ccb161b. Attached are rebased versions. Following are some comments, >uint16 hole_offset:15, /* number of bytes in "hole" */ Typo in description of hole_offset >for (block_id = 0; block_id <= record->max_block_id; block_id++) >- { >- if (XLogRecHasBlockImage(record, block_id)) >- fpi_len += BLCKSZ - record->blocks[block_id].hole_length; >- } >+ fpi_len += record->blocks[block_id].bkp_len; IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is incorrectly removed from the above for loop. >typedef struct XLogRecordCompressedBlockImageHeader I am trying to understand the purpose behind declaration of the above struct. IIUC, it is defined in order to introduce new field uint16 raw_length and it has been declared as a separate struct from XLogRecordBlockImageHeader to not affect the size of WAL record when compression is off. I wonder if it is ok to simply memcpy the uint16 raw_length in the hdr_scratch when compression is on and not have a separate header struct for it neither declare it in existing header. raw_length can be a locally defined variable is XLogRecordAssemble or it can be a field in registered_buffer struct like compressed_page. I think this can simplify the code. Am I missing something obvious? > /* > * Fill in the remaining fields in the XLogRecordBlockImageHeader > * struct and add new entries in the record chain. > */ > bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; This code line seems to be misplaced with respect to the above comment. Comment indicates filling of XLogRecordBlockImageHeader fields while fork_flags is a field of XLogRecordBlockHeader. Is it better to place the code close to following condition? if (needs_backup) { >+ *the original length of the >+ * block without its page hole being deducible from the compressed data >+ * itself. IIUC, this comment before XLogRecordBlockImageHeader seems to be no longer valid as original length is not deducible from compressed data and rather stored in header. Thank you, Rahila Syed -- View this message in context: http://postgresql.nabble.com/Compression-of-full-page-writes-tp5769039p5833025.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao wrote: > On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote: > The patch 1 cannot be applied to the master successfully because of > recent change. Yes, that's caused by ccb161b. Attached are rebased versions. >> - The real stuff comes with patch 2, that implements the removal of >> PGLZ_Header, changing the APIs of compression and decompression to pglz to >> not have anymore toast metadata, this metadata being now localized in >> tuptoaster.c. Note that this patch protects the on-disk format (tested with >> pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of >> compression and decompression look like with this patch, simply performing >> operations from a source to a destination: >> extern int32 pglz_compress(const char *source, int32 slen, char *dest, >> const PGLZ_Strategy *strategy); >> extern int32 pglz_decompress(const char *source, char *dest, >> int32 compressed_size, int32 raw_size); >> The return value of those functions is the number of bytes written in the >> destination buffer, and 0 if operation failed. > > So it's guaranteed that 0 is never returned in success case? I'm not sure > if that case can really happen, though. This is an inspiration from lz4 APIs. Wouldn't it be buggy for a compression algorithm to return a size of 0 bytes as compressed or decompressed length btw? We could as well make it return a negative value when a failure occurs if you feel more comfortable with it. -- Michael 20150105_fpw_compression_v13.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote: > > > On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier > wrote: >> On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao >> wrote: >>> pglz_compress() and pglz_decompress() still use PGLZ_Header, so the >>> frontend >>> which uses those functions needs to handle PGLZ_Header. But it basically >>> should >>> be handled via the varlena macros. That is, the frontend still seems to >>> need to >>> understand the varlena datatype. I think we should avoid that. Thought? >> Hm, yes it may be wiser to remove it and make the data passed to pglz >> for varlena 8 bytes shorter.. > > OK, here is the result of this work, made of 3 patches. Thanks for updating the patches! > The first two patches move pglz stuff to src/common and make it a frontend > utility entirely independent on varlena and its related metadata. > - Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still > present. There is nothing amazing here, and that's the broken version that > has been reverted in 966115c. The patch 1 cannot be applied to the master successfully because of recent change. > - The real stuff comes with patch 2, that implements the removal of > PGLZ_Header, changing the APIs of compression and decompression to pglz to > not have anymore toast metadata, this metadata being now localized in > tuptoaster.c. Note that this patch protects the on-disk format (tested with > pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of > compression and decompression look like with this patch, simply performing > operations from a source to a destination: > extern int32 pglz_compress(const char *source, int32 slen, char *dest, > const PGLZ_Strategy *strategy); > extern int32 pglz_decompress(const char *source, char *dest, > int32 compressed_size, int32 raw_size); > The return value of those functions is the number of bytes written in the > destination buffer, and 0 if operation failed. So it's guaranteed that 0 is never returned in success case? I'm not sure if that case can really happen, though. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 26, 2014 at 4:16 PM, Michael Paquier wrote: > On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao wrote: >> pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend >> which uses those functions needs to handle PGLZ_Header. But it basically should >> be handled via the varlena macros. That is, the frontend still seems to need to >> understand the varlena datatype. I think we should avoid that. Thought? > Hm, yes it may be wiser to remove it and make the data passed to pglz > for varlena 8 bytes shorter.. OK, here is the result of this work, made of 3 patches. The first two patches move pglz stuff to src/common and make it a frontend utility entirely independent on varlena and its related metadata. - Patch 1 is a simple move of pglz to src/common, with PGLZ_Header still present. There is nothing amazing here, and that's the broken version that has been reverted in 966115c. - The real stuff comes with patch 2, that implements the removal of PGLZ_Header, changing the APIs of compression and decompression to pglz to not have anymore toast metadata, this metadata being now localized in tuptoaster.c. Note that this patch protects the on-disk format (tested with pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of compression and decompression look like with this patch, simply performing operations from a source to a destination: extern int32 pglz_compress(const char *source, int32 slen, char *dest, const PGLZ_Strategy *strategy); extern int32 pglz_decompress(const char *source, char *dest, int32 compressed_size, int32 raw_size); The return value of those functions is the number of bytes written in the destination buffer, and 0 if operation failed. This is aimed to make backend as well more pluggable. The reason why patch 2 exists (it could be merged with patch 1), is to facilitate the review and the changes made to pglz to make it an entirely independent facility. Patch 3 is the FPW compression, changed to fit with those changes. Note that as PGLZ_Header contains the raw size of the compressed data, and that it does not exist, it is necessary to store the raw length of the block image directly in the block image header with 2 additional bytes. Those 2 bytes are used only if wal_compression is set to true thanks to a boolean flag, so if wal_compression is disabled, the WAL record length is exactly the same as HEAD, and there is no penalty in the default case. Similarly to previous patches, the block image is compressed without its hole. To finish, here are some results using the same test as here with the hack on getrusage to get the system and user CPU diff on a single backend execution: http://www.postgresql.org/message-id/cab7npqsc97o-ue5paxfmukwcxe_jioyxo1m4a0pmnmyqane...@mail.gmail.com Just as a reminder, this test generated a fixed number of FPWs on a single backend with fsync and autovacuum disabled with several values of fillfactor to see the effect of page holes. test | ffactor | user_diff | system_diff | pg_size_pretty -+-+---+-+ FPW on | 50 | 48.823907 |0.737649 | 582 MB FPW on | 20 | 16.135000 |0.764682 | 229 MB FPW on | 10 | 8.521099 |0.751947 | 116 MB FPW off | 50 | 29.722793 |1.045577 | 746 MB FPW off | 20 | 12.673375 |0.905422 | 293 MB FPW off | 10 | 6.723120 |0.779936 | 148 MB HEAD| 50 | 30.763136 |1.129822 | 746 MB HEAD| 20 | 13.340823 |0.893365 | 293 MB HEAD| 10 | 7.267311 |0.909057 | 148 MB (9 rows) Results are similar to what has been measured previously, it doesn't hurt to check again, but roughly the CPU cost is balanced by the WAL record reduction. There is 0 byte of difference in term of WAL record length between HEAD this patch when wal_compression = off. Patches, as well as the test script and the results are attached. Regards, -- Michael results.sql Description: Binary data test_compress Description: Binary data 20141228_fpw_compression_v12.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao wrote: > pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend > which uses those functions needs to handle PGLZ_Header. But it basically > should > be handled via the varlena macros. That is, the frontend still seems to need > to > understand the varlena datatype. I think we should avoid that. Thought? Hm, yes it may be wiser to remove it and make the data passed to pglz for varlena 8 bytes shorter.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 26, 2014 at 12:31 PM, Michael Paquier wrote: > On Thu, Dec 25, 2014 at 10:10 PM, Michael Paquier > wrote: >> On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier >> wrote: >>> Returning only a boolean is fine for me (that's what my first patch >>> did), especially if we add at some point hooks for compression and >>> decompression calls. >> Here is a patch rebased on current HEAD (60838df) for the core feature >> with the APIs of pglz using booleans as return values. > After the revert of 1st patch moving pglz to src/common, I have > reworked both patches, resulting in the attached. > > For pglz, the dependency to varlena has been removed to make the code > able to run independently on both frontend and backend sides. In order > to do that the APIs of pglz_compress and pglz_decompress have been > changed a bit: > - pglz_compress returns the number of bytes compressed. > - pglz_decompress takes as additional argument the compressed length > of the buffer, and returns the number of bytes decompressed instead of > a simple boolean for consistency with the compression API. > PGLZ_Header is not modified to keep the on-disk format intact. pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend which uses those functions needs to handle PGLZ_Header. But it basically should be handled via the varlena macros. That is, the frontend still seems to need to understand the varlena datatype. I think we should avoid that. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Dec 25, 2014 at 10:10 PM, Michael Paquier wrote: > On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier > wrote: >> Returning only a boolean is fine for me (that's what my first patch >> did), especially if we add at some point hooks for compression and >> decompression calls. > Here is a patch rebased on current HEAD (60838df) for the core feature > with the APIs of pglz using booleans as return values. After the revert of 1st patch moving pglz to src/common, I have reworked both patches, resulting in the attached. For pglz, the dependency to varlena has been removed to make the code able to run independently on both frontend and backend sides. In order to do that the APIs of pglz_compress and pglz_decompress have been changed a bit: - pglz_compress returns the number of bytes compressed. - pglz_decompress takes as additional argument the compressed length of the buffer, and returns the number of bytes decompressed instead of a simple boolean for consistency with the compression API. PGLZ_Header is not modified to keep the on-disk format intact. The WAL compression patch is realigned based on those changes. Regards, -- Michael 20141226_fpw_compression_v11.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier wrote: > Returning only a boolean is fine for me (that's what my first patch > did), especially if we add at some point hooks for compression and > decompression calls. Here is a patch rebased on current HEAD (60838df) for the core feature with the APIs of pglz using booleans as return values. -- Michael From c8891e6086682d4e5bc197ef3047068b3a3875c5 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 25 Nov 2014 14:24:26 +0900 Subject: [PATCH] Support compression for full-page writes in WAL Compression is controlled with a new parameter called wal_compression. This parameter can be changed at session level to control WAL compression. --- contrib/pg_xlogdump/pg_xlogdump.c | 20 ++-- doc/src/sgml/config.sgml | 29 + src/backend/access/transam/xlog.c | 1 + src/backend/access/transam/xloginsert.c | 155 ++ src/backend/access/transam/xlogreader.c | 77 ++--- src/backend/utils/misc/guc.c | 9 ++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + src/include/access/xlogreader.h | 7 +- src/include/access/xlogrecord.h | 36 -- src/include/pg_config.h.in| 4 +- 11 files changed, 283 insertions(+), 57 deletions(-) diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c index 9f05e25..3ba1d8f 100644 --- a/contrib/pg_xlogdump/pg_xlogdump.c +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -363,15 +363,12 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats, * takes up BLCKSZ bytes, minus the "hole" length. * * XXX: We peek into xlogreader's private decoded backup blocks for the - * hole_length. It doesn't seem worth it to add an accessor macro for + * length of block. It doesn't seem worth it to add an accessor macro for * this. */ fpi_len = 0; for (block_id = 0; block_id <= record->max_block_id; block_id++) - { - if (XLogRecHasBlockImage(record, block_id)) - fpi_len += BLCKSZ - record->blocks[block_id].hole_length; - } + fpi_len += record->blocks[block_id].bkp_len; /* Update per-rmgr statistics */ @@ -465,9 +462,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record) blk); if (XLogRecHasBlockImage(record, block_id)) { -printf(" (FPW); hole: offset: %u, length: %u\n", - record->blocks[block_id].hole_offset, - record->blocks[block_id].hole_length); +if (record->blocks[block_id].is_compressed) + printf(" (FPW compressed); hole offset: %u, " + "compressed length: %u, original length: %u\n", + record->blocks[block_id].hole_offset, + record->blocks[block_id].bkp_len, + record->blocks[block_id].bkp_uncompress_len); +else + printf(" (FPW); hole offset: %u, length: %u\n", + record->blocks[block_id].hole_offset, + record->blocks[block_id].bkp_len); } putchar('\n'); } diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6bcb106..acbbd20 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2282,6 +2282,35 @@ include_dir 'conf.d' + + wal_compression (boolean) + + wal_compression configuration parameter + + + + +When this parameter is on, the PostgreSQL +server compresses the content of full-page writes when necessary and +inserts in WAL records with smaller sizes, reducing the amount of +WAL stored on disk. + + + +Compression has the advantage of reducing the amount of disk I/O when +doing WAL-logging, at the cost of some extra CPU to perform the +compression of a block image. At WAL replay, compressed block images +need extra CPU cycles to perform the decompression of each block +image, but it can reduce as well replay time in I/O bounded +environments. + + + +The default value is off. + + + + wal_buffers (integer) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e54..d68d9e3 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -88,6 +88,7 @@ char *XLogArchiveCommand = NULL; bool EnableHotStandby = false; bool fullPageWrites = true; bool wal_log_hints = false; +bool wal_compression = false; bool log_checkpoints = false; int sync_method = DEFAULT_SYNC_METHOD; int wal_level = WAL_LEVEL_MINIMAL; diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index f3d610f..a1496aa 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -24,12 +24,16 @@ #include "access/xlog_internal.h" #include "access/xloginsert.h" #inc
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Dec 24, 2014 at 8:44 PM, Fujii Masao wrote: > On Fri, Dec 19, 2014 at 12:19 AM, Michael Paquier > wrote: > Firstly I'm thinking to commit the > 0001-Move-pg_lzcompress.c-to-src-common.patch. > > pg_lzcompress.h still exists in include/utils, but it should be moved to > include/common? You are right. This is a remnant of first version of this patch where pglz was added in port/ and not common/. > Do we really need PGLZ_Status? I'm not sure whether your categorization of > the result status of compress/decompress functions is right or not. For > example, > pglz_decompress() can return PGLZ_INCOMPRESSIBLE status, but which seems > invalid logically... Maybe this needs to be revisited when we introduce other > compression algorithms and create the wrapper function for those compression > and decompression functions. Anyway making pg_lzdecompress return > the boolean value seems enough. Returning only a boolean is fine for me (that's what my first patch did), especially if we add at some point hooks for compression and decompression calls. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 19, 2014 at 12:19 AM, Michael Paquier wrote: > On Thu, Dec 18, 2014 at 5:27 PM, Fujii Masao wrote: >> Thanks! > Thanks for your input. > >> +else >> +memcpy(compression_scratch, page, page_len); >> >> I don't think the block image needs to be copied to scratch buffer here. >> We can try to compress the "page" directly. > Check. > >> +#include "utils/pg_lzcompress.h" >> #include "utils/memutils.h" >> >> pg_lzcompress.h should be after meutils.h. > Oops. > >> +/* Scratch buffer used to store block image to-be-compressed */ >> +static char compression_scratch[PGLZ_MAX_BLCKSZ]; >> >> Isn't it better to allocate the memory for compression_scratch in >> InitXLogInsert() >> like hdr_scratch? > Because the OS would not touch it if wal_compression is never used, > but now that you mention it, it may be better to get that in the > context of xlog_insert.. > >> +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header)); >> >> Why don't we allocate the buffer for uncompressed page only once and >> keep reusing it like XLogReaderState->readBuf? The size of uncompressed >> page is at most BLCKSZ, so we can allocate the memory for it even before >> knowing the real size of each block image. > OK, this would save some cycles. I was trying to make process allocate > a minimum of memory only when necessary. > >> -printf(" (FPW); hole: offset: %u, length: %u\n", >> - record->blocks[block_id].hole_offset, >> - record->blocks[block_id].hole_length); >> +if (record->blocks[block_id].is_compressed) >> +printf(" (FPW); hole offset: %u, compressed length >> %u\n", >> + record->blocks[block_id].hole_offset, >> + record->blocks[block_id].bkp_len); >> +else >> +printf(" (FPW); hole offset: %u, length: %u\n", >> + record->blocks[block_id].hole_offset, >> + record->blocks[block_id].bkp_len); >> >> We need to consider what info about FPW we want pg_xlogdump to report. >> I'd like to calculate how much bytes FPW was compressed, from the report >> of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW >> and that of compressed one in the report. > OK, so let's add a parameter in the decoder for the uncompressed > length. Sounds fine? > >> In pg_config.h, the comment of BLCKSZ needs to be updated? Because >> the maximum size of BLCKSZ can be affected by not only itemid but also >> XLogRecordBlockImageHeader. > Check. > >> boolhas_image; >> +boolis_compressed; >> >> Doesn't ResetDecoder need to reset is_compressed? > Check. > >> +#wal_compression = off# enable compression of full-page writes >> Currently wal_compression compresses only FPW, so isn't it better to place >> it after full_page_writes in postgresql.conf.sample? > Check. > >> +uint16extra_data;/* used to store offset of bytes in >> "hole", with >> + * last free bit used to check if block is >> + * compressed */ >> At least to me, defining something like the following seems more easy to >> read. >> uint16hole_offset:15, >> is_compressed:1 > Check++. > > Updated patches addressing all those things are attached. Thanks for updating the patch! Firstly I'm thinking to commit the 0001-Move-pg_lzcompress.c-to-src-common.patch. pg_lzcompress.h still exists in include/utils, but it should be moved to include/common? Do we really need PGLZ_Status? I'm not sure whether your categorization of the result status of compress/decompress functions is right or not. For example, pglz_decompress() can return PGLZ_INCOMPRESSIBLE status, but which seems invalid logically... Maybe this needs to be revisited when we introduce other compression algorithms and create the wrapper function for those compression and decompression functions. Anyway making pg_lzdecompress return the boolean value seems enough. I updated 0001-Move-pg_lzcompress.c-to-src-common.patch accordingly. Barring objections, I will push the attached patch firstly. Regards, -- Fujii Masao From 92a7c2ac8a6a8ec6ae84c6713f8ad30b7958de94 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 25 Nov 2014 14:05:59 +0900 Subject: [PATCH] Move pg_lzcompress.c to src/common Exposing compression and decompression APIs of pglz makes possible its use by extensions and contrib modules. pglz_decompress contained a call to elog to emit an error message in case of corrupted data. This function is changed to return a status code to let its callers return an error instead. Compression function is changed similarly to make the whole set consistent. --- src/backend/access/heap/tuptoaster.c | 11 +- src/backend/utils/adt/Makefile|4 +- src/backend/utils/adt/pg_lzcompress.c | 779
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >Updated patches addressing all those things are attached. Below are some performance numbers using latest patch 20141219_fpw_compression_v9 . Compression looks promising with reduced impact on CPU usage, tps and runtime. pgbench command : pgbench -c 16 -j 16 -r -t 25 -M prepared To ensure that data is not highly compressible, empty filler columns were altered using alter table pgbench_accounts alter column filler type text using gen_random_uuid()::text checkpoint_segments = 1024 checkpoint_timeout = 5min fsync = on Compression On Off WAL generated 24558983188(~24.56 GB) 35931217248 (~ 35.93GB) Runtime 5987.0 s 5825.0 s TPS tps = 668.05 tps = 686.69 Latency average 23.935 ms 23.211 ms Latency stddev 80.619 ms 80.141 ms CPU usage(user) 916.64s 614.76s CPU usage(system) 54.96s 64.14s IO(average writes to disk)10.43 MB . 12.5 MB IO(total writes to disk) 64268.94 MB 72920 MB Reduction in WAL is around 32 %. Reduction in total IO writes to disk is around 12%. Impact on runtime , tps and latency is very less. CPU usage when compression is on is increased by 49% which is lesser as compared to earlier measurements. Server specifications: Processors:Intel® Xeon ® Processor E5-2650 (2 GHz, 8C/16T, 20 MB) * 2 nos RAM: 32GB Disk : HDD 450GB 10K Hot Plug 2.5-inch SAS HDD * 8 nos 1 x 450 GB SAS HDD, 2.5-inch, 6Gb/s, 10,000 rpm Thank you, Rahila Syed
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Dec 18, 2014 at 5:27 PM, Fujii Masao wrote: > Thanks! Thanks for your input. > +else > +memcpy(compression_scratch, page, page_len); > > I don't think the block image needs to be copied to scratch buffer here. > We can try to compress the "page" directly. Check. > +#include "utils/pg_lzcompress.h" > #include "utils/memutils.h" > > pg_lzcompress.h should be after meutils.h. Oops. > +/* Scratch buffer used to store block image to-be-compressed */ > +static char compression_scratch[PGLZ_MAX_BLCKSZ]; > > Isn't it better to allocate the memory for compression_scratch in > InitXLogInsert() > like hdr_scratch? Because the OS would not touch it if wal_compression is never used, but now that you mention it, it may be better to get that in the context of xlog_insert.. > +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header)); > > Why don't we allocate the buffer for uncompressed page only once and > keep reusing it like XLogReaderState->readBuf? The size of uncompressed > page is at most BLCKSZ, so we can allocate the memory for it even before > knowing the real size of each block image. OK, this would save some cycles. I was trying to make process allocate a minimum of memory only when necessary. > -printf(" (FPW); hole: offset: %u, length: %u\n", > - record->blocks[block_id].hole_offset, > - record->blocks[block_id].hole_length); > +if (record->blocks[block_id].is_compressed) > +printf(" (FPW); hole offset: %u, compressed length %u\n", > + record->blocks[block_id].hole_offset, > + record->blocks[block_id].bkp_len); > +else > +printf(" (FPW); hole offset: %u, length: %u\n", > + record->blocks[block_id].hole_offset, > + record->blocks[block_id].bkp_len); > > We need to consider what info about FPW we want pg_xlogdump to report. > I'd like to calculate how much bytes FPW was compressed, from the report > of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW > and that of compressed one in the report. OK, so let's add a parameter in the decoder for the uncompressed length. Sounds fine? > In pg_config.h, the comment of BLCKSZ needs to be updated? Because > the maximum size of BLCKSZ can be affected by not only itemid but also > XLogRecordBlockImageHeader. Check. > boolhas_image; > +boolis_compressed; > > Doesn't ResetDecoder need to reset is_compressed? Check. > +#wal_compression = off# enable compression of full-page writes > Currently wal_compression compresses only FPW, so isn't it better to place > it after full_page_writes in postgresql.conf.sample? Check. > +uint16extra_data;/* used to store offset of bytes in > "hole", with > + * last free bit used to check if block is > + * compressed */ > At least to me, defining something like the following seems more easy to > read. > uint16hole_offset:15, > is_compressed:1 Check++. Updated patches addressing all those things are attached. Regards, -- Michael 20141219_fpw_compression_v9.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Dec 18, 2014 at 7:31 PM, Rahila Syed wrote: >>Isn't it better to allocate the memory for compression_scratch in >>InitXLogInsert() >>like hdr_scratch? > > I think making compression_scratch a statically allocated global variable > is the result of following discussion earlier, > http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com Yep, in this case the OS does not request this memory as long as it is not touched, like when wal_compression is off all the time in the backend. Robert mentioned that upthread. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Dec 18, 2014 at 7:31 PM, Rahila Syed wrote: >>Isn't it better to allocate the memory for compression_scratch in >>InitXLogInsert() >>like hdr_scratch? > > I think making compression_scratch a statically allocated global variable > is the result of following discussion earlier, > > http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com /* * Permanently allocate readBuf. We do it this way, rather than just * making a static array, for two reasons: (1) no need to waste the * storage in most instantiations of the backend; (2) a static char array * isn't guaranteed to have any particular alignment, whereas palloc() * will provide MAXALIGN'd storage. */ The above source code comment in XLogReaderAllocate() makes me think that it's better to avoid using a static array. The point (1) seems less important in this case because most processes need the buffer for WAL compression, though. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
>Isn't it better to allocate the memory for compression_scratch in >InitXLogInsert() >like hdr_scratch? I think making compression_scratch a statically allocated global variable is the result of following discussion earlier, http://www.postgresql.org/message-id/ca+tgmoaznbuwnls4bpwyqgqteeznoavy7kwdbm0a2-tbarn...@mail.gmail.com Thank you, Rahila Syed On Thu, Dec 18, 2014 at 1:57 PM, Fujii Masao wrote: > > On Thu, Dec 18, 2014 at 2:21 PM, Michael Paquier > wrote: > > > > > > On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed > > wrote: > >> > >> I had a look at code. I have few minor points, > > > > Thanks! > > > >> + bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; > >> + > >> + if (is_compressed) > >> { > >> - rdt_datas_last->data = page; > >> - rdt_datas_last->len = BLCKSZ; > >> + /* compressed block information */ > >> + bimg.length = compress_len; > >> + bimg.extra_data = hole_offset; > >> + bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK; > >> > >> For consistency with the existing code , how about renaming the macro > >> XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines > of > >> BKPBLOCK_HAS_IMAGE. > > > > OK, why not... > > > >> > >> + blk->hole_offset = extra_data & > ~XLR_BLCK_COMPRESSED_MASK; > >> Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be > >> more indicative of the fact that lower 15 bits of extra_data field > comprises > >> of hole_offset value. This suggestion is also just to achieve > consistency > >> with the existing BKPBLOCK_FORK_MASK for fork_flags field. > > > > Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK > > though. > > > >> And comment typo > >> +* First try to compress block, filling in the page hole > with > >> zeros > >> +* to improve the compression of the whole. If the block is > >> considered > >> +* as incompressible, complete the block header information > as > >> if > >> +* nothing happened. > >> > >> As hole is no longer being compressed, this needs to be changed. > > > > Fixed. As well as an additional comment block down. > > > > A couple of things noticed on the fly: > > - Fixed pg_xlogdump being not completely correct to report the FPW > > information > > - A couple of typos and malformed sentences fixed > > - Added an assertion to check that the hole offset value does not the bit > > used for compression status > > - Reworked docs, mentioning as well that wal_compression is off by > default. > > - Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by > > Fujii-san) > > Thanks! > > +else > +memcpy(compression_scratch, page, page_len); > > I don't think the block image needs to be copied to scratch buffer here. > We can try to compress the "page" directly. > > +#include "utils/pg_lzcompress.h" > #include "utils/memutils.h" > > pg_lzcompress.h should be after meutils.h. > > +/* Scratch buffer used to store block image to-be-compressed */ > +static char compression_scratch[PGLZ_MAX_BLCKSZ]; > > Isn't it better to allocate the memory for compression_scratch in > InitXLogInsert() > like hdr_scratch? > > +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header)); > > Why don't we allocate the buffer for uncompressed page only once and > keep reusing it like XLogReaderState->readBuf? The size of uncompressed > page is at most BLCKSZ, so we can allocate the memory for it even before > knowing the real size of each block image. > > -printf(" (FPW); hole: offset: %u, length: %u\n", > - record->blocks[block_id].hole_offset, > - record->blocks[block_id].hole_length); > +if (record->blocks[block_id].is_compressed) > +printf(" (FPW); hole offset: %u, compressed length > %u\n", > + record->blocks[block_id].hole_offset, > + record->blocks[block_id].bkp_len); > +else > +printf(" (FPW); hole offset: %u, length: %u\n", > + record->blocks[block_id].hole_offset, > + record->blocks[block_id].bkp_len); > > We need to consider what info about FPW we want pg_xlogdump to report. > I'd like to calculate how much bytes FPW was compressed, from the report > of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW > and that of compressed one in the report. > > In pg_config.h, the comment of BLCKSZ needs to be updated? Because > the maximum size of BLCKSZ can be affected by not only itemid but also > XLogRecordBlockImageHeader. > > boolhas_image; > +boolis_compressed; > > Doesn't ResetDecoder need to reset is_compressed? > > +#wal_compression = off# enable compression of full-page writes > > Currently wal_com
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Dec 18, 2014 at 2:21 PM, Michael Paquier wrote: > > > On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed > wrote: >> >> I had a look at code. I have few minor points, > > Thanks! > >> + bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; >> + >> + if (is_compressed) >> { >> - rdt_datas_last->data = page; >> - rdt_datas_last->len = BLCKSZ; >> + /* compressed block information */ >> + bimg.length = compress_len; >> + bimg.extra_data = hole_offset; >> + bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK; >> >> For consistency with the existing code , how about renaming the macro >> XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of >> BKPBLOCK_HAS_IMAGE. > > OK, why not... > >> >> + blk->hole_offset = extra_data & ~XLR_BLCK_COMPRESSED_MASK; >> Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be >> more indicative of the fact that lower 15 bits of extra_data field comprises >> of hole_offset value. This suggestion is also just to achieve consistency >> with the existing BKPBLOCK_FORK_MASK for fork_flags field. > > Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK > though. > >> And comment typo >> +* First try to compress block, filling in the page hole with >> zeros >> +* to improve the compression of the whole. If the block is >> considered >> +* as incompressible, complete the block header information as >> if >> +* nothing happened. >> >> As hole is no longer being compressed, this needs to be changed. > > Fixed. As well as an additional comment block down. > > A couple of things noticed on the fly: > - Fixed pg_xlogdump being not completely correct to report the FPW > information > - A couple of typos and malformed sentences fixed > - Added an assertion to check that the hole offset value does not the bit > used for compression status > - Reworked docs, mentioning as well that wal_compression is off by default. > - Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by > Fujii-san) Thanks! +else +memcpy(compression_scratch, page, page_len); I don't think the block image needs to be copied to scratch buffer here. We can try to compress the "page" directly. +#include "utils/pg_lzcompress.h" #include "utils/memutils.h" pg_lzcompress.h should be after meutils.h. +/* Scratch buffer used to store block image to-be-compressed */ +static char compression_scratch[PGLZ_MAX_BLCKSZ]; Isn't it better to allocate the memory for compression_scratch in InitXLogInsert() like hdr_scratch? +uncompressed_page = (char *) palloc(PGLZ_RAW_SIZE(header)); Why don't we allocate the buffer for uncompressed page only once and keep reusing it like XLogReaderState->readBuf? The size of uncompressed page is at most BLCKSZ, so we can allocate the memory for it even before knowing the real size of each block image. -printf(" (FPW); hole: offset: %u, length: %u\n", - record->blocks[block_id].hole_offset, - record->blocks[block_id].hole_length); +if (record->blocks[block_id].is_compressed) +printf(" (FPW); hole offset: %u, compressed length %u\n", + record->blocks[block_id].hole_offset, + record->blocks[block_id].bkp_len); +else +printf(" (FPW); hole offset: %u, length: %u\n", + record->blocks[block_id].hole_offset, + record->blocks[block_id].bkp_len); We need to consider what info about FPW we want pg_xlogdump to report. I'd like to calculate how much bytes FPW was compressed, from the report of pg_xlogdump. So I'd like to see also the both length of uncompressed FPW and that of compressed one in the report. In pg_config.h, the comment of BLCKSZ needs to be updated? Because the maximum size of BLCKSZ can be affected by not only itemid but also XLogRecordBlockImageHeader. boolhas_image; +boolis_compressed; Doesn't ResetDecoder need to reset is_compressed? +#wal_compression = off# enable compression of full-page writes Currently wal_compression compresses only FPW, so isn't it better to place it after full_page_writes in postgresql.conf.sample? +uint16extra_data;/* used to store offset of bytes in "hole", with + * last free bit used to check if block is + * compressed */ At least to me, defining something like the following seems more easy to read. uint16hole_offset:15, is_compressed:1 Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Dec 17, 2014 at 11:33 PM, Rahila Syed wrote: > I had a look at code. I have few minor points, > Thanks! + bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; > + > + if (is_compressed) > { > - rdt_datas_last->data = page; > - rdt_datas_last->len = BLCKSZ; > + /* compressed block information */ > + bimg.length = compress_len; > + bimg.extra_data = hole_offset; > + bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK; > > For consistency with the existing code , how about renaming the macro > XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of > BKPBLOCK_HAS_IMAGE. > OK, why not... > + blk->hole_offset = extra_data & ~XLR_BLCK_COMPRESSED_MASK; > Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be > more indicative of the fact that lower 15 bits of extra_data field > comprises of hole_offset value. This suggestion is also just to achieve > consistency with the existing BKPBLOCK_FORK_MASK for fork_flags field. > Yeah that seems clearer, let's define it as ~XLR_BLCK_COMPRESSED_MASK though. And comment typo > +* First try to compress block, filling in the page hole with > zeros > +* to improve the compression of the whole. If the block is > considered > +* as incompressible, complete the block header information as > if > +* nothing happened. > > As hole is no longer being compressed, this needs to be changed. > Fixed. As well as an additional comment block down. A couple of things noticed on the fly: - Fixed pg_xlogdump being not completely correct to report the FPW information - A couple of typos and malformed sentences fixed - Added an assertion to check that the hole offset value does not the bit used for compression status - Reworked docs, mentioning as well that wal_compression is off by default. - Removed stuff in pg_controldata and XLOG_PARAMETER_CHANGE (mentioned by Fujii-san) Regards, -- Michael 20141218_fpw_compression_v8.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Dec 18, 2014 at 1:05 PM, Fujii Masao wrote: > On Wed, Dec 17, 2014 at 1:34 AM, Michael Paquier > wrote: > I think that neither pg_control nor xl_parameter_change need to have the info > about WAL compression because each backup block has that entry. > > Will review the remaining part later. I got into wondering the utility of this part earlier this morning as that's some remnant of when wal_compression was set as PGC_POSTMASTER. Will remove. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Dec 17, 2014 at 1:34 AM, Michael Paquier wrote: > > > On Wed, Dec 17, 2014 at 12:00 AM, Michael Paquier > wrote: >> >> Actually, the original length of the compressed block in saved in >> PGLZ_Header, so if we are fine to not check the size of the block >> decompressed when decoding WAL we can do without the hole filled with zeros, >> and use only 1 bit to see if the block is compressed or not. > > And.. After some more hacking, I have been able to come up with a patch that > is able to compress blocks without the page hole, and that keeps the WAL > record length the same as HEAD when compression switch is off. The numbers > are pretty good, CPU is saved in the same proportions as previous patches > when compression is enabled, and there is zero delta with HEAD when > compression switch is off. > > Here are the actual numbers: >test_name | pg_size_pretty | user_diff | system_diff > ---++---+- > FPW on + 2 bytes, ffactor 50 | 582 MB | 42.391894 |0.807444 > FPW on + 2 bytes, ffactor 20 | 229 MB | 14.330304 |0.729626 > FPW on + 2 bytes, ffactor 10 | 117 MB | 7.335442 |0.570996 > FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503 > FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448 > FPW off + 2 bytes, ffactor 10 | 148 MB | 5.762775 |0.763761 > FPW on + 0 bytes, ffactor 50 | 582 MB | 42.174297 |0.790596 > FPW on + 0 bytes, ffactor 20 | 229 MB | 14.424233 |0.770459 > FPW on + 0 bytes, ffactor 10 | 117 MB | 7.057195 |0.584806 > FPW off + 0 bytes, ffactor 50 | 746 MB | 25.261998 |1.054516 > FPW off + 0 bytes, ffactor 20 | 293 MB | 10.589888 |0.860207 > FPW off + 0 bytes, ffactor 10 | 148 MB | 5.827191 |0.874285 > HEAD, ffactor 50 | 746 MB | 25.181729 |1.133433 > HEAD, ffactor 20 | 293 MB | 9.962242 |0.765970 > HEAD, ffactor 10 | 148 MB | 5.693426 |0.775371 > Record, ffactor 50| 582 MB | 54.904374 |0.678204 > Record, ffactor 20| 229 MB | 19.798268 |0.807220 > Record, ffactor 10| 116 MB | 9.401877 |0.668454 > (18 rows) > > The new tests of this patch are "FPW off + 0 bytes". Patches as well as > results are attached. I think that neither pg_control nor xl_parameter_change need to have the info about WAL compression because each backup block has that entry. Will review the remaining part later. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >Patches as well as results are attached. I had a look at code. I have few minor points, + bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; + + if (is_compressed) { - rdt_datas_last->data = page; - rdt_datas_last->len = BLCKSZ; + /* compressed block information */ + bimg.length = compress_len; + bimg.extra_data = hole_offset; + bimg.extra_data |= XLR_BLCK_COMPRESSED_MASK; For consistency with the existing code , how about renaming the macro XLR_BLCK_COMPRESSED_MASK as BKPBLOCK_HAS_COMPRESSED_IMAGE on the lines of BKPBLOCK_HAS_IMAGE. + blk->hole_offset = extra_data & ~XLR_BLCK_COMPRESSED_MASK; Here , I think that having the mask as BKPBLOCK_HOLE_OFFSET_MASK will be more indicative of the fact that lower 15 bits of extra_data field comprises of hole_offset value. This suggestion is also just to achieve consistency with the existing BKPBLOCK_FORK_MASK for fork_flags field. And comment typo +* First try to compress block, filling in the page hole with zeros +* to improve the compression of the whole. If the block is considered +* as incompressible, complete the block header information as if +* nothing happened. As hole is no longer being compressed, this needs to be changed. Thank you, Rahila Syed On Tue, Dec 16, 2014 at 10:04 PM, Michael Paquier wrote: > > > > On Wed, Dec 17, 2014 at 12:00 AM, Michael Paquier < > michael.paqu...@gmail.com> wrote: >> >> Actually, the original length of the compressed block in saved in >> PGLZ_Header, so if we are fine to not check the size of the block >> decompressed when decoding WAL we can do without the hole filled with >> zeros, and use only 1 bit to see if the block is compressed or not. >> > And.. After some more hacking, I have been able to come up with a patch > that is able to compress blocks without the page hole, and that keeps the > WAL record length the same as HEAD when compression switch is off. The > numbers are pretty good, CPU is saved in the same proportions as previous > patches when compression is enabled, and there is zero delta with HEAD when > compression switch is off. > > Here are the actual numbers: >test_name | pg_size_pretty | user_diff | system_diff > ---++---+- > FPW on + 2 bytes, ffactor 50 | 582 MB | 42.391894 |0.807444 > FPW on + 2 bytes, ffactor 20 | 229 MB | 14.330304 |0.729626 > FPW on + 2 bytes, ffactor 10 | 117 MB | 7.335442 |0.570996 > FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503 > FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448 > FPW off + 2 bytes, ffactor 10 | 148 MB | 5.762775 |0.763761 > FPW on + 0 bytes, ffactor 50 | 582 MB | 42.174297 |0.790596 > FPW on + 0 bytes, ffactor 20 | 229 MB | 14.424233 |0.770459 > FPW on + 0 bytes, ffactor 10 | 117 MB | 7.057195 |0.584806 > FPW off + 0 bytes, ffactor 50 | 746 MB | 25.261998 |1.054516 > FPW off + 0 bytes, ffactor 20 | 293 MB | 10.589888 |0.860207 > FPW off + 0 bytes, ffactor 10 | 148 MB | 5.827191 |0.874285 > HEAD, ffactor 50 | 746 MB | 25.181729 |1.133433 > HEAD, ffactor 20 | 293 MB | 9.962242 |0.765970 > HEAD, ffactor 10 | 148 MB | 5.693426 |0.775371 > Record, ffactor 50| 582 MB | 54.904374 |0.678204 > Record, ffactor 20| 229 MB | 19.798268 |0.807220 > Record, ffactor 10| 116 MB | 9.401877 |0.668454 > (18 rows) > > The new tests of this patch are "FPW off + 0 bytes". Patches as well as > results are attached. > Regards, > -- > Michael >
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Dec 17, 2014 at 12:12 AM, Merlin Moncure wrote: > Compressing *one* file with lz4 and a quick/n/dirty plgz i hacked out > of the source (borrowing heavily from > https://github.com/maropu/pglz_bench/blob/master/pglz_bench.cpp), I > tested the results: > > lz4 real time: 0m0.032s > pglz real time: 0m0.281s > > mmoncure@mernix2 ~/src/lz4/lz4-r125 $ ls -lh test.* > -rw-r--r-- 1 mmoncure mmoncure 2.7M Dec 16 09:04 test.lz4 > -rw-r--r-- 1 mmoncure mmoncure 2.5M Dec 16 09:01 test.pglz > A better test would examine all manner of different xlog files in a > fashion closer to how your patch would need to compress them but the > numbers here tell a fairly compelling story: similar compression > results for around 9x the cpu usage. > Yes that could be better... Thanks for some real numbers that's really informative. Be advised that compression alg > selection is one of those types of discussions that tends to spin off > into outer space; that's not something you have to solve today. Just > try and make things so that they can be switched out if things > change > One way to get around that would be a set of hooks to allow people to set up the compression algorithm they want: - One for buffer compression - One for buffer decompression - Perhaps one to set up the size of the buffer used for compression/decompression scratch buffer. In the case of pglz, this needs for example to be PGLZ_MAX_OUTPUT(buffer_size). The part actually tricky is that as xlogreader.c is used by pg_xlogdump, we cannot directly use a hook in it, but we may as well be able to live with a fixed maximum size like BLCKSZ * 2 by default, this would let largely enough room for all kinds of compression algos IMO... Regards, -- Michael
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Dec 17, 2014 at 12:00 AM, Michael Paquier wrote: > > Actually, the original length of the compressed block in saved in > PGLZ_Header, so if we are fine to not check the size of the block > decompressed when decoding WAL we can do without the hole filled with > zeros, and use only 1 bit to see if the block is compressed or not. > And.. After some more hacking, I have been able to come up with a patch that is able to compress blocks without the page hole, and that keeps the WAL record length the same as HEAD when compression switch is off. The numbers are pretty good, CPU is saved in the same proportions as previous patches when compression is enabled, and there is zero delta with HEAD when compression switch is off. Here are the actual numbers: test_name | pg_size_pretty | user_diff | system_diff ---++---+- FPW on + 2 bytes, ffactor 50 | 582 MB | 42.391894 |0.807444 FPW on + 2 bytes, ffactor 20 | 229 MB | 14.330304 |0.729626 FPW on + 2 bytes, ffactor 10 | 117 MB | 7.335442 |0.570996 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448 FPW off + 2 bytes, ffactor 10 | 148 MB | 5.762775 |0.763761 FPW on + 0 bytes, ffactor 50 | 582 MB | 42.174297 |0.790596 FPW on + 0 bytes, ffactor 20 | 229 MB | 14.424233 |0.770459 FPW on + 0 bytes, ffactor 10 | 117 MB | 7.057195 |0.584806 FPW off + 0 bytes, ffactor 50 | 746 MB | 25.261998 |1.054516 FPW off + 0 bytes, ffactor 20 | 293 MB | 10.589888 |0.860207 FPW off + 0 bytes, ffactor 10 | 148 MB | 5.827191 |0.874285 HEAD, ffactor 50 | 746 MB | 25.181729 |1.133433 HEAD, ffactor 20 | 293 MB | 9.962242 |0.765970 HEAD, ffactor 10 | 148 MB | 5.693426 |0.775371 Record, ffactor 50| 582 MB | 54.904374 |0.678204 Record, ffactor 20| 229 MB | 19.798268 |0.807220 Record, ffactor 10| 116 MB | 9.401877 |0.668454 (18 rows) The new tests of this patch are "FPW off + 0 bytes". Patches as well as results are attached. Regards, -- Michael results.sql Description: Binary data 20141216_fpw_compression_v7.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Dec 15, 2014 at 5:37 PM, Michael Paquier wrote: > On Tue, Dec 16, 2014 at 5:14 AM, Merlin Moncure wrote: >> OTOH, Our built in compressor as we all know is a complete dog in >> terms of cpu when stacked up against some more modern implementations. >> All that said, as long as there is a clean path to migrating to >> another compression alg should one materialize, that problem can be >> nicely decoupled from this patch as Robert pointed out. > I am curious to see some numbers about that. Has anyone done such > comparison measurements? I don't, but I can make some. There are some numbers on the web but it's better to make some new ones because IIRC some light optimization had gone into plgz of late. Compressing *one* file with lz4 and a quick/n/dirty plgz i hacked out of the source (borrowing heavily from https://github.com/maropu/pglz_bench/blob/master/pglz_bench.cpp), I tested the results: lz4 real time: 0m0.032s pglz real time: 0m0.281s mmoncure@mernix2 ~/src/lz4/lz4-r125 $ ls -lh test.* -rw-r--r-- 1 mmoncure mmoncure 2.7M Dec 16 09:04 test.lz4 -rw-r--r-- 1 mmoncure mmoncure 2.5M Dec 16 09:01 test.pglz A better test would examine all manner of different xlog files in a fashion closer to how your patch would need to compress them but the numbers here tell a fairly compelling story: similar compression results for around 9x the cpu usage. Be advised that compression alg selection is one of those types of discussions that tends to spin off into outer space; that's not something you have to solve today. Just try and make things so that they can be switched out if things change merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Dec 16, 2014 at 11:30 PM, Michael Paquier wrote: > > > > On Tue, Dec 16, 2014 at 11:24 PM, Alvaro Herrera > wrote: >> >> Michael Paquier wrote: >> >> > And here are attached fresh patches reducing the WAL record size to >> what it >> > is in head when the compression switch is off. Looking at the logic in >> > xlogrecord.h, the block header stores the hole length and hole offset. I >> > changed that a bit to store the length of raw block, with hole or >> > compressed as the 1st uint16. The second uint16 is used to store the >> hole >> > offset, same as HEAD when compression switch is off. When compression is >> > on, a special value 0x is saved (actually only filling 1 in the 16th >> > bit is fine...). Note that this forces to fill in the hole with zeros >> and >> > to compress always BLCKSZ worth of data. >> >> Why do we compress the hole? This seems pointless, considering that we >> know it's all zeroes. Is it possible to compress the head and tail of >> page separately? >> > This would take 2 additional bytes at minimum in the block header, > resulting in 8 additional bytes in record each time a FPW shows up. IMO it > is important to check the length of things obtained when replaying WAL, > that's something the current code of HEAD does quite well. > Actually, the original length of the compressed block in saved in PGLZ_Header, so if we are fine to not check the size of the block decompressed when decoding WAL we can do without the hole filled with zeros, and use only 1 bit to see if the block is compressed or not. -- Michael
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Dec 16, 2014 at 11:24 PM, Alvaro Herrera wrote: > > Michael Paquier wrote: > > > And here are attached fresh patches reducing the WAL record size to what > it > > is in head when the compression switch is off. Looking at the logic in > > xlogrecord.h, the block header stores the hole length and hole offset. I > > changed that a bit to store the length of raw block, with hole or > > compressed as the 1st uint16. The second uint16 is used to store the hole > > offset, same as HEAD when compression switch is off. When compression is > > on, a special value 0x is saved (actually only filling 1 in the 16th > > bit is fine...). Note that this forces to fill in the hole with zeros and > > to compress always BLCKSZ worth of data. > > Why do we compress the hole? This seems pointless, considering that we > know it's all zeroes. Is it possible to compress the head and tail of > page separately? > This would take 2 additional bytes at minimum in the block header, resulting in 8 additional bytes in record each time a FPW shows up. IMO it is important to check the length of things obtained when replaying WAL, that's something the current code of HEAD does quite well. -- Michael
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Michael Paquier wrote: > And here are attached fresh patches reducing the WAL record size to what it > is in head when the compression switch is off. Looking at the logic in > xlogrecord.h, the block header stores the hole length and hole offset. I > changed that a bit to store the length of raw block, with hole or > compressed as the 1st uint16. The second uint16 is used to store the hole > offset, same as HEAD when compression switch is off. When compression is > on, a special value 0x is saved (actually only filling 1 in the 16th > bit is fine...). Note that this forces to fill in the hole with zeros and > to compress always BLCKSZ worth of data. Why do we compress the hole? This seems pointless, considering that we know it's all zeroes. Is it possible to compress the head and tail of page separately? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Dec 16, 2014 at 8:35 AM, Michael Paquier wrote: > On Tue, Dec 16, 2014 at 3:46 AM, Robert Haas wrote: >> On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier >> wrote: >>> Something to be aware of btw is that this patch introduces an >>> additional 8 bytes per block image in WAL as it contains additional >>> information to control the compression. In this case this is the >>> uint16 compress_len present in XLogRecordBlockImageHeader. In the case >>> of the measurements done, knowing that 63638 FPWs have been written, >>> there is a difference of a bit less than 500k in WAL between HEAD and >>> "FPW off" in favor of HEAD. The gain with compression is welcome, >>> still for the default there is a small price to track down if a block >>> is compressed or not. This patch still takes advantage of it by not >>> compressing the hole present in page and reducing CPU work a bit. >> >> That sounds like a pretty serious problem to me. > OK. If that's so much a problem, I'll switch back to the version using > 1 bit in the block header to identify if a block is compressed or not. > This way, when switch will be off the record length will be the same > as HEAD. And here are attached fresh patches reducing the WAL record size to what it is in head when the compression switch is off. Looking at the logic in xlogrecord.h, the block header stores the hole length and hole offset. I changed that a bit to store the length of raw block, with hole or compressed as the 1st uint16. The second uint16 is used to store the hole offset, same as HEAD when compression switch is off. When compression is on, a special value 0x is saved (actually only filling 1 in the 16th bit is fine...). Note that this forces to fill in the hole with zeros and to compress always BLCKSZ worth of data. Those patches pass make check-world, even WAL replay on standbys. I have done as well measurements using this patch set, with the following things that can be noticed: - When compression switch is off, the same quantity of WAL as HEAD is produced - pglz is very bad at compressing page hole. I mean, really bad. Have a look at the user CPU particularly when pages are empty and you'll understand... Other compression algorithms would be better here. Tests are done with various values of fillfactor, 10 means that after the update 80% of the page is empty, at 50% the page is more or less completely full. Here are the results, with 5 test cases: - FPW on + 2 bytes, compression switch is on, using 2 additional bytes in block header, resulting in WAL records longer as 8 more bytes are used per block with lower CPU usage as page holes are not compressed by pglz. - FPW off + 2 bytes, same as previous, with compression switch to on. - FPW on + 0 bytes, compression switch to on, the same block header size as HEAD is used, at the cost of compressing page holes filled with zeros - FPW on + 0 bytes, compression switch to off, same as previous - HEAD, unpatched master (except with hack to calculate user and system CPU) - Record, the record-level compression, with compression lower-bound set at 0. =# select test || ', ffactor ' || ffactor, pg_size_pretty(post_update - pre_update), user_diff, system_diff from results; ?column?| pg_size_pretty | user_diff | system_diff ---++---+- FPW on + 2 bytes, ffactor 50 | 582 MB | 42.391894 |0.807444 FPW on + 2 bytes, ffactor 20 | 229 MB | 14.330304 |0.729626 FPW on + 2 bytes, ffactor 10 | 117 MB | 7.335442 |0.570996 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448 FPW off + 2 bytes, ffactor 10 | 148 MB | 5.762775 |0.763761 FPW on + 0 bytes, ffactor 50 | 585 MB | 54.115496 |0.924891 FPW on + 0 bytes, ffactor 20 | 234 MB | 26.270404 |0.755862 FPW on + 0 bytes, ffactor 10 | 122 MB | 19.540131 |0.800981 FPW off + 0 bytes, ffactor 50 | 746 MB | 25.102241 |1.110677 FPW off + 0 bytes, ffactor 20 | 293 MB | 9.889374 |0.749884 FPW off + 0 bytes, ffactor 10 | 148 MB | 5.286767 |0.682746 HEAD, ffactor 50 | 746 MB | 25.181729 |1.133433 HEAD, ffactor 20 | 293 MB | 9.962242 |0.765970 HEAD, ffactor 10 | 148 MB | 5.693426 |0.775371 Record, ffactor 50| 582 MB | 54.904374 |0.678204 Record, ffactor 20| 229 MB | 19.798268 |0.807220 Record, ffactor 10| 116 MB | 9.401877 |0.668454 (18 rows) Attached are as well the results of the measurements, and the test case used. Regards, -- Michael 20141216_fpw_compression_v7.tar.gz Description: GNU Zip compressed data results.sql Description: Binary data test_compress Description: Binary data -- Sent via pgsql-hackers mailin
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Dec 16, 2014 at 5:14 AM, Merlin Moncure wrote: > OTOH, Our built in compressor as we all know is a complete dog in > terms of cpu when stacked up against some more modern implementations. > All that said, as long as there is a clean path to migrating to > another compression alg should one materialize, that problem can be > nicely decoupled from this patch as Robert pointed out. I am curious to see some numbers about that. Has anyone done such comparison measurements? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Dec 16, 2014 at 3:46 AM, Robert Haas wrote: > On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier > wrote: >> Something to be aware of btw is that this patch introduces an >> additional 8 bytes per block image in WAL as it contains additional >> information to control the compression. In this case this is the >> uint16 compress_len present in XLogRecordBlockImageHeader. In the case >> of the measurements done, knowing that 63638 FPWs have been written, >> there is a difference of a bit less than 500k in WAL between HEAD and >> "FPW off" in favor of HEAD. The gain with compression is welcome, >> still for the default there is a small price to track down if a block >> is compressed or not. This patch still takes advantage of it by not >> compressing the hole present in page and reducing CPU work a bit. > > That sounds like a pretty serious problem to me. OK. If that's so much a problem, I'll switch back to the version using 1 bit in the block header to identify if a block is compressed or not. This way, when switch will be off the record length will be the same as HEAD. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 12, 2014 at 8:27 AM, Andres Freund wrote: > On 2014-12-12 09:24:27 -0500, Bruce Momjian wrote: >> On Fri, Dec 12, 2014 at 03:22:24PM +0100, Andres Freund wrote: >> > > Well, the larger question is why wouldn't we just have the user compress >> > > the entire WAL file before archiving --- why have each backend do it? >> > > Is it the write volume we are saving? I though this WAL compression >> > > gave better performance in some cases. >> > >> > Err. Streaming? >> >> Well, you can already set up SSL for compression while streaming. In >> fact, I assume many are already using SSL for streaming as the majority >> of SSL overhead is from connection start. > > That's not really true. The overhead of SSL during streaming is > *significant*. Both the kind of compression it does (which is far more > expensive than pglz or lz4) and the encyrption itself. In many cases > it's prohibitively expensive - there's even a fair number on-list > reports about this. (late to the party) That may be true, but there are a number of ways to work around SSL performance issues such as hardware acceleration (perhaps deferring encryption to another point in the network), weakening the protocol, or not using it at all. OTOH, Our built in compressor as we all know is a complete dog in terms of cpu when stacked up against some more modern implementations. All that said, as long as there is a clean path to migrating to another compression alg should one materialize, that problem can be nicely decoupled from this patch as Robert pointed out. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sat, Dec 13, 2014 at 9:36 AM, Michael Paquier wrote: > Something to be aware of btw is that this patch introduces an > additional 8 bytes per block image in WAL as it contains additional > information to control the compression. In this case this is the > uint16 compress_len present in XLogRecordBlockImageHeader. In the case > of the measurements done, knowing that 63638 FPWs have been written, > there is a difference of a bit less than 500k in WAL between HEAD and > "FPW off" in favor of HEAD. The gain with compression is welcome, > still for the default there is a small price to track down if a block > is compressed or not. This patch still takes advantage of it by not > compressing the hole present in page and reducing CPU work a bit. That sounds like a pretty serious problem to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Note: this patch has been moved to CF 2014-12 and I marked myself as an author if that's fine... I've finished by being really involved in that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sun, Dec 14, 2014 at 1:16 PM, Andres Freund wrote: > On 2014-12-14 09:56:59 +0900, Michael Paquier wrote: >> On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs wrote: >> > On 13 December 2014 at 14:36, Michael Paquier >> > wrote: >> > >> >> Something to be aware of btw is that this patch introduces an >> >> additional 8 bytes per block image in WAL as it contains additional >> >> information to control the compression. In this case this is the >> >> uint16 compress_len present in XLogRecordBlockImageHeader. >> > >> > So we add 8 bytes to all FPWs, or only for compressed FPWs? >> In this case that was all. We could still use xl_info to put a flag >> telling that blocks are compressed, but it feels more consistent to >> have a way to identify if a block is compressed inside its own header. > > Your 'consistency' argument doesn't convince me. Could you be more precise (perhaps my use of the word "consistent" was incorrect here)? Isn't it the most natural way of doing to have the compression information of each block in their own headers? There may be blocks that are marked as incompressible in a whole set, so we need to track for each block individually if they are compressed. Now, instead of an additional uint16 to store the compressed length of the block, we can take 1 bit from hole_length and 1 bit from hole_offset to store a status flag deciding if a block is compressed. If we do so, the tradeoff is to fill in the block hole with zeros and compress BLCKSZ worth of data all the time, costing more CPU. But doing so we would still use only 4 bytes for the block information, making default case, aka compression switch off, behave like HEAD in term of pure record quantity. This second method has been as well mentioned upthread a couple of times. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2014-12-14 09:56:59 +0900, Michael Paquier wrote: > On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs wrote: > > On 13 December 2014 at 14:36, Michael Paquier > > wrote: > > > >> Something to be aware of btw is that this patch introduces an > >> additional 8 bytes per block image in WAL as it contains additional > >> information to control the compression. In this case this is the > >> uint16 compress_len present in XLogRecordBlockImageHeader. > > > > So we add 8 bytes to all FPWs, or only for compressed FPWs? > In this case that was all. We could still use xl_info to put a flag > telling that blocks are compressed, but it feels more consistent to > have a way to identify if a block is compressed inside its own header. Your 'consistency' argument doesn't convince me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sun, Dec 14, 2014 at 5:45 AM, Simon Riggs wrote: > On 13 December 2014 at 14:36, Michael Paquier > wrote: > >> Something to be aware of btw is that this patch introduces an >> additional 8 bytes per block image in WAL as it contains additional >> information to control the compression. In this case this is the >> uint16 compress_len present in XLogRecordBlockImageHeader. > > So we add 8 bytes to all FPWs, or only for compressed FPWs? In this case that was all. We could still use xl_info to put a flag telling that blocks are compressed, but it feels more consistent to have a way to identify if a block is compressed inside its own header. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 13 December 2014 at 14:36, Michael Paquier wrote: > Something to be aware of btw is that this patch introduces an > additional 8 bytes per block image in WAL as it contains additional > information to control the compression. In this case this is the > uint16 compress_len present in XLogRecordBlockImageHeader. So we add 8 bytes to all FPWs, or only for compressed FPWs? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 12, 2014 at 11:50 PM, Michael Paquier wrote: > > > On Wed, Dec 10, 2014 at 11:25 PM, Bruce Momjian wrote: >> >> On Wed, Dec 10, 2014 at 07:40:46PM +0530, Rahila Syed wrote: >> > The tests ran for around 30 mins.Manual checkpoint was run before each >> > test. >> > >> > Compression WAL generated%compressionLatency-avg CPU usage >> > (seconds) TPS >> > Latency >> > stddev >> > >> > >> > on 1531.4 MB ~35 % 7.351 ms >> > user diff: 562.67s system diff: 41.40s 135.96 >> > 13.759 ms >> > >> > >> > off 2373.1 MB 6.781 >> > ms >> > user diff: 354.20s system diff: 39.67s147.40 >> > 14.152 ms >> > >> > The compression obtained is quite high close to 35 %. >> > CPU usage at user level when compression is on is quite noticeably high >> > as >> > compared to that when compression is off. But gain in terms of reduction >> > of WAL >> > is also high. >> >> I am sorry but I can't understand the above results due to wrapping. >> Are you saying compression was twice as slow? > > > I got curious to see how the compression of an entire record would perform > and how it compares for small WAL records, and here are some numbers based > on the patch attached, this patch compresses the whole record including the > block headers, letting only XLogRecord out of it with a flag indicating that > the record is compressed (note that this patch contains a portion for replay > untested, still this patch gives an idea on how much compression of the > whole record affects user CPU in this test case). It uses a buffer of 4 * > BLCKSZ, if the record is longer than that compression is simply given up. > Those tests are using the hack upthread calculating user and system CPU > using getrusage() when a backend. > > Here is the simple test case I used with 512MB of shared_buffers and small > records, filling up a bunch of buffers, dirtying them and them compressing > FPWs with a checkpoint. > #!/bin/bash > psql < SELECT pg_backend_pid(); > CREATE TABLE aa (a int); > CREATE TABLE results (phase text, position pg_lsn); > CREATE EXTENSION IF NOT EXISTS pg_prewarm; > ALTER TABLE aa SET (FILLFACTOR = 50); > INSERT INTO results VALUES ('pre-insert', pg_current_xlog_location()); > INSERT INTO aa VALUES (generate_series(1,700)); -- 484MB > SELECT pg_size_pretty(pg_relation_size('aa'::regclass)); > SELECT pg_prewarm('aa'::regclass); > CHECKPOINT; > INSERT INTO results VALUES ('pre-update', pg_current_xlog_location()); > UPDATE aa SET a = 700 + a; > CHECKPOINT; > INSERT INTO results VALUES ('post-update', pg_current_xlog_location()); > SELECT * FROM results; > EOF Re-using this test case, I have produced more results by changing the fillfactor of the table: =# select test || ', ffactor ' || ffactor, pg_size_pretty(post_update - pre_update), user_diff, system_diff from results; ?column?| pg_size_pretty | user_diff | system_diff ---++---+- FPW on + 2 bytes, ffactor 50 | 582 MB | 42.391894 |0.807444 FPW on + 2 bytes, ffactor 20 | 229 MB | 14.330304 |0.729626 FPW on + 2 bytes, ffactor 10 | 117 MB | 7.335442 |0.570996 FPW off + 2 bytes, ffactor 50 | 746 MB | 25.330391 |1.248503 FPW off + 2 bytes, ffactor 20 | 293 MB | 10.537475 |0.755448 FPW off + 2 bytes, ffactor 10 | 148 MB | 5.762775 |0.763761 HEAD, ffactor 50 | 746 MB | 25.181729 |1.133433 HEAD, ffactor 20 | 293 MB | 9.962242 |0.765970 HEAD, ffactor 10 | 148 MB | 5.693426 |0.775371 Record, ffactor 50| 582 MB | 54.904374 |0.678204 Record, ffactor 20| 229 MB | 19.798268 |0.807220 Record, ffactor 10| 116 MB | 9.401877 |0.668454 (12 rows) The following tests are run: - "Record" means the record-level compression - "HEAD" is postgres at 1c5c70df - "FPW off" is HEAD + patch with switch set to off - "FPW on" is HEAD + patch with switch set to on The gain in compression has a linear profile with the length of page hole. There was visibly some noise in the tests: you can see that the CPU of "FPW off" is a bit higher than HEAD. Something to be aware of btw is that this patch introduces an additional 8 bytes per block image in WAL as it contains additional information to control the compression. In this case this is the uint16 compress_len present in XLogRecordBlockImageHeader. In the case of the measurements done, knowing that 63638 FPWs have been written, there is a difference of a bit less than 500k in WAL between HEAD and "FPW off" in favor of HEAD. The gain with compression is welcome, still for the default there is a small price to track down if a block is compressed or not. T
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 12 December 2014 at 21:40, Robert Haas wrote: > On Fri, Dec 12, 2014 at 1:51 PM, Simon Riggs wrote: >> What I don't understand is why we aren't working on double buffering, >> since that cost would be paid in a background process and would be >> evenly spread out across a checkpoint. Plus we'd be able to remove >> FPWs altogether, which is like 100% compression. > > The previous patch to implement that - by somebody at vmware - was an > epic fail. I'm not opposed to seeing somebody try again, but it's a > tricky problem. When the double buffer fills up, then you've got to > finish flushing the pages whose images are stored in the buffer to > disk before you can overwrite it, which acts like a kind of > mini-checkpoint. That problem might be solvable, but let's use this > thread to discuss this patch, not some other patch that someone might > have chosen to write but didn't. No, I think its relevant. WAL compression looks to me like a short term tweak, not the end game. On that basis, we should go for simple and effective, user-settable compression of FPWs and not spend too much Valuable Committer Time on it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 12, 2014 at 7:25 PM, Michael Paquier wrote: > On Sat, Dec 13, 2014 at 1:08 AM, Robert Haas wrote: >> On Fri, Dec 12, 2014 at 10:04 AM, Andres Freund wrote: Note that autovacuum and fsync are off. =# select phase, user_diff, system_diff, pg_size_pretty(pre_update - pre_insert), pg_size_pretty(post_update - pre_update) from results; phase| user_diff | system_diff | pg_size_pretty | pg_size_pretty +---+-++ Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB No compression | 25.688731 |1.236551 | 429 MB | 727 MB Compression record | 56.376750 |0.769603 | 429 MB | 566 MB (3 rows) If we do record-level compression, we'll need to be very careful in defining a lower-bound to not eat unnecessary CPU resources, perhaps something that should be controlled with a GUC. I presume that this stands true as well for the upper bound. >>> >>> Record level compression pretty obviously would need a lower boundary >>> for when to use compression. It won't be useful for small heapam/btree >>> records, but it'll be rather useful for large multi_insert, clean or >>> similar records... >> >> Unless I'm missing something, this test is showing that FPW >> compression saves 298MB of WAL for 17.3 seconds of CPU time, as >> against master. And compressing the whole record saves a further 1MB >> of WAL for a further 13.39 seconds of CPU time. That makes >> compressing the whole record sound like a pretty terrible idea - even >> if you get more benefit by reducing the lower boundary, you're still >> burning a ton of extra CPU time for almost no gain on the larger >> records. Ouch! >> >> (Of course, I'm assuming that Michael's patch is reasonably efficient, >> which might not be true.) > Note that I was curious about the worst-case ever, aka how much CPU > pg_lzcompress would use if everything is compressed, even the smallest > records. So we'll surely need a lower-bound. I think that doing some > tests with a lower bound set as a multiple of SizeOfXLogRecord would > be fine, but in this case what we'll see is a result similar to what > FPW compression does. In general, lz4 (and pg_lz is similar to lz4) compresses very poorly anything below about 128b in length. Of course there are outliers, with some very compressible stuff, but with regular text or JSON data, it's quite unlikely to compress at all with smaller input. Compression is modest up to about 1k when it starts to really pay off. That's at least my experience with lots JSON-ish, text-ish and CSV data sets, compressible but not so much in small bits. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sat, Dec 13, 2014 at 1:08 AM, Robert Haas wrote: > On Fri, Dec 12, 2014 at 10:04 AM, Andres Freund wrote: >>> Note that autovacuum and fsync are off. >>> =# select phase, user_diff, system_diff, >>> pg_size_pretty(pre_update - pre_insert), >>> pg_size_pretty(post_update - pre_update) from results; >>>phase| user_diff | system_diff | pg_size_pretty | >>> pg_size_pretty >>> +---+-++ >>> Compression FPW| 42.990799 |0.868179 | 429 MB | 567 MB >>> No compression | 25.688731 |1.236551 | 429 MB | 727 MB >>> Compression record | 56.376750 |0.769603 | 429 MB | 566 MB >>> (3 rows) >>> If we do record-level compression, we'll need to be very careful in >>> defining a lower-bound to not eat unnecessary CPU resources, perhaps >>> something that should be controlled with a GUC. I presume that this stands >>> true as well for the upper bound. >> >> Record level compression pretty obviously would need a lower boundary >> for when to use compression. It won't be useful for small heapam/btree >> records, but it'll be rather useful for large multi_insert, clean or >> similar records... > > Unless I'm missing something, this test is showing that FPW > compression saves 298MB of WAL for 17.3 seconds of CPU time, as > against master. And compressing the whole record saves a further 1MB > of WAL for a further 13.39 seconds of CPU time. That makes > compressing the whole record sound like a pretty terrible idea - even > if you get more benefit by reducing the lower boundary, you're still > burning a ton of extra CPU time for almost no gain on the larger > records. Ouch! > > (Of course, I'm assuming that Michael's patch is reasonably efficient, > which might not be true.) Note that I was curious about the worst-case ever, aka how much CPU pg_lzcompress would use if everything is compressed, even the smallest records. So we'll surely need a lower-bound. I think that doing some tests with a lower bound set as a multiple of SizeOfXLogRecord would be fine, but in this case what we'll see is a result similar to what FPW compression does. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 12, 2014 at 1:51 PM, Simon Riggs wrote: > What I don't understand is why we aren't working on double buffering, > since that cost would be paid in a background process and would be > evenly spread out across a checkpoint. Plus we'd be able to remove > FPWs altogether, which is like 100% compression. The previous patch to implement that - by somebody at vmware - was an epic fail. I'm not opposed to seeing somebody try again, but it's a tricky problem. When the double buffer fills up, then you've got to finish flushing the pages whose images are stored in the buffer to disk before you can overwrite it, which acts like a kind of mini-checkpoint. That problem might be solvable, but let's use this thread to discuss this patch, not some other patch that someone might have chosen to write but didn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 12 December 2014 at 18:04, Bruce Momjian wrote: > Well, it seems we need to see some actual cases where compression does > help before moving forward. I thought Amit had some amazing numbers for > WAL compression --- has that changed? For background processes, like VACUUM, then WAL compression will be helpful. The numbers show that only applies to FPWs. I remain concerned about the cost in foreground processes, especially since the cost will be paid immediately after checkpoint, making our spikes worse. What I don't understand is why we aren't working on double buffering, since that cost would be paid in a background process and would be evenly spread out across a checkpoint. Plus we'd be able to remove FPWs altogether, which is like 100% compression. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 12, 2014 at 05:19:42PM +0100, Andres Freund wrote: > On 2014-12-12 11:15:46 -0500, Robert Haas wrote: > > On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund wrote: > > > On 2014-12-12 11:08:52 -0500, Robert Haas wrote: > > >> Unless I'm missing something, this test is showing that FPW > > >> compression saves 298MB of WAL for 17.3 seconds of CPU time, as > > >> against master. And compressing the whole record saves a further 1MB > > >> of WAL for a further 13.39 seconds of CPU time. That makes > > >> compressing the whole record sound like a pretty terrible idea - even > > >> if you get more benefit by reducing the lower boundary, you're still > > >> burning a ton of extra CPU time for almost no gain on the larger > > >> records. Ouch! > > > > > > Well, that test pretty much doesn't have any large records besides FPWs > > > afaics. So it's unsurprising that it's not beneficial. > > > > "Not beneficial" is rather an understatement. It's actively harmful, > > and not by a small margin. > > Sure, but that's just because it's too simplistic. I don't think it > makes sense to make any inference about the worthyness of the general > approach from the, nearly obvious, fact that compressing every tiny > record is a bad idea. Well, it seems we need to see some actual cases where compression does help before moving forward. I thought Amit had some amazing numbers for WAL compression --- has that changed? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2014-12-12 11:15:46 -0500, Robert Haas wrote: > On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund wrote: > > On 2014-12-12 11:08:52 -0500, Robert Haas wrote: > >> Unless I'm missing something, this test is showing that FPW > >> compression saves 298MB of WAL for 17.3 seconds of CPU time, as > >> against master. And compressing the whole record saves a further 1MB > >> of WAL for a further 13.39 seconds of CPU time. That makes > >> compressing the whole record sound like a pretty terrible idea - even > >> if you get more benefit by reducing the lower boundary, you're still > >> burning a ton of extra CPU time for almost no gain on the larger > >> records. Ouch! > > > > Well, that test pretty much doesn't have any large records besides FPWs > > afaics. So it's unsurprising that it's not beneficial. > > "Not beneficial" is rather an understatement. It's actively harmful, > and not by a small margin. Sure, but that's just because it's too simplistic. I don't think it makes sense to make any inference about the worthyness of the general approach from the, nearly obvious, fact that compressing every tiny record is a bad idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 12, 2014 at 11:12 AM, Andres Freund wrote: > On 2014-12-12 11:08:52 -0500, Robert Haas wrote: >> Unless I'm missing something, this test is showing that FPW >> compression saves 298MB of WAL for 17.3 seconds of CPU time, as >> against master. And compressing the whole record saves a further 1MB >> of WAL for a further 13.39 seconds of CPU time. That makes >> compressing the whole record sound like a pretty terrible idea - even >> if you get more benefit by reducing the lower boundary, you're still >> burning a ton of extra CPU time for almost no gain on the larger >> records. Ouch! > > Well, that test pretty much doesn't have any large records besides FPWs > afaics. So it's unsurprising that it's not beneficial. "Not beneficial" is rather an understatement. It's actively harmful, and not by a small margin. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2014-12-12 11:08:52 -0500, Robert Haas wrote: > Unless I'm missing something, this test is showing that FPW > compression saves 298MB of WAL for 17.3 seconds of CPU time, as > against master. And compressing the whole record saves a further 1MB > of WAL for a further 13.39 seconds of CPU time. That makes > compressing the whole record sound like a pretty terrible idea - even > if you get more benefit by reducing the lower boundary, you're still > burning a ton of extra CPU time for almost no gain on the larger > records. Ouch! Well, that test pretty much doesn't have any large records besides FPWs afaics. So it's unsurprising that it's not beneficial. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers