On 04/17/2015 12:04 PM, Simon Riggs wrote:
On 17 April 2015 at 09:54, Andres Freund <and...@anarazel.de> wrote:

Hrmpf. Says the person that used a lot of padding, without much
discussion, for the WAL level infrastructure making pg_rewind more
maintainable.

Sounds bad. What padding are we talking about?

In the new WAL format, the data chunks are stored unaligned, without padding, to save space. The new format is quite different to the old one, so it's not straightforward to compare how much that saved. The fixed-size XLogRecord header is 8 bytes shorter in the new format, because it doesn't have the xl_len field anymore. But the same information is stored elsewhere in the record, where it takes 2 or 5 bytes (XLogRecordDataHeaderShort/Long).

But it's a fair point that we could've just made small adjustments to the old format, without revamping every record type and the way the block information is stored, and that the space saving of the new format should be compared with that instead, for a fair comparison.

As an example, one simple thing we could've done with the old format: remove xl_len, and store the length in place of the two unused padding bytes instead, as long as it fits in 16 bits. For longer records, set a flag and store it right after XLogRecord header. For practically all WAL records, that would've shrunk XLogRecord from 32 to 24 bytes, and made each record 8 bytes shorter.

I ran the same pgbench test Andres used, with scale 10, and 50000 transactions, and compared the WAL size between master and 9.4:

master: 20738352
9.4: 23915800

According to pg_xlogdump, there were 301153 WAL records. If you take the 9.4 figure, and imagine that we had saved those 8 bytes on each WAL record, 9.4 would've been 21506576 bytes instead. So yeah, we could've achieved much of the WAL savings with that much smaller change. That's a useful thing to compare with.

BTW, those numbers are with wal_level=minimal. With wal_level=logical, the WAL size from the same test on master was 26503520 bytes. That's quite a bump. Looking at pg_xlogdump output, it seems that it's all because the commit records are wider.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to