Hi hackers! I propose a slight change to WAL compression: compress body of big records, if it's bigger than some threshold.
===Rationale===
0. Better compression ratio for full page images when pages are compressed
together.
Consider following test:
set wal_compression to 'zstd';
create table a as select random() from generate_series(1,1e7);
create index on a(random ); -- warmup to avoid FPI for hint on the heap
select pg_stat_reset_shared('wal'); create index on a(random ); select
pg_size_pretty(wal_bytes) from pg_stat_wal;
B-tree index will emit 97Mb of WAL instead of 125Mb when FPIs are compressed
independently.
1. Compression of big records, that are not FPI. E.g. 2-pc records might be big
enough to cross a threshold.
2. This might be a path to full WAL compression. In future I plan to propose a
compression context: retaining compression dictionary between records.
Obviously, the context cannot cross checkpoint borders. And a pool of contexts
would be needed to fully utilize efficiency of compression codecs. Anyway -
it's too early to theorize.
===Propotype===
I attach a prototype patch. It is functional, but some world tests fail.
Probably, because they expect to generate more WAL without putting too much of
entropy. Or, perhaps, I missed some bugs. In present version WAL_DEBUG does not
indicate any problems. But a lot of quality assurance and commenting work is
needed. It's a prototype.
To indicate that WAL record is compressed I use a bit in record->xl_info
(XLR_COMPRESSED == 0x04). I found no places that use this bit...
If the record is compressed, record header is continued with information about
compression: codec byte and uint32 of uncompressed xl_tot_len.
Currently, compression is done on StringInfo buffers, that are expanded before
actual WALInsert() happens. If palloc() is needed during critical section, the
compression is canceled. I do not like memory accounting before WALInsert,
probably, something clever can be done about it.
WAL_DEBUG and wal_compression are enabled for debugging purposes. Of course, I
do not propose to turn them on by default.
What do you think? Does this approach seem viable?
Best regards, Andrey Borodin.
v0-0001-Compress-big-WAL-records.patch
Description: Binary data
