Hi hackers,

I raised this topic a while back [1] but didn't get much traction, so
I went ahead and implemented it: a doublewrite buffer (DWB) mechanism
for PostgreSQL as an alternative to full_page_writes.

The core argument is straightforward. FPW and checkpoint frequency are
fundamentally at odds:

- FPW wants fewer checkpoints -- each checkpoint triggers a wave of
full-page WAL writes for every page dirtied for the first time,
bloating WAL and tanking write throughput.
- Fast crash recovery wants more checkpoints -- less WAL to replay
means the database comes back sooner.

DWB resolves this tension by moving torn page protection out of the
WAL path entirely. Instead of writing full pages into WAL (foreground,
latency-sensitive), dirty pages are sequentially written to a
dedicated doublewrite buffer area on disk before being flushed to
their actual locations. The buffer is fsync'd once when full, then
pages are scatter-written to their final positions. On crash recovery,
intact copies from the DWB repair any torn pages.

Key design differences:

- FPW: 1 WAL write (foreground) + 1 page write = directly impacts SQL latency
- DWB: 2 page writes (background flush path) = minimal user-visible impact
- DWB batches fsync() across multiple pages; WAL fsync batching is
limited by foreground latency constraints
- DWB decouples torn page protection from checkpoint frequency, so you
can checkpoint as often as you want without write amplification

I ran sysbench benchmarks (io-bound, --tables=10
--table_size=10000000) with checkpoint_timeout=30s,
shared_buffers=4GB, synchronous_commit=on. Each scenario uses a fresh
database, VACUUM FULL, 60s warmup, 300s run.

Results (TPS):

                     FPW OFF    FPW ON     DWB ON
read_write/32        18,038      7,943     13,009
read_write/64        24,249      9,533     15,387
read_write/128       27,801      9,715     15,387
write_only/32        53,146     18,116     31,460
write_only/64        57,628     19,589     32,875
write_only/128       59,454     14,857     33,814

Avg latency (ms):

                     FPW OFF    FPW ON     DWB ON
read_write/32          1.77       4.03       2.46
read_write/64          2.64       6.71       4.16
read_write/128         4.60      13.17       9.81
write_only/32          0.60       1.77       1.02
write_only/64          1.11       3.27       1.95
write_only/128         2.15       8.61       3.78

FPW ON drops to ~25% of baseline (FPW OFF). DWB ON holds at ~57%. In
write-heavy scenarios DWB delivers over 2x the throughput of FPW with
significantly better latency.

The implementation is here: https://github.com/baotiao/postgres

I'd appreciate any feedback on the approach. Would be great if the
community could take a look and see if this direction is worth
pursuing upstream.

Thanks,
Baotiao

[1] 
https://www.postgresql.org/message-id/CAGbZs7hbJeUe7xY4QD25QW6VSnNFk1e3cwbCa8_R%2B2%2BYnoYRKw%40mail.gmail.com


Reply via email to