A colleague of mine brought to my attention that pg_rewind is not crash
safe. If it is interrupted for any reason, it leaves behind a data
directory with a mix of data from the source and target images. If
you're "lucky", the server will start up, but it can be in an
inconsistent state. That's obviously not good. It would be nice to:
1. Detect the situation, and refuse to start up.
Or even better:
2. Make pg_rewind crash safe, so that you could safely restart it if
it's interrupted.
Has anyone else run into this? How did you work around it?
It doesn't seem hard to detect this. pg_rewind can somehow "poison" the
data directory just before it starts making irreversible changes. I'm
thinking of updating the 'state' in the control file to a new
PG_IN_REWIND value.
It also doesn't seem too hard to make it restartable. As long as you
point it to the same source server, it is already almost safe to run
pg_rewind again. If we re-order the way it writes the control or backup
files and makes other changes, pg_rewind can verify that you pointed it
at the same or compatible primary as before.
I think there's one corner case with truncated files, if pg_rewind has
extended a file by copying missing "tail" from the source system, but
the system crashes before it's fsynced to disk. But I think we can fix
that too, by paying attention to SMGR_TRUNCATE records when scanning the
source WAL.
- Heikki