A colleague of mine brought to my attention that pg_rewind is not crash safe. If it is interrupted for any reason, it leaves behind a data directory with a mix of data from the source and target images. If you're "lucky", the server will start up, but it can be in an inconsistent state. That's obviously not good. It would be nice to:

1. Detect the situation, and refuse to start up.

Or even better:

2. Make pg_rewind crash safe, so that you could safely restart it if it's interrupted.

Has anyone else run into this? How did you work around it?

It doesn't seem hard to detect this. pg_rewind can somehow "poison" the data directory just before it starts making irreversible changes. I'm thinking of updating the 'state' in the control file to a new PG_IN_REWIND value.

It also doesn't seem too hard to make it restartable. As long as you point it to the same source server, it is already almost safe to run pg_rewind again. If we re-order the way it writes the control or backup files and makes other changes, pg_rewind can verify that you pointed it at the same or compatible primary as before.

I think there's one corner case with truncated files, if pg_rewind has extended a file by copying missing "tail" from the source system, but the system crashes before it's fsynced to disk. But I think we can fix that too, by paying attention to SMGR_TRUNCATE records when scanning the source WAL.

- Heikki


Reply via email to