On Mon, Oct 13, 2014 at 12:11 PM, Bruce Momjian <br...@momjian.us> wrote:
> > I looked into this, and came up with more questions. Why is > checkpoint_completion_target involved in the total number of WAL > segments? If checkpoint_completion_target is 0.5 (the default), the > calculation is: > > (2 + 0.5) * checkpoint_segments + 1 > > while if it is 0.9, it is: > > (2 + 0.9) * checkpoint_segments + 1 > > Is this trying to estimate how many WAL files are going to be created > during the checkpoint? If so, wouldn't it be (1 + > checkpoint_completion_target), not "2 +". My logic is you have the old > WAL files being checkpointed (that's the "1"), plus you have new WAL > files being created during the checkpoint, which would be > checkpoint_completion_target * checkpoint_segments, plus one for the > current WAL file. > WAL is not eligible to be recycled until there have been 2 successful checkpoints. So at the end of a checkpoint, you have 1 cycle of WAL which has just become eligible for recycling, 1 cycle of WAL which is now expendable but which is kept anyway, and checkpoint_completion_target worth of WAL which has occurred while the checkpoint was occurring and is still needed for crash recovery. I don't really understand the point of this way of doing things. I guess it is because the control file contains two redo pointers, one for the last checkpoint, and one for the previous to that checkpoint, and if recovery finds that it can't use the most recent one it tries the ones before that. Why? Beats me. If we are worried about the control file getting a corrupt redo pointer, it seems like we would record the last one twice, rather than recording two different ones once each. And if the in-memory version got corrupted before being written to the file, I really doubt anything is going to save your bacon at that point. I've never seen a case where recovery couldn't use the last recorded good checkpoint, so instead used the previous one, and was successful at it. But then again I haven't seen all possible crashes. This is based on memory from the last time I looked into this, I haven't re-verified it so could be wrong or obsolete. Cheers, Jeff