On Wed, Oct 15, 2014 at 1:11 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge <guilla...@lelarge.info > > wrote: > >> Hi, >> >> As part of our monitoring work for our customers, we stumbled upon an >> issue with our customers' servers who have a wal_keep_segments setting >> higher than 0. >> >> We have a monitoring script that checks the number of WAL files in the >> pg_xlog directory, according to the setting of three parameters >> (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments). >> We usually add a percentage to the usual formula: >> >> greatest( >> (2 + checkpoint_completion_target) * checkpoint_segments + 1, >> checkpoint_segments + wal_keep_segments + 1 >> ) >> > > I think the first bug is even having this formula in the documentation to > start with, and in trying to use it. > > "and will normally not be more than..." > > This may be "normal" for a toy system. I think that the normal state for > any system worth monitoring is that it has had load spikes at some point in > the past. > > So it is the next part of the doc, which describes how many segments it > climbs back down to upon recovering from a spike, which is the important > one. And that doesn't mention wal_keep_segments at all, which surely > cannot be correct. > > I will try to independently derive the correct formula from the code, as > you did, without looking too much at your derivation first, and see if we > get the same answer. > It looked to me that the formula, when descending from a previously stressed state, would be: greatest(1 + checkpoint_completion_target) * checkpoint_segments, wal_keep_segments) + 1 + 2 * checkpoint_segments + 1 This assumes logs are filled evenly over a checkpoint cycle, which is probably not true because there is a spike in full page writes right after a checkpoint starts. But I didn't have a great deal of confidence in my analysis. The first line reflects the number of WAL that will be retained as-is, the second is the number that will be recycled for future use before starting to delete them. My reading of the code is that wal_keep_segments is computed from the current end of WAL (i.e the checkpoint record), not from the checkpoint redo point. If I distribute the part outside the 'greatest' into both branches of the 'greatest', I don't get the same answer as you do for either branch. Then I started wondering if the number we keep for recycling is a good choice, anyway. 2 * checkpoint_segments + 1 seems pretty large. But then again, given that we've reached the high-water-mark once, how unlikely are we to reach it again? Cheers, Jeff