On Fri, Jun 8, 2012 at 12:24 PM, Kevin Grittner <kevin.gritt...@wicourts.gov> wrote: > I haven't been exactly clear on the risks about which Tom and Robert > have been concerned; is it a question about whether we change the > meaning of these settings to something more complicated?: > > checkpoint_segments (integer) > Maximum number of log file segments between automatic WAL > checkpoints > > checkpoint_timeout (integer) > Maximum time between automatic WAL checkpoints
The issue is that, in the tip of the 9.2 branch, checkpoint_timeout is no longer the maximum time between automatic WAL checkpoints. Instead, the checkpoint is skipped if we're still in the same WAL segment that we were in when we did the last checkpoint. Therefore, there is absolutely no upper bound on the amount of time that can pass between checkpoints. If someone does one transaction, which happens not to cross a WAL segment boundary, we will never automatically checkpoint that transaction. A checkpoint will ONLY be triggered when we have enough write-ahead log volume to get us into the next segment. I am arguing (and Tom is now agreeing) that this is bad, and that the patch which made this change needs either some kind of fix, or to be reverted completely. The original motivation for the patch was that the code to suppress duplicate checkpoints stopped working correctly when Hot Standby was committed. The previous coding (before the commit at issue) skips a checkpoint if no write-ahead log records at all have been emitted since the start of the preceding checkpoint. I believe this is the correct behavior, but there's a problem: when wal_level = hot_standby, we emit an XLOG_RUNNING_XACTS record during every checkpoint cycle. So, if wal_level = hot_standby, the test for whether anything has happened always returns false, and so the system never quiesces: every checkpoint cycle contains at least the XLOG_RUNNING_XACTS record, even if nothing else, so we never get to skip any checkpoints. When wal_level < hot_standby, the problem does not exist and redundant checkpoints are suppressed just as we would hope. While Simon's patch does fix the problem, I believe that making checkpoint_timeout anything less than a hard timeout is unwise. The previous behavior - at least one checkpoint per checkpoint_timeout - is easy to understand and plan for; I believe the new behavior will be an unpleasant surprise for users who care about checkpointing regularly, which I think most do, whether they are here to be represented in this conversation or not. So I think we need a different fix for the problem that wal_level = hot_standby defeats the redundant-checkpoint-detection code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers