Alvaro Herrera wrote:
Heikki Linnakangas wrote:

- The signaling between RequestCheckpoint and bgwriter is a bit tricky. Bgwriter now needs to deal immediate checkpoint requests, like those coming from explicit CHECKPOINT or CREATE DATABASE commands, differently from those triggered by checkpoint_segments. I'm afraid there might be race conditions when a CHECKPOINT is issued at the same instant as checkpoint_segments triggers one. What might happen then is that the checkpoint is performed lazily, spreading the writes, and the CHECKPOINT command has to wait for that to finish which might take a long time. I have not been able to convince myself neither that the race condition exists or that it doesn't.

Isn't it just a matter of having a flag to tell whether the checkpoint
should be quick or spread out, and have a command set the flag if a
checkpoint is already running?

Hmm. Thinking about this some more, the core problem is that when starting the checkpoint, bgwriter needs to read and clear the flag. Which is not atomic, as the patch stands.

I think we already have a race condition with ckpt_time_warn. The code to test and clear the flag does this:

        if (BgWriterShmem->ckpt_time_warn && elapsed_secs < CheckPointWarning)
                ereport(LOG,
                                (errmsg("checkpoints are occurring too frequently 
(%d seconds apart)",
                                                elapsed_secs),
                                 errhint("Consider increasing the configuration parameter 
\"checkpoint_segments\".")));
        BgWriterShmem->ckpt_time_warn = false;

In the extremely unlikely event that RequestCheckpoint sets ckpt_time_warn right before it's cleared, after the test in the if-statement, the warning is missed. That's a very harmless and theoretical event, you'd have to run CHECKPOINT (or another command that triggers a checkpoint) at the same instant that an xlog switch triggers one, and all that happens is that you don't get the message in the log while you should. So this is not something to worry about in this case, but it would be more severe if we had the same problem in deciding if a checkpoint should be spread out or not.

I think we just have to protect those signaling flags with a lock. It's not like it's on a critical path, and though we don't know what locks the callers to RequestCheckpoint hold, as long as we don't acquire any other locks while holding the new proposed lock, there's no danger of deadlocks.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to