AFAICR with xlog-triggered checkpoints, the checkpointer progress is
measured with respect to the size of the WAL file, which does not grow
linearly in time for the reason you pointed above (a lot of FPW at the
beginning, less in the end). As the WAL file is growing quickly, the
checkpointer thinks that it is late and that it has some catchup to do, so
it will start to try writing quickly as well. There is a double whammy as
both are writing more, and are probably not succeeding.

For time triggered checkpoints, the WAL file gets filled up *but* the
checkpointer load is balanced against time. This is a "simple" whammy, where
the checkpointer uses IO bandwith which is needed for the WAL, and it could
wait a little bit because the WAL will need less later, but it is not trying
to catch up by even writing more, so the load shifting needed in this case
is not the same as the previous case.

I see your point, but this isn't a function of what triggered the
checkpoint.  It's a function of how we measure whether the
already-triggered checkpoint is on schedule - we may be behind either
because of time, or because of xlog, or both.

Yes. Indeed the current implementation does some kind of both time & xlog.

My reasonning was that for time triggered checkpoints (probably average to low load) the time is likely to be used for the checkpoint schedule, while for xlog-triggered checkpoints (probably higher load) it would be more likely to be the xlog, which is skewed.

Anyway careful thinking is needed to balance WAL and checkpointer IOs, only when needed, not a rough formula applied blindly.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to