Re: [HACKERS] Redesigning checkpoint_segments

Heikki Linnakangas Thu, 06 Jun 2013 03:06:55 -0700

On 06.06.2013 11:42, Joshua D. Drake wrote:

On 6/6/2013 1:11 AM, Heikki Linnakangas wrote:

Yes checkpoint_segments is awkward. We shouldn't have to set it at all.
It should be gone.


The point of having checkpoint_segments or max_wal_size is to put a
limit (albeit a soft one) on the amount of disk space used. If you
don't care about that, I guess we could allow max_wal_size=-1 to mean
infinite, and checkpoints would be driven off purely based on time,
not WAL consumption.


I would not only agree with that, I would argue that max_wal_size
doesn't need to be there at least as a default. Perhaps as an "advanced"
configuration option that only those in the know see.

Well, we have checkpoint_segments=3 as the default currently, which inthe proposed scheme would be about equal to max_wal_size=120MB. Forbetter or worse, our defaults are generally geared towards smallsystems, and that sounds about right for that.

Basically we start with X amount perhaps to be set at
initdb time. That X amount changes dynamically based on the amount of
data being written. In order to not suffer from recycling and creation
penalties we always keep X+N where N is enough to keep up with new data.


To clarify, here you're referring to controlling the number of WAL
segments preallocated/recycled, rather than how often checkpoints are
triggered. Currently, both are derived from checkpoint_segments, but I
proposed to separate them. The above is exactly what I proposed to do
for the preallocation/recycling, it would be tuned automatically, but
you still need something like max_wal_size for the other thing, to
trigger a checkpoint if too much WAL is being consumed.


You think so? I agree with 90% of this paragraph but it seems to me that
we can find an algortihm that manages this without the idea of
max_wal_size (at least as a user settable).

We are in a violent agreement :-). max_wal_size would not directlyaffect the preallocation of segments. The preallocation would be drivenoff the actual number of segments used in previous checkpoint cycles,not on max_wal_size.

Now, max_wal_size would affect when checkpoints happen (ie. if you'reabout to reach max_wal_size, a checkpoint would be triggered), whichwould in turn affect the number of segments used between cycles. Butthere would be no direct connection between the two; the code tocalculate how much to preallocate would not refer to max_wal_size.

Maybe max_wal_size should set an upper limit on how much to preallocate,though. If you want to limit the WAL size, we probably shouldn't exceedit on purpose by preallocating segments, even if the algorithm based onprevious cycles suggests says we should. This situation would arise ifthe checkpoints can't keep up, so that each checkpoint cycle is longerthan we'd want, and we'd exceed max_wal_size because of that.

This makes sense except I don't see a need for the parameter. Why not
just specify how the algorithm works and adhere to that without the need
for another GUC?


Because you want to limit the amount of disk space used for WAL. It's
a soft limit, but still.


Why? This is the point that confuses me. Why do we care? We don't care
how much disk space PGDATA takes... why do we all of a sudden care about
pg_xlog?

Hmm, dunno. We always have had checkpoint_segments setting to limitthat, I was just thinking of retaining that functionality.

A few reasons spring to mind: First, running out of WAL space leads to aPANIC, which is not nice (I know, we talked about fixing that).Secondly, because we can. If a user inserts 10 GB of data into a table,we'll have to just store it, but with WAL, we can always issue acheckpoint to shrink it. People have asked for quotas for user data too,so some people do want to limit disk usage.

Mind you, it's possible to have a tiny database with a high TPS rate,such that the WAL grows really big compared to the size of the userdata. Something with a small hot table that's updated a lot. In such ascenario, limiting the WAL size make sense, and it won't affectperformance much either because checkpointing a small database is verycheap.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

Reply via email to