On 06.06.2013 11:42, Joshua D. Drake wrote:
On 6/6/2013 1:11 AM, Heikki Linnakangas wrote:
Yes checkpoint_segments is awkward. We shouldn't have to set it at all.
It should be gone.
The point of having checkpoint_segments or max_wal_size is to put a
limit (albeit a soft one) on the amount of disk space used. If you
don't care about that, I guess we could allow max_wal_size=-1 to mean
infinite, and checkpoints would be driven off purely based on time,
not WAL consumption.
I would not only agree with that, I would argue that max_wal_size
doesn't need to be there at least as a default. Perhaps as an "advanced"
configuration option that only those in the know see.
Well, we have checkpoint_segments=3 as the default currently, which in
the proposed scheme would be about equal to max_wal_size=120MB. For
better or worse, our defaults are generally geared towards small
systems, and that sounds about right for that.
Basically we start with X amount perhaps to be set at
initdb time. That X amount changes dynamically based on the amount of
data being written. In order to not suffer from recycling and creation
penalties we always keep X+N where N is enough to keep up with new data.
To clarify, here you're referring to controlling the number of WAL
segments preallocated/recycled, rather than how often checkpoints are
triggered. Currently, both are derived from checkpoint_segments, but I
proposed to separate them. The above is exactly what I proposed to do
for the preallocation/recycling, it would be tuned automatically, but
you still need something like max_wal_size for the other thing, to
trigger a checkpoint if too much WAL is being consumed.
You think so? I agree with 90% of this paragraph but it seems to me that
we can find an algortihm that manages this without the idea of
max_wal_size (at least as a user settable).
We are in a violent agreement :-). max_wal_size would not directly
affect the preallocation of segments. The preallocation would be driven
off the actual number of segments used in previous checkpoint cycles,
not on max_wal_size.
Now, max_wal_size would affect when checkpoints happen (ie. if you're
about to reach max_wal_size, a checkpoint would be triggered), which
would in turn affect the number of segments used between cycles. But
there would be no direct connection between the two; the code to
calculate how much to preallocate would not refer to max_wal_size.
Maybe max_wal_size should set an upper limit on how much to preallocate,
though. If you want to limit the WAL size, we probably shouldn't exceed
it on purpose by preallocating segments, even if the algorithm based on
previous cycles suggests says we should. This situation would arise if
the checkpoints can't keep up, so that each checkpoint cycle is longer
than we'd want, and we'd exceed max_wal_size because of that.
This makes sense except I don't see a need for the parameter. Why not
just specify how the algorithm works and adhere to that without the need
for another GUC?
Because you want to limit the amount of disk space used for WAL. It's
a soft limit, but still.
Why? This is the point that confuses me. Why do we care? We don't care
how much disk space PGDATA takes... why do we all of a sudden care about
pg_xlog?
Hmm, dunno. We always have had checkpoint_segments setting to limit
that, I was just thinking of retaining that functionality.
A few reasons spring to mind: First, running out of WAL space leads to a
PANIC, which is not nice (I know, we talked about fixing that).
Secondly, because we can. If a user inserts 10 GB of data into a table,
we'll have to just store it, but with WAL, we can always issue a
checkpoint to shrink it. People have asked for quotas for user data too,
so some people do want to limit disk usage.
Mind you, it's possible to have a tiny database with a high TPS rate,
such that the WAL grows really big compared to the size of the user
data. Something with a small hot table that's updated a lot. In such a
scenario, limiting the WAL size make sense, and it won't affect
performance much either because checkpointing a small database is very
cheap.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers