Re: [HACKERS] Redesigning checkpoint_segments

Joshua D. Drake Wed, 05 Jun 2013 20:22:09 -0700


On 06/05/2013 05:37 PM, Robert Haas wrote:

- If it looks like we're going to exceed limit #3 before the
checkpoint completes, we start exerting back-pressure on writers by
making them wait every time they write WAL, probably in proportion to
the number of bytes written.  We keep ratcheting up the wait until
we've slowed down writers enough that will finish within limit #3.  As
we reach limit #3, the wait goes to infinity; only read-only
operations can proceed until the checkpoint finishes.

Alright, perhaps I am dense. I have read both this thread and the otherone on better handling of archive command(http://www.postgresql.org/message-id/cam3swzqcynxvpaskr-pxm8deqh7_qevw7uqbhpcsg1fpsxk...@mail.gmail.com).I recognize there are brighter minds than mine on this thread but I justhonestly don't get it.

1. WAL writes are already fast. They are the fastest write we havebecause it is sequential.

2. We don't want them to be slow. We want data written to disk asquickly as possible without adversely affecting production. That's thepoint.

3. The spread checkpoints have always confused me. If anything we want acheckpoint to be fast and short because:

4. Bgwriter. We should be adjusting bgwriter so that it is writingeverything in a manner that allows any checkpoint to be in the range ofnever noticed.


Now perhaps my customers workloads are different but for us:

1. Checkpoint timeout is set as high as reasonable, usually 30 minutesto an hour. I wish I could set them even further out.

2. Bgwriter is set to be aggressive but not obtrusive. Usually adjustingbased on an actual amount of IO bandwidth it may take per second basedon their IO constraints. (Note I know that wal_writer comes into playhere but I honestly don't remember where and am reading up on it torefresh my memory).

3. The biggest issue we see with checkpoint segments is not running outof space because really.... 10GB is how many checkpoint segments? It iswith wal_keep_segments. If we don't want to fill up the pg_xlogdirectory, put the wal logs that are for keep_segments elsewhere.


Other oddities:

Yes checkpoint_segments is awkward. We shouldn't have to set it at all.It should be gone. Basically we start with X amount perhaps to be set atinitdb time. That X amount changes dynamically based on the amount ofdata being written. In order to not suffer from recycling and creationpenalties we always keep X+N where N is enough to keep up with new data.

Along with the above, I don't see any reason for checkpoint_timeout.Because of bgwriter we should be able to rather indefinitely not worryabout checkpoints (with a few exceptions such as pg_start_backup()).Perhaps a setting that causes a checkpoint to happen based on somenon-artificial threshold (timeout) such as amount of data currently inneed of a checkpoint?

Heikki said, "I propose that we do something similar, but not exactlythe same. Let's have a setting, max_wal_size, to control the max. diskspace reserved for WAL. Once that's reached (or you get close enough, sothat there are still some segments left to consume while the checkpointruns), a checkpoint is triggered.

In this proposal, the number of segments preallocated is controlledseparately from max_wal_size, so that you can set max_wal_size high,without actually consuming that much space in normal operation. It'sjust a backstop, to avoid completely filling the disk, if there's asudden burst of activity. The number of segments preallocated isauto-tuned, based on the number of segments used in previous checkpointcycles. "

This makes sense except I don't see a need for the parameter. Why notjust specify how the algorithm works and adhere to that without the needfor another GUC? Perhaps at any given point we save 10% of availablespace (within a 16MB calculation) for pg_xlog, you hit it, we checkpointand LOG EXACTLY WHY.

Instead of "running out of disk space PANIC" we should just write to anemergency location within PGDATA and log very loudly that the SA isn'tpaying attention. Perhaps if that area starts to get to an unhappy placewe immediately bounce into read-only mode and log even more loudly thatthe SA should be fired. I would think read-only mode is safer and morepolite than an PANIC crash.

I do not think we should worry about filling up the hard disk except toprotect against data loss in the event. It is not user unfriendly toassume that a user will pay attention to disk space. Really?


Open to people telling me I am off in left field. Sorry if it is noise.

Sincerely,

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

Reply via email to