Re: [PERFORM] Checkpoint distribution

Borodin Vladimir Mon, 14 Apr 2014 09:43:34 -0700

14 апр. 2014 г., в 19:11, Jeff Janes <jeff.ja...@gmail.com> написал(а):


> On Mon, Apr 14, 2014 at 2:46 AM, Borodin Vladimir <r...@simply.name> wrote:
> Hi all.
> 
> I’m running PostgreSQL 9.3.4 and doing stress test of the database with 
> writing only load. The test plan does 1000 transactions per second (each of 
> them does several updates/inserts). The problem is that checkpoint is not 
> distributed over time well. When the checkpoint finishes, the db gets lots of 
> I/O operations and response timings grows strongly.
> 
> My checkpoint settings looks like that:
> 
> postgres=# select name, setting from pg_catalog.pg_settings where name like 
> 'checkpoint%' and boot_val != reset_val;
>              name             | setting 
> ------------------------------+---------
>  checkpoint_completion_target | 0.9
>  checkpoint_segments          | 100500
>  checkpoint_timeout           | 600
> (3 rows)
> 
> postgres=#
> 
> But in the log I see that checkpoint continues less than 600*0.9 = 540 
> seconds:
> 
> 2014-04-14 12:54:41.479 MSK,,,10517,,53468da6.2915,433,,2014-04-10 16:25:10 
> MSK,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
> 2014-04-14 12:57:06.107 MSK,,,10517,,53468da6.2915,434,,2014-04-10 16:25:10 
> MSK,,0,LOG,00000,"checkpoint complete: wrote 65140 buffers (24.8%); 0 
> transaction log file(s) added, 0 removed, 327 recycled; write=134.217 s, 
> sync=10.292 s, total=144.627 s; sync files=31, longest=3.332 s, average=0.331 
> s",,,,,,,,,»"
> 
> When a checkpoint starts, the checkpointer process counts up all the buffers 
> that need to be written.  Then it goes through and writes them.  It paces 
> itself by comparing how many buffers it itself has written to how many need 
> to be written.  But if a buffer that needs to be checkpointed happens to get 
> written by some other process (the background writer, or a backend, because 
> they need a clean buffer to read different data into), the checkpointer is 
> not notified of this and doesn't count that buffer as being written when it 
> computes whether it is on track.  This causes it to finish early.  This can 
> be confusing, but probably doesn't cause any real problems.  The reason for 
> checkpoint_completion_target is to spread the IO out over a longer time, but 
> if much of the checkpoint IO is really being done by the background writer, 
> than it is already getting spread out fairly well.

I didn’t know that, thanks. Seems, that I have quite small shared buffers size. 
I will investigate this problem.

> 
> When the checkpoint starts (12:54:41.479) dstat says that I/O load increases:
>  
> ...
> 
> But when it finishes (12:57:06.107) the I/O load is much higher than the 
> hardware can do:
> 
> During the writing phase of the checkpoint, PostgreSQL passes the dirty data 
> to the OS.  At the end, it then tells the OS to make sure that that data has 
> actually reached disk.  If your OS stored up too much dirty data in memory 
> then it kind of freaks out once it is notified it needs to actually write 
> that data to disk.  The best solution for this may be to lower 
> dirty_background_bytes or dirty_background_ratio so the OS doesn't store up 
> so much trouble for itself.
> 

Actually, I have already tuned them to different values. Test results above 
have been obtained with such settings for page cache:

vm.dirty_background_ratio = 5
vm.dirty_ratio = 40
vm.dirty_expire_centisecs = 100
vm.dirty_writeback_centisecs = 100

Togethrer with previous point I will try to tune os and postgres settings. 
Thanks.

> Cheers
> 
> Jeff


--
Vladimir

Re: [PERFORM] Checkpoint distribution

Reply via email to