14 апр. 2014 г., в 19:11, Jeff Janes <jeff.ja...@gmail.com> написал(а):
> On Mon, Apr 14, 2014 at 2:46 AM, Borodin Vladimir <r...@simply.name> wrote: > Hi all. > > I’m running PostgreSQL 9.3.4 and doing stress test of the database with > writing only load. The test plan does 1000 transactions per second (each of > them does several updates/inserts). The problem is that checkpoint is not > distributed over time well. When the checkpoint finishes, the db gets lots of > I/O operations and response timings grows strongly. > > My checkpoint settings looks like that: > > postgres=# select name, setting from pg_catalog.pg_settings where name like > 'checkpoint%' and boot_val != reset_val; > name | setting > ------------------------------+--------- > checkpoint_completion_target | 0.9 > checkpoint_segments | 100500 > checkpoint_timeout | 600 > (3 rows) > > postgres=# > > But in the log I see that checkpoint continues less than 600*0.9 = 540 > seconds: > > 2014-04-14 12:54:41.479 MSK,,,10517,,53468da6.2915,433,,2014-04-10 16:25:10 > MSK,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,"" > 2014-04-14 12:57:06.107 MSK,,,10517,,53468da6.2915,434,,2014-04-10 16:25:10 > MSK,,0,LOG,00000,"checkpoint complete: wrote 65140 buffers (24.8%); 0 > transaction log file(s) added, 0 removed, 327 recycled; write=134.217 s, > sync=10.292 s, total=144.627 s; sync files=31, longest=3.332 s, average=0.331 > s",,,,,,,,,»" > > When a checkpoint starts, the checkpointer process counts up all the buffers > that need to be written. Then it goes through and writes them. It paces > itself by comparing how many buffers it itself has written to how many need > to be written. But if a buffer that needs to be checkpointed happens to get > written by some other process (the background writer, or a backend, because > they need a clean buffer to read different data into), the checkpointer is > not notified of this and doesn't count that buffer as being written when it > computes whether it is on track. This causes it to finish early. This can > be confusing, but probably doesn't cause any real problems. The reason for > checkpoint_completion_target is to spread the IO out over a longer time, but > if much of the checkpoint IO is really being done by the background writer, > than it is already getting spread out fairly well. I didn’t know that, thanks. Seems, that I have quite small shared buffers size. I will investigate this problem. > > When the checkpoint starts (12:54:41.479) dstat says that I/O load increases: > > ... > > But when it finishes (12:57:06.107) the I/O load is much higher than the > hardware can do: > > During the writing phase of the checkpoint, PostgreSQL passes the dirty data > to the OS. At the end, it then tells the OS to make sure that that data has > actually reached disk. If your OS stored up too much dirty data in memory > then it kind of freaks out once it is notified it needs to actually write > that data to disk. The best solution for this may be to lower > dirty_background_bytes or dirty_background_ratio so the OS doesn't store up > so much trouble for itself. > Actually, I have already tuned them to different values. Test results above have been obtained with such settings for page cache: vm.dirty_background_ratio = 5 vm.dirty_ratio = 40 vm.dirty_expire_centisecs = 100 vm.dirty_writeback_centisecs = 100 Togethrer with previous point I will try to tune os and postgres settings. Thanks. > Cheers > > Jeff -- Vladimir