On Thu, 2010-04-22 at 23:45 +0100, Simon Riggs wrote: > On Thu, 2010-04-22 at 20:39 +0200, Erik Rijkers wrote: > > On Sun, April 18, 2010 13:01, Simon Riggs wrote: > > > any comment is welcome... > > Please can you re-run with -l and post me the file of times
Erik has sent me details of a test run. My analysis of that is: I'm seeing the response time profile on the standby as 99% <110us 99.9% <639us 99.99% <615ms 0.052% (52 samples) are >5ms elapsed and account for 24 s, which is about 45% of elapsed time. Of the 52 samples >5ms, 50 of them are >100ms and 2 >1s. 99% of transactions happen in similar times between primary and standby, everything dragged down by rare but severe spikes. We're looking for something that would delay something that normally takes <0.1ms into something that takes >100ms, yet does eventually return. That looks like a severe resource contention issue. This effect happens when running just a single read-only session on standby from pgbench. No confirmation as yet as to whether recovery is active or dormant, and what other activitity if any occurs on standby server at same time. So no other clues as yet as to what the contention might be, except that we note the standby is writing data and the database is large. > Please also rebuild using --enable-profile so we can see what's > happening. > > Can you also try the enclosed patch which implements prefetching during > replay of btree delete records. (Need to set effective_io_concurrency) As yet, no confirmation that the attached patch is even relevant. It was just a wild guess at some tuning, while we wait for further info. > Thanks for your further help. "Some kind of contention" is best we can say at present. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers