Re: [HACKERS] patch: improve SLRU replacement algorithm

Jeff Janes Thu, 05 Apr 2012 09:31:28 -0700

On Thu, Apr 5, 2012 at 7:05 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Apr 5, 2012 at 9:29 AM, Greg Stark <st...@mit.edu> wrote:
>> On Thu, Apr 5, 2012 at 2:24 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>> Sorry, I don't understand specifically what you're looking for.  I
>>> provided latency percentiles in the last email; what else do you want?
>>
>> I think he wants how many waits were there that were between 0 and 1s
>> how many between 1s and 2s, etc. Mathematically it's equivalent but I
>> also have trouble visualizing just how much improvement is represented
>> by 90th percentile dropping from 1688 to 1620 (ms?)
>
> Yes, milliseconds.  Sorry for leaving out that detail.  I've run these
> scripts so many times that my eyes are crossing.  Here are the
> latencies, bucketized by seconds, first for master and then for the
> patch, on the same test run as before:
>
> 0 26179411
> 1 3642
> 2 660
> 3 374
> 4 166
> 5 356
> 6 41
> 7 8
> 8 56
> 9 0
> 10 0
> 11 21
> 12 11
>
> 0 26199130
> 1 4840
> 2 267
> 3 290
> 4 40
> 5 77
> 6 36
> 7 3
> 8 2
> 9 33
> 10 37
> 11 2
> 12 1
> 13 4
> 14 5
> 15 3
> 16 0
> 17 1
> 18 1
> 19 1
>
> I'm not sure I find those numbers all that helpful, but there they
> are.  There are a couple of outliers beyond 12 s on the patched run,
> but I wouldn't read anything into that; the absolute worst values
> bounce around a lot from test to test.  However, note that every
> bucket between 2s and 8s improves, sometimes dramatically.



However, if it "improved" a bucket by pushing the things out of it
into a higher bucket, that is not really an improvement.  At 8 seconds
*or higher*, for example, it goes from 88 things in master to 90
things in the patch.

Maybe something like a Kaplan-Meier survival curve analysis would be
the way to go (where a long transaction "survival" is bad).  But
probably overkill.

What were full_page_writes and wal_buffers set to for these runs?


> It's worth
> keeping in mind here that the system is under extreme I/O strain on
> this test, and the kernel responds by forcing user processes to sleep
> when they try to do I/O.

Should the tests be dialed back a bit so that the I/O strain is less
extreme?  Analysis is probably best done right after where the
scalability knee is, not long after that point where the server has
already collapsed to a quivering mass.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: improve SLRU replacement algorithm

Reply via email to