Re: [HACKERS] buffer assertion tripping under repeat pgbench load

Greg Smith Sat, 12 Jan 2013 21:34:45 -0800

On 12/26/12 7:23 PM, Greg Stark wrote:

It's also possible it's a bad cpu, not bad memory. If it affects
decrement or increment in particular it's possible that the pattern of
usage on LocalRefCount is particularly prone to triggering it.

This looks to be the winning answer. It turns out that under extendedmulti-hour loads at high concurrency, something related to CPUoverheating was occasionally flipping a bit. One round of compressedair for all the fans/vents, a little tweaking of the fan controls, andnow the system goes >24 hours with no problems.

Sorry about all the noise over this. I do think the improved warningmessages that came out of the diagnosis ideas are useful. The reworkedcode must slows down the checking a few cycles, but if you care aboutperformance these assertions are tacked onto the biggest pig around.

I added the patch to the January CF as "Improve buffer refcount leakwarning messages". The sample I showed with the patch submission was asimulated one. Here's the output from the last crash before resolvingthe issue, where the assertion really triggered:

WARNING: buffer refcount leak: [170583] (rel=base/16384/16578,blockNum=302295, flags=0x106, refcount=0 1073741824)

WARNING:  buffers with non-zero refcount is 1

TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line:1712)


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

Reply via email to