Re: [GENERAL] Autovacuum daemon terminated by signal 11

Justin Pasher Thu, 15 Jan 2009 13:47:32 -0800

Richard Huxton wrote:

Justin Pasher wrote:

Hello,


I have a server running PostgreSQL 8.1.15-0etch1 (Debian etch) that was
recently put into production. Last week a developer started having a problem
with his psql connection being terminated every couple of minutes when he
was running a query. When I look through the logs, I noticed this message.

2009-01-09 08:09:46 CST LOG:  autovacuum process (PID 15012) was terminated
by signal 11


Segmentation fault - probably a bug or bad RAM.

It's a relatively new machine, but that's obviously a possibility withany hardware. I haven't seen any other programs experiencing problems onthe box, but the Postgres daemon is the one that is primarily utilized,so it's a little biased toward that.

I looked through the logs some more and I noticed that this was occurring
every minute or so. The database is a pretty heavily utilized system
(judging by the age(datfrozenxid) from pg_database, the system had run
approximately 500 million queries in less than a week). I noticed that right
before every autovacuum termination, it tried to autovacuum a database.

2009-01-09 08:09:46 CST LOG:  transaction ID wrap limit is 4563352, limited
by database "database_name"

It was always showing the same database, so I decided to manually vacuum the
database. Once that was done (it was successful the first time without
errors), the problem seemed to go away. I went ahead and manually vacuumed
the remaining databases just to take care of the potential xid wraparound
issue.


I'd be suspicious of possible corruption in autovacuum's internal data.
Can you trace these problems back to a power-outage or system crash? It
doesn't look like "database_name" itself since you vacuumed that
successfully. If autovacuum is running normally now, that might indicate
it was something in the way autovacuum was keeping track of "database_name".

The server hasn't been rebooted since it was installed (about 9 monthsago, but only being utilized within the past month), so there haven'tbeen any crashes or power outages. The only abnormal things I can findin the Postgres logs are the autovacuum segfaults. Looking in the logstoday, it looks like it's still happening (once again on a differentdatabase). I manually vacuumed that one database and the problem wentaway (for now).

Are there any internal Postgres tables I can look at that may shed somelight on this? Any particular maintenance commands that could be run forrepair?

It's also probably worth running some memory tests on the server -
(memtest86 or similar) to see if that shows anything. Was it *always*
the autovacuum process getting sig11? If not then it might just be a
pattern of usage that makes it more likely to use some bad RAM

I might try the memtest if we can actually get the databases off of theserver to allow some downtime. None of the logs indicate anything elseacting abnormally or being terminated abnormally, just the autovacuumdaemon. From what I can tell, the segfaults only when the databases passthe half way point (when age(datfrozenxid) exceeds around 1500000000).When this is not the case, the segfaults do not occur according to the logs.



Justin Pasher

--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Autovacuum daemon terminated by signal 11

Reply via email to