[GENERAL] Effect of a kill -9 on postgres

2011-08-07 Thread Royce Ausburn
Hi all,

A few days ago one of our postgres (8.3.12) servers was a bit unhappy, and 
someone decided to try a kill -9 on a backend process after a kill (TERM) was 
ineffective.  I've read many times in the past that a kill -9 can be pretty 
hazardous to a postgres' health, and now it seems I get to see first hand how 
hazardous it really is :(

Fortunately postgres seems to have detected the -9 signal and brought the 
system down:

2011-08-05 17:17:53 EST redacted.com 10.3.0.3(39556) ad...@redacted.com 
WARNING:  terminating connection because of crash of another server process
2011-08-05 17:17:53 EST redacted.com 10.3.0.3(39556) ad...@redacted.com DETAIL: 
 The postmaster has commanded this server process to roll back the current 
transaction and exit, because another server process exited abnormally and 
possibly corrupted shared memory.
2011-08-05 17:17:53 EST redacted.com 10.3.0.3(39556) ad...@redacted.com HINT:  
In a moment you should be able to reconnect to the database and repeat your 
command.

After the barrage of those messages, there:

2011-08-05 17:17:54 ESTLOG:  all server processes terminated; reinitializing
2011-08-05 17:17:55 ESTLOG:  database system was interrupted; last known up 
at 2011-08-05 17:15:33 EST
2011-08-05 17:17:55 ESTLOG:  database system was not properly shut down; 
automatic recovery in progress
2011-08-05 17:17:55 ESTLOG:  redo starts at 208/5013A758
2011-08-05 17:17:55 ESTLOG:  record with zero length at 208/51497498
2011-08-05 17:17:55 ESTLOG:  redo done at 208/51497468
2011-08-05 17:17:55 ESTLOG:  last completed transaction was at log time 
2011-08-05 17:17:52.709539+10
2011-08-05 17:18:03 ESTLOG:  autovacuum launcher started
2011-08-05 17:18:03 ESTLOG:  database system is ready to accept connections


For each of the other backend processes.

I'm a bit worried about corruption and would like to know:

- Is postgres 8.3.12 susceptible to corruption when a backend process is -9'd?

- How do we confirm that there has been no corruption?

We have nightly backups that dump every database in the cluster, and looking 
over postgres' logs I can't see any errors that might point to corruption... I 
guess that's a good sign - is there anything else I can look in to?

Thanks very much,

--Royce


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Effect of a kill -9 on postgres

2011-08-07 Thread Craig Ringer
The only time kill -9 should be a data corruption issue is if you kill the
postMASTER (not just a backend) then remove the postmaster.pid file from the
datadir and relaunch the postmaster without first making sure the worker
backends are all shut down.

You need to load the shotgun, aim it carefully at your foot, take the safety
off and pull the trigger. It's not easy.

A kill -9 shouldn't even cause problems if you're running on unsafe write
cached storage or (afaik) with fsync=off. Though for other reasons you
should never be doing either without streaming replication, good backups,
and a willingness to life some data.

On Aug 8, 2011 9:01 AM, Royce Ausburn royce...@inomial.com wrote:
 Hi all,

 A few days ago one of our postgres (8.3.12) servers was a bit unhappy, and
someone decided to try a kill -9 on a backend process after a kill (TERM)
was ineffective. I've read many times in the past that a kill -9 can be
pretty hazardous to a postgres' health, and now it seems I get to see first
hand how hazardous it really is :(

 Fortunately postgres seems to have detected the -9 signal and brought the
system down:

 2011-08-05 17:17:53 EST redacted.com 10.3.0.3(39556) 
 admin@redacted.comWARNING: terminating connection because of crash of another 
 server process
 2011-08-05 17:17:53 EST redacted.com 10.3.0.3(39556) 
 admin@redacted.comDETAIL: The postmaster has commanded this server process to 
 roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
 2011-08-05 17:17:53 EST redacted.com 10.3.0.3(39556) admin@redacted.comHINT: 
 In a moment you should be able to reconnect to the database and repeat
your command.

 After the barrage of those messages, there:

 2011-08-05 17:17:54 EST LOG: all server processes terminated;
reinitializing
 2011-08-05 17:17:55 EST LOG: database system was interrupted; last known
up at 2011-08-05 17:15:33 EST
 2011-08-05 17:17:55 EST LOG: database system was not properly shut down;
automatic recovery in progress
 2011-08-05 17:17:55 EST LOG: redo starts at 208/5013A758
 2011-08-05 17:17:55 EST LOG: record with zero length at 208/51497498
 2011-08-05 17:17:55 EST LOG: redo done at 208/51497468
 2011-08-05 17:17:55 EST LOG: last completed transaction was at log time
2011-08-05 17:17:52.709539+10
 2011-08-05 17:18:03 EST LOG: autovacuum launcher started
 2011-08-05 17:18:03 EST LOG: database system is ready to accept
connections


 For each of the other backend processes.

 I'm a bit worried about corruption and would like to know:

 - Is postgres 8.3.12 susceptible to corruption when a backend process is
-9'd?

 - How do we confirm that there has been no corruption?

 We have nightly backups that dump every database in the cluster, and
looking over postgres' logs I can't see any errors that might point to
corruption... I guess that's a good sign - is there anything else I can look
in to?

 Thanks very much,

 --Royce


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general