Anyone, I would not describe my situation as desprate (yet), but it is painful, and I am looking for some information and a degree of confidence. I have yet to communicate w/ our sysadmin about backup/restore.
We are using version 8.0 beta 3, on Red Hat Linux, to develop a product that we plan to released in several months. The problem began when I tried to make some table modifications. A CREATE statement hung. I canceled. I then noticed many backend processes hung "idle in transaction". I had everyone exit their database activities, but several hung transactions still remained. I then did a "pg_ctl stop" followed by a "pg_ctl start". Here is the log file activity: (I got lots of these. Probably people exiting their clients) LOG: could not receive data from client: Connection reset by peer LOG: unexpected EOF on client connection LOG: could not receive data from client: Connection reset by peer LOG: unexpected EOF on client connection (Then this; probably after "pg_ctl stop") LOG: received fast shutdown request LOG: aborting any active transactions FATAL: terminating connection due to administrator command FATAL: terminating connection due to administrator command FATAL: terminating connection due to administrator command FATAL: terminating connection due to administrator command FATAL: terminating connection due to administrator command FATAL: terminating connection due to administrator command LOG: shutting down LOG: database system is shut down (Now this after "pg_ctl start") LOG: could not create IPv6 socket: Address family not supported by protocol LOG: database system was shut down at 2004-12-21 16:30:25 EST LOG: checkpoint record is at 0/278B3430 LOG: redo record is at 0/278B3430; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 884736; next OID: 306834 PANIC: could not access status of transaction 884736 DETAIL: could not read from file "/usr/local/pgsql8b3/data/pg_clog/0000" at offset 221184: Success LOG: startup process (PID 17774) was terminated by signal 6 LOG: aborting startup due to startup process failure It turns out that "/usr/local/pgsql8b3/data/pg_clog/0000" exists, is read/writable by postgres, and is exactly 221184 bytes long. So I started pursuing recovery documents. Came across pg_resetxlog. First I tried without the force option. It complained that it would not work. Then, using the -f option; that changed nothing. Then I began using the -f, -x and -l options: as follows: First some useful information for you: $ ls -l /usr/local/pgsql8b3/data/pg_clog/ -rw------- 1 postgres postgres 221184 Dec 21 16:12 0000 and $ls -l /usr/local/pgsql8b3/data/pg_xlog -rw------- 1 postgres postgres 16777216 Dec 22 06:16 000000010000000100000068 drwx------ 2 postgres postgres 4096 Sep 28 04:57 archive_status Like the doc describes, I tried: $ postgres pg_resetxlog -f -x 0x100000 -l 0x1,0x1,0x65 /usr/local/pgsql8b3/data (Now I get this in the log) LOG: could not create IPv6 socket: Address family not supported by protocol LOG: database system was shut down at 2004-12-21 20:11:12 EST LOG: checkpoint record is at 1/6400001C LOG: redo record is at 1/6400001C; undo record is at 1/6400001C; shutdown TRUE LOG: next transaction ID: 1048576; next OID: 306834 PANIC: could not access status of transaction 1048576 DETAIL: could not open file "/usr/local/pgsql8b3/data/pg_clog/0001": No such file or directory LOG: startup process (PID 304) was terminated by signal 6 LOG: aborting startup due to startup process failure Which is correct, there is no file: "/usr/local/pgsql8b3/data/pg_clog/0001". Also, this is the output of pg_controldata after running pg_resetxlog: pg_control version number: 74 Catalog version number: 200408031 Database system identifier: 4708838347263258567 Database cluster state: in production pg_control last modified: Wed 22 Dec 2004 06:17:55 AM EST Current log file ID: 1 Next log file segment: 105 Latest checkpoint location: 1/6800001C Prior checkpoint location: 0/0 Latest checkpoint's REDO location: 1/6800001C Latest checkpoint's UNDO location: 1/6800001C Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 1048576 Latest checkpoint's NextOID: 306834 Time of latest checkpoint: Wed 22 Dec 2004 06:16:27 AM EST Database block size: 8192 Blocks per segment of large relation: 131072 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 Can someone help me make sense of all this? Thanks, David ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org