... pg_control last modified: Tue Dec 14 15:39:26 2004 ... Time of latest checkpoint: Tue Nov 2 17:05:32 2004
[ blink... ] That seems like an unreasonable gap between checkpoints, especially for a production server. Can you see an explanation?
Hmmm, this is even more scary. We have two database clusters on this server, one on /replica/pgdata, and one on /production/pgdata (ignore the names -- /replica is actually the "production" instance at the moment).
# pg_controldata /replica/pgdata pg_control version number: 72 Catalog version number: 200310211 Database cluster state: shutting down pg_control last modified: Tue Dec 14 15:39:26 2004 Current log file ID: 0 Next log file segment: 1 Latest checkpoint location: 0/9B0B8C Prior checkpoint location: 0/9AA1B4 Latest checkpoint's REDO location: 0/9B0B8C Latest checkpoint's UNDO location: 0/0 Latest checkpoint's StartUpID: 12 Latest checkpoint's NextXID: 536 Latest checkpoint's NextOID: 17142 Time of latest checkpoint: Tue Nov 2 17:05:32 2004 Database block size: 8192 Blocks per segment of large relation: 131072 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: 64-bit integers Maximum length of locale name: 128 LC_COLLATE: C LC_CTYPE: C
# pg_controldata /production/pgdata pg_control version number: 72 Catalog version number: 200310211 Database cluster state: shutting down pg_control last modified: Tue Nov 2 21:57:49 2004 Current log file ID: 0 Next log file segment: 1 Latest checkpoint location: 0/9B0B8C Prior checkpoint location: 0/9AA1B4 Latest checkpoint's REDO location: 0/9B0B8C Latest checkpoint's UNDO location: 0/0 Latest checkpoint's StartUpID: 12 Latest checkpoint's NextXID: 536 Latest checkpoint's NextOID: 17142 Time of latest checkpoint: Tue Nov 2 17:05:32 2004 Database block size: 8192 Blocks per segment of large relation: 131072 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: 64-bit integers Maximum length of locale name: 128 LC_COLLATE: C LC_CTYPE: C
I have no idea how this happened, but those look too similar except for the "last modified" date. The space used is quite what I'd expect:
# du -h --max-depth=1 /replica 403G /replica/pgdata
# du -h --max-depth=1 /production 201G /production/pgdata
The "/production/pgdata" cluster has not been in use since Nov 2. But we've been loading data aggressively into "/replica/pgdata".
Any theories on how we screwed up?
Joe
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster