Hi list! depressed me gets error messages like these:
2013-07-29 20:57:09 UTC <xaxos_mailer%xaxos_de> ERROR: could not access status of transaction 8393477 2013-07-29 20:57:09 UTC <xaxos_mailer%xaxos_de> DETAIL: Could not open file "pg_clog/0008": No such file or directory. combined with the error output of queries that do not work. I looked in pg_clog and correct, 0008 is missing. On this linux machine on (3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux) I am using xfs on raid1 on a megacli raid controller with 16 disks, no battery, this is why write through is enabled, no cacheing. My feeling is, that 'something' got confused with hot-standby and wal_archiving leading to this situation, that seems to be partially xfs caused?????? xfs_repair mentionned some missing files, that pg did not expect to have (maybe truncated tables??). I quite extensively created indices in transactions and removed those within these transactions to do fast deletes (foreign key constraints) before i got the error??? * tried to get one of the warm standby's up but one complains about not being the same pg cluster as the 'wal files'. the other hot standby won't start for some locale reason. (it's not that I did not have backups ;) ). the cluster is 'working', i get the error around 1/sec but the other clients seem fine, so it's really only a few tables that are corrupted. I cannot really take down the machine as it's quite a busy few million queries a day cluster. before the current error, i got some error that XXXXX.1 was missing which was (luckily) an index file that i could recreate via 'reindex', but i fear we're now at a table / transaction corruption which i cannot just 'rewrite'. I would not at all mind just discarding all those transactions that have accumulated in pg_clog postgres@pgmaster:~/9.1/main/pg_clog$ ls -alrt | wc -l 180 quite desperate... postgres@[local]:5432 [postgres] # select version(); version ---------------------------------------------------------------------------------------------- PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit (1 row) Customized options: #------------------------------------------------------------------------------ # CUSTOMIZED OPTIONS #------------------------------------------------------------------------------ #custom_variable_classes = '' # list of custom variable class names listen_addresses = '*' # what IP address(es) to listen on; max_connections = 320 # (change requires restart) timezone = 'Etc/UTC' shared_buffers = 2GB # min 128kB maintenance_work_mem = 250MB checkpoint_completion_target = 0.9 effective_cache_size = 20GB effective_io_concurrency = 6 # 1-1000. 0 disables prefetching archive_mode = on wal_level = 'hot_standby' # http://www.postgresql.org/docs/9.1/static/runtime-config-wal.html#GUC-WAL-LEVEL archive_command = '/opt/postgres_archive_command.pl --file_path=%p --file_name=%f --work_dir=/var/tmp/ --destination_hosts= va-pg-back...@dx.ipv6.ex.net --destination_sftp_hosts=u671@ipv6.u71.y--destination_hosts= va-pg-back...@y7.ipv6.ex.net' max_wal_senders = 3 # max number of walsender processes wal_keep_segments = 50 # in logfile segments, 16MB each; 0 disables thx in advance, klaus