Boa tarde pessoal. Hje de manhã tivemos "too many clients" no banco, eu não esta na empresa, e o adm de redes foi lá e derrubou um monte de conexões do postgres que ele achou q eram antigas...
O banco ficou inacessível, ele fez um restart do banco, que não subiu. Teve que apagar o PID na unha e depois o banco subiu... Depois disso, quando cheguei, notei que o banco estava se derrubando e subindo sozinho, exibindo essas mensagens: * 2013-09-26 12:09:25 BRT [18539]: [1-1] db=,user= LOG: server process (PID 23040) was terminated by signal 6* * 2013-09-26 12:09:25 BRT [18539]: [2-1] db=,user= LOG: terminating any other active server processes* *10.11.0.2 2013-09-26 12:09:25 BRT [23043]: [3-1] db=cimed,user=postgres WARNING: terminating connection because of crash of another server process* *10.11.0.2 2013-09-26 12:09:25 BRT [23043]: [4-1] db=cimed,user=postgres DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.* *10.11.0.2 2013-09-26 12:09:25 BRT [23043]: [5-1] db=cimed,user=postgres HINT: In a moment you should be able to reconnect to the database and repeat your command.* Quando subia, esse era o log: * 2013-09-26 12:09:25 BRT [18539]: [3-1] db=,user= LOG: all server processes terminated; reinitializing* * 2013-09-26 12:09:26 BRT [23047]: [1-1] db=,user= LOG: database system was interrupted at 2013-09-26 12:04:32 BRT* * 2013-09-26 12:09:26 BRT [23047]: [2-1] db=,user= LOG: checkpoint record is at 160/370DEA58* * 2013-09-26 12:09:26 BRT [23047]: [3-1] db=,user= LOG: redo record is at 160/370D18C8; undo record is at 0/0; shutdown FALSE* * 2013-09-26 12:09:26 BRT [23047]: [4-1] db=,user= LOG: next transaction ID: 499844432; next OID: 572978777* * 2013-09-26 12:09:26 BRT [23047]: [5-1] db=,user= LOG: next MultiXactId: 15762; next MultiXactOffset: 37493* * 2013-09-26 12:09:26 BRT [23047]: [6-1] db=,user= LOG: database system was not properly shut down; automatic recovery in progress* * 2013-09-26 12:09:26 BRT [23047]: [7-1] db=,user= LOG: redo starts at 160/370D18C8* * 2013-09-26 12:09:26 BRT [23047]: [8-1] db=,user= LOG: record with zero length at 160/3768AD90* * 2013-09-26 12:09:26 BRT [23047]: [9-1] db=,user= LOG: redo done at 160/3768AD60* * 2013-09-26 12:09:33 BRT [23047]: [10-1] db=,user= LOG: database system is ready* * 2013-09-26 12:09:33 BRT [23047]: [11-1] db=,user= LOG: transaction ID wrap limit is 1073777089, limited by database "cimed"* e sempre precedido dessas msg´s ( note que tive varias ocorrencias dela) *10.11.0.2 2013-09-26 12:09:24 BRT [23040]: [3-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 12:25:35 BRT [23843]: [1-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 12:29:26 BRT [24116]: [1-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 12:44:34 BRT [25066]: [1-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 13:34:21 BRT [28222]: [1-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 13:47:51 BRT [29590]: [1-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 14:03:48 BRT [30643]: [1-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 14:26:33 BRT [31689]: [206-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 14:30:27 BRT [31902]: [9-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 14:50:00 BRT [924]: [127-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 14:55:26 BRT [1985]: [5-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 15:05:46 BRT [3063]: [15-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* * * *além da informação de roll back das transações: * * 10.11.0.2 2013-09-26 17:25:11 BRT [11831]: [4-1] db=nutracom,user=visao DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 10.35.0.2 2013-09-26 17:25:11 BRT [11982]: [2-1] db=cimed,user=postgres DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 10.11.0.2 2013-09-26 17:25:11 BRT [13011]: [2-1] db=cimed,user=postgres DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. * Pesquisando, vi que poderia ser corrupção de indices... Derrubei o banco, limitei o acesso dos usuários, e executei o reindex de todas as tabelas em lote, com script. Durante esse processo, tive o mesmo problema duas vezes, qdo o indice chegou numa determinada tabela, ao invés de executar o script em lote, fiz tabela a tabela, e passou do ponto que dava erro. O reindex de todas as tabelas terminou, e subi o banco novamente... Duas horas depois, a mesma coisa com o aumento do acesso: *10.11.0.2 2013-09-26 17:10:37 BRT [11516]: [1-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* *10.11.0.2 2013-09-26 17:25:10 BRT [13163]: [1-1] db=cimed,user=postgres PANIC: right sibling's left-link doesn't match* * * *Inclusive essa mensagem me preocupou e não tenho idéia do que pode ser:* * 10.35.0.2 2013-09-26 18:01:28 BRT [16124]: [1-1] db=nutracom,user=postgres WARNING: could not remove relation 1663/105809227/572937579: Arquivo ou diretório não encontrado 10.35.0.2 2013-09-26 18:01:28 BRT [16124]: [2-1] db=nutracom,user=postgres WARNING: could not remove relation 1663/105809227/572937581: Arquivo ou diretório não encontrado 10.35.0.2 2013-09-26 18:01:28 BRT [16124]: [3-1] db=nutracom,user=postgres WARNING: could not remove relation 1663/105809227/572937583: Arquivo ou diretório não encontrado 10.35.0.2 2013-09-26 18:01:28 BRT [16124]: [4-1] db=nutracom,user=postgres WARNING: could not remove relation 1663/105809227/572937585: Arquivo ou diretório não encontrado 10.35.0.2 2013-09-26 18:01:28 BRT [16124]: [5-1] db=nutracom,user=postgres WARNING: could not remove relation 1663/105809227/572937586: Arquivo ou diretório não encontrado 10.35.0.2 2013-09-26 18:01:28 BRT [16124]: [6-1] db=nutracom,user=postgres WARNING: could not remove relation 1663/105809227/572937588: Arquivo ou diretório não encontrado 10.35.0.2 2013-09-26 18:01:28 BRT [16124]: [7-1] db=nutracom,user=postgres WARNING: could not remove relation 1663/105809227/572937590: Arquivo ou diretório não encontrado 10.35.0.2 2013-09-26 18:01:28 BRT [16124]: [8-1] db=nutracom,user=postgres WARNING: could not remove relation 1663/105809227/572937597: Arquivo ou diretório não encontrado * Não estou certo de como proceder: dump/ restore do banco, drop/create dos indices, ou alguma outra tentativa: Servidor Linux, Postgresql 8.1.18, esse é o servidor de produção que está estável, com espaço em disco e memória sobrando. Tenho uma unica instancia do postgres com vários databases. Poderiam me ajudar? No aguardo,
_______________________________________________ pgbr-geral mailing list pgbr-geral@listas.postgresql.org.br https://listas.postgresql.org.br/cgi-bin/mailman/listinfo/pgbr-geral