Srs. Estou passando por um problema crítico e preciso de ajuda. Meu servidor de banco usa PostgreSQL 8.4 e contém vários banco de dados, pelo menos 50. Rodamos Processamentos de dados que agem em cima dos banco de forma aleatória e com simultaneidade variável. Os processos incluem deleções, inserts e updates em massa, tanto que como exemplo temos uma base de dados com 30GB e depois de um dia de processamentos (entre 3 e 7) a base chega a subir para 40GB a 45GB, quando rodamos a manutenção ela mantém a média do tamanho anterior. Bem depois dessa explanação, o servidor constantemente vem entrando em CRASH, hoje está ocorrendo com uma média de 1 a cada 2 semanas, segue uma parte do LOG:
2011-03-11 09:02:46.112 BRT,,,3130,,4d4b64f5.c3a,10,,2011-02-03 23:31:17 BRT,,0,LOG,00000,"server process (PID 7308) was terminated by signal 11: Segmentation fault",,,,,,,, 2011-03-11 09:02:46.112 BRT,,,3130,,4d4b64f5.c3a,11,,2011-02-03 23:31:17 BRT,,0,LOG,00000,"terminating any other active server processes",,,,,,,, 2011-03-11 09:02:46.112 BRT,"postgres","databases",7312,"192.168.5.40:44551",4d7a0f5f.1c90,1,"idle",2011-03-11 09:02:39 BRT,12/0,0,WARNING,57P02,"terminating connection because of crash of another server process","The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.","In a moment you should be able to reconnect to the database and repeat your command.",,,,,, ... 2011-03-11 09:02:46.170 BRT,"postgres","databases",7343,"192.168.5.40:44604",4d7a0f66.1caf,1,"/usr/bin/postmaster",2011-03-11 09:02:46 BRT,,0,FATAL,57P03,"the database system is in recovery mode",,,,,,,, ... 2011-03-11 09:03:20.588 BRT,,,7353,,4d7a0f66.1cb9,4,,2011-03-11 09:02:46 BRT,,0,LOG,00000,"record with zero length at 1C40/DF5576E0",,,,,,,, 2011-03-11 09:03:20.588 BRT,,,7353,,4d7a0f66.1cb9,5,,2011-03-11 09:02:46 BRT,,0,LOG,00000,"redo done at 1C40/DF5576B0",,,,,,,, 2011-03-11 09:03:20.589 BRT,,,7353,,4d7a0f66.1cb9,6,,2011-03-11 09:02:46 BRT,,0,LOG,00000,"last completed transaction was at log time 2011-03-11 09:02:46.100526-03",,,,,,,, 2011-03-11 09:03:20.589 BRT,,,7353,,4d7a0f66.1cb9,7,,2011-03-11 09:02:46 BRT,,0,LOG,00000,"checkpoint starting: end-of-recovery immediate",,,,,,,, ... 2011-03-11 09:03:25.803 BRT,,,7353,,4d7a0f66.1cb9,8,,2011-03-11 09:02:46 BRT,,0,LOG,00000,"checkpoint complete: wrote 43730 buffers (12.1%); 0 transaction log file(s) added, 0 removed, 77 recycled; write=0.246 s, sync=4.964 s, total=5.214 s",,,,,,,, ... 2011-03-11 09:03:28.801 BRT,,,3130,,4d4b64f5.c3a,13,,2011-02-03 23:31:17 BRT,,0,LOG,00000,"database system is ready to accept connections",,,,,,,, -- Ele demora uma média de 1 a 2 minutos para realizar o recover e os processamentos dos bancos que estavam rodando no exato momento tem que ser re-processados. Nós temos uma demanda de escrita muito forte, principalmente no LOG, isso é prejudicial para nosso ambiente? Segue Abaixo dados do Servidor. Processador:/Memoria processor : 15 vendor_id : GenuineIntel model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz cpu MHz : 2266.807 cache size : 8192 KB Memory: 12GB Config Postgresql.conf - Feito pelo pgtune => oltp, 200 conexões #custom_variable_classes = '' # list of custom variable class names default_statistics_target = 100 # pgtune wizard 2010-03-20 maintenance_work_mem = 1GB # pgtune wizard 2010-03-20 constraint_exclusion = on # pgtune wizard 2010-03-20 checkpoint_completion_target = 0.9 # pgtune wizard 2010-03-20 effective_cache_size = 8GB # pgtune wizard 2010-03-20 work_mem = 30MB # pgtune wizard 2010-03-20 shared_buffers = 2816MB # pgtune wizard 2010-03-20 max_connections = 200 # pgtune wizard 2010-03-20 # by autovacuum = off # by wal_buffers = 64MB # by checkpoint_segments = 128 # by listen_addresses = '*' # by log_destination = 'csvlog' # by log_checkpoints = on # by log_disconnections = on # by log_lock_waits = on # by log_line_prefix = '%d,%r,%u,' # by log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # by # log_duration = on # by # log_min_duration_statement = 100 # #log_temp_files = 0 OBS: Temos outros dois servidores com a mesma configuração que chegou a ocorrer esse problema mas não com a frequencia desse. Aguardo... -- *Atenciosamente, Emanuel Araújo* http://eacshm.wordpress.com/ * * *Linux Certified LPIC-1*
_______________________________________________ pgbr-geral mailing list [email protected] https://listas.postgresql.org.br/cgi-bin/mailman/listinfo/pgbr-geral
