I'm running pgBadger over the log, and will get some queries to explain analyze them.
The 9.2 cluster is running in the same server as the production, so I will try to compare some critical explains and publish on explain.depesz.com. 2013/4/21 Adrian Klaver <adrian.kla...@gmail.com> > On 04/21/2013 09:29 AM, Daniel Cristian Cruz wrote: > >> >> >> >> 2013/4/21 Adrian Klaver <adrian.kla...@gmail.com >> <mailto:adrian.klaver@gmail.**com <adrian.kla...@gmail.com>>> >> > > > >> 1) >> Major upgrade from 9.1.4 to 9.2.4. >> Used pg_upgrade >> Tested on VM with 9.2.4 and no problems. >> Same machine used for production server 9.1.4 and 9.2.4 >> When complex queries where run on production server under 9.2.4 >> memory usage climbed out of control. >> >> Unanswered questions: >> >> a) Data set sizes between test and production machines, how do they >> differ? >> >> >> It's the same on both; we do a dump/restore every day to the development >> / issue team work. >> > > Which begs the question, what is different about your test setup that > makes it not act up? > test environment: virtual server 8 cores 12 GB RAM 4GB SWAP max_connections = 200 shared_buffers = 800MB temp_buffers = 32MB work_mem = 256MB maintenance_work_mem = 768MB (no max_stack_depth set) shared_preload_libraries = '$libdir/plpgsql,$libdir/plpython2,$libdir/pgxml,$libdir/pg_stat_statements' wal_level = minimal checkpoint_segments = 15 archive_mode = off max_wal_senders = 0 (no effective_cache_size set) constraint_exclusion = partition log_min_duration_statement = 1000 (no log_temp_files set) (no statement_timeout set) max_locks_per_transaction = 1024 production: true server 24 cores 96GB RAM 50GB SWAP max_connections = 1000 shared_buffers = 6GB temp_buffers = 24MB work_mem = 24MB maintenance_work_mem = 128MB max_stack_depth = 8MB (no shared_preload_libraries, disabled after problems because pg_stat_statements was a new module) wal_level = hot_standby checkpoint_segments = 20 archive_mode = on (archive_command set) max_wal_senders = 1 effective_cache_size = 32GB (no constraint_exclusion set) log_min_duration_statement = 5000 log_temp_files = 0 statement_timeout = 300000 (no max_locks_per_transaction set) > We know that the test servers are running on VMs with fewer resources than > the production server. > > So: > > Are the VMs running the same OS and OS version as the production server? > test: Red Hat Enterprise Linux Server release 5.5, Linux 2.6.18-194.26.1.el5 #1 SMP Fri Oct 29 14:21:16 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux production: CentOS release 5.5, Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux > > What are 'hardware differences' between the test VMs and the physical > server? > above. > > Are the Postgres configurations different for the test vs production > servers? > Yes, some of them, shown above. > I would guess the usage pattern is different, but in what way? > Number of connections/sessions? > 300 connections in production, 50 in test. > INSERT/UPDATE/DELETE pattern? > test: just test cases, development cases and issue cases production: for 1 minute (00:01:00.076343), 583 inserts, 306 updates and 13 deletes and 3300 transactions (xacts_commits) > Client software using the database? > > Sites using PHP with and without connection pool and with and without Doctrine; a huge system with Java and Hibernate using the JBoss pooler, Java aplication is the main user. > > > >> >> b) What are the EXPLAIN/ANALYZE results for a query on 9.1.4, 9.2.4 >> test and 9.2.4 production? >> Since there is no single query causing the problem, I don't know if it >> could help >> > > For a lack of anything else pick one and try it on the various servers to > see if something stands out. Yes , here it is: query1: EXPLAIN ANALYZE SELECT ced.id_evento, ced.inicio, ced.termino, ced.evento, ced.id_eventos IS NOT NULL AS aula_dividida, ac.titulo, ced.id_tipo_evento, ced.tipo_evento, ac.media_referencia, p.nome, ef.nome AS nomeEspacoFisico, ( SELECT count ( pre2.presente ) > 0 FROM turma.presenca pre2 WHERE pre2.id_aula = ac.id_aula AND pre2.id_evento = ac.id_evento AND pre2.id_diario = '64469' ) AS presenca, ced.id_aula FROM recurso.consulta_evento_diario ced LEFT JOIN recurso.evento e USING ( id_evento ) LEFT JOIN recurso.espaco_fisico ef USING ( id_espaco_fisico ) LEFT JOIN turma.aula_calendario ac USING ( id_aula, id_evento ) LEFT JOIN recurso.evento_participante ep USING ( id_evento ) LEFT JOIN senai.pessoa p USING ( id_pessoa ) WHERE id_diario = '64469' AND ced.id_evento NOT IN ( SELECT ec.id_evento_sobreposto FROM recurso.evento_conflito ec WHERE ec.id_evento_sobreposto = ced.id_evento AND ec.ignorado IS NULL ) AND ced.inicio BETWEEN '2013-04-14 00:00:00' AND '2013-04-20 23:59:59.999999' ORDER BY inicio; server 9.1: http://explain.depesz.com/s/fmM server 9.2: http://explain.depesz.com/s/wXm After run this one on server 9.2, RES memory reached 6.5GB, VIRT 15GB. query 2: EXPLAIN ANALYZE SELECT count ( consulta_diario.id_diario ) FROM turma.consulta_diario LEFT JOIN turma.turma_cancelamento ON consulta_diario.id_turma = turma_cancelamento.id_turma WHERE turma_cancelamento.id_turma IS NULL AND consulta_diario.id_unidade_curricular_tipo IN ( 1, 6, 7, 8 ) AND ( 8365 = ANY ( consulta_diario.id_colaborador_coordenadores ) OR 2252 = ANY ( consulta_diario.id_docentes ) ) AND consulta_diario.id_unidade_execucao IN ( 33, 33, 46, 46, 53, 53 ) AND consulta_diario.situacao_diario LIKE 'Em Andamento' LIMIT '2'; server 9.1: http://explain.depesz.com/s/qfC server 9.2: http://explain.depesz.com/s/mh2 No change in RES memory after this one (stayed at 6.5GB). query 3: EXPLAIN ANALYZE WITH justificativas AS ( SELECT justificativa_falta_aula.id_matricula, justificativa_falta_aula.id_diario, justificativa_falta_aula.id_aula, justificativa_falta_aula.id_evento FROM turma.presenca JOIN recurso.evento ON evento.id_evento = presenca.id_evento LEFT JOIN matricula.justificativa_falta_aula JOIN matricula.justificativa_falta ON justificativa_falta.id_justificativa_falta = justificativa_falta_aula.id_justificativa_falta JOIN matricula.justificativa_falta_tipo ON justificativa_falta_tipo.id_justificativa_falta_tipo = justificativa_falta.id_justificativa_falta_tipo LEFT JOIN matricula.parecer ON parecer.id_parecer = justificativa_falta.id_parecer ON justificativa_falta_tipo.tipo = 'Abono' AND justificativa_falta_aula.id_matricula = presenca.id_matricula AND justificativa_falta_aula.id_diario = presenca.id_diario AND justificativa_falta_aula.id_aula = presenca.id_aula AND justificativa_falta_aula.id_evento = presenca.id_evento AND ( NOT justificativa_falta.encaminhar OR parecer.aceito ) WHERE justificativa_falta_aula.id_matricula = 147124 ) SELECT id_diario, to_char ( ( contagem.carga_uc - COALESCE ( contagem.carga_ausencias, 0 ) ) / contagem.carga_uc * 100, '990D99' ) AS frequencia FROM ( SELECT estudante.id_matricula, estudante.id_diario, extract ( EPOCH FROM SUM ( evento.termino - evento.inicio ) ) AS carga_uc, extract ( EPOCH FROM SUM ( CASE WHEN aula_confirmacao.confirmada AND evento.inicio <= CURRENT_DATE AND NOT presenca.presente AND justificativas.id_evento IS NULL THEN evento.termino - evento.inicio END ) ) AS carga_ausencias FROM turma.estudante JOIN turma.presenca ON presenca.id_diario = estudante.id_diario AND presenca.id_matricula = estudante.id_matricula JOIN recurso.evento ON evento.id_evento = presenca.id_evento LEFT JOIN turma.aula_confirmacao ON aula_confirmacao.id_evento = presenca.id_evento AND aula_confirmacao.id_aula = presenca.id_aula LEFT JOIN matricula.matricula_cancelamento ON matricula_cancelamento.id_matricula = estudante.id_matricula LEFT JOIN justificativas ON justificativas.id_matricula = presenca.id_matricula AND justificativas.id_diario = presenca.id_diario AND justificativas.id_aula = presenca.id_aula AND justificativas.id_evento = presenca.id_evento WHERE matricula_cancelamento.id_matricula IS NULL AND estudante.id_matricula = 147124 GROUP BY estudante.id_matricula, estudante.id_diario ) AS contagem; server 9.1: http://explain.depesz.com/s/jD4 server 9.2: http://explain.depesz.com/s/hV9 100MB more in RES memory after this one. Thanks, -- Daniel Cristian Cruz クルズ クリスチアン ダニエル