I'm running pgBadger over the log, and will get some queries to explain
analyze them.

The 9.2 cluster is running in the same server as the production, so I will
try to compare some critical explains and publish on explain.depesz.com.

2013/4/21 Adrian Klaver <adrian.kla...@gmail.com>

> On 04/21/2013 09:29 AM, Daniel Cristian Cruz wrote:
>
>>
>>
>>
>> 2013/4/21 Adrian Klaver <adrian.kla...@gmail.com
>> <mailto:adrian.klaver@gmail.**com <adrian.kla...@gmail.com>>>
>>
>
>
>
>>     1)
>>     Major upgrade from 9.1.4 to 9.2.4.
>>     Used pg_upgrade
>>     Tested on VM with 9.2.4 and no problems.
>>     Same machine used for production server 9.1.4 and 9.2.4
>>     When complex queries where run on production server under 9.2.4
>>     memory usage climbed out of control.
>>
>>     Unanswered questions:
>>
>>     a) Data set sizes between test and production machines, how do they
>>     differ?
>>
>>
>> It's the same on both; we do a dump/restore every day to the development
>> / issue team work.
>>
>
> Which begs the question, what is different about your test setup that
> makes it not act up?
>

test environment:

virtual server
8 cores
12 GB RAM
4GB SWAP

max_connections = 200
shared_buffers = 800MB
temp_buffers = 32MB
work_mem = 256MB
maintenance_work_mem = 768MB
(no max_stack_depth set)
shared_preload_libraries =
'$libdir/plpgsql,$libdir/plpython2,$libdir/pgxml,$libdir/pg_stat_statements'
wal_level = minimal
checkpoint_segments = 15
archive_mode = off
max_wal_senders = 0
(no effective_cache_size set)
constraint_exclusion = partition
log_min_duration_statement = 1000
(no log_temp_files set)
(no statement_timeout set)
max_locks_per_transaction = 1024

production:

true server
24 cores
96GB RAM
50GB SWAP

max_connections = 1000
shared_buffers = 6GB
temp_buffers = 24MB
work_mem = 24MB
maintenance_work_mem = 128MB
max_stack_depth = 8MB
(no shared_preload_libraries, disabled after problems because
pg_stat_statements was a new module)
wal_level = hot_standby
checkpoint_segments = 20
archive_mode = on
(archive_command set)
max_wal_senders = 1
effective_cache_size = 32GB
(no constraint_exclusion set)
log_min_duration_statement = 5000
log_temp_files = 0
statement_timeout = 300000
(no max_locks_per_transaction set)


> We know that the test servers are running on VMs with fewer resources than
> the production server.
>
> So:
>
> Are the VMs running the same OS and OS version as the production server?
>

test: Red Hat Enterprise Linux Server release 5.5, Linux
2.6.18-194.26.1.el5 #1 SMP Fri Oct 29 14:21:16 EDT 2010 x86_64 x86_64
x86_64 GNU/Linux
production: CentOS release 5.5, Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5
17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux


>
> What are 'hardware differences' between the test VMs and the physical
> server?
>

above.


>
> Are the Postgres configurations different for the test vs production
> servers?
>

Yes, some of them, shown above.


> I would guess the usage pattern is different, but in what way?
>    Number of connections/sessions?
>

300 connections in production, 50 in test.


>    INSERT/UPDATE/DELETE pattern?
>

test: just test cases, development cases and issue cases
production: for 1 minute (00:01:00.076343), 583 inserts, 306 updates and 13
deletes and 3300 transactions (xacts_commits)



>    Client software using the database?
>
>
Sites using PHP with and without connection pool and with and without
Doctrine; a huge system with Java and Hibernate using the JBoss pooler,
Java aplication is the main user.


>
>
>
>>
>>     b) What are the EXPLAIN/ANALYZE results for a query on 9.1.4, 9.2.4
>>     test and 9.2.4 production?
>> Since there is no single query causing  the problem, I don't know if it
>> could help
>>
>
> For a lack of anything else pick one and try it on the various servers to
> see if something stands out.


Yes , here it is:

query1:
EXPLAIN ANALYZE SELECT ced.id_evento, ced.inicio, ced.termino, ced.evento,
ced.id_eventos IS NOT NULL AS aula_dividida, ac.titulo, ced.id_tipo_evento,
ced.tipo_evento, ac.media_referencia, p.nome, ef.nome AS nomeEspacoFisico,
( SELECT count ( pre2.presente ) > 0 FROM turma.presenca pre2 WHERE
pre2.id_aula = ac.id_aula AND pre2.id_evento = ac.id_evento AND
pre2.id_diario = '64469' ) AS presenca, ced.id_aula FROM
recurso.consulta_evento_diario ced LEFT JOIN recurso.evento e USING (
id_evento ) LEFT JOIN recurso.espaco_fisico ef USING ( id_espaco_fisico )
LEFT JOIN turma.aula_calendario ac USING ( id_aula, id_evento ) LEFT JOIN
recurso.evento_participante ep USING ( id_evento ) LEFT JOIN senai.pessoa p
USING ( id_pessoa ) WHERE id_diario = '64469' AND ced.id_evento NOT IN (
SELECT ec.id_evento_sobreposto FROM recurso.evento_conflito ec WHERE
ec.id_evento_sobreposto = ced.id_evento AND ec.ignorado IS NULL ) AND
ced.inicio BETWEEN '2013-04-14 00:00:00' AND '2013-04-20 23:59:59.999999'
ORDER BY inicio;

server 9.1:
http://explain.depesz.com/s/fmM

server 9.2:
http://explain.depesz.com/s/wXm

After run this one on server 9.2, RES memory reached 6.5GB, VIRT 15GB.

query 2:
EXPLAIN ANALYZE SELECT count ( consulta_diario.id_diario ) FROM
turma.consulta_diario LEFT JOIN turma.turma_cancelamento ON
consulta_diario.id_turma = turma_cancelamento.id_turma WHERE
turma_cancelamento.id_turma IS NULL AND
consulta_diario.id_unidade_curricular_tipo IN ( 1, 6, 7, 8 ) AND ( 8365 =
ANY ( consulta_diario.id_colaborador_coordenadores ) OR 2252 = ANY (
consulta_diario.id_docentes ) ) AND consulta_diario.id_unidade_execucao IN
( 33, 33, 46, 46, 53, 53 ) AND consulta_diario.situacao_diario LIKE 'Em
Andamento' LIMIT '2';

server 9.1:
http://explain.depesz.com/s/qfC

server 9.2:
http://explain.depesz.com/s/mh2

No change in RES memory after this one (stayed at 6.5GB).

query 3:
EXPLAIN ANALYZE WITH justificativas AS ( SELECT
justificativa_falta_aula.id_matricula, justificativa_falta_aula.id_diario,
justificativa_falta_aula.id_aula, justificativa_falta_aula.id_evento FROM
turma.presenca JOIN recurso.evento ON evento.id_evento = presenca.id_evento
LEFT JOIN matricula.justificativa_falta_aula JOIN
matricula.justificativa_falta ON justificativa_falta.id_justificativa_falta
= justificativa_falta_aula.id_justificativa_falta JOIN
matricula.justificativa_falta_tipo ON
justificativa_falta_tipo.id_justificativa_falta_tipo =
justificativa_falta.id_justificativa_falta_tipo LEFT JOIN matricula.parecer
ON parecer.id_parecer = justificativa_falta.id_parecer ON
justificativa_falta_tipo.tipo = 'Abono' AND
justificativa_falta_aula.id_matricula = presenca.id_matricula AND
justificativa_falta_aula.id_diario = presenca.id_diario AND
justificativa_falta_aula.id_aula = presenca.id_aula AND
justificativa_falta_aula.id_evento = presenca.id_evento AND ( NOT
justificativa_falta.encaminhar OR parecer.aceito ) WHERE
justificativa_falta_aula.id_matricula = 147124 ) SELECT id_diario, to_char
( ( contagem.carga_uc - COALESCE ( contagem.carga_ausencias, 0 ) ) /
contagem.carga_uc * 100, '990D99' ) AS frequencia FROM ( SELECT
estudante.id_matricula, estudante.id_diario, extract ( EPOCH FROM SUM (
evento.termino - evento.inicio ) ) AS carga_uc, extract ( EPOCH FROM SUM (
CASE WHEN aula_confirmacao.confirmada AND evento.inicio <= CURRENT_DATE AND
NOT presenca.presente AND justificativas.id_evento IS NULL THEN
evento.termino - evento.inicio END ) ) AS carga_ausencias FROM
turma.estudante JOIN turma.presenca ON presenca.id_diario =
estudante.id_diario AND presenca.id_matricula = estudante.id_matricula JOIN
recurso.evento ON evento.id_evento = presenca.id_evento LEFT JOIN
turma.aula_confirmacao ON aula_confirmacao.id_evento = presenca.id_evento
AND aula_confirmacao.id_aula = presenca.id_aula LEFT JOIN
matricula.matricula_cancelamento ON matricula_cancelamento.id_matricula =
estudante.id_matricula LEFT JOIN justificativas ON
justificativas.id_matricula = presenca.id_matricula AND
justificativas.id_diario = presenca.id_diario AND justificativas.id_aula =
presenca.id_aula AND justificativas.id_evento = presenca.id_evento WHERE
matricula_cancelamento.id_matricula IS NULL AND estudante.id_matricula =
147124 GROUP BY estudante.id_matricula, estudante.id_diario ) AS contagem;

server 9.1:
http://explain.depesz.com/s/jD4

server 9.2:
http://explain.depesz.com/s/hV9

100MB more in RES memory after this one.

Thanks,
-- 
Daniel Cristian Cruz
クルズ クリスチアン ダニエル

Reply via email to