Re: [GENERAL] Bad plan on a huge table query

Daniel Cristian Cruz Thu, 21 Mar 2013 12:38:52 -0700

Ooops, no query... Now it goes (Jeff, types in each line):

SELECT
 aluno_mensal.id_matricula, --integer
 aluno_mensal.id_turma, --integer
 aluno_mensal.turma, --text
 aluno_mensal.id_produto_educacao, --integer
 aluno_mensal.produto_educacao, --text
 aluno_mensal.unidade, --text
 aluno_mensal.unidade_execucao, --text
 aluno_mensal.modalidade, --text
 aluno_mensal.id_pessoa, --integer
 aluno_mensal.nome_pessoa, --text
 presenca.id_diario, --integer
 aula_confirmacao.inicio::date AS data_aula, --timestamp to date
 presenca.justificativa_falta, --text
 SUM(aula_confirmacao.termino - aula_confirmacao.inicio) AS carga_faltas,
--interval
 mensal.ano AS ano_apuracao, --integer
 mensal.mes AS mes_apuracao --integer
FROM indicadores.aluno_mensal
JOIN indicadores.mensal
 ON mensal.id_mensal = aluno_mensal.id_mensal
JOIN turma.presenca
 ON presenca.id_matricula = aluno_mensal.id_matricula
JOIN turma.aula_confirmacao
 ON aula_confirmacao.id_evento = presenca.id_evento
JOIN turma.estudante_periodo
 ON
  estudante_periodo.id_matricula = presenca.id_matricula AND
  estudante_periodo.id_diario = presenca.id_diario AND
  aula_confirmacao.inicio::date BETWEEN estudante_periodo.inicio AND
estudante_periodo.termino -- timestamp, date, date
WHERE
 presenca.inicio::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND
CURRENT_TIMESTAMP AND -- timestamp
 NOT presenca.presente AND --boolean
 mensal.ano = EXTRACT(YEAR FROM CURRENT_TIMESTAMP) AND -- integer
 aula_confirmacao.inicio::DATE BETWEEN DATE_TRUNC('YEAR',
CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP AND --timestamp to date
 aula_confirmacao.confirmada AND -- boolean
 aluno_mensal.id_medicao = 7
GROUP BY
 aluno_mensal.id_matricula,
 aluno_mensal.id_turma,
 aluno_mensal.turma,
 aluno_mensal.id_produto_educacao,
 aluno_mensal.produto_educacao,
 aluno_mensal.unidade,
 aluno_mensal.unidade_execucao,
 aluno_mensal.modalidade,
 aluno_mensal.id_pessoa,
 aluno_mensal.nome_pessoa,
 presenca.id_diario,
 aula_confirmacao.inicio::date,
 presenca.justificativa_falta,
 mensal.ano,
 mensal.mes;




2013/3/21 Daniel Cristian Cruz <[email protected]>

> 2013/3/21 Alban Hertroys <[email protected]>
>
>> On Mar 20, 2013, at 22:36, Daniel Cristian Cruz <[email protected]>
>> wrote:
>>
>> Hi,
>>
>> I'm trying to figure out why does the planner found 1 row estimate using
>> nested loops over a big table. There is no return from it:
>>
>> http://explain.depesz.com/s/GRs
>>
>>
>> That plan contains no actual statistics, which makes it difficult to say
>> anything about it. And you didn't provide any info on table definitions or
>> indexes whatsoever, we have to put that together from the generated query
>> plans. Not great...
>>
>
> My bad... I guess the plan could do it. And now I figured out that I lost
> the first query... Now the query looks like this:
>
>
>
>
>> It returns if disable nested loops, but the plan still poor:
>>
>> http://explain.depesz.com/s/fMY
>>
>>
>> You could probably gain some here by adding an index on
>> aluno_mensal.id_medicao. In step 14 the lack thereof causes a seqscan over
>> more than a million rows.
>>
>
> There is already an index on id_medicao. It used a hashjoin because I
> disable mergejoin which uses the index, instead there is no return.
>
>
>> What I also find a bit peculiar is that the filter in step 7 appears to
>> apply a function (date_part(text, date)) on every row in that heap. Do you
>> perhaps have a functional index on that table that makes that operation
>> efficient?
>>
>
> Yes, tried to improve performance creating a index on inicio using
> CAST(inicio TO DATE). The real filter here is aula_confirmacao.inicio::date
> BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP
> (filtering all rows from year's beginning until now).
>
>
>> Besides, now() results in a timestamp, which in this query needs to be
>> converted to date; it's perhaps better to use CURRENT_DATE there, although
>> the benefits are probably immeasurable since it only needs to be calculated
>> once for all rows it's compared against.
>>
>
> DATE_TRUNC expect a text and a timestamp.
>
>
>>
>> I'm using PostgreSQL 9.2.3, default_statistics_target on 1000.
>>
>> I can't remember what to make PostgreSQL sees a better estimate in the
>> scan of aula_confirmacao and the join with presenca. I got rusty after a
>> long time just doing modeling.
>>
>> Does someone has some idea on that?
>>
>>
>> Are you asking about vacuum? You're definitely not vacuuming enough, your
>> statistics and your actual numbers of rows differ by enormous amounts
>> (that's why depesz colours them red).
>>
>
> autovacuum is running on production and the develop database. This is
> happening at develop database, fresh restore.
>
>
>> Are you using autovacuum? If so, you probably need to tune it more
>> aggressively. For the short term, running an ANALYSE on those tables should
>> at least get you more accurate query plans.
>>
>
> I've done it; with default_statistics_target on 1000, 100 and 200 (left it
> on 200, which was production config too).
>
> Thank you and sorry about the broken english, there was a long time since
> the last time I wrote...
>
> --
> Daniel Cristian Cruz
> クルズ クリスチアン ダニエル
>



-- 
Daniel Cristian Cruz
クルズ クリスチアン ダニエル

Re: [GENERAL] Bad plan on a huge table query

Reply via email to