Re: [PERFORM] Why should such a simple query over indexed columns be so slow?

Alessandro Gagliardi Mon, 30 Jan 2012 12:38:21 -0800

Well that was a *lot* faster:

"HashAggregate  (cost=156301.82..156301.83 rows=2 width=26) (actual
time=2692.806..2692.807 rows=2 loops=1)"
"  ->  Bitmap Heap Scan on blocks  (cost=14810.54..155828.95 rows=472871
width=26) (actual time=289.828..1593.893 rows=575186 loops=1)"
"        Recheck Cond: (created > '2012-01-29 00:00:00+00'::timestamp with
time zone)"
"        Filter: (shared IS FALSE)"
"        ->  Bitmap Index Scan on blocks_created_idx  (cost=0.00..14786.89
rows=550404 width=0) (actual time=277.407..277.407 rows=706663 loops=1)"
"              Index Cond: (created > '2012-01-29 00:00:00+00'::timestamp
with time zone)"
"Total runtime: 2693.107 ms"


To answer your (non-)question about Heroku, it's a cloud service, so I
don't host PostgreSQL myself. I'm not sure how much I can mess with things
like GUC since I don't even have access to the "postgres" database on the
server. I am a long time SQL user but new to Postgres so I welcome
suggestions on where to start with that sort of thing.
Setting enable_seqscan=false made a huge difference, so I think I'll start
there.

Thank you very much!
-Alessandro

On Mon, Jan 30, 2012 at 11:24 AM, Claudio Freire <[email protected]>wrote:

> On Mon, Jan 30, 2012 at 4:13 PM, Alessandro Gagliardi
> <[email protected]> wrote:
> > So, here's the query:
> >
> > SELECT private, COUNT(block_id) FROM blocks WHERE created > 'yesterday'
> AND
> > shared IS FALSE GROUP BY private
> >
> > What confuses me is that though this is a largish table (millions of
> rows)
> > with constant writes, the query is over indexed columns of types
> timestamp
> > and boolean so I would expect it to be very fast. The clause where
> created >
> > 'yesterday' is there mostly to speed it up, but apparently it doesn't
> help
> > much.
>
> The number of rows touched is ~0.5M, and is correctly estimated, which
> would lead me to believe PG estimates the index plan to be slower.
>
> You could try by executing first "set enable_seqscan=false;" and then
> your query with explain analyze again. You'll probably get an index
> scan, and you'll see both how it performs and how PG thought it would
> perform. Any mismatch between the two probably means you'll have to
> change the planner tunables (the x_tuple_cost ones) to better match
> your hardware.
>
>
> > As for Hardware: I'm using Heroku's "Ronin" setup which involves 1.7 GB
> > Cache. Beyond that I don't really know.
> snip
> > As for GUC Settings: Again, I don't know what this is. Whatever Heroku
> > defaults to is what I'm using.
>
> And there's your problem. Without knowing/understanding those, you
> won't get anywhere. I don't know what Heroku is, but you should find
> out both hardware details and PG configuration details.
>
> > As for Maintenance Setup: I let Heroku handle that, so I again, I don't
> > really know. FWIW though, vacuuming should not really be an issue (as I
> > understand it) since I don't really do any updates or deletions. It's
> pretty
> > much all inserts and selects.
>
> Maintainance also includes analyzing the table, to gather stats that
> feed the optimizer, and it's very important to keep the stats
> accurate. You can do it manually - just perform an ANALYZE. However,
> the plan doesn't show any serious mismatch between expected and actual
> rowcounts, which suggests stats aren't your problem.
>

Re: [PERFORM] Why should such a simple query over indexed columns be so slow?

Reply via email to