Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present

Jori Jovanovich Thu, 03 Jun 2010 08:32:34 -0700

hi,

I'm sorry for not posting this first.


The server is the following and is being used exclusively for this
PostgreSQL instance:

PostgreSQL 8.4.2 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.2.real (GCC)
4.2.4 (Ubuntu 4.2.4-1ubuntu4), 64-bit
Amazon EC2 Large Instance, 7.5GB memory, 64-bit

This is what is set in my postgresql.conf file:

max_connections = 100
ssl = true
shared_buffers = 24MB

ANALYZE VERBOSE EVENTS;
INFO:  analyzing "public.events"
INFO:  "events": scanned 30000 of 211312 pages, containing 1725088 live rows
and 0 dead rows; 30000 rows in sample, 12151060 estimated total rows

Updating statistics did not effect the results -- it's still doing full
table scans (I had run statistics as well before posting here as well so
this was expected).

thank you

On Wed, Jun 2, 2010 at 8:49 PM, Bob Lunney <bob_lun...@yahoo.com> wrote:

> Jori,
>
> What is the PostgreSQL
> version/shared_buffers/work_mem/effective_cache_size/default_statistics_target?
> Are the statistics for the table up to date?  (Run analyze verbose
> <tablename> to update them.)  Table and index structure would be nice to
> know, too.
>
> If all else fails you can set enable_seqscan = off for the session, but
> that is a Big Hammer for what is probably a smaller problem.
>
> Bob Lunney
>
> --- On *Wed, 6/2/10, Jori Jovanovich <j...@dimensiology.com>* wrote:
>
>
> From: Jori Jovanovich <j...@dimensiology.com>
> Subject: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT
> present
> To: pgsql-performance@postgresql.org
> Date: Wednesday, June 2, 2010, 4:28 PM
>
>
> hi,
>
> I have a problem space where the main goal is to search backward in time
> for events.  Time can go back very far into the past, and so the
> table can get quite large.  However, the vast majority of queries are all
> satisfied by relatively recent data.  I have an index on the row creation
> date and I would like almost all of my queries to have a query plan looking
> something like:
>
>  Limit ...
>    ->  Index Scan Backward using server_timestamp_idx on events
>  (cost=0.00..623055.91 rows=8695 width=177)
>          ...
>
> However, PostgreSQL frequently tries to do a full table scan.  Often what
> controls whether a scan is performed or not is dependent on the size of the
> LIMIT and how detailed the WHERE clause is.  In practice, the scan is always
> the wrong answer for my use cases (where "always" is defined to be >99.9%).
>
> Some examples:
>
> (1) A sample query that devolves to a full table scan
>
>   EXPLAIN
>    SELECT events.id, events.client_duration, events.message,
> events.created_by, events.source, events.type, events.event,
> events.environment,
>           events.server_timestamp, events.session_id, events.reference,
> events.client_uuid
>      FROM events
>     WHERE client_uuid ~* E'^foo bar so what'
>  ORDER BY server_timestamp DESC
>     LIMIT 20;
>                                 QUERY PLAN (BAD!)
> --------------------------------------------------------------------------
>  Limit  (cost=363278.56..363278.61 rows=20 width=177)
>    ->  Sort  (cost=363278.56..363278.62 rows=24 width=177)
>          Sort Key: server_timestamp
>          ->  Seq Scan on events  (cost=0.00..363278.01 rows=24 width=177)
>                Filter: (client_uuid ~* '^foo bar so what'::text)
>
>
> (2) Making the query faster by making the string match LESS specific (odd,
> seems like it should be MORE)
>
>   EXPLAIN
>    SELECT events.id, events.client_duration, events.message,
> events.created_by, events.source, events.type, events.event,
> events.environment,
>           events.server_timestamp, events.session_id, events.reference,
> events.client_uuid
>      FROM events
>     WHERE client_uuid ~* E'^foo'
>  ORDER BY server_timestamp DESC
>     LIMIT 20;
>                                 QUERY PLAN (GOOD!)
>
>
> ------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..1433.14 rows=20 width=177)
>    ->  Index Scan Backward using server_timestamp_idx on events
>  (cost=0.00..623055.91 rows=8695 width=177)
>          Filter: (client_uuid ~* '^foo'::text)
>
>
> (3) Alternatively making the query faster by using a smaller limit
>
>   EXPLAIN
>    SELECT events.id, events.client_duration, events.message,
> events.created_by, events.source, events.type, events.event,
> events.environment,
>           events.server_timestamp, events.session_id, events.reference,
> events.client_uuid
>      FROM events
>     WHERE client_uuid ~* E'^foo bar so what'
>  ORDER BY server_timestamp DESC
>     LIMIT 10;
>                                 QUERY PLAN (GOOD!)
>
>
> ----------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..259606.63 rows=10 width=177)
>    ->  Index Scan Backward using server_timestamp_idx on events
>  (cost=0.00..623055.91 rows=24 width=177)
>          Filter: (client_uuid ~* '^foo bar so what'::text)
>
>
> I find myself wishing I could just put a SQL HINT on the query to force the
> index to be used but I understand that HINTs are considered harmful and are
> therefore not provided for PostgreSQL, so what is the recommended way to
> solve this?
>
> thank you very much
>
>
On Thu, Jun 3, 2010 at 5:15 AM, Matthew Wakeling <matt...@flymine.org>
 wrote:

> On Wed, 2 Jun 2010, Jori Jovanovich wrote:
>
>> (2) Making the query faster by making the string match LESS specific (odd,
>> seems like it should be MORE)
>>
>
> No, that's the way round it should be. The LIMIT changes it all. Consider
> if you have a huge table, and half of the entries match your WHERE clause.
> To fetch the ORDER BY ... LIMIT 20 using an index scan would involve
> accessing only on average 40 entries from the table referenced by the index.
> Therefore, the index is quick. However, consider a huge table that only has
> twenty matching entries. The index scan would need to touch every single row
> in the table to return the matching rows, so a sequential scan, filter, and
> sort would be much faster. Of course, if you had an index capable of
> answering the WHERE clause, that would be even better for that case.
>

Okay, this makes sense, thank you -- I was thinking about it backwards.

Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present

Reply via email to