hi, I'm sorry for not posting this first.
The server is the following and is being used exclusively for this PostgreSQL instance: PostgreSQL 8.4.2 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.2.real (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu4), 64-bit Amazon EC2 Large Instance, 7.5GB memory, 64-bit This is what is set in my postgresql.conf file: max_connections = 100 ssl = true shared_buffers = 24MB ANALYZE VERBOSE EVENTS; INFO: analyzing "public.events" INFO: "events": scanned 30000 of 211312 pages, containing 1725088 live rows and 0 dead rows; 30000 rows in sample, 12151060 estimated total rows Updating statistics did not effect the results -- it's still doing full table scans (I had run statistics as well before posting here as well so this was expected). thank you On Wed, Jun 2, 2010 at 8:49 PM, Bob Lunney <bob_lun...@yahoo.com> wrote: > Jori, > > What is the PostgreSQL > version/shared_buffers/work_mem/effective_cache_size/default_statistics_target? > Are the statistics for the table up to date? (Run analyze verbose > <tablename> to update them.) Table and index structure would be nice to > know, too. > > If all else fails you can set enable_seqscan = off for the session, but > that is a Big Hammer for what is probably a smaller problem. > > Bob Lunney > > --- On *Wed, 6/2/10, Jori Jovanovich <j...@dimensiology.com>* wrote: > > > From: Jori Jovanovich <j...@dimensiology.com> > Subject: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT > present > To: pgsql-performance@postgresql.org > Date: Wednesday, June 2, 2010, 4:28 PM > > > hi, > > I have a problem space where the main goal is to search backward in time > for events. Time can go back very far into the past, and so the > table can get quite large. However, the vast majority of queries are all > satisfied by relatively recent data. I have an index on the row creation > date and I would like almost all of my queries to have a query plan looking > something like: > > Limit ... > -> Index Scan Backward using server_timestamp_idx on events > (cost=0.00..623055.91 rows=8695 width=177) > ... > > However, PostgreSQL frequently tries to do a full table scan. Often what > controls whether a scan is performed or not is dependent on the size of the > LIMIT and how detailed the WHERE clause is. In practice, the scan is always > the wrong answer for my use cases (where "always" is defined to be >99.9%). > > Some examples: > > (1) A sample query that devolves to a full table scan > > EXPLAIN > SELECT events.id, events.client_duration, events.message, > events.created_by, events.source, events.type, events.event, > events.environment, > events.server_timestamp, events.session_id, events.reference, > events.client_uuid > FROM events > WHERE client_uuid ~* E'^foo bar so what' > ORDER BY server_timestamp DESC > LIMIT 20; > QUERY PLAN (BAD!) > -------------------------------------------------------------------------- > Limit (cost=363278.56..363278.61 rows=20 width=177) > -> Sort (cost=363278.56..363278.62 rows=24 width=177) > Sort Key: server_timestamp > -> Seq Scan on events (cost=0.00..363278.01 rows=24 width=177) > Filter: (client_uuid ~* '^foo bar so what'::text) > > > (2) Making the query faster by making the string match LESS specific (odd, > seems like it should be MORE) > > EXPLAIN > SELECT events.id, events.client_duration, events.message, > events.created_by, events.source, events.type, events.event, > events.environment, > events.server_timestamp, events.session_id, events.reference, > events.client_uuid > FROM events > WHERE client_uuid ~* E'^foo' > ORDER BY server_timestamp DESC > LIMIT 20; > QUERY PLAN (GOOD!) > > > ------------------------------------------------------------------------------------------------------------ > Limit (cost=0.00..1433.14 rows=20 width=177) > -> Index Scan Backward using server_timestamp_idx on events > (cost=0.00..623055.91 rows=8695 width=177) > Filter: (client_uuid ~* '^foo'::text) > > > (3) Alternatively making the query faster by using a smaller limit > > EXPLAIN > SELECT events.id, events.client_duration, events.message, > events.created_by, events.source, events.type, events.event, > events.environment, > events.server_timestamp, events.session_id, events.reference, > events.client_uuid > FROM events > WHERE client_uuid ~* E'^foo bar so what' > ORDER BY server_timestamp DESC > LIMIT 10; > QUERY PLAN (GOOD!) > > > ---------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..259606.63 rows=10 width=177) > -> Index Scan Backward using server_timestamp_idx on events > (cost=0.00..623055.91 rows=24 width=177) > Filter: (client_uuid ~* '^foo bar so what'::text) > > > I find myself wishing I could just put a SQL HINT on the query to force the > index to be used but I understand that HINTs are considered harmful and are > therefore not provided for PostgreSQL, so what is the recommended way to > solve this? > > thank you very much > > On Thu, Jun 3, 2010 at 5:15 AM, Matthew Wakeling <matt...@flymine.org> wrote: > On Wed, 2 Jun 2010, Jori Jovanovich wrote: > >> (2) Making the query faster by making the string match LESS specific (odd, >> seems like it should be MORE) >> > > No, that's the way round it should be. The LIMIT changes it all. Consider > if you have a huge table, and half of the entries match your WHERE clause. > To fetch the ORDER BY ... LIMIT 20 using an index scan would involve > accessing only on average 40 entries from the table referenced by the index. > Therefore, the index is quick. However, consider a huge table that only has > twenty matching entries. The index scan would need to touch every single row > in the table to return the matching rows, so a sequential scan, filter, and > sort would be much faster. Of course, if you had an index capable of > answering the WHERE clause, that would be even better for that case. > Okay, this makes sense, thank you -- I was thinking about it backwards.