Hi,

I've the problem my database is not using the 'right' queryplan in all
cases. Is there a way I can force that and/or how should I tuned the
table statistics? 

I'm doing a rsyslog database in PostgreSQL with millions of records
(firewall logging). The db scheme is the so called 'MonitorWare' scheme,
to wich I added two extra indexes.

syslog=# select version();
                                                     version                    
                                  
--------------------------------------------------------------------------------
----------------------------------
 PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20
080704 (Red Hat 4.1.2-48), 64-bit


syslog=# \d systemevents
                                         Table "public.systemevents"
       Column       |            Type             |                         Modi
fiers                         
--------------------+-----------------------------+-----------------------------
------------------------------
 id                 | integer                     | not null default nextval('sy
stemevents_id_seq'::regclass)
 customerid         | bigint                      | 
 receivedat         | timestamp without time zone | 
 devicereportedtime | timestamp without time zone | 
 facility           | smallint                    | 
 priority           | smallint                    | 
 fromhost           | character varying(60)       | 
 message            | text                        | 
 ntseverity         | integer                     | 
 importance         | integer                     | 
 eventsource        | character varying(60)       | 
 eventuser          | character varying(60)       | 
 eventcategory      | integer                     | 
 eventid            | integer                     | 
 eventbinarydata    | text                        | 
 maxavailable       | integer                     | 
 currusage          | integer                     | 
 minusage           | integer                     | 
 maxusage           | integer                     | 
 infounitid         | integer                     | 
 syslogtag          | character varying(60)       | 
 eventlogtype       | character varying(60)       | 
 genericfilename    | character varying(60)       | 
 systemid           | integer                     | 
Indexes:
    "systemevents_pkey" PRIMARY KEY, btree (id)
    "fromhost_idx" btree (fromhost)
    "msgs_idx" gin (to_tsvector('english'::regconfig, message))

The GIN index is to do text searching (via LogAnalyzer).

Now there are two types of query plans:

syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, 
syslogtag, infounitid, message FROM systemevents WHERE (  ( 
to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY 
id DESC LIMIT 100;                                                   QUERY PLAN 
                  
                                 
--------------------------------------------------------------------------------
---------------------------------
 Limit  (cost=0.00..10177.22 rows=100 width=159)
   ->  Index Scan Backward using systemevents_pkey on systemevents  (cost=0.00..
1052934.86 rows=10346 width=159)
         Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.
211.112.9'::text))
(3 rows)

This one is useless (takes very long). However this one:

syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, 
syslogtag, infounitid, message FROM systemevents WHERE (  ( 
to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY 
id DESC LIMIT 500;
                                                    QUERY PLAN                  
                                   
--------------------------------------------------------------------------------
-----------------------------------
 Limit  (cost=40928.89..40930.14 rows=500 width=159)
   ->  Sort  (cost=40928.89..40954.76 rows=10346 width=159)
         Sort Key: id
         ->  Bitmap Heap Scan on systemevents  (cost=2898.06..40413.36 rows=1034
6 width=159)
               Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t
squery('131.211.112.9'::text))
               ->  Bitmap Index Scan on msgs_idx  (cost=0.00..2895.47 rows=10346
 width=0)
                     Index Cond: (to_tsvector('english'::regconfig, message) @@ 
to_tsquery('131.211.112.9'::text))
(7 rows)

works acceptable.

Stats:

syslog=# SELECT relname, relkind, reltuples, relpages
FROM pg_class                                                      WHERE 
relname LIKE 'systemevents%';                                                   
      relname            | relkind |  reltuples  | relpages 
-------------------------------+---------+-------------+----------
 systemevents_pkey             | i       | 2.06915e+06 |    71985
 systemeventsproperties        | r       |           0 |        0
 systemeventsproperties_pkey   | i       |           0 |        1
 systemevents_id_seq           | S       |           1 |        1
 systemeventsproperties_id_seq | S       |           1 |        1
 systemevents                  | r       | 2.06915e+06 |   694826
(6 rows)

syslog=# SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname LIKE 'msg%';
 relname  | relkind |  reltuples  | relpages 
----------+---------+-------------+----------
 msgs_idx | i       | 2.06915e+06 |   128069
(1 row)

How to use the right plan regardless of the 'LIMIT-size'?

Cheers,
-- 
Henk van Lingen, ICT-SC Netwerk & Telefonie,                  (o-      -+
Universiteit Utrecht, Jenalaan 18a, room 0.12                 /\        |
phone: +31-30-2538453                                         v_/_      |
http://henk.vanlingen.net/             http://www.tuxtown.net/netiquette/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to