Sorry to bother everyone with yet another "my query isn't using an index" problem but I am over my head on this one.. I am open to ways of restructuring this query to perform better.

I have a table, 'ea', with 22 million rows in it. VACUUM ANALYZE has been just run on the table.

This is the result of:

explain analyze
select distinct
        em.incidentid,
        ea.recordtext as retdata,
        eg.long,
        eg.lat
from
        ea, em, eg
where
        em.incidentid = ea.incidentid and
        em.incidentid = eg.incidentid and
        em.entrydate >= '2005-1-1 00:00' and
        em.entrydate <= '2005-5-9 00:00'
        and ea.incidentid in (
                select
                        incidentid
                from
                        ea
                where
                        recordtext like '%RED%'
        )

and ea.incidentid in (
select
incidentid
from
ea
where
recordtext like '%CORVETTE%'
)
and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' ) order by em.entrydate



--------------------- ANALYZE RESULTS ---------------------

Unique (cost=774693.72..774693.76 rows=1 width=159) (actual time=446787.056..446787.342 rows=72 loops=1)
-> Sort (cost=774693.72..774693.72 rows=1 width=159) (actual time=446787.053..446787.075 rows=72 loops=1)
Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
-> Nested Loop (cost=771835.10..774693.71 rows=1 width=159) (actual time=444378.655..446786.746 rows=72 loops=1)
-> Nested Loop (cost=771835.10..774688.81 rows=1 width=148) (actual time=444378.532..446768.381 rows=72 loops=1)
-> Nested Loop IN Join (cost=771835.10..774678.88 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1)
-> Nested Loop (cost=771835.10..774572.05 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1)
-> HashAggregate (cost=771835.10..771835.10 rows=1 width=17) (actual time=444366.702..444368.583 rows=473 loops=1)
-> Seq Scan on ea (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 rows=592 loops=1)
Filter: ((recordtext)::text ~~ '%CORVETTE%'::text)
-> Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473)
Index Cond: ((ea.incidentid)::text = ("outer".incidentid)::text)
Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
-> Index Scan using ea1 on ea (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 loops=1011)
Index Cond: (("outer".incidentid)::text = (ea.incidentid)::text)
Filter: ((recordtext)::text ~~ '%RED%'::text)
-> Index Scan using em_incidentid_idx on em (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701)
Index Cond: (("outer".incidentid)::text = (em.incidentid)::text)
Filter: ((entrydate >= '2005-01-01 00:00:00'::timestamp without time zone) AND (entrydate <= '2005-05-09 00:00:00'::timestamp without time zone))
-> Index Scan using eg_incidentid_idx on eg (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72)
Index Cond: (("outer".incidentid)::text = (eg.incidentid)::text)
Total runtime: 446871.880 ms
(22 rows)



-------------------------
EXPLANATION
-------------------------
The reason for the redundant LIKE clause is that first, I only want those "incidentid"s that contain the words 'RED' and 'CORVETTE'. BUT, those two words may exist across multiple records with the same incidentid. Then, I only want to actually work with the rows that contain one of the words. This query will repeat the same logic for however many keywords are entered by the user. I have investigated text searching options and have not found them to be congruous with my application.


Why is it choosing a sequential scan one part of the query when searching for the words, yet using an index scan for another part of it? Is there a better way to structure the query to give it better hints?

I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM.

Thank you for any advice.

-Dan




---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

Reply via email to