Re: [PERFORM] Planner doing seqscan before indexed join

2007-03-30 Thread Marc Mamin
You may try to change the planner's opinion using sub queries. Something
like:


select * from 
   
   eventactivity,   

   (select * from 
  keyword_incidents, 
  eventmain, 
  eventgeo 
where 
  eventmain.incidentid = keyword_incidents.incidentid 
  and eventgeo.incidentid = keyword_incidents.incidentid 
  and (  recordtext like '%JOSE CHAVEZ%'   )
   )foo
   
 where eventactivity.incidentid = foo.incidentid 
 order by foo.entrydate limit 1;


HTH,

Marc

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dan Harris
Sent: Thursday, March 29, 2007 4:22 AM
To: PostgreSQL Performance
Subject: [PERFORM] Planner doing seqscan before indexed join

8.0.3 - Linux 2.6.18..
Freshly vacuumed and analyzed

This database has been humming along fine for a while now, but I've got
one of those sticky queries that is taking much too long to finish.

After some digging, I've found that the planner is choosing to apply a
necessary seq scan to the table.  Unfortunately, it's scanning the whole
table, when it seems that it could have joined it to a smaller table
first and reduce the amount of rows it would have to scan dramatically (
70 million to about 5,000 ).

The table eventactivity has about 70million rows in it, index on
incidentid
The table keyword_incidents is a temporary table and has incidentid as
its primary key.  It contains
5125 rows. eventmain and eventgeo both have 2.1 million. My hope is that
I can convince the planner to do the
  join to keyword_incidents *first* and then do the seq scan for the
LIKE condition.  Instead, it seems that it's seqscanning the whole 70
million rows first and then doing the join, which takes a lot longer
than I'd like to wait for it.  Or, maybe I'm misreading the explain
output?

Thanks again

-Dan
-
Here's the query:

explain analyze 

select 
   * from 

   keyword_incidents, 

   eventactivity, 

   eventmain, 

   eventgeo 

  where 

   eventmain.incidentid = keyword_incidents.incidentid and 

   eventgeo.incidentid = keyword_incidents.incidentid and 

   eventactivity.incidentid = keyword_incidents.incidentid 

   and (  recordtext like '%JOSE CHAVEZ%'   )
order by eventmain.entrydate limit 1;



---
  Limit  (cost=2388918.07..2388918.08 rows=1 width=455) (actual
time=81771.186..81771.292 rows=26 loops=1)
-  Sort  (cost=2388918.07..2388918.08 rows=1 width=455) (actual
time=81771.180..81771.215 rows=26 loops=1)
  Sort Key: eventmain.entrydate
  -  Nested Loop  (cost=0.00..2388918.06 rows=1 width=455)
(actual time=357.389..81770.982 rows=26 loops=1)
-  Nested Loop  (cost=0.00..2388913.27 rows=1
width=230) (actual time=357.292..81767.385 rows=26 loops=1)
  -  Nested Loop  (cost=0.00..2388909.33 rows=1
width=122) (actual time=357.226..81764.501 rows=26 loops=1)
-  Seq Scan on eventactivity
(cost=0.00..2388874.46 rows=7 width=84) (actual time=357.147..81762.582
rows=27 loops=1)
  Filter: ((recordtext)::text ~~ '%JOSE
CHAVEZ%'::text)
-  Index Scan using keyword_incidentid_pkey
on keyword_incidents  (cost=0.00..4.97 rows=1 width=38) (actual
time=0.034..0.036 rows=1 loops=27)
  Index Cond:
((outer.incidentid)::text = (keyword_incidents.incidentid)::text)
  -  Index Scan using eventgeo_incidentid_idx on
eventgeo  (cost=0.00..3.93 rows=1 width=108) (actual
time=0.076..0.081 rows=1 loops=26)
Index Cond: ((outer.incidentid)::text =
(eventgeo.incidentid)::text)
-  Index Scan using eventmain_incidentid_idx on
eventmain  (cost=0.00..4.78 rows=1 width=225) (actual
time=0.069..0.075 rows=1 loops=26)
  Index Cond: ((outer.incidentid)::text =
(eventmain.incidentid)::text)
  Total runtime: 81771.529 ms
(15 rows)

---(end of broadcast)---
TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Planner doing seqscan before indexed join

2007-03-29 Thread Dave Dutcher
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Dan Harris
 
 After some digging, I've found that the planner is choosing 
 to apply a necessary seq scan to the table.  Unfortunately,
 it's scanning the whole table, when it seems that it could 
 have joined it to a smaller table first and reduce the
 amount of rows it would have to scan dramatically ( 70 
 million to about 5,000 ).
 

Joining will reduce the amount of rows to scan for the filter, but
performing the join is non-trivial.  If postgres is going to join two tables
together without applying any filter first then it will have to do a seqscan
of one of the tables, and if it chooses the table with 5000 rows, then it
will have to do 5000 index scans on a table with 70 million records.  I
don't know which way would be faster. 

I wonder if you could find a way to use an index to do the text filter.
Maybe tsearch2?  I haven't used anything like that myself, maybe someone
else has more input.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Planner doing seqscan before indexed join

2007-03-28 Thread Dan Harris

8.0.3 - Linux 2.6.18..
Freshly vacuumed and analyzed

This database has been humming along fine for a while now, but I've got one of 
those sticky queries that is taking
much too long to finish.

After some digging, I've found that the planner is choosing to apply a 
necessary seq scan to the table.  Unfortunately,
it's scanning the whole table, when it seems that it could have joined it to a 
smaller table first and reduce the
amount of rows it would have to scan dramatically ( 70 million to about 5,000 ).

The table eventactivity has about 70million rows in it, index on incidentid
The table keyword_incidents is a temporary table and has incidentid as its 
primary key.  It contains
5125 rows. eventmain and eventgeo both have 2.1 million. My hope is that I can 
convince the planner to do the
 join to keyword_incidents *first* and then do the seq scan for the LIKE condition.  Instead, it seems that it's seqscanning the 
whole 70 million rows first and then doing the join, which takes a lot longer than I'd like to wait for it.  Or, maybe I'm

misreading the explain output?

Thanks again

-Dan
-
Here's the query:

explain analyze 

select 
  *
from 

  keyword_incidents, 

  eventactivity, 

  eventmain, 

  eventgeo 

 where 

  eventmain.incidentid = keyword_incidents.incidentid and 

  eventgeo.incidentid = keyword_incidents.incidentid and 

  eventactivity.incidentid = keyword_incidents.incidentid 


  and (  recordtext like '%JOSE CHAVEZ%'   )
order by eventmain.entrydate limit 1;

---
 Limit  (cost=2388918.07..2388918.08 rows=1 width=455) (actual 
time=81771.186..81771.292 rows=26 loops=1)
   -  Sort  (cost=2388918.07..2388918.08 rows=1 width=455) (actual 
time=81771.180..81771.215 rows=26 loops=1)
 Sort Key: eventmain.entrydate
 -  Nested Loop  (cost=0.00..2388918.06 rows=1 width=455) (actual 
time=357.389..81770.982 rows=26 loops=1)
   -  Nested Loop  (cost=0.00..2388913.27 rows=1 width=230) 
(actual time=357.292..81767.385 rows=26 loops=1)
 -  Nested Loop  (cost=0.00..2388909.33 rows=1 width=122) 
(actual time=357.226..81764.501 rows=26 loops=1)
   -  Seq Scan on eventactivity  (cost=0.00..2388874.46 rows=7 width=84) (actual time=357.147..81762.582 
rows=27 loops=1)

 Filter: ((recordtext)::text ~~ '%JOSE 
CHAVEZ%'::text)
   -  Index Scan using keyword_incidentid_pkey on keyword_incidents  (cost=0.00..4.97 rows=1 width=38) 
(actual time=0.034..0.036 rows=1 loops=27)

 Index Cond: ((outer.incidentid)::text = 
(keyword_incidents.incidentid)::text)
 -  Index Scan using eventgeo_incidentid_idx on eventgeo  (cost=0.00..3.93 rows=1 width=108) (actual 
time=0.076..0.081 rows=1 loops=26)

   Index Cond: ((outer.incidentid)::text = 
(eventgeo.incidentid)::text)
   -  Index Scan using eventmain_incidentid_idx on eventmain  (cost=0.00..4.78 rows=1 width=225) (actual 
time=0.069..0.075 rows=1 loops=26)

 Index Cond: ((outer.incidentid)::text = 
(eventmain.incidentid)::text)
 Total runtime: 81771.529 ms
(15 rows)

---(end of broadcast)---
TIP 6: explain analyze is your friend