Aha!  set hashjoin=off did the trick.
The PG version is: 8.0.3

NB: I removed that redundant "DISTINCT" after the SELECT.

EXPLAIN ANALYZE select userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ from 
user_url_tag userurltag0_, user_url userurl1_ where (((userurl1_.user_id=1 
))AND((userurltag0_.user_url_id=userurl1_.id ))) group by  userurltag0_.tag 
order by  count(*)DESC;
                                                                                
   QUERY PLAN 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=155766.79..155774.81 rows=3207 width=10) (actual 
time=2387.756..2396.578 rows=2546 loops=1)
   Sort Key: count(*)
   ->  HashAggregate  (cost=155572.02..155580.03 rows=3207 width=10) (actual 
time=2365.643..2376.626 rows=2546 loops=1)
         ->  Nested Loop  (cost=0.00..155552.68 rows=3867 width=10) (actual 
time=0.135..2222.028 rows=8544 loops=1)
               ->  Index Scan using ix_user_url_user_id_url_id on user_url 
userurl1_  (cost=0.00..2798.12 rows=963 width=4) (actual time=0.067..9.744 
rows=1666 loops=1)
                     Index Cond: (user_id = 1)
               ->  Index Scan using ix_user_url_tag_user_url_id on user_url_tag 
userurltag0_  (cost=0.00..157.34 rows=103 width=14) (actual time=1.223..1.281 
rows=5 loops=1666)
                     Index Cond: (userurltag0_.user_url_id = "outer".id)
 Total runtime: 2405.691 ms
(9 rows)


Are you still interested in other "its second-choice join type"?  If you are, 
please tell me what join types those are, this is a bit beyond me. :(

Is there a way to force PG to use the index automatically?  This query is 
executed from something called Hibernate, and I'm not sure if that will let me 
set enable_hashjoin=off through its API...

Thanks,
Otis


----- Original Message ----
From: Tom Lane <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Sent: Wednesday, May 10, 2006 8:27:01 PM
Subject: Re: [SQL] Help with a seq scan on multi-million row table 

<[EMAIL PROTECTED]> writes:
>                ->  Hash Join  (cost=2797.65..140758.50 rows=3790 width=10) 
> (actual time=248.530..380635.132 rows=8544 loops=1)  
>                      Hash Cond: ("outer".user_url_id = "inner".id)  
>                      ->  Seq Scan on user_url_tag userurltag0_  
> (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630 
> rows=6259553 loops=1)  
>                      ->  Hash  (cost=2795.24..2795.24 rows=962 width=4) 
> (actual time=199.840..199.840 rows=0 loops=1)  
>                            ->  Index Scan using ix_user_url_user_id_url_id on 
> user_url userurl1_  (cost=0.00..2795.24 rows=962 width=4) (actual 
> time=0.048..193.707 rows=1666 loops=1)  
>                                  Index Cond: (user_id = 1)  

Hm, I'm not sure why it's choosing that join plan.  A nestloop indexscan
wouldn't be terribly cheap, but just counting on my fingers it seems
like it ought to come in at less than 100000 cost units.  What do you
get if you set enable_hashjoin off?  (Then try disabling its
second-choice join type too --- I'm interested to see EXPLAIN ANALYZE
output for all three join types.)

What PG version is this exactly?

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend




---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to