Re: [PERFORM] Query improvement

2011-05-13 Thread Robert Haas
On Mon, May 2, 2011 at 3:58 AM, Claudio Freire klaussfre...@gmail.com wrote:
 Hash joins are very inefficient if they require big temporary files.

Hmm, that's not been my experience.  What have you seen?

I've seen a 64-batch hash join beat out a
nested-loop-with-inner-indexscan, which I never woulda believed,
but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query improvement

2011-05-09 Thread Mark
Thanks for replies. Finally I have used UNION and JOINS, which helped. Mainly
the UNION helped a lot. Now the query takes 1sec max. Thanks a lot. 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378163.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query improvement

2011-05-09 Thread Mark
Thanks for reply both UNION and JOINS helped. Mainly the UNION helped a lot.
Now the query takes 1sec max. Thanks a lot. 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378157.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query improvement

2011-05-09 Thread Mark
Thanks a lot for reply. Finally I have used UNION, but thanks for your help.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378160.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query improvement

2011-05-03 Thread Mark
Here is EXPLAIN ANALYZE:

Limit  (cost=136568.00..136568.25 rows=100 width=185) (actual
time=1952.174..1952.215 rows=100 loops=1)
  -  Sort  (cost=136568.00..137152.26 rows=233703 width=185) (actual
time=1952.172..1952.188 rows=100 loops=1)
Sort Key: ((ts_rank(pc.textvector, to_tsquery('fotbal'::text)) +
(ts_rank(pa.titlevector, to_tsquery('fotbal'::text)) * 10::double
precision)))
Sort Method:  top-N heapsort  Memory: 23kB
-  Hash Join  (cost=61707.99..127636.04 rows=233703 width=185)
(actual time=1046.838..1947.815 rows=3278 loops=1)
  Hash Cond: (re.rev_id = pc.old_id)
  -  Merge Join  (cost=24098.90..71107.48 rows=233703
width=66) (actual time=200.884..859.453 rows=3278 loops=1)
Merge Cond: (pa.page_id = re.rev_page)
-  Merge Semi Join  (cost=24096.98..55665.69
rows=233703 width=66) (actual time=200.843..629.821 rows=3278 loops=1)
  Merge Cond: (pa.page_id =
mediawiki.page.page_id)
  -  Index Scan using page_btree_id on page pa 
(cost=0.00..13155.20 rows=311604 width=62) (actual time=0.027..145.989
rows=311175 loops=1)
  -  Index Scan using page_btree_id on page 
(cost=24096.98..38810.19 rows=233703 width=4) (actual time=200.779..429.219
rows=3278 loops=1)
Filter: ((hashed SubPlan 1) OR (hashed
SubPlan 2))
SubPlan 1
  -  Bitmap Heap Scan on page 
(cost=10.41..900.33 rows=270 width=4) (actual time=0.748..9.845 rows=280
loops=1)
Recheck Cond: (titlevector @@
to_tsquery('fotbal'::text))
-  Bitmap Index Scan on gin_index 
(cost=0.00..10.34 rows=270 width=0) (actual time=0.586..0.586 rows=280
loops=1)
  Index Cond: (titlevector @@
to_tsquery('fotbal'::text))
SubPlan 2
  -  Nested Loop  (cost=1499.29..23192.08
rows=1558 width=4) (actual time=2.032..185.743 rows=3250 loops=1)
-  Nested Loop 
(cost=1499.29..15967.11 rows=1558 width=4) (actual time=1.980..109.491
rows=3250 loops=1)
  -  Bitmap Heap Scan on
pagecontent  (cost=1499.29..6448.12 rows=1558 width=4) (actual
time=1.901..36.583 rows=3250 loops=1)
Recheck Cond:
(textvector @@ to_tsquery('fotbal'::text))
-  Bitmap Index Scan
on gin_index2  (cost=0.00..1498.90 rows=1558 width=0) (actual
time=1.405..1.405 rows=3250 loops=1)
  Index Cond:
(textvector @@ to_tsquery('fotbal'::text))
  -  Index Scan using
page_btree_rev_content_id on revision r  (cost=0.00..6.10 rows=1 width=8)
(actual time=0.020..0.021 rows=1 loops=3250)
Index Cond: (r.rev_id =
pagecontent.old_id)
-  Index Scan using page_btree_id
on page p  (cost=0.00..4.62 rows=1 width=4) (actual time=0.022..0.022 rows=1
loops=3250)
  Index Cond: (p.page_id =
r.rev_page)
-  Index Scan using page_btree_rev_page_id on revision
re  (cost=0.00..11850.52 rows=311604 width=8) (actual time=0.012..166.042
rows=311175 loops=1)
  -  Hash  (cost=27932.04..27932.04 rows=311604 width=127)
(actual time=801.000..801.000 rows=311604 loops=1)
Buckets: 1024  Batches: 64  Memory Usage: 744kB
-  Seq Scan on pagecontent pc  (cost=0.00..27932.04
rows=311604 width=127) (actual time=0.018..465.686 rows=311604 loops=1)
Total runtime: 1952.962 ms


I have tried 
set enable_hashjoin = false;
query
set enable_hashjoin = true; 

but the result have been worst than before. By the way is there a posibility
to create beeter query with same effect?
I have tried more queries, but this has got best performance yet. 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4365717.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query improvement

2011-05-03 Thread Claudio Freire
On Mon, May 2, 2011 at 10:54 PM, Mark marek.bal...@seznam.cz wrote:
 but the result have been worst than before. By the way is there a posibility
 to create beeter query with same effect?
 I have tried more queries, but this has got best performance yet.

Well, this seems to be the worst part:

   (SELECT page_id FROM mediawiki.page WHERE page_id IN
   (SELECT page_id FROM mediawiki.page
WHERE (titlevector @@ (to_tsquery('fotbal'
   OR page_id IN
   (SELECT p.page_id from mediawiki.page p,mediawiki.revision r,
   (SELECT old_id FROM mediawiki.pagecontent
   WHERE (textvector @@ (to_tsquery('fotbal' ss
   WHERE (p.page_id=r.rev_page AND r.rev_id=ss.old_id)))

If you're running a new enough pg (8.4+), you could try using CTEs for that.

I haven't used CTEs much, but I think it goes something like:

WITH someids AS (

   (SELECT page_id FROM mediawiki.page WHERE page_id IN
   (SELECT page_id FROM mediawiki.page
WHERE (titlevector @@ (to_tsquery('fotbal'
   OR page_id IN
   (SELECT p.page_id from mediawiki.page p,mediawiki.revision r,
   (SELECT old_id FROM mediawiki.pagecontent
   WHERE (textvector @@ (to_tsquery('fotbal' ss
   WHERE (p.page_id=r.rev_page AND r.rev_id=ss.old_id)))

)
SELECT pa.page_id, pa.page_title,
ts_rank(pc.textvector,(to_tsquery('fotbal')))+ts_rank(pa.titlevector,(to_tsquery('fotbal')))*10
as totalrank
   from mediawiki.page pa, mediawiki.revision re, mediawiki.pagecontent pc
   WHERE pa.page_id in someids
   AND (pa.page_id=re.rev_page AND re.rev_id=pc.old_id)
   ORDER BY totalrank LIMIT 100;

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query improvement

2011-05-03 Thread Marc Mamin

 On Mon, May 2, 2011 at 10:54 PM, Mark marek.bal...@seznam.cz wrote:
  but the result have been worst than before. By the way is there a
posibility
  to create beeter query with same effect?
  I have tried more queries, but this has got best performance yet.
 
 Well, this seems to be the worst part:
 
(SELECT page_id FROM mediawiki.page WHERE page_id IN
(SELECT page_id FROM mediawiki.page
 WHERE (titlevector @@ (to_tsquery('fotbal'
OR page_id IN
(SELECT p.page_id from mediawiki.page
p,mediawiki.revision r,
(SELECT old_id FROM mediawiki.pagecontent
WHERE (textvector @@ (to_tsquery('fotbal' ss
WHERE (p.page_id=r.rev_page AND r.rev_id=ss.old_id)))
 


'OR' statements often generate complicated plans. You should try to
rewrite your Query with a n UNION clause.
Using explicit joins may also help the planner:

SELECT page_id 
FROM mediawiki.page
WHERE (titlevector @@ (to_tsquery('fotbal')))

UNION 

SELECT p.page_id 
FROM mediawiki.page p 
  JOIN mediawiki.revision r on (p.page_id=r.rev_page)
  JOIN mediawiki.pagecontent ss on (r.rev_id=ss.old_id)
WHERE (ss.textvector @@ (to_tsquery('fotbal')))

HTH,

Marc Mamin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance