Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Viviane Lestic
2008/2/5, Ansgar -59cobalt- Wiechers wrote:
> Your planner estimates are way off. Try increasing the statistics target
> for the columns used in this query and re-analyze the tables after doing
> so.

I first set STATISTICS to 1000 for column zoneindex_test and saw no
significant improvement (with a vacuum full analyze in between). Then
I set default_statistics_target to 1000: there is now an improvement,
but the overall time is still way too long... (and the estimated costs
didn't change...)
Here are the results with default_statistics_target set to 1000:

explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test,
q) DESC;

  QUERY PLAN
-
 Sort  (cost=2345.30..2345.32 rows=8 width=327) (actual
time=229913.715..229918.172 rows=7106 loops=1)
   Sort Key: rank(tab_ocr.zoneindex_test, q.q)
   ->  Nested Loop  (cost=80.04..2345.18 rows=8 width=327) (actual
time=28159.626..229892.957 rows=7106 loops=1)
 ->  Nested Loop  (cost=80.04..1465.76 rows=392 width=327)
(actual time=26084.558..130979.395 rows=272414 loops=1)
   ->  Function Scan on q  (cost=0.00..0.01 rows=1
width=32) (actual time=0.006..0.007 rows=1 loops=1)
   ->  Bitmap Heap Scan on tab_ocr  (cost=80.04..1460.85
rows=392 width=295) (actual time=26084.544..130562.220 rows=272414
loops=1)
 Filter: (tab_ocr.zoneindex_test @@ q.q)
 ->  Bitmap Index Scan on zoneindex_test_idx
(cost=0.00..79.94 rows=392 width=0) (actual time=26073.315..26073.315
rows=283606 loops=1)
   Index Cond: (tab_ocr.zoneindex_test @@ q.q)
 ->  Index Scan using tab_chemin_label_index on tab_chemin
(cost=0.00..2.23 rows=1 width=4) (actual time=0.040..0.040 rows=0
loops=272414)
   Index Cond: (tab_ocr.idstruct = tab_chemin.label)
   Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 229922.864 ms


2008/2/5, Guillaume Smet wrote:
> Can you try to replace zoneindex_test @@ q with zoneindex_test @@
> to_tsquery('partir')?

The improvement seems negligible (with default_statistics_target back
to 10, its default value):
explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ to_tsquery('partir') ORDER BY
rank(zoneindex_test, q) DESC;

  QUERY PLAN
-
 Sort  (cost=4358.91..4358.95 rows=16 width=308) (actual
time=266489.667..266494.132 rows=7106 loops=1)
   Sort Key: rank(tab_ocr.zoneindex_test, q.q)
   ->  Nested Loop  (cost=80.04..4358.59 rows=16 width=308) (actual
time=42245.881..266469.644 rows=7106 loops=1)
 ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.007..0.008 rows=1 loops=1)
 ->  Nested Loop  (cost=80.04..4358.34 rows=16 width=276)
(actual time=42239.570..178496.761 rows=7106 loops=1)
   ->  Bitmap Heap Scan on tab_ocr  (cost=80.04..1461.83
rows=392 width=276) (actual time=38317.423..174188.779 rows=272414
loops=1)
 Filter: (zoneindex_test @@ '''partir'''::tsquery)
 ->  Bitmap Index Scan on zoneindex_test_idx
(cost=0.00..79.94 rows=392 width=0) (actual time=38289.289..38289.289
rows=283606 loops=1)
   Index Cond: (zoneindex_test @@ '''partir'''::tsquery)
   ->  Index Scan using tab_chemin_label_index on
tab_chemin  (cost=0.00..7.38 rows=1 width=4) (actual time=0.014..0.014
rows=0 loops=272414)
 Index Cond: (tab_ocr.idstruct = tab_chemin.label)
 Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 266498.704 ms

> Increasing the statistics for zoneindex_test may
> be a good idea too (see ALTER TABLE ... ALTER COLUMN doc).

I posted the results above.

> I'm surprised you have the word "partir" in so many documents? Do you
> use real data?

I'm using real data. The indexed documents are extracted from
newspapers, and "partir" (and its derivates) is quite a common verb in
the French language, so I'm not that surprised to see it show up in
many documents.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Guillaume Smet
On Feb 5, 2008 12:47 PM, Viviane Lestic <[EMAIL PROTECTED]> wrote:
> Could someone help me analyze this problem?
> I don't manage to see if the problem comes from bad tsearch tuning,
> postgresql configuration, or something else...

Can you try to replace zoneindex_test @@ q with zoneindex_test @@
to_tsquery('partir')? Increasing the statistics for zoneindex_test may
be a good idea too (see ALTER TABLE ... ALTER COLUMN doc).
I'm surprised you have the word "partir" in so many documents? Do you
use real data?

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Ansgar -59cobalt- Wiechers
On 2008-02-05 Viviane Lestic wrote:
> QUERY PLAN
> -
> Sort (cost=2345.54..2345.58 rows=16 width=308) (actual
> time=270638.774..270643.142 rows=7106 loops=1)
> Sort Key: rank(tab_ocr.zoneindex_test, q.q)
> -> Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual
> time=40886.553..270619.730 rows=7106 loops=1)
> -> Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual
> time=38209.193..173932.313 rows=272414 loops=1)
> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual
> time=0.006..0.007 rows=1 loops=1)
> -> Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392
> width=276) (actual time=38209.180..173507.052 rows=272414 loops=1)
> Filter: (tab_ocr.zoneindex_test @@ q.q)
> -> Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392
> width=0) (actual time=38204.261..38204.261 rows=283606 loops=1)
> Index Cond: (tab_ocr.zoneindex_test @@ q.q)
> -> Index Scan using tab_chemin_label_index on tab_chemin
> (cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0
> loops=272414)
> Index Cond: (tab_ocr.idstruct = tab_chemin.label)
> Filter: ((chemin)::text ~~ '%;2;%'::text)
> Total runtime: 270647.946 ms
> **
> 
> Could someone help me analyze this problem?

Your planner estimates are way off. Try increasing the statistics target
for the columns used in this query and re-analyze the tables after doing
so.

Regards
Ansgar Wiechers
-- 
"The Mac OS X kernel should never panic because, when it does, it
seriously inconveniences the user."
--http://developer.apple.com/technotes/tn2004/tn2118.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match