Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-30 Thread Christopher Kings-Lynne
I have 7.2.4 running on a Sun box, so I tried that too, with similar
results. tsearch just doesn't seem to work very well on this dataset
(or any other large dataset I've tried).
Well, as I've shown - works fine for me...

I strongly suggest you repost your problem report to -hackers, since the 
fact that the tsearch developers haven't chimed in implies to me that 
they don't watch the performance list.

BTW, read this about Gist indexes:

http://www.postgresql.org/docs/current/static/limitations.html

(Note lack of concurrency)

Chris

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-28 Thread Steve Atkins
On Fri, Nov 28, 2003 at 01:18:48PM +0800, Christopher Kings-Lynne wrote:
> 
> >Any thoughts?
> 
> Actually, I ran my tests using tsearch V1.  I wonder if there has been 
> some weird regression between tsearch 1 and 2?

Maybe. tsearch2 doesn't seem production ready in other respects
(untsearch2.sql barfs with 'aggregate stat(tsvector) does not exist'
and the openfts mailing list, where this would be more appropriate,
doesn't appear to exist according to sourceforge).

So, using the same data, modulo a few alter tables, I try tsearch, V1.
It's a little slower than V2, and again runs far faster without an
index than with it. Broken in the same way.

I have 7.2.4 running on a Sun box, so I tried that too, with similar
results. tsearch just doesn't seem to work very well on this dataset
(or any other large dataset I've tried).

Cheers,
  Steve

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-27 Thread Christopher Kings-Lynne
Any thoughts?


Actually, I ran my tests using tsearch V1.  I wonder if there has been 
some weird regression between tsearch 1 and 2?
I also ran my tests on 7.3.4 :(

Chris



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-27 Thread Steve Atkins
On Wed, Nov 26, 2003 at 09:12:30PM -0800, Steve Atkins wrote:
> On Thu, Nov 27, 2003 at 12:41:59PM +0800, Christopher Kings-Lynne wrote:
> > >Does anyone have any metrics on how fast tsearch2 actually is?
> > >
> > >I tried it on a synthetic dataset of a million documents of a hundred
> > >words each and while insertions were impressively fast I gave up on
> > >the search after 10 minutes.
> > >
> > >Broken? Unusable slow? This was on the last 7.4 release candidate.
> > 
> > I just created a 1.1million row dataset by copying one of our 3 row 
> > production tables and just taking out the txtidx column.  Then I 
> > inserted it into itself until it had 1.1 million rows.
> > 
> > Then I created the GiST index - THAT took forever - seriously like 20 
> > mins or half an hour or something.
> > 
> > Now, to find a word:
> > 
> > select * from tsearchtest where ftiidx ## 'curry';
> > Time: 9760.75 ms
> 
> > So, I have no idea why you think it's slow?  Perhaps you forgot the 
> > 'create index using gist' step?
> 
> No, it was indexed.
> 
> Thanks, that was the datapoint I was looking for. It _can_ run fast, so
> I just need to work out what's going on. (It's hard to diagnose a slow
> query when you've no idea whether it's really 'slow').

Looking at it further, something is very broken, possibly with GIST
indices, possibly with tsearch2s use of 'em.

This is on a newly built 7.4 installation, built with 64 bit
datetimes, but completely stock other than that. Stock gcc 3.3.2,
Linux, somewhat elderly 2.4.18 kernel. Running on a 1.5GHz single
processor Athlon with a half gig of RAM. Configuration set to use 20%
of RAM as shared buffers (amongst other settings, this was the last of
a range I tried looking for variation).

Software RAID0 across two 7200RPM SCSI drives, reiserfs (it's a
development box, not a production system). System completely idle
apart from postgresql.

269000 rows, each row having 400 words. Analyzed.

Running the select query given below appears to pause a process trying
to insert into the table completely (locking issue? I/O bandwidth?).

top shows the select below consuming <2% of CPU and iostat shows it reading
~2800 blocks/second from each of the two RAID drives.

Physical size of the database is under 3 gigs, including toast and index
tables.

The select query takes around 6 minutes (consistently, even if the same
identical query is repeated).

For entertainment, I turned off indexscan and the query takes 1
minute with a simple seqscan.

Any thoughts?

Cheers,
  Steve

=> select count(*) from ftstest;
 count  

 269000
(1 row)

=> \d ftstest
Table "public.ftstest"
 Column |   Type   |Modifiers 
+--+--
 idx| integer  | not null default nextval('public.ftstest_idx_seq'::text)
 words  | text | not null
 idxfti | tsvector | not null
Indexes:
"ftstest_idx" gist (idxfti)

=> explain analyze select idx from ftstest where idxfti @@ 'dominican'::tsquery;
QUERY PLAN 

---
 Index Scan using ftstest_idx on ftstest  (cost=0.00..515.90 rows=271 width=4) (actual 
time=219.694..376042.428 rows=4796 loops=1)
   Index Cond: (idxfti @@ '\'dominican\''::tsquery)
   Filter: (idxfti @@ '\'dominican\''::tsquery)
 Total runtime: 376061.541 ms
(4 rows)


((Set enable_indexscan=false))


=> explain analyze select idx from ftstest where idxfti @@ 'dominican'::tsquery;
  QUERY PLAN   
   
--
 Seq Scan on ftstest  (cost=0.00..5765.88 rows=271 width=4) (actual 
time=42.589..62158.285 rows=4796 loops=1)
   Filter: (idxfti @@ '\'dominican\''::tsquery)
 Total runtime: 62182.277 ms
(3 rows)

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-27 Thread Christopher Kings-Lynne

Any thoughts?
Actually, I ran my tests using tsearch V1.  I wonder if there has been 
some weird regression between tsearch 1 and 2?

hris



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-26 Thread Steve Atkins
On Thu, Nov 27, 2003 at 12:41:59PM +0800, Christopher Kings-Lynne wrote:
> >Does anyone have any metrics on how fast tsearch2 actually is?
> >
> >I tried it on a synthetic dataset of a million documents of a hundred
> >words each and while insertions were impressively fast I gave up on
> >the search after 10 minutes.
> >
> >Broken? Unusable slow? This was on the last 7.4 release candidate.
> 
> I just created a 1.1million row dataset by copying one of our 3 row 
> production tables and just taking out the txtidx column.  Then I 
> inserted it into itself until it had 1.1 million rows.
> 
> Then I created the GiST index - THAT took forever - seriously like 20 
> mins or half an hour or something.
> 
> Now, to find a word:
> 
> select * from tsearchtest where ftiidx ## 'curry';
> Time: 9760.75 ms

> So, I have no idea why you think it's slow?  Perhaps you forgot the 
> 'create index using gist' step?

No, it was indexed.

Thanks, that was the datapoint I was looking for. It _can_ run fast, so
I just need to work out what's going on. (It's hard to diagnose a slow
query when you've no idea whether it's really 'slow').

Cheers,
  Steve

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-26 Thread Christopher Kings-Lynne
Does anyone have any metrics on how fast tsearch2 actually is?

I tried it on a synthetic dataset of a million documents of a hundred
words each and while insertions were impressively fast I gave up on
the search after 10 minutes.
Broken? Unusable slow? This was on the last 7.4 release candidate.
I just created a 1.1million row dataset by copying one of our 3 row 
production tables and just taking out the txtidx column.  Then I 
inserted it into itself until it had 1.1 million rows.

Then I created the GiST index - THAT took forever - seriously like 20 
mins or half an hour or something.

Now, to find a word:

select * from tsearchtest where ftiidx ## 'curry';
Time: 9760.75 ms
The AND of two words:
Time: 103.61 ms
The AND of three words:
select * from tsearchtest where ftiidx ## 'curry&green&thai';
Time: 61.86 ms
And now a one word query now that buffers are cached:
select * from tsearchtest where ftiidx ## 'curry';
Time: 444.89 ms
So, I have no idea why you think it's slow?  Perhaps you forgot the 
'create index using gist' step?

Also, if you use the NOT (!) operand, you can get yourself into a really 
slow situation.

Chris



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-26 Thread Steve Atkins
On Thu, Nov 27, 2003 at 08:51:14AM +0800, Christopher Kings-Lynne wrote:
> >Which one is better (performance/easier to use),
> >tsearch2 or fulltextindex? 
> >there is an example how to use fulltextindex in the
> >techdocs, but I checked the contrib/fulltextindex
> >package, there is a WARNING that fulltextindex is
> >much slower than tsearch2. but tsearch2 seems
> >complex to use, and I can not find a good example.
> >Which one I should use? Any suggestions?  
> 
> I believe I wrote that warning :)
> 
> Tsearch2 is what you should use.  Yes, it's more complicated but it's 
> HEAPS faster and seriously powerful.

Does anyone have any metrics on how fast tsearch2 actually is?

I tried it on a synthetic dataset of a million documents of a hundred
words each and while insertions were impressively fast I gave up on
the search after 10 minutes.

Broken? Unusable slow? This was on the last 7.4 release candidate.

Cheers,
  Steve

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-26 Thread Dror Matalon

On Thu, Nov 27, 2003 at 08:51:14AM +0800, Christopher Kings-Lynne wrote:
> >Which one is better (performance/easier to use),
> >tsearch2 or fulltextindex? 
> >there is an example how to use fulltextindex in the
> >techdocs, but I checked the contrib/fulltextindex
> >package, there is a WARNING that fulltextindex is
> >much slower than tsearch2. but tsearch2 seems
> >complex to use, and I can not find a good example.
> >Which one I should use? Any suggestions?  
> 
> I believe I wrote that warning :)
> 
> Tsearch2 is what you should use.  Yes, it's more complicated but it's 
> HEAPS faster and seriously powerful.
> 

Can you provide some numbers please, both for creating full text indexes
as well as for searching them? I tried to use tsearch and it seemed like
just creating a full text index on million+ records took forever.

> Just read the README file.
> 
> You could also try out the original tsearch (V1), but that will probably 
> be superceded soon, now that tsearch2 is around.
> 
> Chris
> 
> 
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>   http://www.postgresql.org/docs/faqs/FAQ.html

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-26 Thread Christopher Kings-Lynne
Which one is better (performance/easier to use),
tsearch2 or fulltextindex? 
there is an example how to use fulltextindex in the
techdocs, but I checked the contrib/fulltextindex
package, there is a WARNING that fulltextindex is
much slower than tsearch2. but tsearch2 seems
complex to use, and I can not find a good example.
Which one I should use? Any suggestions?  
I believe I wrote that warning :)

Tsearch2 is what you should use.  Yes, it's more complicated but it's 
HEAPS faster and seriously powerful.

Just read the README file.

You could also try out the original tsearch (V1), but that will probably 
be superceded soon, now that tsearch2 is around.

Chris



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] For full text indexing, which is better, tsearch2 or fulltextindex

2003-11-26 Thread LIANHE SHAO
Hi all, 
Which one is better (performance/easier to use),
tsearch2 or fulltextindex? 
there is an example how to use fulltextindex in the
techdocs, but I checked the contrib/fulltextindex
package, there is a WARNING that fulltextindex is
much slower than tsearch2. but tsearch2 seems
complex to use, and I can not find a good example.
Which one I should use? Any suggestions?  

thanks and Regards,
William

- Original Message -
From: Hannu Krosing <[EMAIL PROTECTED]>
Date: Wednesday, November 26, 2003 5:33 pm
Subject: Re: [PERFORM] why index scan not working
when using 'like'?

> Tom Lane kirjutas T, 25.11.2003 kell 23:29:
> > Josh Berkus <[EMAIL PROTECTED]> writes:
> > > In regular text fields containing words, your
problem is 
> solvable with full 
> > > text indexing (FTI).   Unfortunately, FTI is
not designed for 
> arbitrary 
> > > non-language strings.  It could be adapted,
but would require a 
> lot of 
> > > hacking.
> > 
> > I'm not sure why you say that FTI isn't a usable
solution.  As 
> long as
> > the gene symbols are separated by whitespace or
some other non-
> letters> (eg, "foo mif bar" not "foomifbar"), I'd
think FTI would 
> work.
> If he wants to search on arbitrary substring, he
could change 
> tokeniserin FTI to produce trigrams, so that
"foomifbar" would be 
> indexed as if
> it were text "foo oom omi mif ifb fba bar" and
search for things like
> %mifb% should first do a FTI search for "mif" AND
"ifb" and then 
> simpleLIKE %mifb% to weed out something like "mififb".
> 
> There are ways to use trigrams for 1 and 2 letter
matches as well.
> 
> -
> Hannu
> 
> 
> ---(end of
broadcast)---
> 
> TIP 3: if posting/reading through Usenet, please
send an appropriate
>  subscribe-nomail command to
[EMAIL PROTECTED] so that 
> your  message can get through to the mailing
list cleanly
> 


---(end of broadcast)---
TIP 8: explain analyze is your friend