Re: [GENERAL] full text indexing

2006-03-16 Thread Chris

Oleg Bartunov wrote:

On Wed, 15 Mar 2006, chris smith wrote:


Hi all,

Just wondering which full text module is better  what the differences
are between tsearch and fti ?


if you need online indexing and linguistic support (dictionaries, stop 
words, ranking) tsearch2 is fine. If your data are static and you need

only strict search, fti could works for you.
We hope to develop inverted index support for tsearch2 this year, though.



The table in question has roughly 80,000 rows.



how many unique words and how long are documents ?


I think I'll stick to tsearch, it seems to work pretty well.

I created the words list from one table (2 columns) for fti. 2.6G file 
and almost 200 million records, and took up almost 8.5G of space in the 
database.


I was going to see if I could do a speed comparison between the two but 
ran out of space before I could create the indexes on this table.


Hmm. Might try it on a different table and see what happens, I'm 
interested to see the differences (if any) between the results :)


--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] full text indexing

2006-03-16 Thread Oleg Bartunov

Chris,

try REL8_1_STABLE branch, because you save a lot of time of indexing

Oleg
On Fri, 17 Mar 2006, Chris wrote:


Oleg Bartunov wrote:

On Wed, 15 Mar 2006, chris smith wrote:


Hi all,

Just wondering which full text module is better  what the differences
are between tsearch and fti ?


if you need online indexing and linguistic support (dictionaries, stop 
words, ranking) tsearch2 is fine. If your data are static and you need

only strict search, fti could works for you.
We hope to develop inverted index support for tsearch2 this year, though.



The table in question has roughly 80,000 rows.



how many unique words and how long are documents ?


I think I'll stick to tsearch, it seems to work pretty well.

I created the words list from one table (2 columns) for fti. 2.6G file and 
almost 200 million records, and took up almost 8.5G of space in the database.


I was going to see if I could do a speed comparison between the two but ran 
out of space before I could create the indexes on this table.


Hmm. Might try it on a different table and see what happens, I'm interested 
to see the differences (if any) between the results :)





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] full text indexing

2006-03-15 Thread Richard Huxton

chris smith wrote:

Hi all,

Just wondering which full text module is better  what the differences
are between tsearch and fti ?

The table in question has roughly 80,000 rows.


I've been very happy with tsearch2. Note that if you're running an old 
version of PostgreSQL (7.4?) there are some manual steps you'll need to 
take to dump + restore.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] full text indexing

2006-03-15 Thread Oleg Bartunov

On Wed, 15 Mar 2006, chris smith wrote:


Hi all,

Just wondering which full text module is better  what the differences
are between tsearch and fti ?


if you need online indexing and linguistic support (dictionaries, stop 
words, ranking) tsearch2 is fine. If your data are static and you need

only strict search, fti could works for you.
We hope to develop inverted index support for tsearch2 this year, though.



The table in question has roughly 80,000 rows.


how many unique words and how long are documents ?




Thanks!
--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] full text indexing

2006-03-15 Thread Ian Harding
On 3/15/06, chris smith [EMAIL PROTECTED] wrote:
 Hi all,

 Just wondering which full text module is better  what the differences
 are between tsearch and fti ?

Having only used tsearch/tsearch2 all I can say that it works as
advertised and I am extremely happy with it.

- Ian

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Full Text Indexing and Syntax

2006-02-28 Thread Michael Glaesemann


On Mar 1, 2006, at 6:54 , flood wrote:


Unfortunately I can not seem to get my query to use PG's full text
indexing, it keeps doing a seq scan:




EXPLAIN
SELECT t1.id, t2.id
FROM test1 t1, test2 t2
WHERE lower( t1.keyword ) ~ ( lower ( '^' || t2.article ));


It doesn't appear that you're using tsearch2. PostgreSQL does not  
include full text search in the basic installation. Have you  
installed tsearch2?


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Full Text Indexing and Syntax

2006-02-28 Thread Joshua D. Drake




EXPLAIN
SELECT t1.id, t2.id
FROM test1 t1, test2 t2
WHERE lower( t1.keyword ) ~ ( lower ( '^' || t2.article ));


It doesn't appear that you're using tsearch2. PostgreSQL does not 
include full text search in the basic installation. Have you installed 
tsearch2?

Which is included in the PostgreSQL contrib

Joshua D. Drake



Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 6: explain analyze is your friend




--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] full text indexing

2000-10-03 Thread Poul L. Christiansen

Mitch Vincent wrote:
 
  Hi
 
  In my PostgreSQL database I have a lot of newspaper articles (size:
  100mb now, growing beyond 1gb within few months).
  I wan't to use full text indexing so that users can search the articles
  with a keyword and have the results in less than one second.
 
  How do I accomplish that?
 
 It's complicated :-)
 
  Does PostgreSQL have this feature?
 
 Nope.
 
  Which 3rd party indexing tools are available that easily interoperate
  with PostgreSQL?
 
 There is some code in contrib called FTI (Full Text Index - no doubt).. I
 re-wrote it for my uses but ended up not using because I was doing so many
 sorts and joins, it made a scan (even an index scan) of a table with 3
 million rows in it very slow. However id you were just searching the keyword
 table itself it was very, very fast.
 
 The FTI trigger in the contrib breaks the words down to 2 letter bits (for
 substring searching) -- mine doesn't, it only indexes whole words without
 duplicates and looks at a list of words not to index (words like a, an, the,
 anything else you want -- I think it has the 300 most used English words in
 there already)
 
 It's drawback is speed, it does take a few seconds to index on INSERT (and
 UPDATE) -- that's if your text fields are 30ish k (very close to the max PG
 can store)..
 
 I would say you're going to run into the 32k limit pretty quick with
 newspaper articles -- I index resumes and I've run into it many times (32k
 of text really isn't all that much)..

I split the articles up. When pg 7.1 is out, I will merge them together
again :)

 
 Anyway, I'll try and get that trigger together that I did and send it to the
 PG guys to see if it's worthy of being added to contrib -- I'll send you a
 copy in private if you'd like. Note: I think there have been more people to
 re-write that trigger, I haven't seen anything else though..
 

Where can i find the FTI code? 

I looked in ftp://ftp.postgresql.org/pub/contrib/ , but no luck :(

Thanks for the reply,
Poul L. Christiansen



Re: [GENERAL] full-text indexing

2000-04-18 Thread Brett W. McCoy

On Tue, 18 Apr 2000, Bruce Momjian wrote:

 I have one word for you:  CLUSTER.  Without it, index lookups are too
 slow.  With it, they are rapid.  I have done some work like this
 commerically with Ingres, which has an ISAM type that keeps the matching
 rows pretty close on a newly-created ISAM index.  In PostgreSQL, and
 regular CLUSTER will keep you good.

I agree!  The last bit of advice given in the full text README.  As I
said, I'd built full-text stuff for experimentation (I had maybe 30k of
raw text, which amounted to several 100,000 indexed entries), and I had
clustered it, and it was pretty darn fast, even on a Pentium 233 with only
48 megs of RAM.  I have significantly better hardware to run it on now.
The original project called MySQL, but it just didn't have what we needed
to put something like this together.

 If you find it slow, let me know.  I have done some benchmarking with
 the author and he found it pretty fast, usually a few seconds.  See the
 section in my book on CLUSTER for information on _why_ it helps.

Thanks, Bruce.

Brett W. McCoy
  http://www.chapelperilous.net
---
Twenty two thousand days.
Twenty two thousand days.
It's not a lot.
It's all you've got.
Twenty two thousand days.
-- Moody Blues, "Twenty Two Thousand Days"




Re: [GENERAL] full-text indexing

2000-04-18 Thread Brett W. McCoy

On Tue, 18 Apr 2000, Bruce Momjian wrote:

  I agree!  The last bit of advice given in the full text README.  As I
  said, I'd built full-text stuff for experimentation (I had maybe 30k of
  raw text, which amounted to several 100,000 indexed entries), and I had
  clustered it, and it was pretty darn fast, even on a Pentium 233 with only
  48 megs of RAM.  I have significantly better hardware to run it on now.
  The original project called MySQL, but it just didn't have what we needed
  to put something like this together.
 
 With the original author, testing was fast, but when he loaded all the
 data, it got very slow.  The problem was that as soon as his data
 exceeded the buffer cache, performance became terrible.

How much data are we talking here?  How can one get around this buffer
cache problem?

Brett W. McCoy
  http://www.chapelperilous.net
---
Twenty two thousand days.
Twenty two thousand days.
It's not a lot.
It's all you've got.
Twenty two thousand days.
-- Moody Blues, "Twenty Two Thousand Days"




Re: [GENERAL] full-text indexing

2000-04-18 Thread Bruce Momjian

 On Tue, 18 Apr 2000, Bruce Momjian wrote:
 
  I have one word for you:  CLUSTER.  Without it, index lookups are too
  slow.  With it, they are rapid.  I have done some work like this
  commerically with Ingres, which has an ISAM type that keeps the matching
  rows pretty close on a newly-created ISAM index.  In PostgreSQL, and
  regular CLUSTER will keep you good.
 
 I agree!  The last bit of advice given in the full text README.  As I
 said, I'd built full-text stuff for experimentation (I had maybe 30k of
 raw text, which amounted to several 100,000 indexed entries), and I had
 clustered it, and it was pretty darn fast, even on a Pentium 233 with only
 48 megs of RAM.  I have significantly better hardware to run it on now.
 The original project called MySQL, but it just didn't have what we needed
 to put something like this together.

With the original author, testing was fast, but when he loaded all the
data, it got very slow.  The problem was that as soon as his data
exceeded the buffer cache, performance became terrible.

-- 
  Bruce Momjian|  http://www.op.net/~candle
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026