Re: [GENERAL] full text indexing
Oleg Bartunov wrote: On Wed, 15 Mar 2006, chris smith wrote: Hi all, Just wondering which full text module is better what the differences are between tsearch and fti ? if you need online indexing and linguistic support (dictionaries, stop words, ranking) tsearch2 is fine. If your data are static and you need only strict search, fti could works for you. We hope to develop inverted index support for tsearch2 this year, though. The table in question has roughly 80,000 rows. how many unique words and how long are documents ? I think I'll stick to tsearch, it seems to work pretty well. I created the words list from one table (2 columns) for fti. 2.6G file and almost 200 million records, and took up almost 8.5G of space in the database. I was going to see if I could do a speed comparison between the two but ran out of space before I could create the indexes on this table. Hmm. Might try it on a different table and see what happens, I'm interested to see the differences (if any) between the results :) -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] full text indexing
Chris, try REL8_1_STABLE branch, because you save a lot of time of indexing Oleg On Fri, 17 Mar 2006, Chris wrote: Oleg Bartunov wrote: On Wed, 15 Mar 2006, chris smith wrote: Hi all, Just wondering which full text module is better what the differences are between tsearch and fti ? if you need online indexing and linguistic support (dictionaries, stop words, ranking) tsearch2 is fine. If your data are static and you need only strict search, fti could works for you. We hope to develop inverted index support for tsearch2 this year, though. The table in question has roughly 80,000 rows. how many unique words and how long are documents ? I think I'll stick to tsearch, it seems to work pretty well. I created the words list from one table (2 columns) for fti. 2.6G file and almost 200 million records, and took up almost 8.5G of space in the database. I was going to see if I could do a speed comparison between the two but ran out of space before I could create the indexes on this table. Hmm. Might try it on a different table and see what happens, I'm interested to see the differences (if any) between the results :) Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] full text indexing
chris smith wrote: Hi all, Just wondering which full text module is better what the differences are between tsearch and fti ? The table in question has roughly 80,000 rows. I've been very happy with tsearch2. Note that if you're running an old version of PostgreSQL (7.4?) there are some manual steps you'll need to take to dump + restore. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] full text indexing
On Wed, 15 Mar 2006, chris smith wrote: Hi all, Just wondering which full text module is better what the differences are between tsearch and fti ? if you need online indexing and linguistic support (dictionaries, stop words, ranking) tsearch2 is fine. If your data are static and you need only strict search, fti could works for you. We hope to develop inverted index support for tsearch2 this year, though. The table in question has roughly 80,000 rows. how many unique words and how long are documents ? Thanks! -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] full text indexing
On 3/15/06, chris smith [EMAIL PROTECTED] wrote: Hi all, Just wondering which full text module is better what the differences are between tsearch and fti ? Having only used tsearch/tsearch2 all I can say that it works as advertised and I am extremely happy with it. - Ian ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Full Text Indexing and Syntax
On Mar 1, 2006, at 6:54 , flood wrote: Unfortunately I can not seem to get my query to use PG's full text indexing, it keeps doing a seq scan: EXPLAIN SELECT t1.id, t2.id FROM test1 t1, test2 t2 WHERE lower( t1.keyword ) ~ ( lower ( '^' || t2.article )); It doesn't appear that you're using tsearch2. PostgreSQL does not include full text search in the basic installation. Have you installed tsearch2? Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Full Text Indexing and Syntax
EXPLAIN SELECT t1.id, t2.id FROM test1 t1, test2 t2 WHERE lower( t1.keyword ) ~ ( lower ( '^' || t2.article )); It doesn't appear that you're using tsearch2. PostgreSQL does not include full text search in the basic installation. Have you installed tsearch2? Which is included in the PostgreSQL contrib Joshua D. Drake Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 6: explain analyze is your friend -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: PLphp, PLperl - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] full text indexing
Mitch Vincent wrote: Hi In my PostgreSQL database I have a lot of newspaper articles (size: 100mb now, growing beyond 1gb within few months). I wan't to use full text indexing so that users can search the articles with a keyword and have the results in less than one second. How do I accomplish that? It's complicated :-) Does PostgreSQL have this feature? Nope. Which 3rd party indexing tools are available that easily interoperate with PostgreSQL? There is some code in contrib called FTI (Full Text Index - no doubt).. I re-wrote it for my uses but ended up not using because I was doing so many sorts and joins, it made a scan (even an index scan) of a table with 3 million rows in it very slow. However id you were just searching the keyword table itself it was very, very fast. The FTI trigger in the contrib breaks the words down to 2 letter bits (for substring searching) -- mine doesn't, it only indexes whole words without duplicates and looks at a list of words not to index (words like a, an, the, anything else you want -- I think it has the 300 most used English words in there already) It's drawback is speed, it does take a few seconds to index on INSERT (and UPDATE) -- that's if your text fields are 30ish k (very close to the max PG can store).. I would say you're going to run into the 32k limit pretty quick with newspaper articles -- I index resumes and I've run into it many times (32k of text really isn't all that much).. I split the articles up. When pg 7.1 is out, I will merge them together again :) Anyway, I'll try and get that trigger together that I did and send it to the PG guys to see if it's worthy of being added to contrib -- I'll send you a copy in private if you'd like. Note: I think there have been more people to re-write that trigger, I haven't seen anything else though.. Where can i find the FTI code? I looked in ftp://ftp.postgresql.org/pub/contrib/ , but no luck :( Thanks for the reply, Poul L. Christiansen
Re: [GENERAL] full-text indexing
On Tue, 18 Apr 2000, Bruce Momjian wrote: I have one word for you: CLUSTER. Without it, index lookups are too slow. With it, they are rapid. I have done some work like this commerically with Ingres, which has an ISAM type that keeps the matching rows pretty close on a newly-created ISAM index. In PostgreSQL, and regular CLUSTER will keep you good. I agree! The last bit of advice given in the full text README. As I said, I'd built full-text stuff for experimentation (I had maybe 30k of raw text, which amounted to several 100,000 indexed entries), and I had clustered it, and it was pretty darn fast, even on a Pentium 233 with only 48 megs of RAM. I have significantly better hardware to run it on now. The original project called MySQL, but it just didn't have what we needed to put something like this together. If you find it slow, let me know. I have done some benchmarking with the author and he found it pretty fast, usually a few seconds. See the section in my book on CLUSTER for information on _why_ it helps. Thanks, Bruce. Brett W. McCoy http://www.chapelperilous.net --- Twenty two thousand days. Twenty two thousand days. It's not a lot. It's all you've got. Twenty two thousand days. -- Moody Blues, "Twenty Two Thousand Days"
Re: [GENERAL] full-text indexing
On Tue, 18 Apr 2000, Bruce Momjian wrote: I agree! The last bit of advice given in the full text README. As I said, I'd built full-text stuff for experimentation (I had maybe 30k of raw text, which amounted to several 100,000 indexed entries), and I had clustered it, and it was pretty darn fast, even on a Pentium 233 with only 48 megs of RAM. I have significantly better hardware to run it on now. The original project called MySQL, but it just didn't have what we needed to put something like this together. With the original author, testing was fast, but when he loaded all the data, it got very slow. The problem was that as soon as his data exceeded the buffer cache, performance became terrible. How much data are we talking here? How can one get around this buffer cache problem? Brett W. McCoy http://www.chapelperilous.net --- Twenty two thousand days. Twenty two thousand days. It's not a lot. It's all you've got. Twenty two thousand days. -- Moody Blues, "Twenty Two Thousand Days"
Re: [GENERAL] full-text indexing
On Tue, 18 Apr 2000, Bruce Momjian wrote: I have one word for you: CLUSTER. Without it, index lookups are too slow. With it, they are rapid. I have done some work like this commerically with Ingres, which has an ISAM type that keeps the matching rows pretty close on a newly-created ISAM index. In PostgreSQL, and regular CLUSTER will keep you good. I agree! The last bit of advice given in the full text README. As I said, I'd built full-text stuff for experimentation (I had maybe 30k of raw text, which amounted to several 100,000 indexed entries), and I had clustered it, and it was pretty darn fast, even on a Pentium 233 with only 48 megs of RAM. I have significantly better hardware to run it on now. The original project called MySQL, but it just didn't have what we needed to put something like this together. With the original author, testing was fast, but when he loaded all the data, it got very slow. The problem was that as soon as his data exceeded the buffer cache, performance became terrible. -- Bruce Momjian| http://www.op.net/~candle [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026