I have tried contrib/tsearch, but I do fairly complex score computation based on word placement (headers,body,graphics,metatags, etc). And also do additional computations on the scores when the user does a search. It is very quick to use something like tsearch to figure out which pages match my criterion, but very slow to then compute a valid score for each page. Instead, in an attempt to speed things up, a score is computed for each distinct word on a page, and that score is stored in this table. Then when a search is requested, I use previously existing search information and data from this table to compute the new results.
The upshot of all this computing is that I get fairly relevant results, but I have to do a lot of math to get there. I have been considering combining the two methods, doing something like a full text search to find pages that meet my criterion, and then using this table to actually compute a pages score. - brian On Sat, 11 May 2002, Rajesh Kumar Mallah. wrote: > > Hi Brian , > > are you performing full text search in any case? > > Apart from optimizing the TABLE/INDEXES (thru VACUUM measures) > > i feel using text indexes provided by contrib/tsearch can also > lead to significant improvement in search performance. > > regds > mallah. > > On Saturday 11 May 2002 05:48 am, Brian McCane wrote: > > Okay, I guess I misunderstood something about "VACUUM FULL". My > > understanding was that a VACUUM (without FULL), marked unused records for > > reuse. VACUUM FULL moved records from the end of a file to the holes > > where unused records were stored and truncated the file to free disk > > space. So far so good, but.... > > > > I have had continued loss of performance on one of my largest tables > > (600,000,000 records). There are 4 fields in a record like such: > > > > CREATE TABLE foo ( > > a int not null references bar(a) on delete cascade on update no action, > > b int not null references baz(b) on delete cascade on update no action, > > c int, > > d smallint, > > primary key(a, b)) ; > > CREATE INDEX foo_ac ON foo (a,c) ; > > CREATE INDEX foo_ad on foo (a,d) ; > > And there are 3 triggers which fire before insert/delete/update. > > > > I INSERT/UPDATE/DELETE approximately 300,000 records per day, but this > > number is increasing on a daily basis as I make changes which improve the > > performance of my data gathering tools (spiders ;). Two days ago, it had > > reached the point where a search for a 3-word term (ie. free news servers) > > took about 60 seconds. I have just spent 48 hours running a VACUUM FULL > > on my table, and now the same search takes < 10 seconds. I assume that > > the increase in performance is due to the decrease in table/index size > > which added up to approximate 1GB of freed space on the machine, which was > > approximately 4% of the original size of the table and all its indices. > > But, a 4% decrease in size should not add up to a 84% increase in > > performance (is that right? I always get the ratio confused :). > > > > If all that VACUUM FULL did was move records from file 12345678.6 to file > > 12345678, the database would still being doing a large number of random > > accesses on the table. However, if VACUUM FULL clusters the data > > according to the primary key, it would still be doing a large number of > > random access on the table, because the primary key has almost nothing to > > do with how I actually access the data in real life. So, is VACUUM FULL > > looking somewhere in pg_statistics (or pg_stat_user_indexes), to determine > > which index I actually use most (foo_ad), and then clustering the data > > that way, or is there some other agent at work here. > > > > - brian > > > > > > Wm. Brian McCane | Life is full of doors that won't open > > Search http://recall.maxbaud.net/ | when you knock, equally spaced amid > > those Usenet http://freenews.maxbaud.net/ | that open when you don't want > > them to. Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of > > Amber" > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to [EMAIL PROTECTED] so that your > > message can get through to the mailing list cleanly > > -- > Rajesh Kumar Mallah, > Project Manager (Development) > Infocom Network Limited, New Delhi > phone: +91(11)6152172 (221) (L) ,9811255597 (M) > > Visit http://www.trade-india.com , > India's Leading B2B eMarketplace. > > Wm. Brian McCane | Life is full of doors that won't open Search http://recall.maxbaud.net/ | when you knock, equally spaced amid those Usenet http://freenews.maxbaud.net/ | that open when you don't want them to. Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of Amber" ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org