I am investigating whether it is useful to directly query a database containing a rather large text corpus (order of magnitude 100k - 1m newspaper articles, so around 100 million words), or whether I should use third party text indexing services. I want to know things such as: how often is a certain word (or pattern) mentioned in an article and how often it is mentioned with the condition that another word is nearby (same article or n words distant).
You really want to use the contrib/tsearch2 module that comes already with PostgreSQL.
cd contrib/tsearch2 gmake install psql <mydb> < tsearch2.sql more README.tsearch2
Chris
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]