On Thu, Jan 16, 2003 at 10:56:35AM +0000, martin bower wrote:
>Im writing a document management site, and am looking for pointers on how 
>to index html,pdf (maybe word) docs, and then search against them.
>Ive had a quick look and found xpdf, but wondered what the pitfalls are 
>writing my own indexer, and any advice would be welcome.

Searchable indexing is one of those interesting problems. Google for
"inverted index" for the most common technique. It's possible to get
decent performance by rolling your own (see e.g.
http://lists.firedrake.org/), but there are a bunch of standard problems
which things like ht/dig have already solved for you.

Roger

Reply via email to