I've pretty much decided to bite the bullet and fall back on Verity
for spidered indexing, despite its inability to natively parse DOCs
and PDFs.

Now, I'm wondering what strategies people are using to feed the text
versions of DOCs/PDFs to the spider.

The following seems like the best option to me. Does anyone have a
better idea?

Create a template that is a (cfdirectory-populated) list of links to
the (pdf2text-converted) text files. This template (and its link from
the start page) would only be visible to localhost (the VK2 spider).
Later, when displaying the search results, I could replace the URLs of
the text files with the appropriate PDF files.

Thanks,
Jamie
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

Reply via email to