BTW, BlueDragon as of 6.1 beta 2 now has built-in full-text search and
indexing via Lucene as well as HTTP spidering.

-Matt

On Feb 17, 2004, at 5:01 PM, Jamie Jackson wrote:

> I've pretty much decided to bite the bullet and fall back on Verity
>  for spidered indexing, despite its inability to natively parse DOCs
>  and PDFs.
>
>  Now, I'm wondering what strategies people are using to feed the text
>  versions of DOCs/PDFs to the spider.
>
>  The following seems like the best option to me. Does anyone have a
>  better idea?
>
>  Create a template that is a (cfdirectory-populated) list of links to
>  the (pdf2text-converted) text files. This template (and its link from
>  the start page) would only be visible to localhost (the VK2 spider).
>  Later, when displaying the search results, I could replace the URLs of
>  the text files with the appropriate PDF files.
>
>  Thanks,
>  Jamie
>
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

Reply via email to