indexing via Lucene as well as HTTP spidering.
-Matt
On Feb 17, 2004, at 5:01 PM, Jamie Jackson wrote:
> I've pretty much decided to bite the bullet and fall back on Verity
> for spidered indexing, despite its inability to natively parse DOCs
> and PDFs.
>
> Now, I'm wondering what strategies people are using to feed the text
> versions of DOCs/PDFs to the spider.
>
> The following seems like the best option to me. Does anyone have a
> better idea?
>
> Create a template that is a (cfdirectory-populated) list of links to
> the (pdf2text-converted) text files. This template (and its link from
> the start page) would only be visible to localhost (the VK2 spider).
> Later, when displaying the search results, I could replace the URLs of
> the text files with the appropriate PDF files.
>
> Thanks,
> Jamie
>
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

