BTW, BlueDragon as of 6.1 beta 2 now has built-in full-text search and
indexing via Lucene as well as HTTP spidering.
-Matt
On Feb 17, 2004, at 5:01 PM, Jamie Jackson wrote:
> I've pretty much decided to bite the bullet and fall back on Verity
> for spidered indexing, despite its inability to natively parse DOCs
> and PDFs.
>
> Now, I'm wondering what strategies people are using to feed the text
> versions of DOCs/PDFs to the spider.
>
> The following seems like the best option to me. Does anyone have a
> better idea?
>
> Create a template that is a (cfdirectory-populated) list of links to
> the (pdf2text-converted) text files. This template (and its link from
> the start page) would only be visible to localhost (the VK2 spider).
> Later, when displaying the search results, I could replace the URLs of
> the text files with the appropriate PDF files.
>
> Thanks,
> Jamie
>
[Todays Threads]
[This Message]
[Subscription]
[Fast Unsubscribe]
[User Settings]
- Verity K2 Spidering on Linux/CFMX. Jamie Jackson
- Re: Verity K2 Spidering on Linux/CFMX. Matt Liotta
- Re: Verity K2 Spidering on Linux/CFMX. Jamie Jackson
- Re: Verity K2 Spidering on Linux/CFMX. Matt Liotta