Good Monday morning,
I have developed a spider for our site to collect links for indexing,
the list of links is then looped and cfhttp is employed to get the
content. We parse the tags out of the html pages and put the content
into a table for a later query and into a Verity collection. I would
like to include our Adobe .pdf and Word .doc files in our index. I would
like to index the .pdf and .doc documents during the looping because the
content is binary and is not easily parsed.
I believe Verity will index .doc and .pdf pages, correct?
Does anyone have a code sample that performs indexing on a pdf / doc
file? Currently I am using:
<cfindex action="" collection="Crawler" type="file" key="webpath"
extensions=".pdf">
But it does not appear to be performing the indexing.
TIA!
Doug James
IT Developer
Hollings Cancer Center
Medical University of SC
http://hcc.musc.edu
[Todays Threads]
[This Message]
[Subscription]
[Fast Unsubscribe]
[User Settings]
[Donations and Support]