Hi

I am trying to use htdig 3.1.6-6 to index and search a hierarchy of .doc files, saved on a server based on redhat 7.3.

I am using doc2html, which runs fine on documents from the command line, but as far as I can see does not get called from htdig.

I have run rundig with -vvvv and can see it it accessing the doc files (html indexed with modindex).
Here is an example of the log file: (the filesize is correct)

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<snip>
Tag: <td align="left" class="tbl_files">, matched -1
Tag: <a href="Aune_2392.doc" class ="files">, matched 2
Tag: <img src="/ModIndex_Files/file.gif" height="13" width="13" alt="file" />, matched 18
word: [EMAIL PROTECTED]
image: http://bjsserver/ModIndex_Files/file.gif
word: [EMAIL PROTECTED]
word part: [EMAIL PROTECTED]
word part: [EMAIL PROTECTED]
word part: [EMAIL PROTECTED]
word part: [EMAIL PROTECTED]
word part: [EMAIL PROTECTED]
Tag: </a>, matched 3
href: http://bjsserver/testdocs/files/A/Aune_2392.doc (file Aune_2392.doc)
resolving 'http://bjsserver/testdocs/files/A/Aune_2392.doc'

  pushing http://bjsserver/testdocs/files/A/Aune_2392.doc

<snip>

pick: bjsserver, # servers = 1
203:203:2:http://bjsserver/testdocs/files/A/Aune_2392.doc: Retrieval command for http://bjsserver/testdocs/files/A/Aune_2392.doc: GET /testdocs/files/A/Aune_2392.doc HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Referer: http://bjsserver/testdocs/files/A/
Host: bjsserver

Header line: HTTP/1.1 200 OK
Header line: Date: Tue, 03 Jan 2006 16:40:31 GMT
Header line: Server: Apache
Header line: Last-Modified: Thu, 18 Aug 2005 17:56:00 GMT
Converted Thu, 18 Aug 2005 17:56:00 GMT to Thu, 18 Aug 2005 17:56:00
Header line: ETag: "4400d9-15c00-4304cbb0"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 89088
Header line: Connection: close
Header line: Content-Type: text/plain
Header line:
returnStatus = 0
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 7168 from document
Read a total of 89088 bytes
size = 89088

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Here is my config file:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
start_url:              http://bjsserver/testdocs/files/
limit_urls_to:          http://bjsserver/testdocs/files/
local_urls:             http://bjsserver/testdocs/files/
#common_url_parts:       http://bjsserver/testdocs/files/ .html
database_dir:           /opt/mailman/htdig/db/uretek
bad_word_list:          /opt/mailman/htdig/common/bad_words
nothing_found_file:     /opt/mailman/htdig/common/nomatch.html
search_results_wrapper: /opt/mailman/htdig/common/wrapper.html
exclude_urls:           .dir .htaccess .mhonarc.db
max_head_length:        10000
remove_bad_urls:        true
use_star_image:         no
maintainer:             [EMAIL PROTECTED]
search_algorithm:       exact:1 synonyms:0.5 endings:0.1
allow_virtual_hosts:    true
allow_numbers:          true
no_next_page_text:
no_prev_page_text:
backlink_factor:        0
sort:                   date
maximum_pages:          30
next_page_text: <img src=/icons/buttonr.gif border=0 align=middle width=30 height=30 alt=next> prev_page_text: <img src=/icons/buttonl.gif border=0 align=middle width=30 height=30 alt=prev> page_number_text: "<img src=/icons/button1.gif border=0 align=middle width=30 height=30 alt=1>" \ "<img src=/icons/button2.gif border=0 align=middle width=30 height=30 alt=2>" \ "<img src=/icons/button3.gif border=0 align=middle width=30 height=30 alt=3>" \ "<img src=/icons/button4.gif border=0 align=middle width=30 height=30 alt=4>" \ "<img src=/icons/button5.gif border=0 align=middle width=30 height=30 alt=5>" \ "<img src=/icons/button6.gif border=0 align=middle width=30 height=30 alt=6>" \ "<img src=/icons/button7.gif border=0 align=middle width=30 height=30 alt=7>" \ "<img src=/icons/button8.gif border=0 align=middle width=30 height=30 alt=8>" \ "<img src=/icons/button9.gif border=0 align=middle width=30 height=30 alt=9>" \ "<img src=/icons/button10.gif border=0 align=middle width=30 height=30 alt=10>" no_page_number_text: "<img src=/icons/button1.gif border=2 align=middle width=30 height=30 alt=1>" \ "<img src=/icons/button2.gif border=2 align=middle width=30 height=30 alt=2>" \ "<img src=/icons/button3.gif border=2 align=middle width=30 height=30 alt=3>" \ "<img src=/icons/button4.gif border=2 align=middle width=30 height=30 alt=4>" \ "<img src=/icons/button5.gif border=2 align=middle width=30 height=30 alt=5>" \ "<img src=/icons/button6.gif border=2 align=middle width=30 height=30 alt=6>" \ "<img src=/icons/button7.gif border=2 align=middle width=30 height=30 alt=7>" \ "<img src=/icons/button8.gif border=2 align=middle width=30 height=30 alt=8>" \ "<img src=/icons/button9.gif border=2 align=middle width=30 height=30 alt=9>" \ "<img src=/icons/button10.gif border=2 align=middle width=30 height=30 alt=10>"

external_parsers: application/rtf->text/html /opt/mailman/htdig/bin/doc2html.pl \
           text/rtf->text/html /opt/mailman/htdig/bin/doc2html.pl \
           application/pdf->text/html /opt/mailman/htdig/bin/doc2html.pl \
application/postscript->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/msword->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/wordperfect5.1->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/wordperfect6.0->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/msexcel->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/vnd.ms-excel->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/powerpoint->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/vnd.ms-powerpoint->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/x-shockwave-flash->text/html /opt/mailman/htdig/bin/doc2html.pl \ application/x-shockwave-flash2-preview->text/html /opt/mailman/htdig/bin/doc2html.pl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Any ideas?

--
Cheers

Brian

http://www.abandonmicrosoft.co.uk



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to