Dear Sir or Madam, to me the task was assigned to aquaint myself with htdig in order to set up an intranet search engine that merely can be accessed by password and that indexes every local document in our network. We decided in favor of HT://DIG. I'm wrestling with the following problem for about two weeks, each day: I'm using SUSE 9.1, now htdig-3.2.0b6, and Apache 2.0.53 (xampp) Everything's working fine with HTML and TEXT Documents but with DOC- and PDF-Files it fails. I installed CATDOC for parsing the msword-files and acrobat for pdf. Lasting for hours, I gorged every kind of information I was able to get in the forums and faqs relating my problem, but nothing seems to work. I'm definetly sure my htdig.conf is absolutely right and my doc2html.pl, too. Evoking the perlscript at the command line works fine, and the verbose mode of rundig tells me that apache passes the correct MIME-type for each file! To me, it seems that for any reason I'm not able to indicate, the external parser doesn't return any text or is not called up the right way -> see what 'rundig -vvv' gives: " . . . href: http://localhost/test/test.doc (___-= TEST - MSWORD-FILE =-___ ) resolving 'http://localhost/test/test.doc' pushing http://localhost/test/test.doc +Tag: <br>, matched -1 Tag: <br>, matched -1 Tag: </div>, matched -1 Tag: <hr noshade size="4">, matched -1 word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] Tag: <br>, matched -1 . . . Header line: HTTP/1.1 200 OK Header line: Date: Tue, 05 Apr 2005 13:09:19 GMT Header line: Server: Apache/2.0.53 (Unix) mod_ssl/2.0.53 OpenSSL/0.9.7d PHP/5.0.3 DAV/2 mod_perl/1.999.21 Perl/v5.8.6 Header line: Last-Modified: Fri, 13 Sep 2002 10:25:48 GMT Converted Fri, 13 Sep 2002 10:25:48 GMT to Fri, 13 Sep 2002 10:25:48 Header line: ETag: "1c6e6-7e00-e8c8a300" Header line: Accept-Ranges: bytes Header line: Content-Length: 32256 Header line: Connection: close Header line: Content-Type: application/msword not HTML pick: localhost, # servers = 1 . . . htmerge: Sorting... htmerge: Removing doc #1 htmerge: Removing doc #3 htmerge: Merging... htmerge: Discarding docfile in doc #1 htmerge: Discarding mswordfile in doc #3 htmerge: Discarding test in doc #3 . . . Deleted, no excerpt: 3/http://localhost/test/test.doc " As I'm inching towards despair, any help would be appreciated! Thank you!
Best regards, Jeremy Prasetyo ------------------------------------------------------- SF.Net email is sponsored by: Tell us your software development plans! Take this survey and enter to win a one-year sub to SourceForge.net Plus IDC's 2005 look-ahead and a copy of this survey Click here to start! http://www.idcswdc.com/cgi-bin/survey?id=105hix _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

