It looks as thought you have done everything right in configuring doc2html.
The "! UNABLE to convert!" message comes from doc2html.pl, it means
that doc2html.pl went through its list of converters but failed to find one
for .PDF files. If you execute doc2html.pl on the command line you need to
specify three arguments:
/usr/local/bin/pdf2html.pl filename "application/pdf" url
Your -vvv log includes:
Header line: HTTP/1.1 403 Forbidden
This would seem to be the problem.
--
David Adams
Information Systems Services
Southampton University
----- Original Message -----
From: "Jason Morse" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, January 17, 2003 7:29 PM
Subject: [htdig] Trouble with PDF setup
> Hi. I'm looking for some help regarding PDF files with htdig on an
> intranet server. I first set up htdig 3.1.6 on a Mac OS X Server running
> Apache 1.3.26, without any PDF support. Rundig indexed all files without
> problem. I then tried to add PDF file support:
>
> 1. Installed xpdf-2.00 to add the pdfinfo and pdftotext utilities to
> /usr/local/bin
>
> 2. Added doc2html.pl and pdf2html.pl to /usr/local/bin (and made them
> executable).
>
> 3. Modified the following line in doc2html.pl:
> my $PDF2HTML = '/usr/local/bin/pdf2html.pl';
>
> 4. Modified the following in pdf2html.pl:
> my $PDFTOTEXT = "/usr/local/bin/pdftotext";
> my $PDFINFO = "/usr/local/bin/pdfinfo";
>
> 5. Added the following to htdig.conf:
> external_parsers: application/pdf->text/html
/usr/local/bin/doc2html.pl
>
> I've not had any luck getting PDF files indexed with this setup.
>
> So far in troubleshooting, I've found the following:
> a. Both pdfinfo and pdftotext seem to work when executed on a PDF file.
> b. Executing '/usr/local/bin/pdf2html.pl' on a PDF file generates
> appropriate output.
> c. Executing '/usr/local/bin/doc2html.pl' gives the following error, which
> I don't understand:
> ! UNABLE to convert
> d. After setting 'start_url' in htdig.conf to a directory containing only
> PDF files (10 of them), the following is the output of 'rundig -vvv':
> ----------------
> New server: 1.0.20.78, 80
> Retrieval command for http://1.0.20.78/robots.txt: GET /robots.txt
> HTTP/1.0
> User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
> Host: 1.0.20.78
>
> Header line: HTTP/1.1 404 Not Found
> Header line: Date: Fri, 17 Jan 2003 19:15:31 GMT
> Header line: Server: Apache/1.3.27 (Darwin) PHP/4.1.2
> Header line: Connection: close
> Header line: Content-Type: text/html; charset=iso-8859-1
> Header line:
> returnStatus = 1
> pushed
> pick: 1.0.20.78, # servers = 1
> 0:0:0:http://1.0.20.78/test_pdfs/: Retrieval command for
> http://1.0.20.78/test_pdfs/: GET /test_pdfs/ HTTP/1.0
> User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
> Host: 1.0.20.78
>
> Header line: HTTP/1.1 403 Forbidden
> Header line: Date: Fri, 17 Jan 2003 19:15:31 GMT
> Header line: Server: Apache/1.3.27 (Darwin) PHP/4.1.2
> Header line: Connection: close
> Header line: Content-Type: text/html; charset=iso-8859-1
> Header line:
> returnStatus = 1
> not found
> pick: 1.0.20.78, # servers = 1
> htmerge: Sorting...
> htmerge: Removing doc #0
> DB2 problem...: missing or empty key value specified
>
> Deleted, no excerpt: 0/http://1.0.20.78/test_pdfs/
> ----------------
>
> Thanks for any help!
> -Jason Morse
> [EMAIL PROTECTED]
>
>
>
>
> -------------------------------------------------------
> This SF.NET email is sponsored by: Thawte.com - A 128-bit supercerts will
> allow you to extend the highest allowed 128 bit encryption to all your
> clients even if they use browsers that are limited to 40 bit encryption.
> Get a guide here:http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0030en
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>
-------------------------------------------------------
This SF.NET email is sponsored by: FREE SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html