> On Thu, 13 May 2004, Douglas Kline wrote: > > > Date: Thu, 13 May 2004 20:24:47 -0400 > > From: Douglas Kline <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Subject: [htdig] Interpreting pdf Files > > > > > > In an attempt to process pdf files with ht-Dig version 3.2.0b5, I've added the > > lines > > > > external_parsers: > > application/pdf->text/html <local directory>/xpdf-3.00/xpdf/pdftotext > > Add a back-slash at the end of the first line to join the two lines: > > external_parsers: \ > application/pdf->text/html <local directory>/xpdf-3.00/xpdf/pdftotext
Thanks. That worked. I got the idea that the back-slashes were needed for continuation lines after the first line after the "external_parsers:" line. I think this was from a message on this list of Apr. 16 which had it that way but other messages from the same day on the same thread had "external_parsers:" and the following text on the same line. Now I'm getting a different error. It's finding the pdftotext command but outputting the help text you get if you don't give it any arguments. Evidently it isn't being passed the file to be converted from pdf to text format. The documentation on external_parsers in the Configuration file format -- Attributes Web page doesn't seem to deal with passing arguments that refer to the pages being indexed. Yet even if it's passing the file to be converted the following arguments would vary from one converter to another and so there must be some way to indicate them in the htdig.conf file. The documentation says you can include arguments if you quote the whole command string but how do I indicate the file to be converted and where should the output of that command go? The documentation also says, "Unless it is an external converter, which will output a document of a different content-type, then its output must follow the format described here." I'm guessing that my case here is one of the external converters and the output doesn't have to conform to that format. The documentation also says, "If the second type is user-defined, then it's up to the converter script to put out a "Content-Type: type" header followed by a blank line, to indicate to htdig what type it should expect for the output, much like what a CGI script would do." Is this a user-defined second type? I'm guessing that it isn't since it's plain text? TIA. Douglas ======== Douglas Kline [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

