On Sat, 15 May 2004, Douglas Kline wrote:

> Date: Sat, 15 May 2004 19:46:17 -0400
> From: Douglas Kline <[EMAIL PROTECTED]>
> To: Joe R. Jah <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: Re: [htdig] Interpreting pdf Files 
> 
> > On Thu, 13 May 2004, Douglas Kline wrote:
> > 
> > > Date: Thu, 13 May 2004 20:24:47 -0400
> > > From: Douglas Kline <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]
> > > Subject: [htdig] Interpreting pdf Files
> > > 
> > > 
> > > In an attempt to process pdf files with ht-Dig version 3.2.0b5, I've added the
> > > lines
> > > 
> > > external_parsers:
> > >  application/pdf->text/html   <local directory>/xpdf-3.00/xpdf/pdftotext
> > 
> > Add a back-slash at the end of the first line to join the two lines:
> > 
> > external_parsers: \
> >  application/pdf->text/html   <local directory>/xpdf-3.00/xpdf/pdftotext
> 
> 
> Thanks.  That worked.  I got the idea that the back-slashes were needed for
> continuation lines after the first line after the "external_parsers:" line.  I
> think this was from a message on this list of Apr. 16 which had it that way but
> other messages from the same day on the same thread had "external_parsers:" and
> the following text on the same line.

If you can have them on the same line, that's fine too, but you may have
more lines; for example:

external_parsers: \
application/rtf->text/html /usr/local/bin/doc2html.pl \
text/rtf->text/html /usr/local/bin/doc2html.pl \
application/pdf->text/html /usr/local/bin/doc2html.pl \
application/postscript->text/html /usr/local/bin/doc2html.pl \
application/msword->text/html /usr/local/bin/doc2html.pl \
application/wordperfect5.1->text/html /usr/local/bin/doc2html.pl \
application/msexcel->text/html /usr/local/bin/doc2html.pl \
application/vnd.ms-excel->text/html /usr/local/bin/doc2html.pl \
application/vnd.ms-powerpoint->text/html /usr/local/bin/doc2html.pl \
application/x-shockwave-flash->text/html /usr/local/bin/doc2html.pl \
application/x-shockwave-flash2-preview->text/html /usr/local/bin/doc2html.pl

You may want to install doc2html.pl:

http://www.htdig.org/FAQ.html#q4.9

Regards,

Joe
-- 
     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        [EMAIL PROTECTED]

> Now I'm getting a different error.  It's finding the pdftotext command but
> outputting the help text you get if you don't give it any arguments.  Evidently
> it isn't being passed the file to be converted from pdf to text format.  The
> documentation on external_parsers in the Configuration file format --
> Attributes Web page doesn't seem to deal with passing arguments that refer to
> the pages being indexed.  Yet even if it's passing the file to be converted the
> following arguments would vary from one converter to another and so there must
> be some way to indicate them in the htdig.conf file.  The documentation says
> you can include arguments if you quote the whole command string but how do I
> indicate the file to be converted and where should the output of that command
> go?  The documentation also says, "Unless it is an external converter, which
> will output a document of a different content-type, then its output must follow
> the format described here."  I'm guessing that my case here is one of the
> external converters and the output doesn't have to conform to that format.
> The documentation also says, "If the second type is user-defined, then it's up
> to the converter script to put out a "Content-Type: type" header followed by a
> blank line, to indicate to htdig what type it should expect for the output,
> much like what a CGI script would do."  Is this a user-defined second type?
> I'm guessing that it isn't since it's plain text?
> 
> TIA.
> 
> Douglas 
> 
> ========
> Douglas Kline
> [EMAIL PROTECTED]



-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to