Try:
external_parsers: \
application/pdf->text/html /usr/local/bin/doc2html.pl
I have to agree with you that this is not easy. Only it isn't easy to make
it easy either!
--
David Adams
Information Systems Services
Southampton University
----- Original Message -----
From: "Michael Friendly" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Geoff Hutchison" <[EMAIL PROTECTED]>
Sent: Saturday, December 28, 2002 5:01 PM
Subject: Re: [htdig] can't index PDF files
> OK, so I re-read all the FAQ sections, configured doc2html and pdf2html,
> and used them as external parsers, with
>
> external_parsers: \
> application/pdf /usr/local/bin/doc2html
>
> now, I get hundreds of error messages,
>
> External parser error: unknown field in line <HTML>
> URL: .... vcdstory.pdf
> External parser error: unknown field in line <HEAD>
> URL: .... vcdstory.pdf
> ....
>
>
>
> It's not clear to me why this should be so hard.
>
> Geoff Hutchison wrote:
>
> >On Thu, 26 Dec 2002, Michael Friendly wrote:
> >
> >
> >
> >>I've read the FAQ on this topic, but still can't get rundig to index pdf
> >>files. I have set
> >>
> >>max_doc_size: 500000
> >>
> >>pdf_parser: /usr/bin/htdig-pdfparser
> >>debian_pdf_parser: xpdf
> >>
> >>and verified that pdftotext works from the command line on my debian
> >>
> >>
> >
> >No, I don't think this is what you want to do. The pdf_parser attribute
is
> >now quite depreciated--it really, truly expects Acrobat-generated PS
> >files.
> >
> >I'd look at the FAQ again (specifically q4.9):
> >http://www.htdig.org/FAQ.html#q4.9
> >
> >--
> >-Geoff Hutchison
> >Williams Students Online
> >http://wso.williams.edu/
> >
> >
> >
>
> --
> Michael Friendly Email: [EMAIL PROTECTED]
> Professor, Psychology Dept.
> York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
> Toronto, ONT M3J 1P3 CANADA
>
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html