OK, so I re-read all the FAQ sections, configured doc2html and pdf2html, and used them as external parsers, with

external_parsers: \
application/pdf /usr/local/bin/doc2html

now, I get hundreds of error messages,

External parser error: unknown field in line <HTML>
URL: .... vcdstory.pdf
External parser error: unknown field in line <HEAD>
URL: .... vcdstory.pdf
....



It's not clear to me why this should be so hard.

Geoff Hutchison wrote:

On Thu, 26 Dec 2002, Michael Friendly wrote:


I've read the FAQ on this topic, but still can't get rundig to index pdf files. I have set

max_doc_size: 500000

pdf_parser: /usr/bin/htdig-pdfparser
debian_pdf_parser: xpdf

and verified that pdftotext works from the command line on my debian
No, I don't think this is what you want to do. The pdf_parser attribute is
now quite depreciated--it really, truly expects Acrobat-generated PS
files.

I'd look at the FAQ again (specifically q4.9):
http://www.htdig.org/FAQ.html#q4.9

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


--
Michael Friendly Email: [EMAIL PROTECTED] Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to