I am using xpdf and doc2html.pl and it works reasonably now.

Bob


At 11:54 14/01/2003 -0600, you wrote:
According to Martin Vorlaender:
> Robert Isaac <[EMAIL PROTECTED]> wrote (via email):
...
> > Thank you for your message. I had increased the max_doc_size
> > to 5000000, and it is ver 3.1.6. I have over 100 pdf files on
> > the web site, and only 2 have been indexed during rundig.
>
> Next I'd suggest you run rundig with multiple -v's and output
> redirection, and have a look at any error messages in the logfile
> generated.
>
> The most simplistic error of course would be that the PDFs are not
> linked to (i.e. reachable from) any of the start_url's.

Good advice.  There are a number of things that could be going on
here, so you need more output from htdig to narrow things down.
See http://www.htdig.org/FAQ.html#q4.1 and the related questions to
which it refers.

> Also, I seem to remember a note (in the sources?) that xpdf wouldn't
> work. Could someone else please chime in here?

xpdf or its pdftops and pdftotext utilities can't be used as drop-in
replacements for acroread in the pdf_parser attribute in 3.1.x
releases of htdig.  However, pdftotext from the xpdf package can work
fine in an external parser or external converter like doc2html.pl.
See http://www.htdig.org/FAQ.html#q4.9
and http://www.htdig.org/FAQ.html#q1.13

Indeed, this is the preferred way of indexing files, and support for
the pdf_parser attribute has been dropped in the 3.2 beta releases.

--
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)
VOLVO OWNERS CLUB ONLINE
Robert Isaac, Director, Volvo Owners Club Limited
All email messages are virus scanned before being sent
PLEASE INCLUDE ALL PREVIOUS MESSAGE TEXT WITH REPLY

Club web site: www.volvoclub.org.uk

Also visit: www.trisaac.com for
John Wayne Collectors Plates
Roil Products





-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving your online business a competitive advantage. Test-drive a Thawte SSL certificate - our easy online guide will show you how. Click here to get started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to