According to Martin Vorlaender:
> Robert Isaac <[EMAIL PROTECTED]> wrote (via email):
...
> > Thank you for your message. I had increased the max_doc_size 
> > to 5000000, and it is ver 3.1.6. I have over 100 pdf files on
> > the web site, and only 2 have been indexed during rundig.
> 
> Next I'd suggest you run rundig with multiple -v's and output
> redirection, and have a look at any error messages in the logfile
> generated.
> 
> The most simplistic error of course would be that the PDFs are not
> linked to (i.e. reachable from) any of the start_url's.

Good advice.  There are a number of things that could be going on
here, so you need more output from htdig to narrow things down.
See http://www.htdig.org/FAQ.html#q4.1 and the related questions to
which it refers.

> Also, I seem to remember a note (in the sources?) that xpdf wouldn't
> work. Could someone else please chime in here?

xpdf or its pdftops and pdftotext utilities can't be used as drop-in
replacements for acroread in the pdf_parser attribute in 3.1.x
releases of htdig.  However, pdftotext from the xpdf package can work
fine in an external parser or external converter like doc2html.pl.
See http://www.htdig.org/FAQ.html#q4.9
and http://www.htdig.org/FAQ.html#q1.13

Indeed, this is the preferred way of indexing files, and support for
the pdf_parser attribute has been dropped in the 3.2 beta releases.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to