According to Martin Vorlaender: > Robert Isaac <[EMAIL PROTECTED]> wrote (via email): ... > > Thank you for your message. I had increased the max_doc_size > > to 5000000, and it is ver 3.1.6. I have over 100 pdf files on > > the web site, and only 2 have been indexed during rundig. > > Next I'd suggest you run rundig with multiple -v's and output > redirection, and have a look at any error messages in the logfile > generated. > > The most simplistic error of course would be that the PDFs are not > linked to (i.e. reachable from) any of the start_url's.
Good advice. There are a number of things that could be going on here, so you need more output from htdig to narrow things down. See http://www.htdig.org/FAQ.html#q4.1 and the related questions to which it refers. > Also, I seem to remember a note (in the sources?) that xpdf wouldn't > work. Could someone else please chime in here? xpdf or its pdftops and pdftotext utilities can't be used as drop-in replacements for acroread in the pdf_parser attribute in 3.1.x releases of htdig. However, pdftotext from the xpdf package can work fine in an external parser or external converter like doc2html.pl. See http://www.htdig.org/FAQ.html#q4.9 and http://www.htdig.org/FAQ.html#q1.13 Indeed, this is the preferred way of indexing files, and support for the pdf_parser attribute has been dropped in the 3.2 beta releases. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

