According to Curtis J. Peredina: > When I index PDF's the summary lines in the results pages are appearing > like binary characters. > > Any good tips for removing this?
Well, that depends a whole lot on how you're indexing PDFs, and what kind of PDFs you're indexing. If you use doc2html.pl or conv_doc.pl, along with pdftotext from the xpdf 0.92 package, as I do, you're already using the most up to date technique to index PDFs. Some PDFs just use strange encodings for some fonts, which pdftotext can't decypher. We have 3 such PDFs on our SCRC web site (search for "presentation"), which I was unable to do anything about. If you're experiencing this problem with most or all PDFs, or if you get meaningful text from these PDFs when you run pdftotext manually on them, then the problem may lie elsewhere. In this case, you should post more details about how you've configured htdig to index PDFs, after carefully trying out the suggestions in http://www.htdig.org/FAQ.html#q4.9 -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

