Re: [htdig] Binary Characters in Summary

Gilles Detillieux Fri, 12 Oct 2001 10:02:04 -0700

According to Curtis J. Peredina:
> When I index PDF's the summary lines in the results pages are appearing
> like binary characters.
> 
> Any good tips for removing this?


Well, that depends a whole lot on how you're indexing PDFs, and what kind
of PDFs you're indexing.  If you use doc2html.pl or conv_doc.pl, along
with pdftotext from the xpdf 0.92 package, as I do, you're already using
the most up to date technique to index PDFs.  Some PDFs just use strange
encodings for some fonts, which pdftotext can't decypher.  We have 3 such
PDFs on our SCRC web site (search for "presentation"), which I was unable
to do anything about.

If you're experiencing this problem with most or all PDFs, or if you get
meaningful text from these PDFs when you run pdftotext manually on them,
then the problem may lie elsewhere.  In this case, you should post more
details about how you've configured htdig to index PDFs, after carefully
trying out the suggestions in http://www.htdig.org/FAQ.html#q4.9

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] Binary Characters in Summary

Reply via email to