[htdig-dev] trouble indexing pdf - strange characteres

Diane Tulpan Fri, 29 Aug 2003 14:02:31 +0000

Hello,

I've got troubles to index pdf files.

Using the faq, i've installed the file "doc2html.pl" in a repertory and put this line in the htdig.cong file :
external_parsers: application/pdf->text/html ../bin/doc2html.pl

When I try to search a PDF document, I have strange charateres like this in the result :

[file.pdf]
%PDF-1.4 %�� 61 0 obj << /Linearized 1 /O 65 /H [ 2959 395 ] /L 39883 /E 14087 /N 9 /T 38545 >> endobj xref 61 111 0000000016 00000 n 0000002569 00000 n 0000002773 00000 n 0000002908 00000 n 0000003354 00000 n 0000003714 00000 n 0000003794 00000 n 0000003893 00000 n 0000004127 00000 n ..

Do you have any idea on what can I do to resolve this ?

For information, the website is under apache.

Thank you if you could help me

Diane Tulpan,

Developer

[htdig-dev] trouble indexing pdf - strange characteres

Reply via email to