Hi Gilles,
El Perfecto:
Thank you very much for your giant leap for PDF kind;)
I applied your second patch to parse_doc.pl and Derek's fix to
xpdf/TextOutputDev.cc; now all the PDF files in my search path are indexed
using the external parser directive in the config file:
external_parsers: application/msword /usr/local/bin/parse_doc.pl \
application/postscript /usr/local/bin/parse_doc.pl \
application/pdf /usr/local/bin/parse_doc.pl
-.0001:
One crappy PDF file creates a score of errors during the dig:
External parser error in line:w^@(Garbage)*
It also appears in the search results as:
Word Document prereg.pdf
instead of
PDF Document prereg.pdf
The file is:
http://www.ccsf.cc.ca.us/Resources/Title3/training/prereg.pdf
It can be searched with:
http://www.ccsf.cc.ca.us/cgi-bin/htsearch?config=htdig&restrict=\
&exclude=&words=pre-registration+form&method=and&format=builtin-short
No other word in that file gives a search result, I guess the error had
happened at the top of the file after the line Pre-Registration Form.
P.S. I couldn't correspond during the week because I had a hectic one;
come to think of it, I have one every week;)
Best regards,
Joe
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah [EMAIL PROTECTED]
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.