According to Stefan Nehlsen:
> maybe you will find this useful.
>
> This perl script may be used by htdig as external parser for pdf-files.
Thanks for the contribution. It certainly looks like an improvement
over parse_doc.pl, so I've put it up on
ftp://ftp.htdig.org/pub/htdig/contrib/parsers/
However, since 3.1.4 was released, the use of external parsers isn't
usually recommended, as external converters do a better job. See
ftp://ftp.htdig.org/pub/htdig/contrib/parsers/doc2html.tar.gz
for the latest and fanciest incarnation of these. The big problem
with external parsers is they don't parse words consistently in the
manner that the internal parsers do, and they don't respond to changes
in the config file. E.g., if you drop minimum_word_length from 3 to
2, you still won't get 2-letter words from the external parser because
of the hardcoded 3 in there. It also won't look at valid_punctuation,
extra_word_characters, or any other attribute that controls parsing.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html