Greg Lepore wrote:
>
> Hyphenation example:
> On our site we are doing large scale conversion of previously published
> material to html via OCR. As we are reproducing format as well as text,
> his results in many hyphenations. For a page with several examples:
> http://mdsa.net/megafile/msa/speccol/sc2900/sc2908/000001/000138/html/am138--606.html
> The hyphens appear as regular (-). No special characters are inserted by
> the OCR programs.
For these documents, I would rather suggest processing the HTML
output of the OCR software with a simple filter program that cuts
out the "-<BR>" from the texts.
This could be done quite easily using a lex(1) scanner.
cheers,
Torsten
--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14 Tel: +49-4101-403605
D-25474 Ellerbek Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED] Internet: http://www.inwise.de
_______________________________________________
htdig-general mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-general