On Tue, Sep 16, 2008 at 12:29 PM, René Rebe <[EMAIL PROTECTED]> wrote:
> as I wrote earlier we worked on creating searchable PDFs from Cuneiform > (or other) OCR results. > > ExactImage 0.6(.0) now comes with an revamped PDF writer and hocr2pdf > front-end, together with a patch to cuneiform annotating each recognized > glyph with a hOCR-like bounding box allows the creation of pretty exactly > positioned, searchable PDF files: This is very cool. Great work. > Cuneiform annotated HTML patch (includes already committed <>& fix), which > is not yet conditional. For merging it it probably should only output > the additional > formating based on some additional command line switch, e.g. --hocr instead of > --html or so, but that probably requires changing some 20+ files to pass the > information down to the point where the HTML is written: I'll look into integrating this. Getting the hOCR/HTML switch should be quite straightforward, since PUMA_TOHTML and ROUT_FMT_HTML are only used in six different source files all together. > Have fun, patches and inspiration welcome, I see that you added line feeds after HTML tags to make the output easier to read. There is a preprocessor macro NEW_LINE for this. Yes, it is slightly brain-dead but should probably be used for consistency. _______________________________________________ Mailing list: https://launchpad.net/~cuneiform Post to : [email protected] Unsubscribe : https://launchpad.net/~cuneiform More help : https://help.launchpad.net/ListHelp

