Re: [Cuneiform] Creating searchable PDF from Cuneiform OCR results

Jussi Pakkanen Wed, 17 Sep 2008 02:22:12 -0700

On Tue, Sep 16, 2008 at 12:29 PM, René Rebe <[EMAIL PROTECTED]> wrote:


> as I wrote earlier we worked on creating searchable PDFs from Cuneiform
> (or other) OCR results.
>
> ExactImage 0.6(.0) now comes with an revamped PDF writer and hocr2pdf
> front-end, together with a patch to cuneiform annotating each recognized
> glyph with a hOCR-like bounding box allows the creation of pretty exactly
> positioned, searchable PDF files:

This is very cool. Great work.

> Cuneiform annotated HTML patch (includes already committed <>& fix), which
> is not yet conditional. For merging it it probably should only output
> the additional
> formating based on some additional command line switch, e.g. --hocr instead of
> --html or so, but that probably requires changing some 20+ files to pass the
> information down to the point where the HTML is written:

I'll look into integrating this. Getting the hOCR/HTML switch should
be quite straightforward, since PUMA_TOHTML and ROUT_FMT_HTML are only
used in six different source files all together.

> Have fun, patches and inspiration welcome,

I see that you added line feeds after HTML tags to make the output
easier to read. There is a preprocessor macro NEW_LINE for this. Yes,
it is slightly brain-dead but should probably be used for consistency.

_______________________________________________
Mailing list: https://launchpad.net/~cuneiform
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~cuneiform
More help   : https://help.launchpad.net/ListHelp

Re: [Cuneiform] Creating searchable PDF from Cuneiform OCR results

Reply via email to