Hi Frank,

It's not so easy especially having dependency on native libraries.
It's also depends on "trained" profiles, languages & fonts.

The questions are - what are platforms we want to support. what are
languages and fonts.

BR,
Oleg


On Tue, Dec 24, 2013 at 9:48 AM, frank (JIRA) <[email protected]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856214#comment-13856214]
>
> frank commented on TIKA-93:
> ---------------------------
>
> this feature is really useful and helpful.
>
> > OCR support
> > -----------
> >
> >                 Key: TIKA-93
> >                 URL: https://issues.apache.org/jira/browse/TIKA-93
> >             Project: Tika
> >          Issue Type: New Feature
> >          Components: parser
> >            Reporter: Jukka Zitting
> >            Priority: Minor
> >
> > I don't know of any decent open source pure Java OCR libraries, but
> there are command line OCR tools like Tesseract (
> http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to
> extract text content (where available) from image files.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1.5#6160)
>

Reply via email to