[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895205#comment-13895205
]
Grant Ingersoll commented on TIKA-93:
-------------------------------------
Chris, are Parsers composable? If it is a Parser, how do I make it work w/ all
the different MIME types that have images? (It's been a while since I've
contributed to Tika, so please bare with me). Wouldn't we have one off code
that essentially hacks in OCR to the various different parsers? I'm thinking
there must be some way to normalize/simplify it. I'll take a poke through the
Parsers at a deeper level. Maybe a Parser takes in an OCR Engine, which is an
implementation of something like Tesseract or JavaOCR.
> OCR support
> -----------
>
> Key: TIKA-93
> URL: https://issues.apache.org/jira/browse/TIKA-93
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Priority: Minor
>
> I don't know of any decent open source pure Java OCR libraries, but there are
> command line OCR tools like Tesseract
> (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to
> extract text content (where available) from image files.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)