[jira] [Commented] (TIKA-93) OCR support

Grant Ingersoll (JIRA) Fri, 07 Feb 2014 15:04:25 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895205#comment-13895205
 ]


Grant Ingersoll commented on TIKA-93:
-------------------------------------

Chris, are Parsers composable?  If it is a Parser, how do I make it work w/ all 
the different MIME types that have images?  (It's been a while since I've 
contributed to Tika, so please bare with me).  Wouldn't we have one off code 
that essentially hacks in OCR to the various different parsers?  I'm thinking 
there must be some way to normalize/simplify it.  I'll take a poke through the 
Parsers at a deeper level.  Maybe a Parser takes in an OCR Engine, which is an 
implementation of something like Tesseract or JavaOCR.

> OCR support
> -----------
>
>                 Key: TIKA-93
>                 URL: https://issues.apache.org/jira/browse/TIKA-93
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> I don't know of any decent open source pure Java OCR libraries, but there are 
> command line OCR tools like Tesseract 
> (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to 
> extract text content (where available) from image files.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TIKA-93) OCR support

Reply via email to