[jira] [Commented] (TIKA-93) OCR support

Grant Ingersoll (JIRA) Fri, 07 Feb 2014 16:02:15 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895276#comment-13895276
 ]


Grant Ingersoll commented on TIKA-93:
-------------------------------------

Well, Tesseract is out, at least as far as using Tess4j goes, as it has LGPL 
and BCL dependencies.  Ugh, especially since Tesseract itself is ASL.   And 
here Tesseract looks so promising, at least in my initial tests (compared to 
JavaOCR, which requires a bunch of training work up front)

> OCR support
> -----------
>
>                 Key: TIKA-93
>                 URL: https://issues.apache.org/jira/browse/TIKA-93
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> I don't know of any decent open source pure Java OCR libraries, but there are 
> command line OCR tools like Tesseract 
> (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to 
> extract text content (where available) from image files.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TIKA-93) OCR support

Reply via email to