[jira] [Commented] (TIKA-93) OCR support

Grant Ingersoll (JIRA) Sat, 08 Feb 2014 03:15:44 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895514#comment-13895514
 ]


Grant Ingersoll commented on TIKA-93:
-------------------------------------

It can, via some ancient JavaIO stuff, which, in some cases, has some weird 
dependencies.  Still working this out, but the way this is shaping up is that 
it is all going to have to be very pluggable to avoid any of these cases.  If 
anyone is up for lobbying the Tess4J team to remove GPL/LGPL/viral 
dependencies, we'd be in much better shape.

> OCR support
> -----------
>
>                 Key: TIKA-93
>                 URL: https://issues.apache.org/jira/browse/TIKA-93
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> I don't know of any decent open source pure Java OCR libraries, but there are 
> command line OCR tools like Tesseract 
> (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to 
> extract text content (where available) from image files.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TIKA-93) OCR support

Reply via email to