[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895514#comment-13895514
]
Grant Ingersoll commented on TIKA-93:
-------------------------------------
It can, via some ancient JavaIO stuff, which, in some cases, has some weird
dependencies. Still working this out, but the way this is shaping up is that
it is all going to have to be very pluggable to avoid any of these cases. If
anyone is up for lobbying the Tess4J team to remove GPL/LGPL/viral
dependencies, we'd be in much better shape.
> OCR support
> -----------
>
> Key: TIKA-93
> URL: https://issues.apache.org/jira/browse/TIKA-93
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Priority: Minor
>
> I don't know of any decent open source pure Java OCR libraries, but there are
> command line OCR tools like Tesseract
> (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to
> extract text content (where available) from image files.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)