[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895276#comment-13895276 ]
Grant Ingersoll commented on TIKA-93: ------------------------------------- Well, Tesseract is out, at least as far as using Tess4j goes, as it has LGPL and BCL dependencies. Ugh, especially since Tesseract itself is ASL. And here Tesseract looks so promising, at least in my initial tests (compared to JavaOCR, which requires a bunch of training work up front) > OCR support > ----------- > > Key: TIKA-93 > URL: https://issues.apache.org/jira/browse/TIKA-93 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Jukka Zitting > Priority: Minor > > I don't know of any decent open source pure Java OCR libraries, but there are > command line OCR tools like Tesseract > (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to > extract text content (where available) from image files. -- This message was sent by Atlassian JIRA (v6.1.5#6160)