FW: [jira] [Commented] (TIKA-93) OCR support

Mattmann, Chris A (3980) Fri, 19 Sep 2014 09:05:42 -0700

I wanted to personally thank Grant for pushing this and getting
the initial code and idea started. Thank you Grant you da man.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: "Grant Ingersoll   (JIRA)" <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, September 19, 2014 7:34 AM
To: "[email protected]" <[email protected]>
Subject: [jira] [Commented] (TIKA-93) OCR support

>
>    [ 
>https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plug
>in.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140649#commen
>t-14140649 ] 
>
>Grant Ingersoll commented on TIKA-93:
>-------------------------------------
>
>Very cool!  Thanks for following through on this!
>
>> OCR support
>> -----------
>>
>>                 Key: TIKA-93
>>                 URL: https://issues.apache.org/jira/browse/TIKA-93
>>             Project: Tika
>>          Issue Type: New Feature
>>          Components: parser
>>            Reporter: Jukka Zitting
>>            Assignee: Chris A. Mattmann
>>            Priority: Minor
>>             Fix For: 1.7
>>
>>         Attachments: Petr_tika-config.xml, TIKA-93.patch,
>>TIKA-93.patch, TIKA-93.patch, TIKA-93.patch, TesseractOCRParser.patch,
>>TesseractOCRParser.patch, TesseractOCR_Tyler.patch,
>>TesseractOCR_Tyler_v2.patch, TesseractOCR_Tyler_v3.patch,
>>TesseractOCR_Tyler_v4.patch, testOCR.docx, testOCR.pdf, testOCR.pptx
>>
>>
>> I don't know of any decent open source pure Java OCR libraries, but
>>there are command line OCR tools like Tesseract
>>(http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika
>>to extract text content (where available) from image files.
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)

FW: [jira] [Commented] (TIKA-93) OCR support

Reply via email to