Yep, you need to train tesseract probably. See this link:

https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3


Once Tesseract is trained, e.g., on the type of handwritten note
you are dealing with, it will perform better when called through
Tika.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




-----Original Message-----
From: <Hari>, Sekhar <[email protected]>
Date: Tuesday, June 9, 2015 at 5:26 AM
To: jpluser <[email protected]>
Cc: "[email protected]" <[email protected]>,
"[email protected]" <[email protected]>, "[email protected]"
<[email protected]>
Subject: RE: Integration of Tika with cTAKES

>Hello Chris - 
>
>I tried the methods mentioned in the link you shared. That has OCR
>feature; but I was unable to configure it to read a handwritten note. The
>software was just not able to recognize anything handwritten; but it was
>able to recognize everything accurately that are machine printed.
>
>Any idea how to train Tika so it can read and convert handwritten
>documents?
>
>Thanks,
>Sekhar H.
>
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:[email protected]]
>Sent: Monday, June 08, 2015 11:20 AM
>To: [email protected]; [email protected]
>Subject: Re: Integration of Tika with cTAKES
>
>Hi Sekhar,
>
>[BCC to [email protected] to keep them in the loop]
>
>Sure, you can do this with Tika and Tesseract. FYI:
>
>http://wiki.apache.org/tika/TikaOCR/
>
>Enjoy! :)
>
>(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser
>to see how to run cTAKES on the result with Tika)
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department University of
>Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>-----Original Message-----
>From: <Hari>, Sekhar <[email protected]>
>Reply-To: "[email protected]" <[email protected]>
>Date: Sunday, June 7, 2015 at 10:27 PM
>To: "[email protected]" <[email protected]>,
>"[email protected]" <[email protected]>
>Subject: RE: Integration of Tika with cTAKES
>
>>Hello Pei, all -
>>
>>I am looking to convert handwritten image documents (Ex: a physician's
>>handwritten medical prescription) into a text format file. The image
>>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or
>>Tessaract do this? Can anybody share their experience about this? Also,
>>if it is possible to do with Tika, request you to send me a step-by-step
>>guide.
>>
>>Many thanks,
>>Sekhar H.
>>
>>-----Original Message-----
>>From: Chen, Pei [mailto:[email protected]]
>>Sent: Sunday, June 07, 2015 10:34 PM
>>To: <[email protected]>
>>Subject: Re: Integration of Tika with cTAKES
>>
>>This looks awesome.
>>Perhaps we can reuse the Tika server on the ctakes demo VM.
>>
>>Sent from my iPhone
>>
>>> On Jun 6, 2015, at 8:40 PM, jay vyas <[email protected]>
>>>wrote:
>>> 
>>> This is awesome; thanks!
>>> 
>>> For some of the new ctakes projects where fplks bc are aiming at
>>> using it with big data tooling, the till abstraction might be super
>>>useful.
>>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
>>> [email protected]> wrote:
>>> 
>>>> Hey cTAKES peeps!
>>>> 
>>>> We went ahead and integrated Tika with cTAKES for a project I'm
>>>> working on at JPL. It will be part of the 1.9 release of Tika. You
>>>> can check it out here:
>>>> 
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org
>>>> _ 
>>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
>>>> x 
>>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8j
>>>> G 
>>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1Pa
>>>> U
>>>> PRM&e=
>>>> 
>>>> 
>>>> Feedback welcomed. cTAKES is rad!
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: [email protected]
>>>> WWW:  
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-
>>>> 7 
>>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>>>> h 
>>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_
>>>> G 
>>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Associate Professor, Computer Science Department University
>>>> of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>>>> 
>

Reply via email to