Yep, you need to train tesseract probably. See this link: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Once Tesseract is trained, e.g., on the type of handwritten note you are dealing with, it will perform better when called through Tika. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: <Hari>, Sekhar <[email protected]> Date: Tuesday, June 9, 2015 at 5:26 AM To: jpluser <[email protected]> Cc: "[email protected]" <[email protected]>, "[email protected]" <[email protected]>, "[email protected]" <[email protected]> Subject: RE: Integration of Tika with cTAKES >Hello Chris - > >I tried the methods mentioned in the link you shared. That has OCR >feature; but I was unable to configure it to read a handwritten note. The >software was just not able to recognize anything handwritten; but it was >able to recognize everything accurately that are machine printed. > >Any idea how to train Tika so it can read and convert handwritten >documents? > >Thanks, >Sekhar H. > >-----Original Message----- >From: Mattmann, Chris A (3980) [mailto:[email protected]] >Sent: Monday, June 08, 2015 11:20 AM >To: [email protected]; [email protected] >Subject: Re: Integration of Tika with cTAKES > >Hi Sekhar, > >[BCC to [email protected] to keep them in the loop] > >Sure, you can do this with Tika and Tesseract. FYI: > >http://wiki.apache.org/tika/TikaOCR/ > >Enjoy! :) > >(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser >to see how to run cTAKES on the result with Tika) > >Cheers, >Chris > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Chief Architect >Instrument Software and Science Data Systems Section (398) NASA Jet >Propulsion Laboratory Pasadena, CA 91109 USA >Office: 168-519, Mailstop: 168-527 >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Adjunct Associate Professor, Computer Science Department University of >Southern California, Los Angeles, CA 90089 USA >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > >-----Original Message----- >From: <Hari>, Sekhar <[email protected]> >Reply-To: "[email protected]" <[email protected]> >Date: Sunday, June 7, 2015 at 10:27 PM >To: "[email protected]" <[email protected]>, >"[email protected]" <[email protected]> >Subject: RE: Integration of Tika with cTAKES > >>Hello Pei, all - >> >>I am looking to convert handwritten image documents (Ex: a physician's >>handwritten medical prescription) into a text format file. The image >>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or >>Tessaract do this? Can anybody share their experience about this? Also, >>if it is possible to do with Tika, request you to send me a step-by-step >>guide. >> >>Many thanks, >>Sekhar H. >> >>-----Original Message----- >>From: Chen, Pei [mailto:[email protected]] >>Sent: Sunday, June 07, 2015 10:34 PM >>To: <[email protected]> >>Subject: Re: Integration of Tika with cTAKES >> >>This looks awesome. >>Perhaps we can reuse the Tika server on the ctakes demo VM. >> >>Sent from my iPhone >> >>> On Jun 6, 2015, at 8:40 PM, jay vyas <[email protected]> >>>wrote: >>> >>> This is awesome; thanks! >>> >>> For some of the new ctakes projects where fplks bc are aiming at >>> using it with big data tooling, the till abstraction might be super >>>useful. >>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" < >>> [email protected]> wrote: >>> >>>> Hey cTAKES peeps! >>>> >>>> We went ahead and integrated Tika with cTAKES for a project I'm >>>> working on at JPL. It will be part of the 1.9 release of Tika. You >>>> can check it out here: >>>> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org >>>> _ >>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp >>>> x >>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8j >>>> G >>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1Pa >>>> U >>>> PRM&e= >>>> >>>> >>>> Feedback welcomed. cTAKES is rad! >>>> >>>> Cheers, >>>> Chris >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Chief Architect >>>> Instrument Software and Science Data Systems Section (398) NASA Jet >>>> Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 168-519, Mailstop: 168-527 >>>> Email: [email protected] >>>> WWW: >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_- >>>> 7 >>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r= >>>> h >>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_ >>>> G >>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e= >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Associate Professor, Computer Science Department University >>>> of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >
