Hi Oleg, Thanks for the FYI, Oleg and the heads up on what needs to improve here.
Cheers, Chris On Nov 29, 2011, at 11:10 PM, Oleg Tikhonov wrote: > Hi Chris, > I was playing with it recently. > One of the big issues with tesseract is a tough process of the preparing > training set for multiple fonts and languages. > In addition, we also have to add an option for image preprocessing (skewing > + filtering etc). > > > BR, > Oleg > > On Wed, Nov 30, 2011 at 8:59 AM, Mattmann, Chris A (388J) < > chris.a.mattm...@jpl.nasa.gov> wrote: > >> Hey Guys, >> >> FYI: http://code.google.com/p/tesseract-ocr/ >> >> I was pointed at this library by someone recently asking me if Tika >> was interested in integrating with this library. It's ALv2 licensed, and >> seems pretty interesting. I'm going to check it out, but just >> wanted to give everyone a heads up. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++