Is that a typo in your path to tesseract? /urs/bin/tesseract => /usr/bin/tesseract
--Paul > On Sep 30, 2014, at 1:48 PM, "kevin slote" <kslo...@gmail.com> wrote: > > Unfortunately, that did not do it either. > > I did: > > $export > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/urs/bin/tesseract > > Here is the output from printenv > > kslote@ubuntu:~/tika/tika$ printenv > SHELL=/bin/bash > USERNAME=kslote > XDG_CONFIG_DIRS=/etc/xdg/xdg-gnome:/etc/xdg > DESKTOP_SESSION=gnome > PATH=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/urs/bin/tesseract > PWD=/home/kslote/tika/tika > HOME=/home/kslote > LOGNAME=kslote > _=/usr/bin/printenv > > > On Tue, Sep 30, 2014 at 4:13 PM, Tyler Palsulich <tpalsul...@gmail.com> > wrote: > >> Hi, >> >> Hmm. Could you try adding tesseract to your PATH? How did you install >> Tesseract? You should be able to do a straightforward `sudo apt-get install >> tesseract-ocr`. After that, the OCR tests should pass. We're still running >> into TIKA-1422, where a mail test fails. But, you can run just the OCR >> tests with `mvn test -Dtest=org.apache.tika.parser.ocr.TesseractOCRTest >> -DfailIfNoTests=false`. >> >> Let me know if that works for you! >> Tyler >> >>> On Tue, Sep 30, 2014 at 4:00 PM, kevin slote <kslo...@gmail.com> wrote: >>> >>> I am working on ubuntu 10.4. and I am having some trouble. >>> Tesseract is installed correctly, but just doing a clone from the repo >> and >>> installing with maven, I am getting some errors. >>> >>> This is before I did anything with tesseract installed. >>> >>> Failed tests: testPPTXOCR(org.apache.tika.parser.ocr.TesseractOCRTest): >>> Check for the image's text. >>> testDOCXOCR(org.apache.tika.parser.ocr.TesseractOCRTest) >>> testPDFOCR(org.apache.tika.parser.ocr.TesseractOCRTest) >>> >>> Next I hard coded the tesseractPath: >>> >>> I went into the TesseractOCRConfig.java and hard coded 'tesseractPath.' >>> The all tests passed and it built successfully, but then I went to post >>> some tiff's to the server. >>> That didn't work. So I tried adding some System.out.println("hello >> world") >>> (a little crude I know) inside the unit tests to confirm that tesseract >>> was working correctly. It looks like something happens in the unit test >> in >>> TesseractOCRTest.java >>> on the line that says TesseractOCRConfig config = new >>> TesseractOCRConfig();. Printing to stdout before works, but I get nothing >>> after. That happens before the assumeTrue(canRun(config));. So an >> exception >>> is not get raised. >>> >>> Then once everything is built, ocr does not work. That was why I >> figured I >>> would ask to see if I missed some sort of configuration step in building >>> it. >>> >>> Thanks a ton. >>> >>> >>> >>> >>> >>> On Tue, Sep 30, 2014 at 2:57 PM, Mattmann, Chris A (3980) < >>> chris.a.mattm...@jpl.nasa.gov> wrote: >>> >>>> Dear Kevin, >>>> >>>> Sure, it already works :) 1.7-SNAPSHOT. >>>> >>>> See this wiki page: >>>> >>>> https://wiki.apache.org/tika/TikaOCR >>>> >>>> I¹d be happy to discuss more. >>>> >>>> Thanks! >>>> >>>> Cheers, >>>> Chris >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Chief Architect >>>> Instrument Software and Science Data Systems Section (398) >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 168-519, Mailstop: 168-527 >>>> Email: chris.a.mattm...@nasa.gov >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Associate Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: kevin slote <kslo...@gmail.com> >>>> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> >>>> Date: Tuesday, September 30, 2014 at 8:52 AM >>>> To: "dev@tika.apache.org" <dev@tika.apache.org> >>>> Subject: OCR with tika-server >>>> >>>>> Hello all, >>>>> >>>>> I have been testing out the integration of tika with tesseract. >>>>> I was wondering if there is a way to get tika-server to run with >>>>> tesseract's OCR capabilities? >>>>> >>>>> Best >>>>> >>>>> Kevin Slote >>