Re: OCR with tika-server

Ramirez, Paul M (398J) Tue, 30 Sep 2014 16:29:57 -0700

Is that a typo in your path to tesseract?

/urs/bin/tesseract => /usr/bin/tesseract


--Paul

> On Sep 30, 2014, at 1:48 PM, "kevin slote" <kslo...@gmail.com> wrote:
> 
> Unfortunately, that did not do it either.
> 
> I did:
> 
>   $export
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/urs/bin/tesseract
> 
> Here is the output from printenv
> 
> kslote@ubuntu:~/tika/tika$ printenv
> SHELL=/bin/bash
> USERNAME=kslote
> XDG_CONFIG_DIRS=/etc/xdg/xdg-gnome:/etc/xdg
> DESKTOP_SESSION=gnome
> PATH=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/urs/bin/tesseract
> PWD=/home/kslote/tika/tika
> HOME=/home/kslote
> LOGNAME=kslote
> _=/usr/bin/printenv
> 
> 
> On Tue, Sep 30, 2014 at 4:13 PM, Tyler Palsulich <tpalsul...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> Hmm. Could you try adding tesseract to your PATH? How did you install
>> Tesseract? You should be able to do a straightforward `sudo apt-get install
>> tesseract-ocr`. After that, the OCR tests should pass. We're still running
>> into TIKA-1422, where a mail test fails. But, you can run just the OCR
>> tests with `mvn test -Dtest=org.apache.tika.parser.ocr.TesseractOCRTest
>> -DfailIfNoTests=false`.
>> 
>> Let me know if that works for you!
>> Tyler
>> 
>>> On Tue, Sep 30, 2014 at 4:00 PM, kevin slote <kslo...@gmail.com> wrote:
>>> 
>>> I am working on ubuntu 10.4. and I am having some trouble.
>>> Tesseract is installed correctly, but just doing a clone from the repo
>> and
>>> installing with maven, I am getting some errors.
>>> 
>>> This is before I did anything with tesseract installed.
>>> 
>>> Failed tests:   testPPTXOCR(org.apache.tika.parser.ocr.TesseractOCRTest):
>>> Check for the image's text.
>>>  testDOCXOCR(org.apache.tika.parser.ocr.TesseractOCRTest)
>>>  testPDFOCR(org.apache.tika.parser.ocr.TesseractOCRTest)
>>> 
>>> Next I hard coded the tesseractPath:
>>> 
>>> I went into the TesseractOCRConfig.java and hard coded 'tesseractPath.'
>>> The all tests passed and it built successfully, but then I went to post
>>> some tiff's to the server.
>>> That didn't work. So I tried adding some System.out.println("hello
>> world")
>>> (a little crude I know) inside the unit tests to confirm that tesseract
>>> was working correctly.  It looks like something happens in the unit test
>> in
>>> TesseractOCRTest.java
>>> on the line that says TesseractOCRConfig config = new
>>> TesseractOCRConfig();. Printing to stdout before works, but I get nothing
>>> after. That happens before the assumeTrue(canRun(config));. So an
>> exception
>>> is not get raised.
>>> 
>>> Then once everything is built, ocr does not work.  That was why I
>> figured I
>>> would ask to see if I missed some sort of configuration step in building
>>> it.
>>> 
>>> Thanks a ton.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Sep 30, 2014 at 2:57 PM, Mattmann, Chris A (3980) <
>>> chris.a.mattm...@jpl.nasa.gov> wrote:
>>> 
>>>> Dear Kevin,
>>>> 
>>>> Sure, it already works :) 1.7-SNAPSHOT.
>>>> 
>>>> See this wiki page:
>>>> 
>>>> https://wiki.apache.org/tika/TikaOCR
>>>> 
>>>> I¹d be happy to discuss more.
>>>> 
>>>> Thanks!
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398)
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: chris.a.mattm...@nasa.gov
>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Associate Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: kevin slote <kslo...@gmail.com>
>>>> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
>>>> Date: Tuesday, September 30, 2014 at 8:52 AM
>>>> To: "dev@tika.apache.org" <dev@tika.apache.org>
>>>> Subject: OCR with tika-server
>>>> 
>>>>> Hello all,
>>>>> 
>>>>> I have been testing out the integration of tika with tesseract.
>>>>> I was wondering if there is  a way to get tika-server to run with
>>>>> tesseract's OCR capabilities?
>>>>> 
>>>>> Best
>>>>> 
>>>>> Kevin Slote
>>

Re: OCR with tika-server

Reply via email to