Re: Tess4j API for TIKA OCR parser

Thamme Gowda Sun, 05 Mar 2017 11:07:49 -0800

Thejan,

Welcome to the world of mysteries. I am unable to explain why you are
facing it since I am unable to reproduce it.

Try out few other images, may be the image you have chosen is corrupt and
maybe there is an exception thrown and silently swallowed in code.

I suggest you do this:
   Please use an IDE like IntelliJ/Eclipse and use a debugger to understand
the call stack inside TesseractOCRParser. It is indeed a nice way to get to
the internals of Tika :-)

Best,
TG

*--*
*Thamme Gowda*
TG | @thammegowda <https://twitter.com/thammegowda>
~Sent via somebody's Webmail server!

On Sat, Mar 4, 2017 at 9:04 AM, Thejan Wijesinghe <
thejan.k.wijesin...@gmail.com> wrote:

>
> Hi Thamme,
>
> Yes. I am using Ubuntu :) and I had ImageMagick and Tesseract both
> installed in my system using apt-get. Since, I wasn't sure whether this is
> a problem with the APT software packages, I built both ImageMagick and
> Tesseract from sources.
>
> I also double checked the availability of Tesseract and ImageMagick by
> typing CLI commands that you suggested and the below commands as well,
>
> convert test.jpg -resize 64x64 resized_test.jpg
>
> tesseract test.jpg out
>
> and they worked.
>
> I can't find a exact reason why I am not getting metadata but when I used
> the AutoDetectParser class instead of the TesseractOCRParser class, I can
> extract both content and metadata.
>
> p.s. I will put updating the wiki OCR page in my TODO list :)
>

Re: Tess4j API for TIKA OCR parser

Reply via email to