Tess4j API for TIKA OCR parser

Thejan Wijesinghe Sat, 04 Mar 2017 09:05:20 -0800

Hi Thamme,

Yes. I am using Ubuntu :) and I had ImageMagick and Tesseract both
installed in my system using apt-get. Since, I wasn't sure whether this is
a problem with the APT software packages, I built both ImageMagick and
Tesseract from sources.


I also double checked the availability of Tesseract and ImageMagick by
typing CLI commands that you suggested and the below commands as well,

convert test.jpg -resize 64x64 resized_test.jpg

tesseract test.jpg out

and they worked.

I can't find a exact reason why I am not getting metadata but when I used
the AutoDetectParser class instead of the TesseractOCRParser class, I can
extract both content and metadata.

p.s. I will put updating the wiki OCR page in my TODO list :)

Tess4j API for TIKA OCR parser

Reply via email to