Thejan, Welcome to the world of mysteries. I am unable to explain why you are facing it since I am unable to reproduce it.
Try out few other images, may be the image you have chosen is corrupt and maybe there is an exception thrown and silently swallowed in code. I suggest you do this: Please use an IDE like IntelliJ/Eclipse and use a debugger to understand the call stack inside TesseractOCRParser. It is indeed a nice way to get to the internals of Tika :-) Best, TG *--* *Thamme Gowda* TG | @thammegowda <https://twitter.com/thammegowda> ~Sent via somebody's Webmail server! On Sat, Mar 4, 2017 at 9:04 AM, Thejan Wijesinghe < thejan.k.wijesin...@gmail.com> wrote: > > Hi Thamme, > > Yes. I am using Ubuntu :) and I had ImageMagick and Tesseract both > installed in my system using apt-get. Since, I wasn't sure whether this is > a problem with the APT software packages, I built both ImageMagick and > Tesseract from sources. > > I also double checked the availability of Tesseract and ImageMagick by > typing CLI commands that you suggested and the below commands as well, > > convert test.jpg -resize 64x64 resized_test.jpg > > tesseract test.jpg out > > and they worked. > > I can't find a exact reason why I am not getting metadata but when I used > the AutoDetectParser class instead of the TesseractOCRParser class, I can > extract both content and metadata. > > p.s. I will put updating the wiki OCR page in my TODO list :) >