[PR] Added ImageMagick to support OCR image preprocessing [tika-docker]

via GitHub Fri, 30 May 2025 17:25:01 -0700


Flushot opened a new pull request, #27:
URL: https://github.com/apache/tika-docker/pull/27


   Tesseract OCR image preprocessor is broken in the current Docker image 
because ImageMagick is missing.
   
   You can reproduce this by setting `enableImagePreprocessing` in 
tika-config.xml:
   ```xml
     <?xml version="1.0" encoding="UTF-8"?>
     <properties>
       <parsers>
         <parser class="org.apache.tika.parser.DefaultParser">
           <parser-exclude 
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
         </parser>
         <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
           <params>
             <param name="enableImagePreprocessing" type="bool">true</param>
           </params>
         </parser>
       </parsers>
     </properties>
   ```
   
   When you try to process a document, you'll get this error:
   ```
   org.apache.tika.parser.ocr.TesseractOCRParser User has selected to 
preprocess images, but I can't find ImageMagick.Backing off to original file.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Added ImageMagick to support OCR image preprocessing [tika-docker]

Reply via email to