Flushot opened a new pull request, #27:
URL: https://github.com/apache/tika-docker/pull/27
Tesseract OCR image preprocessor is broken in the current Docker image
because ImageMagick is missing.
You can reproduce this by setting `enableImagePreprocessing` in
tika-config.xml:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<parser-exclude
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
</parser>
<parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
<params>
<param name="enableImagePreprocessing" type="bool">true</param>
</params>
</parser>
</parsers>
</properties>
```
When you try to process a document, you'll get this error:
```
org.apache.tika.parser.ocr.TesseractOCRParser User has selected to
preprocess images, but I can't find ImageMagick.Backing off to original file.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]