[ 
https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612695#comment-17612695
 ] 

Hudson commented on TIKA-3812:
------------------------------

FAILURE: Integrated in Jenkins build Tika ยป tika-main-jdk8 #830 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/830/])
TIKA-3812 -- add unit test to confirm plain png and jpeg work (tallison: 
[https://github.com/apache/tika/commit/f69c0ba5d976a72f8075fbd81d732d6ca74a188d])
* (add) 
tika-parsers/tika-parsers-extended/tika-parsers-extended-integration-tests/src/test/resources/test-documents/testOCR.jpg
* (edit) 
tika-integration-tests/tika-resource-loading-tests/src/test/java/org/apache/custom/parser/CustomParserTest.java
* (add) 
tika-parsers/tika-parsers-extended/tika-parsers-extended-integration-tests/src/test/resources/test-documents/testOCR.png
* (edit) 
tika-parsers/tika-parsers-extended/tika-parsers-extended-integration-tests/pom.xml
* (add) 
tika-parsers/tika-parsers-extended/tika-parsers-extended-integration-tests/src/test/java/org/apache/tika/parser/ocr/TestOCR.java


> Parser Order: image get parsed by GDALParser instead of TesseractOCRParser
> --------------------------------------------------------------------------
>
>                 Key: TIKA-3812
>                 URL: https://issues.apache.org/jira/browse/TIKA-3812
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.4.1
>            Reporter: Eugen Caruntu
>            Priority: Minor
>             Fix For: 2.5.0
>
>         Attachments: parser-diffs.tgz
>
>
> The selected parser seems to be different in 2.4.1. For example sending an 
> image (jpg/png) that was previously (2.4.0) processed by TesseractOCRParser, 
> now gets parsed by GDALParser.
> Seems that when multiple parsers support same file types, the selected parser 
> depends on the order in which they get loaded.
> For example the GDALParser, ImageParser and TesseractOCRParser all support 
> image/jpeg, image/png, image/gif ...
> A recent change is reversing the parser order (TIKA-3750).
> Re-configuring the GDALParser by excluding the image mime types might work, 
> but there could be other duplicated parsers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to