Ruairidh Williamson created TIKA-4711:
-----------------------------------------

             Summary: Tika OCR content type tests are failing for 3.3.0
                 Key: TIKA-4711
                 URL: https://issues.apache.org/jira/browse/TIKA-4711
             Project: Tika
          Issue Type: Bug
    Affects Versions: 3.3.0
            Reporter: Ruairidh Williamson


When running the tests on branch_3x they fail with errors about the content 
types. For example
{code:java}
[ERROR] org.apache.tika.parser.AutoDetectParserTest.testImages -- Time elapsed: 
0.095 s <<< FAILURE!
org.opentest4j.AssertionFailedError:
Bad content type: Test parameters:
  resourceRealName        = /test-documents/testBMP.bmp
  resourceStatedName      = /test-documents/testBMP.bmp
  realType                = image/bmp
  statedType              = image/bmp
  expectedContentFragment = null
 ==> expected: <image/bmp> but was: <image/ocr-bmp>
        at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at 
org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
        at 
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
        at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1156)
        at 
org.apache.tika.parser.AutoDetectParserTest.assertAutoDetect(AutoDetectParserTest.java:106)
        at 
org.apache.tika.parser.AutoDetectParserTest.assertAutoDetect(AutoDetectParserTest.java:132)
        at 
org.apache.tika.parser.AutoDetectParserTest.assertAutoDetect(AutoDetectParserTest.java:147)
        at 
org.apache.tika.parser.AutoDetectParserTest.testImages(AutoDetectParserTest.java:254)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) {code}
The change that introduced this issue was:

[https://github.com/apache/tika/commit/65bf98d3c54e1500878398210ef64aafe2dcb589]

I believe this is fixed in main by:

[https://github.com/apache/tika/commit/b9a0d9889b999b496680fdca03db245fe8b62b73]

So I will submit a pr based on the fix in main.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to