kaiyaok2 opened a new pull request, #1754:
URL: https://github.com/apache/tika/pull/1754

   Fixes https://issues.apache.org/jira/projects/TIKA/issues/TIKA-4254
   
   ### Brief Description of the Bug
   
   The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in 
the first run but fails in the second run in the same environment. The source 
of the problem is that each test execution initializes a new media type 
(`MimeType`) instance `testType` (same problem for `testType2`), and all media 
types across different test executions attempt to use the same name pattern 
`"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of the test, 
the line `this.repo.addPattern(testType, pattern, true);` will throw an error, 
since the name pattern is already used by the `testType` instance initiated 
from the first test execution. Specifically, in the second run, the `addGlob()` 
method of the `Pattern` class will assert conflict patterns and throw 
a`MimeTypeException`(line 123 in `Patterns.java`).
   
   ### Failure Message in the 2nd Test Run:
   ```
   org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: 
rtg_sst_grb_0\.5\.\d{8}
        at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
        at org.apache.tika.mime.Patterns.add(Patterns.java:71)
        at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
        at 
org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
   ```
   
   ### Reproduce
   
   Use the `NIOInspector` plugin that supports rerunning individual tests in 
the same environment:
   ```
   cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
   mvn edu.illinois:NIODetector:rerun 
-Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
   ```
   
   ### Proposed Fix
   
   Declare `testType` and `testType2` as static variables and initialize them 
at class loading time. Therefore, repeated runs of `testJavaRegex()` will not 
conflict each other. All tests pass and are idempotent after the fix.
   
   ### Necessity of Fix
   
   A fix is recommended as unit tests shall be idempotent, and state pollution 
shall be mitigated so that newly introduced tests do not fail in the future due 
to polluted shared states.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to