[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845586#comment-17845586 ]
ASF GitHub Bot commented on TIKA-4254: -------------------------------------- kaiyaok2 commented on PR #1754: URL: https://github.com/apache/tika/pull/1754#issuecomment-2105675512 > The `repo` is refreshed with each unit test in the `@BeforeEach` call, though. Is NIODetector respecting that? @tballison Yes, NIOInspector uses the JUnit Jupiter engine and takes into account of all setup and teardown methods. Notice that although the `MimeTypes` instance `repo` is refreshed, `MimeTypes.addPattern()` calls `Patterns.add()` ,which then calls `addGlob()`: ``` private void addGlob(String glob, MimeType type) throws MimeTypeException { MimeType previous = globs.get(glob); if (previous == null || registry.isSpecializationOf(previous.getType(), type.getType())) { globs.put(glob, type); } else if (previous == type || registry.isSpecializationOf(type.getType(), previous.getType())) { // do nothing } else { throw new MimeTypeException("Conflicting glob pattern: " + glob); } } ``` In the second execution of the test, `previous` would be the `testType` object constructed in the first test run, while `type` is the `testType` object constructed in the second test run (from 2 different calls to `new MimeType(MediaType.parse("foo/bar"))`. Now since `previous != type` are not the same, the exception is thrown. Ideally we shall go to the `// do nothing` branch in repeated runs, thus the fix. > The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the > first run and fails in repeated runs in the same environment. > -------------------------------------------------------------------------------------------------------------------------------------------- > > Key: TIKA-4254 > URL: https://issues.apache.org/jira/browse/TIKA-4254 > Project: Tika > Issue Type: Bug > Reporter: Kaiyao Ke > Priority: Major > > ### Brief Description of the Bug > The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the > first run but fails in the second run in the same environment. The source of > the problem is that each test execution initializes a new media type > (`MimeType`) instance `testType` (same problem for `testType2`), and all > media types across different test executions attempt to use the same name > pattern `"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of > the test, the line `this.repo.addPattern(testType, pattern, true);` will > throw an error, since the name pattern is already used by the `testType` > instance initiated from the first test execution. Specifically, in the second > run, the `addGlob()` method of the `Pattern` class will assert conflict > patterns and throw a`MimeTypeException`(line 123 in `Patterns.java`). > ### Failure Message in the 2nd Test Run: > ``` > org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: > rtg_sst_grb_0\.5\.\d{8} > at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123) > at org.apache.tika.mime.Patterns.add(Patterns.java:71) > at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450) > at > org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) > ``` > ### Reproduce > Use the `NIOInspector` plugin that supports rerunning individual tests in the > same environment: > ``` > cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package > mvn edu.illinois:NIOInspector:rerun > -Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex > ``` > ### Proposed Fix > Declare `testType` and `testType2` as static variables and initialize them at > class loading time. Therefore, repeated runs of `testJavaRegex()` will not > conflict each other. All tests pass and are idempotent after the fix. > ### Necessity of Fix > A fix is recommended as unit tests shall be idempotent, and state pollution > shall be mitigated so that newly introduced tests do not fail in the future > due to polluted shared states. -- This message was sent by Atlassian Jira (v8.20.10#820010)