[ 
https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845586#comment-17845586
 ] 

ASF GitHub Bot commented on TIKA-4254:
--------------------------------------

kaiyaok2 commented on PR #1754:
URL: https://github.com/apache/tika/pull/1754#issuecomment-2105675512

   > The `repo` is refreshed with each unit test in the `@BeforeEach` call, 
though. Is NIODetector respecting that?
   
   @tballison Yes, NIOInspector uses the JUnit Jupiter engine and takes into 
account of all setup and teardown methods. Notice that although the `MimeTypes` 
instance `repo` is refreshed, `MimeTypes.addPattern()` calls `Patterns.add()` 
,which then calls `addGlob()`:
   ```
   private void addGlob(String glob, MimeType type) throws MimeTypeException {
           MimeType previous = globs.get(glob);
           if (previous == null || 
registry.isSpecializationOf(previous.getType(), type.getType())) {
               globs.put(glob, type);
           } else if (previous == type ||
                   registry.isSpecializationOf(type.getType(), 
previous.getType())) {
               // do nothing
           } else {
               throw new MimeTypeException("Conflicting glob pattern: " + glob);
           }
       }
   ```
   In the second execution of the test, `previous` would be the `testType` 
object constructed in the first test run, while `type` is the `testType` object 
constructed in the second test run (from 2 different calls to `new 
MimeType(MediaType.parse("foo/bar"))`. Now since `previous != type` are not the 
same, the exception is thrown. 
   
   Ideally we shall go to the `// do nothing` branch in repeated runs, thus the 
fix.




> The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the 
> first run and fails in repeated runs in the same environment. 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-4254
>                 URL: https://issues.apache.org/jira/browse/TIKA-4254
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Kaiyao Ke
>            Priority: Major
>
> ### Brief Description of the Bug
> The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the 
> first run but fails in the second run in the same environment. The source of 
> the problem is that each test execution initializes a new media type 
> (`MimeType`) instance `testType` (same problem for `testType2`), and all 
> media types across different test executions attempt to use the same name 
> pattern `"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of 
> the test, the line `this.repo.addPattern(testType, pattern, true);` will 
> throw an error, since the name pattern is already used by the `testType` 
> instance initiated from the first test execution. Specifically, in the second 
> run, the `addGlob()` method of the `Pattern` class will assert conflict 
> patterns and throw a`MimeTypeException`(line 123 in `Patterns.java`).
> ### Failure Message in the 2nd Test Run:
> ```
> org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: 
> rtg_sst_grb_0\.5\.\d{8}
>       at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
>       at org.apache.tika.mime.Patterns.add(Patterns.java:71)
>       at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
>       at 
> org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
> ```
> ### Reproduce
> Use the `NIOInspector` plugin that supports rerunning individual tests in the 
> same environment:
> ```
> cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
> mvn edu.illinois:NIOInspector:rerun 
> -Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
> ```
> ### Proposed Fix
> Declare `testType` and `testType2` as static variables and initialize them at 
> class loading time. Therefore, repeated runs of `testJavaRegex()` will not 
> conflict each other. All tests pass and are idempotent after the fix.
> ### Necessity of Fix
> A fix is recommended as unit tests shall be idempotent, and state pollution 
> shall be mitigated so that newly introduced tests do not fail in the future 
> due to polluted shared states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to