[ 
https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845590#comment-17845590
 ] 

Tilman Hausherr edited comment on TIKA-4254 at 5/12/24 9:40 AM:
----------------------------------------------------------------

THausherr commented on PR #1754:
URL: https://github.com/apache/tika/pull/1754#issuecomment-2105679546

   Maybe I get it: {{repo = config.getMimeRepository();}} isn't creating 
anything new, it's retrieving something that is changed later by the test? If 
my understanding is correct then it's a deeper problem.





was (Author: githubbot):
THausherr commented on PR #1754:
URL: https://github.com/apache/tika/pull/1754#issuecomment-2105679546

   Maybe I get it: `repo = config.getMimeRepository();` isn't creating anything 
new, it's retrieving something that is changed later by the test? If my 
understanding is correct then it's a deeper problem.




> The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the 
> first run and fails in repeated runs in the same environment. 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-4254
>                 URL: https://issues.apache.org/jira/browse/TIKA-4254
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Kaiyao Ke
>            Priority: Major
>
> ### Brief Description of the Bug
> The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the 
> first run but fails in the second run in the same environment. The source of 
> the problem is that each test execution initializes a new media type 
> (`MimeType`) instance `testType` (same problem for `testType2`), and all 
> media types across different test executions attempt to use the same name 
> pattern `"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of 
> the test, the line `this.repo.addPattern(testType, pattern, true);` will 
> throw an error, since the name pattern is already used by the `testType` 
> instance initiated from the first test execution. Specifically, in the second 
> run, the `addGlob()` method of the `Pattern` class will assert conflict 
> patterns and throw a`MimeTypeException`(line 123 in `Patterns.java`).
> ### Failure Message in the 2nd Test Run:
> ```
> org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: 
> rtg_sst_grb_0\.5\.\d{8}
>       at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
>       at org.apache.tika.mime.Patterns.add(Patterns.java:71)
>       at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
>       at 
> org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
> ```
> ### Reproduce
> Use the `NIOInspector` plugin that supports rerunning individual tests in the 
> same environment:
> ```
> cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
> mvn edu.illinois:NIOInspector:rerun 
> -Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
> ```
> ### Proposed Fix
> Declare `testType` and `testType2` as static variables and initialize them at 
> class loading time. Therefore, repeated runs of `testJavaRegex()` will not 
> conflict each other. All tests pass and are idempotent after the fix.
> ### Necessity of Fix
> A fix is recommended as unit tests shall be idempotent, and state pollution 
> shall be mitigated so that newly introduced tests do not fail in the future 
> due to polluted shared states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to