kaiyaok2 commented on PR #1754:
URL: https://github.com/apache/tika/pull/1754#issuecomment-2106037067

   @THausherr @tballison  I confirmed that the two lines in `@BeforeEach` 
**does not** create a new repo if one exists from a previous test run:
   ```
   TikaConfig config = TikaConfig.getDefaultConfig();
   repo = config.getMimeRepository();
   ```
   
   
   `TikaConfig.getDefaultConfig()` simply calls the default `TikaConfig()` 
constructor 
(https://github.com/apache/tika/blob/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java#L390).
 
   
   When the system property `'tika.config'` and the environment variable 
`'TIKA_CONFIG'` are both not set, the `mimeTypes` field (accessible by 
`getMimeRepository()` - which is `repo` in our context) of the constructed 
config will be 
`getDefaultMimeTypes(getContextClassLoader())`(https://github.com/apache/tika/blob/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java#L246).
   
   Now take a look at `getDefaultMimeTypes()` - when a classloader is given 
(`getContextClassLoader()` in our context), it first tries to retrieve from a 
HashMap via `CLASSLOADER_SPECIFIC_DEFAULT_TYPES.get(classLoader);` 
(https://github.com/apache/tika/blob/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797/tika-core/src/main/java/org/apache/tika/mime/MimeTypes.java#L150).
 Notice that `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` is not an instance variable, 
but a **static** `HashMap`. 
   
   So in the first test execution, the `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` is 
empty, so `types` after the line `types = 
CLASSLOADER_SPECIFIC_DEFAULT_TYPES.get(classLoader);` will be `null`, and is 
later initialized by `MimeTypesFactory.create()` as desired. After this, the 
initialized `types` is put to the static `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` 
map 
(https://github.com/apache/tika/blob/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797/tika-core/src/main/java/org/apache/tika/mime/MimeTypes.java#L166).
 
   
   Now in the second test execution, the `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` 
already has the key of the context class loader, with corresponding `types` 
initialized from the previous run. So 
`CLASSLOADER_SPECIFIC_DEFAULT_TYPES.get(classLoader)` will return such 
initialized object directly. In other words, `repo` **would be the same object 
across repeated test runs**.
   
   I think the essential idea of `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` is 1-to-1 
map between classloaders and default types, so this implementation does not 
seem buggy for me, but please confirm.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to