kaiyaok2 commented on PR #1754: URL: https://github.com/apache/tika/pull/1754#issuecomment-2106037067
@THausherr @tballison I confirmed that the two lines in `@BeforeEach` **does not** create a new repo if one exists from a previous test run: ``` TikaConfig config = TikaConfig.getDefaultConfig(); repo = config.getMimeRepository(); ``` `TikaConfig.getDefaultConfig()` simply calls the default `TikaConfig()` constructor (https://github.com/apache/tika/blob/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java#L390). When the system property `'tika.config'` and the environment variable `'TIKA_CONFIG'` are both not set, the `mimeTypes` field (accessible by `getMimeRepository()` - which is `repo` in our context) of the constructed config will be `getDefaultMimeTypes(getContextClassLoader())`(https://github.com/apache/tika/blob/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java#L246). Now take a look at `getDefaultMimeTypes()` - when a classloader is given (`getContextClassLoader()` in our context), it first tries to retrieve from a HashMap via `CLASSLOADER_SPECIFIC_DEFAULT_TYPES.get(classLoader);` (https://github.com/apache/tika/blob/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797/tika-core/src/main/java/org/apache/tika/mime/MimeTypes.java#L150). Notice that `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` is not an instance variable, but a **static** `HashMap`. So in the first test execution, the `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` is empty, so `types` after the line `types = CLASSLOADER_SPECIFIC_DEFAULT_TYPES.get(classLoader);` will be `null`, and is later initialized by `MimeTypesFactory.create()` as desired. After this, the initialized `types` is put to the static `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` map (https://github.com/apache/tika/blob/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797/tika-core/src/main/java/org/apache/tika/mime/MimeTypes.java#L166). Now in the second test execution, the `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` already has the key of the context class loader, with corresponding `types` initialized from the previous run. So `CLASSLOADER_SPECIFIC_DEFAULT_TYPES.get(classLoader)` will return such initialized object directly. In other words, `repo` **would be the same object across repeated test runs**. I think the essential idea of `CLASSLOADER_SPECIFIC_DEFAULT_TYPES` is 1-to-1 map between classloaders and default types, so this implementation does not seem buggy for me, but please confirm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org