[ https://issues.apache.org/jira/browse/TIKA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562596#comment-17562596 ]
Giorgiana Ciobanu commented on TIKA-3811: ----------------------------------------- [~nick] I understand what you are saying and I appreciate the clarification around security aspect regarding the Tika implementation. For what I need, disabling the mime type detection by the file name/extension it's enough for now. I suppose it's the NameDetector that needs to be excluded from the DefaultDetector in tika config, right? Tika documentation says it is possible to exclude a detector by configuration and in this case would be org.apache.tika.detect.NameDetector . So I was expecting that, after excluding NameDetector and using the detect method with a File as input parameter, the guessing of the mime type by the file extension to be skipped. > Exclude NameDetector not working for Tika.detect(file) > ------------------------------------------------------ > > Key: TIKA-3811 > URL: https://issues.apache.org/jira/browse/TIKA-3811 > Project: Tika > Issue Type: Bug > Components: config, core, detector > Affects Versions: 2.3.0 > Reporter: Giorgiana Ciobanu > Priority: Major > Attachments: invalid_format.vtt, tika-config_test.xml > > > I need to detect mime type for a file but for security reason I want to > exclude the detection by file name extension. > I added a tika-config_test.xml (see attached) to my unit test but it still > detects file by name extension. > I attached a test file that is wrongly detected as text/vtt because of the > file extension, it should be text/plain in this case. > > The code of my unit test: > {code:java} > File file = new > File(getClass().getClassLoader().getResource("invalid_format.vtt").getFile()); > TikaConfig tikaConfig = new TikaConfig(this.getClass() > .getClassLoader() > .getResourceAsStream("tika-config_test.xml")); > > // returns text/vtt but should be text/plain > String mimeType = new Tika(tikaConfig).detect(file); > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)