[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Iain Lopata updated NUTCH-1991: ------------------------------- Attachment: NUTCH-1991-1.6.patch > Tika mime detection not using Nutch supplied tika-mimetypes.xml for content > based detection > ------------------------------------------------------------------------------------------- > > Key: NUTCH-1991 > URL: https://issues.apache.org/jira/browse/NUTCH-1991 > Project: Nutch > Issue Type: Bug > Components: util > Reporter: Iain Lopata > Priority: Minor > Attachments: NUTCH-1991-1.6.patch > > > From Nutch Version 1.5 onwards the MimeUtil.java class that acts as a facade > to Tika to perform mime type detection uses a process that attempts a match > using the mimetype returned by the server, the filename and the content. > NUTCH-1045 provided for the use of an external tika-mimetype.xml file which > provides the configuration for this process. However, the content based > detection did not use this file, but instead reverted to using the > configuration included in the tika library. Consequently, any content based > match rules added to the nutch version of the configuration file were not > used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)