[ https://issues.apache.org/jira/browse/NUTCH-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1041: --------------------------------- Affects Version/s: (was: 1.4) 1.3 Fix Version/s: (was: 1.4) (was: nutchgora) 1.5 > Not reading mime-type correctly > ------------------------------- > > Key: NUTCH-1041 > URL: https://issues.apache.org/jira/browse/NUTCH-1041 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.3 > Reporter: Markus Jelsma > Fix For: 1.5 > > > Another issue with mime-types and test url's. Below are two logs lines from > MimeUtil. Mime-type is still ok at the start of the autoResolveContentType > method: > {code} > Jul 11, 2011 6:46:15 PM org.apache.nutch.util.MimeUtil autoResolveContentType > INFO: Type: text/html; charset=ISO-8859-1 from: > http://www.taxipoll.nl/taxipol.htm > Jul 11, 2011 6:46:15 PM org.apache.nutch.util.MimeUtil autoResolveContentType > INFO: Type: text/html from: > http://archief.hoofdklassehockey.nl/hschema2009.html > {code} > mIME-TYpe correctness has been confirmed with Curl. The documents, however, > do not end up in the index with the correct mime-type, here's output from > IndexingFiltersChecker. ParserChecker does output the correct Content-Type. > {code} > http://www.taxipoll.nl/taxipol.htm --> taxipoll/htm > http://archief.hoofdklassehockey.nl/hschema2009.html --> tet/html > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira