[ https://issues.apache.org/jira/browse/NUTCH-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056564#comment-13056564 ]
Markus Jelsma commented on NUTCH-1017: -------------------------------------- Another curiosity, this error is not to be found in the hadoop.log file! Anyone knows where this is coming from? > Exception getting mime type by name > ----------------------------------- > > Key: NUTCH-1017 > URL: https://issues.apache.org/jira/browse/NUTCH-1017 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.4 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.4, 2.0 > > > Large crawls of `bad` websites tend to produce a lot of parsing errors. One > of them is related to retrieving mime types, so it seems: > {code} > WARNING: Exception getting mime type by name: [<WEBSITE_CONTENT>]: Message: > Invalid media type name: <WEBSITE_CONTENT> > Jun 27, 2011 9:23:27 PM org.apache.nutch.util.MimeUtil forName > WARNING: Exception getting mime type by name: [<WEBSITE_CONTENT>]: Message: > Invalid media type name: <WEBSITE_CONTENT> > Jun 27, 2011 9:23:27 PM org.apache.nutch.util.MimeUtil forName > WARNING: Exception getting mime type by name: [Mime-Type]: Message: Invalid > media type name: Mime-Type > Jun 27, 2011 9:23:27 PM org.apache.nutch.util.MimeUtil forName > WARNING: Exception getting mime type by name: [<WEBSITE_CONTENT>]: Message: > Invalid media type name: <WEBSITE_CONTENT> > Jun 27, 2011 9:23:27 PM org.apache.nutch.util.MimeUtil forName > WARNING: Exception getting mime type by name: [text/html charset=utf-8]: > Message: Invalid media type name: text/html charset=utf-8 > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira