Check your parse-plugins.xml file. It needs to map content types to a parse plugin. This parse plugin must also be configured to load in your nutch-site configuration plugin.includes directive.
The plugin's plugin.xml file must also map to the content type. See examples such as parse-html or parse-tika. > Hi, > I'm testing my custom parser plugin for nutch 1.2, which match some regular > expression in the content and store these matched text into my database. > When I test it in eclipse, everything worked well. But if I use it in my > production environment. Some warnings were logged in hadoop.log like > > following: > > 2011-06-11 00:33:06,760 WARN parse.ParserFactory - ParserFactory:Plugin: > >> org.apache.nutch.parse.html.HtmlParser mapped to contentType > >> application/xhtml+xml via parse-plugins.xml, but its plugin.xml file > >> does not claim to support contentType: application/xhtml+xml > >> > > 2011-06-11 00:33:07,302 INFO fetcher.Fetcher - -activeThreads=1, > > > >> spinWaiting=0, fetchQueues.totalSize=0 > > > > 2011-06-11 00:33:08,303 INFO fetcher.Fetcher - -activeThreads=1, > > > >> spinWaiting=0, fetchQueues.totalSize=0 > > > > 2011-06-11 00:33:09,303 INFO fetcher.Fetcher - -activeThreads=1, > > > >> spinWaiting=0, fetchQueues.totalSize=0 > > > > 2011-06-11 00:33:09,940 WARN parse.ParseUtil - Unable to successfully > > > >> parse content http://www.eccom.com.cn/EN/ of type application/xhtml+xml > > > > 2011-06-11 00:33:09,943 WARN fetcher.Fetcher - Error parsing: > >> http://www.eccom.com.cn/EN/: failed(2,200): > >> org.apache.nutch.parse.ParseException: Unable to successfully parse > >> content > > > > When I remove the plugin in nutch-site.xml, crawling worked correctly. > > Any > > idea? Thanks.