Hi,
I'm testing my custom parser plugin for nutch 1.2, which match some regular
expression in the content and store these matched text into my database.
When I test it in eclipse, everything worked well. But if I use it in my
production environment. Some warnings were logged in hadoop.log like
following:

>  2011-06-11 00:33:06,760 WARN  parse.ParserFactory - ParserFactory:Plugin:
>> org.apache.nutch.parse.html.HtmlParser mapped to contentType
>> application/xhtml+xml via parse-plugins.xml, but its plugin.xml file does
>> not claim to support contentType: application/xhtml+xml
>
>  2011-06-11 00:33:07,302 INFO  fetcher.Fetcher - -activeThreads=1,
>> spinWaiting=0, fetchQueues.totalSize=0
>
> 2011-06-11 00:33:08,303 INFO  fetcher.Fetcher - -activeThreads=1,
>> spinWaiting=0, fetchQueues.totalSize=0
>
> 2011-06-11 00:33:09,303 INFO  fetcher.Fetcher - -activeThreads=1,
>> spinWaiting=0, fetchQueues.totalSize=0
>
> 2011-06-11 00:33:09,940 WARN  parse.ParseUtil - Unable to successfully
>> parse content http://www.eccom.com.cn/EN/ of type application/xhtml+xml
>
> 2011-06-11 00:33:09,943 WARN  fetcher.Fetcher - Error parsing:
>> http://www.eccom.com.cn/EN/: failed(2,200):
>> org.apache.nutch.parse.ParseException: Unable to successfully parse content
>
> When I remove the plugin in nutch-site.xml, crawling worked correctly. Any
idea? Thanks.

Reply via email to