MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
------------------------------------------------------------------------

                 Key: NUTCH-745
                 URL: https://issues.apache.org/jira/browse/NUTCH-745
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.0.0
         Environment: JDK1.6 + tomcat 6 + Eclipse3.3 + nutch 1.0
            Reporter: jcore_XiaTian


MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run

        public ParseResult getParse(Content content) {
        return ParseResult.createParseResult(content.getUrl(), new 
ParseStatus(ParseStatus.FAILED, 
                ParseStatus.FAILED_MISSING_CONTENT, 
        "No textual content available").getEmptyParse(conf)); 
                
                // return null;
        }

========nutch-site.xml=======
<property>
  <name>plugin.includes</name>
  
<value>protocol-http|urlfilter-regex|parse-(myHtml|html|text|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|language-identifier|analysis-(zh)</value>
  <description><![CDATA[
  
  ]]>  </description>
</property>
==========parse-plugins.xml============
<mimeType name="text/html">
                <plugin id="parse-myHtml" />
                <plugin id="parse-html" />
        </mimeType>
<alias name="parse-myHtml"
                        extension-id="org.apache.nutch.parse.html.MyHtmlParser" 
/>

===src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java========
 public ParseResult getParse(Content content) {
.....
// cannot run the code:
  ParseResult filteredParse = this.htmlParseFilters.filter(content, 
parseResult, 
                                                             metaTags, root);
.......



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to