Hi everybody,
Sorry if I come again on this issue with this long mail but I really cant have my plugin loaded. I have read and applied the suggestion given in various previous postings on this list
but i still have not get results

Well basically I  have used part of the code written for the "recommended"
plugin example from the nutch wiki, and kept only the Parse extension.
I have ported it a on nutch 0.9 and run the inject/generate/fetch cycle.
The plugin is compiled and correctly installed in $NUTCH_HOME/plugins/parse-rec directory.

My problem is the it looks like that my plugin is never executed even if it appears to be correctly registered. Another problem I got is to make the plugin system to produce some logs unless I invoke it directly (see below)

I add here all my code/config etc. hoping someone can point out my mistakes or misunderstanding .

-Corrado

I took the code from the latest nightly  "At revision 472436"
put my plugin code in trunk/src/plugin/parse-rec/src/java/org/apache/nutch/parse/rec/RecParseFilter.java

here is the code  and  config files:
__________________________ RecParseFilter.java ______________________________________
package org.apache.nutch.parse.rec;

// JDK imports
import java.util.Enumeration;
import java.util.Properties;
import java.util.logging.Logger;

// Nutch imports
import org.apache.nutch.parse.HTMLMetaTags;
import org.apache.nutch.parse.Parse;
import org.apache.nutch.parse.HtmlParseFilter;
import org.apache.nutch.protocol.Content;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import org.apache.nutch.util.NutchConfiguration;
import org.apache.hadoop.conf.Configuration;

import org.w3c.dom.DocumentFragment;

public class RecParseFilter implements HtmlParseFilter {

 /** Configuration  */
 private Configuration conf;

 public static final Log LOG = LogFactory.getLog("RecParseFilter.class");

 /** The Recommended meta data attribute name */
 public static final String META_RECOMMENDED_NAME="Recommended";

 /** Scan the HTML document looking for a recommended meta tag.  */
public Parse filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc) {

       LOG.debug("RecParseFilter::filter() --->");
       /** Trying to find the document's recommended term */
       String recommendation = null;
       Properties generalMetaTags = metaTags.getGeneralTags();
       String title = parse.getData().getTitle();
       LOG.debug("RecParseFilter::filter() - Document Title : " + title);

for(Enumeration tagNames = generalMetaTags.propertyNames(); tagNames.hasMoreElements(); ) {
           if (tagNames.nextElement().equals("recommended")) {
               recommendation = generalMetaTags.getProperty("recommended");
LOG.debug("RecParseFilter::filter() - Found a Recommendation for " + recommendation);
            }
       }

       if(recommendation == null)
          LOG.debug("RecParseFilter::filter() - No Recommendataion");
       else {
LOG.debug("RecParseFilter::filter() - Adding Recommendation for " + recommendation); parse.getData().getContentMeta().set(META_RECOMMENDED_NAME, recommendation);
       }
       LOG.debug("RecParseFilter::filter() <--");
       return parse;
 }

 public Configuration getConf() {
   LOG.debug("RecParseFilter::getConf() -->");
   LOG.debug("RecParseFilter::getConf() <--");
   return this.conf;
 }

 public void setConf(Configuration conf) {
   LOG.debug("RecParseFilter::setConf() -->");
   LOG.debug("RecParseFilter::setConf() <--");
   this.conf = conf;
 }
}
________________________________________________________________

_________________________plugin.xml_______________________________

<?xml version="1.0" encoding="UTF-8"?>
<plugin
  id="parse-rec"
  name="Recommended Parser/Filter"
  version="0.0.1"
  provider-name="nutch.org">

  <runtime>
<!-- As defined in build.xml this plugin will end up bundled as recommended.jar -->
     <library name="parse-rec.jar">
        <export name="*"/>
     </library>
  </runtime>

  <requires>
   <import plugin="nutch-extensionpoints"/>
  </requires>

<!-- The RecommendedParser extends the HtmlParseFilter to grab the contents of any recommended meta tags -->
  <extension id="org.apache.nutch.parse.rec.RecParseFilter"
             name="Recommended Parser"
             point="org.apache.nutch.parse.HtmlParseFilter">
<implementation id="RecParseFilter" class="org.apache.nutch.parse.rec.RecParseFilter">
        <parameter name="contentType" value="text/html"/>
        <parameter name="pathSuffix"  value=""/>
     </implementation>
  </extension>
</plugin>
________________________________________________________________

I have added this line in nutch-site.xml

___________________________nutch-site.xml__________________________
     <property>
<name>plugin.includes</name> <value>*nutch-extensionpoints*|protocol-http|urlfilter-regex|*parse-(*text|html|js|rec)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>

     </property>
________________________________________________________________

I have added this lines in parse-plugins.xml.
Whell I also tried to have only my  plugin with the same results

___________________________parse.plugins.xml__________________________
       <mimeType name="text/html">
               <plugin id="parse-rec" />
               <plugin id="parse-html" />
       </mimeType>
________________________________________________________________

and finally added a line to make plugin system to log in log4j.properties
But despite of the this line I get no plugins logs at all.
___________________________log4j.properties__________________________
log4j.logger.org.apache.nutch.plugin=DEBUG
________________________________________________________________

After having run the fetcher I was expecting to have the "recommended" meta tag in my segement

nutch readseg -get test/segments/20061108110142 "http://testmachine.toto.net/index.html";
SegmentReader: get 'http://testmachine.toto.net/index.html'
Content::
Version: 2
url: http://testmachine.toto.net/index.html
base: http://testmachine.toto.net/index.html
contentType: text/html
metadata: nutch.segment.name=20061108110142 nutch.crawl.score=1.0
Content:

Crawl Generate::
Version: 4
Status: 1 (DB_unfetched)
Fetch time: Wed Nov 08 10:54:39 CET 2006
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 30.0 days
Score: 1.0
Signature: null
Metadata: null

Crawl Fetch::
Version: 4
Status: 6 (fetch_retry)
Fetch time: Wed Nov 08 11:02:46 CET 2006
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 1
Retry interval: 30.0 days
Score: 1.0
Signature: null
Metadata: null

I have then tried to invoke the plugin directly :
nutch  plugin parse-rec  org.apache.nutch.parse.rec.RecParseFilter

In this way I got the plugin logs I wanted in hadoop.log showing that the plugin is registered


.....
2006-11-08 11:07:33,520 DEBUG plugin.PluginRepository - parsing: /home/opt/nutch-0.9-dev/plugins/parse-rec/plugin.xml 2006-11-08 11:07:33,526 DEBUG plugin.PluginRepository - plugin: id=parse-rec name=Recommended Parser/Filter version=0.0.1 provider=nutch.orgclass=null 2006-11-08 11:07:33,527 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.parse.rec.RecParseFilter 2006-11-08 11:07:33,528 DEBUG plugin.PluginRepository - parsing: /home/opt/nutch-0.9-dev/plugins/parse-text/plugin.xml
.....
Registered Plugins:
....
2006-11-08 11:07:34,014 INFO plugin.PluginRepository - Recommended Parser/Filter (parse-rec)
....
2006-11-08 11:07:51,827 DEBUG plugin.PluginRepository - parsing: /home/opt/nutch-0.9-dev/plugins/parse-rec/plugin.xml 2006-11-08 11:07:51,837 DEBUG plugin.PluginRepository - plugin: id=parse-rec name=Recommended Parser/Filter version=0.0.1 provider=nutch.orgclass=null
...



Reply via email to