Hi everybody,
Sorry if I come again on this issue with this long mail but I really
cant have my plugin loaded.
I have read and applied the suggestion given in various previous
postings on this list
but i still have not get results
Well basically I have used part of the code written for the recommended
plugin example from the nutch wiki, and kept only the Parse extension.
I have ported it a on nutch 0.9 and run the inject/generate/fetch cycle.
The plugin is compiled and correctly installed in
$NUTCH_HOME/plugins/parse-rec directory.
My problem is the it looks like that my plugin is never executed even if
it appears to be correctly registered.
Another problem I got is to make the plugin system to produce some
logs unless I invoke it directly (see below)
I add here all my code/config etc. hoping someone can point out my
mistakes or misunderstanding .
-Corrado
I took the code from the latest nightly At revision 472436
put my plugin code in
trunk/src/plugin/parse-rec/src/java/org/apache/nutch/parse/rec/RecParseFilter.java
here is the code and config files:
__ RecParseFilter.java
__
package org.apache.nutch.parse.rec;
// JDK imports
import java.util.Enumeration;
import java.util.Properties;
import java.util.logging.Logger;
// Nutch imports
import org.apache.nutch.parse.HTMLMetaTags;
import org.apache.nutch.parse.Parse;
import org.apache.nutch.parse.HtmlParseFilter;
import org.apache.nutch.protocol.Content;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.nutch.util.NutchConfiguration;
import org.apache.hadoop.conf.Configuration;
import org.w3c.dom.DocumentFragment;
public class RecParseFilter implements HtmlParseFilter {
/** Configuration */
private Configuration conf;
public static final Log LOG = LogFactory.getLog(RecParseFilter.class);
/** The Recommended meta data attribute name */
public static final String META_RECOMMENDED_NAME=Recommended;
/** Scan the HTML document looking for a recommended meta tag. */
public Parse filter(Content content, Parse parse, HTMLMetaTags
metaTags, DocumentFragment doc) {
LOG.debug(RecParseFilter::filter() ---);
/** Trying to find the document's recommended term */
String recommendation = null;
Properties generalMetaTags = metaTags.getGeneralTags();
String title = parse.getData().getTitle();
LOG.debug(RecParseFilter::filter() - Document Title : + title);
for(Enumeration tagNames = generalMetaTags.propertyNames();
tagNames.hasMoreElements(); ) {
if (tagNames.nextElement().equals(recommended)) {
recommendation = generalMetaTags.getProperty(recommended);
LOG.debug(RecParseFilter::filter() - Found a
Recommendation for + recommendation);
}
}
if(recommendation == null)
LOG.debug(RecParseFilter::filter() - No Recommendataion);
else {
LOG.debug(RecParseFilter::filter() - Adding Recommendation
for + recommendation);
parse.getData().getContentMeta().set(META_RECOMMENDED_NAME,
recommendation);
}
LOG.debug(RecParseFilter::filter() --);
return parse;
}
public Configuration getConf() {
LOG.debug(RecParseFilter::getConf() --);
LOG.debug(RecParseFilter::getConf() --);
return this.conf;
}
public void setConf(Configuration conf) {
LOG.debug(RecParseFilter::setConf() --);
LOG.debug(RecParseFilter::setConf() --);
this.conf = conf;
}
}
_plugin.xml___
?xml version=1.0 encoding=UTF-8?
plugin
id=parse-rec
name=Recommended Parser/Filter
version=0.0.1
provider-name=nutch.org
runtime
!-- As defined in build.xml this plugin will end up bundled as
recommended.jar --
library name=parse-rec.jar
export name=*/
/library
/runtime
requires
import plugin=nutch-extensionpoints/
/requires
!-- The RecommendedParser extends the HtmlParseFilter to grab the
contents of any recommended meta tags --
extension id=org.apache.nutch.parse.rec.RecParseFilter
name=Recommended Parser
point=org.apache.nutch.parse.HtmlParseFilter
implementation id=RecParseFilter
class=org.apache.nutch.parse.rec.RecParseFilter
parameter name=contentType value=text/html/
parameter name=pathSuffix value=/
/implementation
/extension
/plugin
I have added this line in nutch-site.xml
___nutch-site.xml__
property
nameplugin.includes/name
value*nutch-extensionpoints*|protocol-http|urlfilter-regex|*parse-(*text|html|js|rec)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)/value