Hello,
I really need your help here please. I tried a few more things; I
deleted my two plugins and instead of creating new ones I modified the
existing index-more and query-more plugins.
The index-more modification is working. Here's what I added:
private Document addScope(Document doc, ParseData data, String url) {
doc.add(new Field("scope", "aScope", Field.Store.YES,
Field.Index.UN_TOKENIZED));
return doc;
}
And made sure that the method is called by adding this in the filter
method:
addScope(doc, parse.getData(), url_s);
Using the Nutch API, when I check for the details of a hit, I look for:
String scope = detail.getValue("scope");
And as expected it always return "aScope".
The problem is when I try to filter a query using my modified query-more
plugin. When executing the query "aKeyword scope:aScope" (the double
quotes are there only for the email lisibility)the index always returns
0 result.
Here's the additional class to the org.apache.nutch.indexer.more
package:
import org.apache.nutch.searcher.RawFieldQueryFilter;
import org.apache.hadoop.conf.Configuration;
/**
* Handles "scope:" query clauses, causing them to search the field
* indexed by MoreIndexingFilter.
*
* @author John Xing / David Poirier
*/
public class ScopeQueryFilter extends RawFieldQueryFilter {
private Configuration conf;
public ScopeQueryFilter() {
super("scope");
}
public void setConf(Configuration conf) {
this.conf = conf;
setBoost(conf.getFloat("query.scope.boost", 0.0f));
}
public Configuration getConf() {
return this.conf;
}
}
And the plugin.xml file associated with it:
<plugin
id="query-more"
name="More Query Filter"
version="1.0.0"
provider-name="nutch.org">
<runtime>
<library name="query-more.jar">
<export name="*"/>
</library>
</runtime>
<requires>
<import plugin="nutch-extensionpoints"/>
</requires>
<extension id="org.apache.nutch.searcher.more"
name="Nutch More Query Filter"
point="org.apache.nutch.searcher.QueryFilter">
<implementation id="TypeQueryFilter"
class="org.apache.nutch.searcher.more.TypeQueryFilter">
<parameter name="raw-fields" value="type"/>
</implementation>
</extension>
<extension id="org.apache.nutch.searcher.more"
name="Nutch More Query Filter"
point="org.apache.nutch.searcher.QueryFilter">
<implementation id="DateQueryFilter"
class="org.apache.nutch.searcher.more.DateQueryFilter">
<parameter name="raw-fields" value="date"/>
</implementation>
</extension>
<extension id="org.apache.nutch.searcher.more"
name="Nutch More Query Filter"
point="org.apache.nutch.searcher.QueryFilter">
<implementation id="ScopeQueryFilter"
class="org.apache.nutch.searcher.more.ScopeQueryFilter">
<parameter name="raw-fields" value="scope"/>
</implementation>
</extension>
</plugin>
If this tells something to anybody, please let me know.
Thank you in advance,
David
-----------------------------------------
David Poirier
E-business Consultant - Software Engineer
-----Original Message-----
From: POIRIER David [mailto:[EMAIL PROTECTED]
Sent: mardi, 25. mars 2008 18:09
To: [email protected]
Subject: nutch: creating new plugins: query plugin
Hello,
Following the info available on the wiki
(http://wiki.apache.org/nutch/CreateNewFilter), I have created two new
plugins:
- index-scope (based on index-more)
- query-scope (based on query-site)
As you can guess, the first plugin simply add the "scope" metadata to
every parsed document, giving them, as a test, a fixed value, while the
second plugin add the possibility to search for a "scope" using the
Lucene syntax.
I have deploy the two new plugins, as JARS, in my plugins repository and
modified my nutch-site.xml file to look for them. To be sure of
everything I have performed a complete crawling of a "virgin" source. I
have also modified both plugin.xml files so that the system can find the
right java classes.
Looking at a resultset everything looks fine: every hit in the set
possesses the metadata scope=aScope, which is exactly what I am looking
for. Things stop working though when I try to search for the metadata
using the Lucene syntax. The query "aWord scope:aScope" returns
nothing...
When I check at my log files I can see that the query-scope plugin is
available:
[...]
2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
org.apache.nutch.plugin.PluginRepository - Scope Query Filter
(query-scope)
[...]
And that the proper extension point is registered:
[...]
2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
org.apache.nutch.plugin.PluginRepository - Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
[...]
Here is the plugin.xml file associated with the plugin:
<plugin
id="query-scope"
name="a description"
version="1.0.0"
provider-name="myName.xyz">
<runtime>
<library name="query-scope.jar">
<export name="*"/>
</library>
</runtime>
<requires>
<import plugin="nutch-extensionpoints"/>
</requires>
<extension
id="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified"
name="Scope Query Filter"
point="org.apache.nutch.searcher.QueryFilter">
<implementation id="SiteQueryFilterModified"
class="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified">
<parameter name="raw-fields" value="scope"/>
</implementation>
</extension>
</plugin>
If somebody has any idea... please let me know! Thank you in advance!
David