Hello,

I really need your help here please. I tried a few more things; I
deleted my two plugins and instead of creating new ones I modified the
existing index-more and query-more plugins.

The index-more modification is working. Here's what I added:
private Document addScope(Document doc, ParseData data, String url) {
        doc.add(new Field("scope", "aScope", Field.Store.YES,
Field.Index.UN_TOKENIZED));
        return doc;
}

And made sure that the method is called by adding this in the filter
method:
addScope(doc, parse.getData(), url_s);

Using the Nutch API, when I check for the details of a hit, I look for:
String scope = detail.getValue("scope");

And as expected it always return "aScope".

The problem is when I try to filter a query using my modified query-more
plugin. When executing the query "aKeyword scope:aScope" (the double
quotes are there only for the email lisibility)the index always returns
0 result. 

Here's the additional class to the org.apache.nutch.indexer.more
package:
import org.apache.nutch.searcher.RawFieldQueryFilter;
import org.apache.hadoop.conf.Configuration;

/**
 * Handles "scope:" query clauses, causing them to search the field
 * indexed by MoreIndexingFilter.
 *
 * @author John Xing / David Poirier
 */

public class ScopeQueryFilter extends RawFieldQueryFilter {
  private Configuration conf;

  public ScopeQueryFilter() {
    super("scope");
  }

  public void setConf(Configuration conf) {
    this.conf = conf;
    setBoost(conf.getFloat("query.scope.boost", 0.0f));
  }

  public Configuration getConf() {
    return this.conf;
  }
}

And the plugin.xml file associated with it:
<plugin
   id="query-more"
   name="More Query Filter"
   version="1.0.0"
   provider-name="nutch.org">

   <runtime>
      <library name="query-more.jar">
         <export name="*"/>
      </library>
   </runtime>

   <requires>
      <import plugin="nutch-extensionpoints"/>
   </requires>

   <extension id="org.apache.nutch.searcher.more"
              name="Nutch More Query Filter"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="TypeQueryFilter"
 
class="org.apache.nutch.searcher.more.TypeQueryFilter">
        <parameter name="raw-fields" value="type"/>
      </implementation>
      
   </extension>

   <extension id="org.apache.nutch.searcher.more"
              name="Nutch More Query Filter"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="DateQueryFilter"
 
class="org.apache.nutch.searcher.more.DateQueryFilter">
        <parameter name="raw-fields" value="date"/>
      </implementation>
      
   </extension>
   
   <extension id="org.apache.nutch.searcher.more"
              name="Nutch More Query Filter"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="ScopeQueryFilter"
 
class="org.apache.nutch.searcher.more.ScopeQueryFilter">
        <parameter name="raw-fields" value="scope"/>
      </implementation>
      
   </extension>

</plugin>

If this tells something to anybody, please let me know.

Thank you in advance,

David


-----------------------------------------
David Poirier
E-business Consultant - Software Engineer
 


-----Original Message-----
From: POIRIER David [mailto:[EMAIL PROTECTED] 
Sent: mardi, 25. mars 2008 18:09
To: [email protected]
Subject: nutch: creating new plugins: query plugin

Hello,

Following the info available on the wiki
(http://wiki.apache.org/nutch/CreateNewFilter), I have created two new
plugins:
- index-scope (based on index-more)
- query-scope (based on query-site)

As you can guess, the first plugin simply add the "scope" metadata to
every parsed document, giving them, as a test, a fixed value, while the
second plugin add the possibility to search for a "scope" using the
Lucene syntax.  

I have deploy the two new plugins, as JARS, in my plugins repository and
modified my nutch-site.xml file to look for them. To be sure of
everything I have performed a complete crawling of a "virgin" source. I
have also modified both plugin.xml files so that the system can find the
right java classes.

Looking at a resultset everything looks fine: every hit in the set
possesses the metadata scope=aScope, which is exactly what I am looking
for. Things stop working though when I try to search for the metadata
using the Lucene syntax. The query "aWord scope:aScope" returns
nothing...

When I check at my log files I can see that the query-scope plugin is
available:
[...]
2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
org.apache.nutch.plugin.PluginRepository  -     Scope Query Filter
(query-scope)
[...]
And that the proper extension point is registered:
[...]
2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
org.apache.nutch.plugin.PluginRepository  -     Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
[...]


Here is the plugin.xml file associated with the plugin:

<plugin
   id="query-scope"
   name="a description"
   version="1.0.0"
   provider-name="myName.xyz">

   <runtime>
      <library name="query-scope.jar">
         <export name="*"/>
      </library>
   </runtime>

   <requires>
      <import plugin="nutch-extensionpoints"/>
   </requires>

   <extension
id="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified"
              name="Scope Query Filter"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="SiteQueryFilterModified"
 
class="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified">
        <parameter name="raw-fields" value="scope"/>
      </implementation>    

   </extension>
</plugin>



If somebody has any idea... please let me know! Thank you in advance!

David


Reply via email to