What happens when you do the keyword query only?

Where are you executing the query from?  Using the NutchBean?  If so,
then the double-quotes would be necessary.

Why don't you try searching for the urls directly that you think should
be returned using the url: syntax to make sure they got indexed and you
are pointing at the right index.

Brian Ulicny




On Wed, 26 Mar 2008 11:50:43 +0100, "POIRIER David"
<[EMAIL PROTECTED]> said:
> Hello,
> 
> I really need your help here please. I tried a few more things; I
> deleted my two plugins and instead of creating new ones I modified the
> existing index-more and query-more plugins.
> 
> The index-more modification is working. Here's what I added:
> private Document addScope(Document doc, ParseData data, String url) {
>       doc.add(new Field("scope", "aScope", Field.Store.YES,
> Field.Index.UN_TOKENIZED));
>       return doc;
> }
> 
> And made sure that the method is called by adding this in the filter
> method:
> addScope(doc, parse.getData(), url_s);
> 
> Using the Nutch API, when I check for the details of a hit, I look for:
> String scope = detail.getValue("scope");
> 
> And as expected it always return "aScope".
> 
> The problem is when I try to filter a query using my modified query-more
> plugin. When executing the query "aKeyword scope:aScope" (the double
> quotes are there only for the email lisibility)the index always returns
> 0 result. 
> 
> Here's the additional class to the org.apache.nutch.indexer.more
> package:
> import org.apache.nutch.searcher.RawFieldQueryFilter;
> import org.apache.hadoop.conf.Configuration;
> 
> /**
>  * Handles "scope:" query clauses, causing them to search the field
>  * indexed by MoreIndexingFilter.
>  *
>  * @author John Xing / David Poirier
>  */
> 
> public class ScopeQueryFilter extends RawFieldQueryFilter {
>   private Configuration conf;
> 
>   public ScopeQueryFilter() {
>     super("scope");
>   }
> 
>   public void setConf(Configuration conf) {
>     this.conf = conf;
>     setBoost(conf.getFloat("query.scope.boost", 0.0f));
>   }
> 
>   public Configuration getConf() {
>     return this.conf;
>   }
> }
> 
> And the plugin.xml file associated with it:
> <plugin
>    id="query-more"
>    name="More Query Filter"
>    version="1.0.0"
>    provider-name="nutch.org">
> 
>    <runtime>
>       <library name="query-more.jar">
>          <export name="*"/>
>       </library>
>    </runtime>
> 
>    <requires>
>       <import plugin="nutch-extensionpoints"/>
>    </requires>
> 
>    <extension id="org.apache.nutch.searcher.more"
>               name="Nutch More Query Filter"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="TypeQueryFilter"
>  
> class="org.apache.nutch.searcher.more.TypeQueryFilter">
>         <parameter name="raw-fields" value="type"/>
>       </implementation>
>       
>    </extension>
> 
>    <extension id="org.apache.nutch.searcher.more"
>               name="Nutch More Query Filter"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="DateQueryFilter"
>  
> class="org.apache.nutch.searcher.more.DateQueryFilter">
>         <parameter name="raw-fields" value="date"/>
>       </implementation>
>       
>    </extension>
>    
>    <extension id="org.apache.nutch.searcher.more"
>               name="Nutch More Query Filter"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="ScopeQueryFilter"
>  
> class="org.apache.nutch.searcher.more.ScopeQueryFilter">
>         <parameter name="raw-fields" value="scope"/>
>       </implementation>
>       
>    </extension>
> 
> </plugin>
> 
> If this tells something to anybody, please let me know.
> 
> Thank you in advance,
> 
> David
> 
> 
> -----------------------------------------
> David Poirier
> E-business Consultant - Software Engineer
>  
> 
> 
> -----Original Message-----
> From: POIRIER David [mailto:[EMAIL PROTECTED] 
> Sent: mardi, 25. mars 2008 18:09
> To: [email protected]
> Subject: nutch: creating new plugins: query plugin
> 
> Hello,
> 
> Following the info available on the wiki
> (http://wiki.apache.org/nutch/CreateNewFilter), I have created two new
> plugins:
> - index-scope (based on index-more)
> - query-scope (based on query-site)
> 
> As you can guess, the first plugin simply add the "scope" metadata to
> every parsed document, giving them, as a test, a fixed value, while the
> second plugin add the possibility to search for a "scope" using the
> Lucene syntax.  
> 
> I have deploy the two new plugins, as JARS, in my plugins repository and
> modified my nutch-site.xml file to look for them. To be sure of
> everything I have performed a complete crawling of a "virgin" source. I
> have also modified both plugin.xml files so that the system can find the
> right java classes.
> 
> Looking at a resultset everything looks fine: every hit in the set
> possesses the metadata scope=aScope, which is exactly what I am looking
> for. Things stop working though when I try to search for the metadata
> using the Lucene syntax. The query "aWord scope:aScope" returns
> nothing...
> 
> When I check at my log files I can see that the query-scope plugin is
> available:
> [...]
> 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
> org.apache.nutch.plugin.PluginRepository  -   Scope Query Filter
> (query-scope)
> [...]
> And that the proper extension point is registered:
> [...]
> 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
> org.apache.nutch.plugin.PluginRepository  -   Nutch Query Filter
> (org.apache.nutch.searcher.QueryFilter)
> [...]
> 
> 
> Here is the plugin.xml file associated with the plugin:
> 
> <plugin
>    id="query-scope"
>    name="a description"
>    version="1.0.0"
>    provider-name="myName.xyz">
> 
>    <runtime>
>       <library name="query-scope.jar">
>          <export name="*"/>
>       </library>
>    </runtime>
> 
>    <requires>
>       <import plugin="nutch-extensionpoints"/>
>    </requires>
> 
>    <extension
> id="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified"
>               name="Scope Query Filter"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="SiteQueryFilterModified"
>  
> class="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified">
>         <parameter name="raw-fields" value="scope"/>
>       </implementation>    
> 
>    </extension>
> </plugin>
> 
> 
> 
> If somebody has any idea... please let me know! Thank you in advance!
> 
> David
> 
> 
-- 
  Brian Ulicny
  bulicny at alum dot mit dot edu
  home: 781-721-5746
  fax: 360-361-5746


Reply via email to