Brian,

Thank you for your answer.

Q: What happens when you do the keyword query only?
A: It works. Example: 
        query: cancer
        results: yes

Q: Where are you executing the query from?  Using the NutchBean?
A: From the nutchBean. Here's a few tests I made:
        query: cancer scope:aScope
        results: no

        query: cancer scope:"aScope"
        results: no

        query: cancer "scope:aScope"
        results: no

        query: "cancer scope:aScope"
        results: no

Q: Why don't you try searching for the urls directly that you think should
be returned using the url: syntax to make sure they got indexed and you
are pointing at the right index.
A: Thanks for the tip. I am indeed 100% certain that an index metadata named 
scope with a value aScope exist for ALL the reference in my index. 

Example:
Query: cancer
Results:
    * segment = 20080326104113
    * digest = 678e47f1a52ce036b89e2dc4c6f3571c
    * url = http://www.aWebsite.com/article/511833.aspx
    * title = Arimidex with Tamoxifen efficacy and safety trial for advanced 
breast cancer (1033IL/0027)
    * tstamp = 20080326094143023
    * contentLength = 45167
    * primaryType = text
    * subType = html
    * scope = aScope
    * boost = 0.028375218

I am turning in circle (if we can say that in english)... I went back to my 
first plugin, which is a modification of the query-site plugin, without success.

If you, or anybody, think of something else, please let me know. 

David





-----------------------------------------
David Poirier
E-business Consultant - Software Engineer
 
Direct: +41 (0)22 596 10 35
 
Cross Systems - Groupe Micropole Univers
Route des Acacias 45 B
1227 Carouge / Genève
Tél: +41 (0)22 308 48 60
Fax: +41 (0)22 308 48 68
 





-----Original Message-----
From: Brian Ulicny [mailto:[EMAIL PROTECTED] 
Sent: mercredi, 26. mars 2008 15:30
To: [email protected]; [email protected]
Subject: RE: nutch: creating new plugins: query plugin

What happens when you do the keyword query only?

Where are you executing the query from?  Using the NutchBean?  If so,
then the double-quotes would be necessary.

Why don't you try searching for the urls directly that you think should
be returned using the url: syntax to make sure they got indexed and you
are pointing at the right index.

Brian Ulicny




On Wed, 26 Mar 2008 11:50:43 +0100, "POIRIER David"
<[EMAIL PROTECTED]> said:
> Hello,
> 
> I really need your help here please. I tried a few more things; I
> deleted my two plugins and instead of creating new ones I modified the
> existing index-more and query-more plugins.
> 
> The index-more modification is working. Here's what I added:
> private Document addScope(Document doc, ParseData data, String url) {
>       doc.add(new Field("scope", "aScope", Field.Store.YES,
> Field.Index.UN_TOKENIZED));
>       return doc;
> }
> 
> And made sure that the method is called by adding this in the filter
> method:
> addScope(doc, parse.getData(), url_s);
> 
> Using the Nutch API, when I check for the details of a hit, I look for:
> String scope = detail.getValue("scope");
> 
> And as expected it always return "aScope".
> 
> The problem is when I try to filter a query using my modified query-more
> plugin. When executing the query "aKeyword scope:aScope" (the double
> quotes are there only for the email lisibility)the index always returns
> 0 result. 
> 
> Here's the additional class to the org.apache.nutch.indexer.more
> package:
> import org.apache.nutch.searcher.RawFieldQueryFilter;
> import org.apache.hadoop.conf.Configuration;
> 
> /**
>  * Handles "scope:" query clauses, causing them to search the field
>  * indexed by MoreIndexingFilter.
>  *
>  * @author John Xing / David Poirier
>  */
> 
> public class ScopeQueryFilter extends RawFieldQueryFilter {
>   private Configuration conf;
> 
>   public ScopeQueryFilter() {
>     super("scope");
>   }
> 
>   public void setConf(Configuration conf) {
>     this.conf = conf;
>     setBoost(conf.getFloat("query.scope.boost", 0.0f));
>   }
> 
>   public Configuration getConf() {
>     return this.conf;
>   }
> }
> 
> And the plugin.xml file associated with it:
> <plugin
>    id="query-more"
>    name="More Query Filter"
>    version="1.0.0"
>    provider-name="nutch.org">
> 
>    <runtime>
>       <library name="query-more.jar">
>          <export name="*"/>
>       </library>
>    </runtime>
> 
>    <requires>
>       <import plugin="nutch-extensionpoints"/>
>    </requires>
> 
>    <extension id="org.apache.nutch.searcher.more"
>               name="Nutch More Query Filter"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="TypeQueryFilter"
>  
> class="org.apache.nutch.searcher.more.TypeQueryFilter">
>         <parameter name="raw-fields" value="type"/>
>       </implementation>
>       
>    </extension>
> 
>    <extension id="org.apache.nutch.searcher.more"
>               name="Nutch More Query Filter"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="DateQueryFilter"
>  
> class="org.apache.nutch.searcher.more.DateQueryFilter">
>         <parameter name="raw-fields" value="date"/>
>       </implementation>
>       
>    </extension>
>    
>    <extension id="org.apache.nutch.searcher.more"
>               name="Nutch More Query Filter"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="ScopeQueryFilter"
>  
> class="org.apache.nutch.searcher.more.ScopeQueryFilter">
>         <parameter name="raw-fields" value="scope"/>
>       </implementation>
>       
>    </extension>
> 
> </plugin>
> 
> If this tells something to anybody, please let me know.
> 
> Thank you in advance,
> 
> David
> 
> 
> -----------------------------------------
> David Poirier
> E-business Consultant - Software Engineer
>  
> 
> 
> -----Original Message-----
> From: POIRIER David [mailto:[EMAIL PROTECTED] 
> Sent: mardi, 25. mars 2008 18:09
> To: [email protected]
> Subject: nutch: creating new plugins: query plugin
> 
> Hello,
> 
> Following the info available on the wiki
> (http://wiki.apache.org/nutch/CreateNewFilter), I have created two new
> plugins:
> - index-scope (based on index-more)
> - query-scope (based on query-site)
> 
> As you can guess, the first plugin simply add the "scope" metadata to
> every parsed document, giving them, as a test, a fixed value, while the
> second plugin add the possibility to search for a "scope" using the
> Lucene syntax.  
> 
> I have deploy the two new plugins, as JARS, in my plugins repository and
> modified my nutch-site.xml file to look for them. To be sure of
> everything I have performed a complete crawling of a "virgin" source. I
> have also modified both plugin.xml files so that the system can find the
> right java classes.
> 
> Looking at a resultset everything looks fine: every hit in the set
> possesses the metadata scope=aScope, which is exactly what I am looking
> for. Things stop working though when I try to search for the metadata
> using the Lucene syntax. The query "aWord scope:aScope" returns
> nothing...
> 
> When I check at my log files I can see that the query-scope plugin is
> available:
> [...]
> 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
> org.apache.nutch.plugin.PluginRepository  -   Scope Query Filter
> (query-scope)
> [...]
> And that the proper extension point is registered:
> [...]
> 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
> org.apache.nutch.plugin.PluginRepository  -   Nutch Query Filter
> (org.apache.nutch.searcher.QueryFilter)
> [...]
> 
> 
> Here is the plugin.xml file associated with the plugin:
> 
> <plugin
>    id="query-scope"
>    name="a description"
>    version="1.0.0"
>    provider-name="myName.xyz">
> 
>    <runtime>
>       <library name="query-scope.jar">
>          <export name="*"/>
>       </library>
>    </runtime>
> 
>    <requires>
>       <import plugin="nutch-extensionpoints"/>
>    </requires>
> 
>    <extension
> id="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified"
>               name="Scope Query Filter"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="SiteQueryFilterModified"
>  
> class="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified">
>         <parameter name="raw-fields" value="scope"/>
>       </implementation>    
> 
>    </extension>
> </plugin>
> 
> 
> 
> If somebody has any idea... please let me know! Thank you in advance!
> 
> David
> 
> 
-- 
  Brian Ulicny
  bulicny at alum dot mit dot edu
  home: 781-721-5746
  fax: 360-361-5746


Reply via email to