Date range queries are part of the query-more functionality, right?  Do
they work?

Brian

On Wed, 26 Mar 2008 15:57:44 +0100, "POIRIER David"
<[EMAIL PROTECTED]> said:
> Brian,
> 
> Thank you for your answer.
> 
> Q: What happens when you do the keyword query only?
> A: It works. Example: 
>       query: cancer
>       results: yes
> 
> Q: Where are you executing the query from?  Using the NutchBean?
> A: From the nutchBean. Here's a few tests I made:
>       query: cancer scope:aScope
>       results: no
> 
>       query: cancer scope:"aScope"
>       results: no
> 
>       query: cancer "scope:aScope"
>       results: no
> 
>       query: "cancer scope:aScope"
>       results: no
> 
> Q: Why don't you try searching for the urls directly that you think
> should
> be returned using the url: syntax to make sure they got indexed and you
> are pointing at the right index.
> A: Thanks for the tip. I am indeed 100% certain that an index metadata
> named scope with a value aScope exist for ALL the reference in my index. 
> 
> Example:
> Query: cancer
> Results:
>     * segment = 20080326104113
>     * digest = 678e47f1a52ce036b89e2dc4c6f3571c
>     * url = http://www.aWebsite.com/article/511833.aspx
>     * title = Arimidex with Tamoxifen efficacy and safety trial for
>     advanced breast cancer (1033IL/0027)
>     * tstamp = 20080326094143023
>     * contentLength = 45167
>     * primaryType = text
>     * subType = html
>     * scope = aScope
>     * boost = 0.028375218
> 
> I am turning in circle (if we can say that in english)... I went back to
> my first plugin, which is a modification of the query-site plugin,
> without success.
> 
> If you, or anybody, think of something else, please let me know. 
> 
> David
> 
> 
> 
> 
> 
> -----------------------------------------
> David Poirier
> E-business Consultant - Software Engineer
>  
> Direct: +41 (0)22 596 10 35
>  
> Cross Systems - Groupe Micropole Univers
> Route des Acacias 45 B
> 1227 Carouge / Genève
> Tél: +41 (0)22 308 48 60
> Fax: +41 (0)22 308 48 68
>  
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Brian Ulicny [mailto:[EMAIL PROTECTED] 
> Sent: mercredi, 26. mars 2008 15:30
> To: [email protected]; [email protected]
> Subject: RE: nutch: creating new plugins: query plugin
> 
> What happens when you do the keyword query only?
> 
> Where are you executing the query from?  Using the NutchBean?  If so,
> then the double-quotes would be necessary.
> 
> Why don't you try searching for the urls directly that you think should
> be returned using the url: syntax to make sure they got indexed and you
> are pointing at the right index.
> 
> Brian Ulicny
> 
> 
> 
> 
> On Wed, 26 Mar 2008 11:50:43 +0100, "POIRIER David"
> <[EMAIL PROTECTED]> said:
> > Hello,
> > 
> > I really need your help here please. I tried a few more things; I
> > deleted my two plugins and instead of creating new ones I modified the
> > existing index-more and query-more plugins.
> > 
> > The index-more modification is working. Here's what I added:
> > private Document addScope(Document doc, ParseData data, String url) {
> >     doc.add(new Field("scope", "aScope", Field.Store.YES,
> > Field.Index.UN_TOKENIZED));
> >     return doc;
> > }
> > 
> > And made sure that the method is called by adding this in the filter
> > method:
> > addScope(doc, parse.getData(), url_s);
> > 
> > Using the Nutch API, when I check for the details of a hit, I look for:
> > String scope = detail.getValue("scope");
> > 
> > And as expected it always return "aScope".
> > 
> > The problem is when I try to filter a query using my modified query-more
> > plugin. When executing the query "aKeyword scope:aScope" (the double
> > quotes are there only for the email lisibility)the index always returns
> > 0 result. 
> > 
> > Here's the additional class to the org.apache.nutch.indexer.more
> > package:
> > import org.apache.nutch.searcher.RawFieldQueryFilter;
> > import org.apache.hadoop.conf.Configuration;
> > 
> > /**
> >  * Handles "scope:" query clauses, causing them to search the field
> >  * indexed by MoreIndexingFilter.
> >  *
> >  * @author John Xing / David Poirier
> >  */
> > 
> > public class ScopeQueryFilter extends RawFieldQueryFilter {
> >   private Configuration conf;
> > 
> >   public ScopeQueryFilter() {
> >     super("scope");
> >   }
> > 
> >   public void setConf(Configuration conf) {
> >     this.conf = conf;
> >     setBoost(conf.getFloat("query.scope.boost", 0.0f));
> >   }
> > 
> >   public Configuration getConf() {
> >     return this.conf;
> >   }
> > }
> > 
> > And the plugin.xml file associated with it:
> > <plugin
> >    id="query-more"
> >    name="More Query Filter"
> >    version="1.0.0"
> >    provider-name="nutch.org">
> > 
> >    <runtime>
> >       <library name="query-more.jar">
> >          <export name="*"/>
> >       </library>
> >    </runtime>
> > 
> >    <requires>
> >       <import plugin="nutch-extensionpoints"/>
> >    </requires>
> > 
> >    <extension id="org.apache.nutch.searcher.more"
> >               name="Nutch More Query Filter"
> >               point="org.apache.nutch.searcher.QueryFilter">
> >       <implementation id="TypeQueryFilter"
> >  
> > class="org.apache.nutch.searcher.more.TypeQueryFilter">
> >         <parameter name="raw-fields" value="type"/>
> >       </implementation>
> >       
> >    </extension>
> > 
> >    <extension id="org.apache.nutch.searcher.more"
> >               name="Nutch More Query Filter"
> >               point="org.apache.nutch.searcher.QueryFilter">
> >       <implementation id="DateQueryFilter"
> >  
> > class="org.apache.nutch.searcher.more.DateQueryFilter">
> >         <parameter name="raw-fields" value="date"/>
> >       </implementation>
> >       
> >    </extension>
> >    
> >    <extension id="org.apache.nutch.searcher.more"
> >               name="Nutch More Query Filter"
> >               point="org.apache.nutch.searcher.QueryFilter">
> >       <implementation id="ScopeQueryFilter"
> >  
> > class="org.apache.nutch.searcher.more.ScopeQueryFilter">
> >         <parameter name="raw-fields" value="scope"/>
> >       </implementation>
> >       
> >    </extension>
> > 
> > </plugin>
> > 
> > If this tells something to anybody, please let me know.
> > 
> > Thank you in advance,
> > 
> > David
> > 
> > 
> > -----------------------------------------
> > David Poirier
> > E-business Consultant - Software Engineer
> >  
> > 
> > 
> > -----Original Message-----
> > From: POIRIER David [mailto:[EMAIL PROTECTED] 
> > Sent: mardi, 25. mars 2008 18:09
> > To: [email protected]
> > Subject: nutch: creating new plugins: query plugin
> > 
> > Hello,
> > 
> > Following the info available on the wiki
> > (http://wiki.apache.org/nutch/CreateNewFilter), I have created two new
> > plugins:
> > - index-scope (based on index-more)
> > - query-scope (based on query-site)
> > 
> > As you can guess, the first plugin simply add the "scope" metadata to
> > every parsed document, giving them, as a test, a fixed value, while the
> > second plugin add the possibility to search for a "scope" using the
> > Lucene syntax.  
> > 
> > I have deploy the two new plugins, as JARS, in my plugins repository and
> > modified my nutch-site.xml file to look for them. To be sure of
> > everything I have performed a complete crawling of a "virgin" source. I
> > have also modified both plugin.xml files so that the system can find the
> > right java classes.
> > 
> > Looking at a resultset everything looks fine: every hit in the set
> > possesses the metadata scope=aScope, which is exactly what I am looking
> > for. Things stop working though when I try to search for the metadata
> > using the Lucene syntax. The query "aWord scope:aScope" returns
> > nothing...
> > 
> > When I check at my log files I can see that the query-scope plugin is
> > available:
> > [...]
> > 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
> > org.apache.nutch.plugin.PluginRepository  -         Scope Query Filter
> > (query-scope)
> > [...]
> > And that the proper extension point is registered:
> > [...]
> > 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
> > org.apache.nutch.plugin.PluginRepository  -         Nutch Query Filter
> > (org.apache.nutch.searcher.QueryFilter)
> > [...]
> > 
> > 
> > Here is the plugin.xml file associated with the plugin:
> > 
> > <plugin
> >    id="query-scope"
> >    name="a description"
> >    version="1.0.0"
> >    provider-name="myName.xyz">
> > 
> >    <runtime>
> >       <library name="query-scope.jar">
> >          <export name="*"/>
> >       </library>
> >    </runtime>
> > 
> >    <requires>
> >       <import plugin="nutch-extensionpoints"/>
> >    </requires>
> > 
> >    <extension
> > id="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified"
> >               name="Scope Query Filter"
> >               point="org.apache.nutch.searcher.QueryFilter">
> >       <implementation id="SiteQueryFilterModified"
> >  
> > class="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified">
> >         <parameter name="raw-fields" value="scope"/>
> >       </implementation>    
> > 
> >    </extension>
> > </plugin>
> > 
> > 
> > 
> > If somebody has any idea... please let me know! Thank you in advance!
> > 
> > David
> > 
> > 
> -- 
>   Brian Ulicny
>   bulicny at alum dot mit dot edu
>   home: 781-721-5746
>   fax: 360-361-5746
> 
> 
-- 
  Brian Ulicny
  bulicny at alum dot mit dot edu
  home: 781-721-5746
  fax: 360-361-5746


Reply via email to