Fred,

I must say I am happy to see that I am not the only one!

You are right: Using Luke and the
org.apache.lucene.analysis.KeywordAnalyzer I can search for my added
field (scope). An example: +content:"cancer"  +scope:"aScope". What I
understand is that using this analyzer you can filter your query using
any of the stored fields.

When executing a query through Nutch, the analyzer used is
org.apache.nutch.analysis.NutchAnalyzer. I guess it might execute
similar tasks... 

The class called by my query plugin is
org.apache.nutch.searcher.RawFieldQueryFilter. I'll check into that
also.

I'll plunge into the details and let you know if I find something. 


David

-----------------------------------------
David Poirier
E-business Consultant - Software Engineer






-----Original Message-----
From: Fred Gilmore [mailto:[EMAIL PROTECTED] 
Sent: mercredi, 26. mars 2008 18:34
To: [email protected]
Subject: Re: nutch: creating new plugins: query plugin

I'm watching this thread with interest as I'm stuck in the same place.  
 From reading three years of list archives, people seem to get over the 
hump of indexing custom fields and then get mired in query side.  My 
index is fine.  Luke shows me the fields, the values.  I can change my 
index plugin code to not split on commas and it obeys.   I can search it

with Luke and it pulls data.  I realize that only means so much since 
it's parsing those queries with a Lucene class.

But I can't get past the query plugin.  No matter how closely I follow 
the example on the wiki.  I can look at the query-url, query-more, 
doesn't seem to matter.  In fact, right now, if I load the query-plugin 
listed below (in addition to query-basic) it breaks all searching.  
keyword, fielded, whatever.


<plugin
   id="query-placename"
   name="Placename Query Filter"
   version="1.0.0"
   provider-name="utexas.edu">

   <runtime>
      <library name="query-placename.jar">
         <export name="*"/>
      </library>
   </runtime>

   <requires>
      <import plugin="nutch-extensionpoints"/>
   </requires>

   <extension
id="org.apache.nutch.searcher.placename.PlacenameQueryFilter"
              name="Placename Query Filter"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="PlacenameQueryFilter"
                      
class="org.apache.nutch.searcher.placename.PlacenameQueryFilter">
        <parameter name="fields" value="placename"/>
      </implementation>
   </extension>
</plugin>

===============
[search1]:nutch> pg PlacenameQueryFilter.java
package org.apache.nutch.searcher.placename;

import org.apache.nutch.searcher.FieldQueryFilter;
import org.apache.hadoop.conf.Configuration;


public class PlacenameQueryFilter extends FieldQueryFilter {

  public PlacenameQueryFilter() {
    super("placename", 5f);
  }

  public void setConf(Configuration conf) {
    super.setConf(conf);
  }

}

The wiki plugin example omits setConf as above, the query-url code sets 
it as does query-more.  Some use rawfieldqueryfilter, some use 
queryfilter, doesn't seem that should matter.

The plugin gets shuttled over to the tomcat side, the nutch-site.xml 
gets updated with a new plugins.include stanza and the webapp 
redeployed.  I've tried loading nutch-extensionpoints first here as 
well, doesn't seem to matter.  Maybe the boost is messing things up, 
it's set high but that's because previous threads have indicated it was 
the only way to get the field only searches like placename:london
working.

  <property>
    <name>searcher.dir</name>
    <value>/usr/local/db/nutch/search1/crawls/missions-test</value>
  </property>

<property>
  <name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(text|html|meta)|index-(basic
|more|meta)|query-(basic|more|placename|creator|url)|summary-lucene|scor
ing-opic|urlnormalizer-(pass|regex|basic)</value>
</property>

 No other nutch-default.xml or nutch-site.xml settings are altered.  But

there must be something obvious I'm leaving unset or that's conflicting 
on the tomcat side which is breaking this.

removing the query-placename and query-creator plugins, keyword 
searching resumes.  url: works, so the syntax is accepted.  

After several weeks of trying diff things, I'm all out.  But there must 
be something I'm missing.  Any ideas at all?

thanks,

Fred Gilmore
University of Texas Austin Libraries
>>     

Reply via email to