What happens when you do the keyword query only?
Where are you executing the query from? Using the NutchBean? If so,
then the double-quotes would be necessary.
Why don't you try searching for the urls directly that you think should
be returned using the url: syntax to make sure they got indexed and you
are pointing at the right index.
Brian Ulicny
On Wed, 26 Mar 2008 11:50:43 +0100, "POIRIER David"
<[EMAIL PROTECTED]> said:
> Hello,
>
> I really need your help here please. I tried a few more things; I
> deleted my two plugins and instead of creating new ones I modified the
> existing index-more and query-more plugins.
>
> The index-more modification is working. Here's what I added:
> private Document addScope(Document doc, ParseData data, String url) {
> doc.add(new Field("scope", "aScope", Field.Store.YES,
> Field.Index.UN_TOKENIZED));
> return doc;
> }
>
> And made sure that the method is called by adding this in the filter
> method:
> addScope(doc, parse.getData(), url_s);
>
> Using the Nutch API, when I check for the details of a hit, I look for:
> String scope = detail.getValue("scope");
>
> And as expected it always return "aScope".
>
> The problem is when I try to filter a query using my modified query-more
> plugin. When executing the query "aKeyword scope:aScope" (the double
> quotes are there only for the email lisibility)the index always returns
> 0 result.
>
> Here's the additional class to the org.apache.nutch.indexer.more
> package:
> import org.apache.nutch.searcher.RawFieldQueryFilter;
> import org.apache.hadoop.conf.Configuration;
>
> /**
> * Handles "scope:" query clauses, causing them to search the field
> * indexed by MoreIndexingFilter.
> *
> * @author John Xing / David Poirier
> */
>
> public class ScopeQueryFilter extends RawFieldQueryFilter {
> private Configuration conf;
>
> public ScopeQueryFilter() {
> super("scope");
> }
>
> public void setConf(Configuration conf) {
> this.conf = conf;
> setBoost(conf.getFloat("query.scope.boost", 0.0f));
> }
>
> public Configuration getConf() {
> return this.conf;
> }
> }
>
> And the plugin.xml file associated with it:
> <plugin
> id="query-more"
> name="More Query Filter"
> version="1.0.0"
> provider-name="nutch.org">
>
> <runtime>
> <library name="query-more.jar">
> <export name="*"/>
> </library>
> </runtime>
>
> <requires>
> <import plugin="nutch-extensionpoints"/>
> </requires>
>
> <extension id="org.apache.nutch.searcher.more"
> name="Nutch More Query Filter"
> point="org.apache.nutch.searcher.QueryFilter">
> <implementation id="TypeQueryFilter"
>
> class="org.apache.nutch.searcher.more.TypeQueryFilter">
> <parameter name="raw-fields" value="type"/>
> </implementation>
>
> </extension>
>
> <extension id="org.apache.nutch.searcher.more"
> name="Nutch More Query Filter"
> point="org.apache.nutch.searcher.QueryFilter">
> <implementation id="DateQueryFilter"
>
> class="org.apache.nutch.searcher.more.DateQueryFilter">
> <parameter name="raw-fields" value="date"/>
> </implementation>
>
> </extension>
>
> <extension id="org.apache.nutch.searcher.more"
> name="Nutch More Query Filter"
> point="org.apache.nutch.searcher.QueryFilter">
> <implementation id="ScopeQueryFilter"
>
> class="org.apache.nutch.searcher.more.ScopeQueryFilter">
> <parameter name="raw-fields" value="scope"/>
> </implementation>
>
> </extension>
>
> </plugin>
>
> If this tells something to anybody, please let me know.
>
> Thank you in advance,
>
> David
>
>
> -----------------------------------------
> David Poirier
> E-business Consultant - Software Engineer
>
>
>
> -----Original Message-----
> From: POIRIER David [mailto:[EMAIL PROTECTED]
> Sent: mardi, 25. mars 2008 18:09
> To: [email protected]
> Subject: nutch: creating new plugins: query plugin
>
> Hello,
>
> Following the info available on the wiki
> (http://wiki.apache.org/nutch/CreateNewFilter), I have created two new
> plugins:
> - index-scope (based on index-more)
> - query-scope (based on query-site)
>
> As you can guess, the first plugin simply add the "scope" metadata to
> every parsed document, giving them, as a test, a fixed value, while the
> second plugin add the possibility to search for a "scope" using the
> Lucene syntax.
>
> I have deploy the two new plugins, as JARS, in my plugins repository and
> modified my nutch-site.xml file to look for them. To be sure of
> everything I have performed a complete crawling of a "virgin" source. I
> have also modified both plugin.xml files so that the system can find the
> right java classes.
>
> Looking at a resultset everything looks fine: every hit in the set
> possesses the metadata scope=aScope, which is exactly what I am looking
> for. Things stop working though when I try to search for the metadata
> using the Lucene syntax. The query "aWord scope:aScope" returns
> nothing...
>
> When I check at my log files I can see that the query-scope plugin is
> available:
> [...]
> 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
> org.apache.nutch.plugin.PluginRepository - Scope Query Filter
> (query-scope)
> [...]
> And that the proper extension point is registered:
> [...]
> 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO
> org.apache.nutch.plugin.PluginRepository - Nutch Query Filter
> (org.apache.nutch.searcher.QueryFilter)
> [...]
>
>
> Here is the plugin.xml file associated with the plugin:
>
> <plugin
> id="query-scope"
> name="a description"
> version="1.0.0"
> provider-name="myName.xyz">
>
> <runtime>
> <library name="query-scope.jar">
> <export name="*"/>
> </library>
> </runtime>
>
> <requires>
> <import plugin="nutch-extensionpoints"/>
> </requires>
>
> <extension
> id="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified"
> name="Scope Query Filter"
> point="org.apache.nutch.searcher.QueryFilter">
> <implementation id="SiteQueryFilterModified"
>
> class="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified">
> <parameter name="raw-fields" value="scope"/>
> </implementation>
>
> </extension>
> </plugin>
>
>
>
> If somebody has any idea... please let me know! Thank you in advance!
>
> David
>
>
--
Brian Ulicny
bulicny at alum dot mit dot edu
home: 781-721-5746
fax: 360-361-5746