I found out that the problem occured since 'url' is stored in index. 

I decided instead to create a plugin whos purpose is to parse the url
and index the parts i need (fields location and language). I have
created the plugin from this tutorial
(http://wiki.apache.org/nutch/WritingPluginExample-0%2e9). I have one
Indexingfilter indexing the two fields mentioned above. This works fine
and I can view them from Luke (nice tool by the way). However my two
QueryFilters in the same plugin does not load correctly it seems.

If I do a query I recive a nullpointer in queryFilters (line 109,
nutch-0.9). After some debugging I can see that both my filters end up
as null while URLQueryFilter, SiteQueryFilter, and BasicQueryFilter
shows as expected. The length of the queryFilters array is 5 so somthing
works and something does not it seems :-)


My fiters is configured like this (in plugin.xml):

   <extension
id="no.avinor.aviluft.nutch.AvinorInternettLocationQueryFilter"
              name="Avinor Internett Location Search Query Filter"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="AvinorInternettLocationQueryFilter"
 
class="no.avinor.aviluft.nutch.AvinorInternettLocationQueryFilter"
                      fields="DEFAULT"/> <!-- URLQueryFilter uses url
not DEFAULT, why is that? -->
   </extension>

The source looks like this;

public class AvinorInternettLocationQueryFilter extends FieldQueryFilter
{

        private static final Log LOG =
LogFactory.getLog(AvinorInternettLocationQueryFilter.class.getName());

    public AvinorInternettLocationQueryFilter() {
        super("location", 5f);
        LOG.info("Added a location query");
    }
}

I also updated nutch-default.xml to include my plugin (somthing that is
exluded in the 0.9 tutorial, but included in 0.7).

I am thinking about splitting the plugin into two or four separate
plugins, but I do not think that will help me, or even copy and
duplicate the query-url plugin to see if that helps. 

Anyone seeing something wrong or having any tips/solutions regarding
this problem?

Regards,
Ronny

-----Opprinnelig melding-----
Fra: Naess, Ronny [mailto:[EMAIL PROTECTED] 
Sendt: 23. mai 2007 20:28
Til: [EMAIL PROTECTED]
Emne: Filtering hits

 
Is it possible to filter out url: hits?

With this I mean the following.

Query: "sometext url:myurl" (quotes is not part of search only to show
the query) This query gives me hits where "sometext" is found for given
url. The problem with this is that myurl is also displayed as a hit. So
if I have only one page with "sometext" but total 40 with myurl I will
recive 40 hits, but I want only the one with sometext. Said in another
way. If i do the query "url:myurl" I do not want any hits at all.

Is it possible make it behave like I want?

I guess that 'url' is not only indexed but also stored in the document
and that might be the case why url strings is returned as hits?

Regards,
Ronny

!DSPAM:465487b1191861367111490!


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to