[Nutch-dev] Re: Which extension point should I extend?

Stefan Groschupf Thu, 16 Feb 2006 15:22:59 -0800

The way I go is that I index such pages anyway but 'tag' them. So Iuse a index filter for that and tag the positive pages with a other tag.

Like this category:trash or category:nugget.
Than I also use a querfilter plugin and in the ui I extend my query:
queryString+ " category:nugget"

So you will have only non trash pages in your results. I guess youcan also use the prune tool to remove such trash pages the index ifyou like.

HTH
Stefan


Am 14.02.2006 um 08:11 schrieb Elwin:

When using nutch to crawl some sites, I want to index fetched contents

selectively only when the urls to these contents fit my filter, forother

urls I just want nutch to crawl them and parse them without index.
How can I achieve this? Which extension point should I extend?




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: Which extension point should I extend?

Reply via email to