The way I go is that I index such pages anyway but 'tag' them. So I
use a index filter for that and tag the positive pages with a other tag.
Like this category:trash or category:nugget.
Than I also use a querfilter plugin and in the ui I extend my query:
queryString+ " category:nugget"
So you will have only non trash pages in your results. I guess you
can also use the prune tool to remove such trash pages the index if
you like.
HTH
Stefan
Am 14.02.2006 um 08:11 schrieb Elwin:
When using nutch to crawl some sites, I want to index fetched contents
selectively only when the urls to these contents fit my filter, for
other
urls I just want nutch to crawl them and parse them without index.
How can I achieve this? Which extension point should I extend?
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers