prashant_nutch wrote:
> IS Subcollection useful for specific URL Searching ?
> How we activate subcollection at indexing and searching time?
>
> in conf/subcollection ,
> if we include our URL in whitelist ,then only we have search on that URLs?
> command for searching on subcollection
>
> Subcollection :< Name of subcollection> < word for specific URL>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <subcollections>
> <subcollection>
> <name>nutch</name>
> <id>nutch</id>
> <whitelist>
> http://lucene.apache.org/nutch/
> http://wiki.apache.org/nutch/
> </whitelist>
> <blacklist />
> </subcollection>
> </subcollections>
>
> can anybody explain how overall thing should work ?
> can it is useful for specific URL searching ?(we are using nutch 0.8.1)
>
>
Subcollection is a very useful way to group a set of urls and then
assign a label for them. You can use it to limit searching to certain urls.
You should first enable subcollection in the nutch-site.xml file.
Then you should add collections to the conf/subcollection.xml file.
After indexing, the documents with the matched urls should have the
subcollection field in the index.
After that, since subcollection also includes a query plugin, you can do
searches like
java subcollection:nutch
To limit the search to the nutch collection. You can consult the readme
file in the plugin's directory.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general