I'm using nutch 1.0.
My subcollections.xml config file is configured like this:
<?xml version="1.0" encoding="UTF-8"?>
<subcollections>
<subcollection>
<name>sub1</name>
<id>sub1</id>
<whitelist>
http://www.apache.org/
</whitelist>
<blacklist />
</subcollection>
<subcollection>
<name>sub2</name>
<id>sub2</id>
<whitelist>
http://www.mysql.com/
</whitelist>
<blacklist />
</subcollection>
<subcollection>
<name>sub3</name>
<id>sub3</id>
<whitelist>
http://www.redhat.com/
</whitelist>
<blacklist />
</subcollection>
</subcollections>
After indexing, and making sure that plugin subcollection was activated
on nutch-site.xml,
I checked the database with luke.
Subcollection field was populated as it should with sub1,sub2,sub3
Problem is when I try to search for anything associated with a
subcollection.
I get zero results (on luke).
Using the command line, the same results:
./bin/nutch org.apache.nutch.searcher.NutchBean "subcollection:sub1 apache"
Total hits: 0
After performing a normal search, following the explain link on the
search results, the subcollection content is correct too but any search
using subcollection:sub1 text, returns no results..
Bug maybe?