You will need to add urlfilter-domain to the plugin.includes config
variable in either the nutch-site.xml or nutch-default.xml file.
Something like this depending on your config:
<property>
<name>plugin.includes</name>
<value>...urlfilter-(prefix|suffix|domain)...</value>
</property>
Then having the domain-urlfilter.txt file in the conf directory with the
one line se should work to match only .se domains.
Denis
Larsson85 wrote:
Hi
I've been trying to get the domain-urlfilter to work for quiete some time
now. What I want to do is simply to only crawl pages within the .se domain.
From what I can understand it should be enough to only write
se
in the domain-urlfilter.txt file, and that should fix it. But that has no
effect. I've been trying witth different regulair expressions and so on as
well, but the filter doesnt have any effect what so ever.
In the domain-urlfilter.txt it says something about it beeing a plugin. Do I
have to load the plugin in some special way to make it active? If so, how?
Thanks for any help
I'm running nutch 1.0