[ 
https://issues.apache.org/jira/browse/NUTCH-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208881#comment-13208881
 ] 

Lewis John McGibbney commented on NUTCH-1210:
---------------------------------------------

Hi Markus. 
1) I would ask one tiny change in ivy.xml
from
{code}
  <configurations>
    <include file="${nutch.root}/ivy/ivy-configurations.xml"/>
  </configurations>
{code}
to
{code}
  <configurations>
    <include file="../../..//ivy/ivy-configurations.xml"/>
  </configurations>
{code}
this is purely for consistency as I think it's easier to configure in Eclipse 
as the ${nutch.root} variable hasn't been specified.

2) Also domainblacklist-urlfilter.txt is not included in the patch under /conf. 
Would it be possible to have a file there with some commented out documentation 
so users at least have something to go on?

3) Your documentation in the main class also mentions that the property can be 
overridden in nutch-*.xml, however no property exists in nutch-default for 
people to go on meaning that it is likely people will become confused when 
trying to set the property from nutch-site.xml.

My tests seemt obe failing with trunk therefore there is something up with my 
trunk co, so I'll go get that sorted then test a bit more. Thanks  


                
> DomainBlacklistFilter
> ---------------------
>
>                 Key: NUTCH-1210
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1210
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.5
>
>         Attachments: NUTCH-1210-1.5-1.patch
>
>
> The current DomainFilter acts as a white list. We also need a filter that 
> acts as a black list so we can allow tld's and/or domains with DomainFilter 
> but blacklist specific subdomains. If we would patch the current DomainFilter 
> for this behaviour it would break current semantics such as it's precedence. 
> Therefore i would propose a new filter instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to