[jira] Updated: (NUTCH-668) Domain URL Filter

2008-12-04 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-668:
---

Attachment: NUTCH-668-2-20081204.patch

Updated to include URLUtil methods that were missing.  Sorry.

 Domain URL Filter
 -

 Key: NUTCH-668
 URL: https://issues.apache.org/jira/browse/NUTCH-668
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: NUTCH-668-1-20081202.patch, NUTCH-668-2-20081204.patch


 A URLFilter that adds the ability to filter out URLs by top level domain or 
 by hostname.  A configuration file with a listing of URLs is used to denote 
 accepted urls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-668) Domain URL Filter

2008-12-02 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-668:
---

Attachment: NUTCH-668-1-20081202.patch

Includes the DomainURLFilter and test files.  Domains can either be filtered by 
top level domains ignoring subdomains, or by hostnames through configuration.  
There is a configuration file where valid domains are placed one per line.  
Those domains are used to create valid domain set against which we validate 
urls at runtime.  Only urls which match domains in the domain set are 
considered valid.

 Domain URL Filter
 -

 Key: NUTCH-668
 URL: https://issues.apache.org/jira/browse/NUTCH-668
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: NUTCH-668-1-20081202.patch


 A URLFilter that adds the ability to filter out URLs by top level domain or 
 by hostname.  A configuration file with a listing of URLs is used to denote 
 accepted urls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.