Semyon Semyonov created NUTCH-2522: -------------------------------------- Summary: Bidirectional URL exemption filter Key: NUTCH-2522 URL: https://issues.apache.org/jira/browse/NUTCH-2522 Project: Nutch Issue Type: Improvement Components: plugin Reporter: Semyon Semyonov
The current Nutch Url Exemption plugin exempts based on toUrl only, the new plugin uses both fromUrl and toUrl and after the regex transformation, exempts based on condition regex(fromUrl) == regex(toUrl). This approach allows us to perform more complex url exemption filter checks, such as allow links: http://[www.website.com/|http://www.website.com/]home -> http://[website.com/a|http://www.website.com/]bout ( with/without www). -- This message was sent by Atlassian JIRA (v7.6.3#76005)