[ http://issues.apache.org/jira/browse/NUTCH-49?page=comments#action_12355864 ]
byron miller commented on NUTCH-49: ----------------------------------- Can something like this be adapted to use the regex filter as well? it would be nice to say new only and match urls of x type or x link score or some other expressions. (not just the very topN) > Flag for generate to fetch only new pages to complement the -refetchonly flag > ----------------------------------------------------------------------------- > > Key: NUTCH-49 > URL: http://issues.apache.org/jira/browse/NUTCH-49 > Project: Nutch > Type: New Feature > Components: fetcher > Reporter: Luke Baker > Priority: Minor > Attachments: fetchnewonly.patch > > It would be useful, especially for research/testing purposes, to have a flag > for the FetchListTool that make sure to only include URLs in the fetchlist > that have not already been fetched (according to the information from the > webdb that you're generating the fetchlist from). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira