[ http://issues.apache.org/jira/browse/NUTCH-49?page=comments#action_12355864 ]
byron miller commented on NUTCH-49: ----------------------------------- Can something like this be adapted to use the regex filter as well? it would be nice to say new only and match urls of x type or x link score or some other expressions. (not just the very topN) > Flag for generate to fetch only new pages to complement the -refetchonly flag > ----------------------------------------------------------------------------- > > Key: NUTCH-49 > URL: http://issues.apache.org/jira/browse/NUTCH-49 > Project: Nutch > Type: New Feature > Components: fetcher > Reporter: Luke Baker > Priority: Minor > Attachments: fetchnewonly.patch > > It would be useful, especially for research/testing purposes, to have a flag > for the FetchListTool that make sure to only include URLs in the fetchlist > that have not already been fetched (according to the information from the > webdb that you're generating the fetchlist from). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today * Register for a JBoss Training Course Free Certification Exam for All Training Attendees Through End of 2005 Visit http://www.jboss.com/services/certification for more information _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
