Flag for generate to fetch only new pages to complement the -refetchonly flag
-----------------------------------------------------------------------------
Key: NUTCH-49
URL: http://issues.apache.org/jira/browse/NUTCH-49
Project: Nutch
Type: New Feature
Components: fetcher
Reporter: Luke Baker
Priority: Minor
Attachments: fetchnewonly.patch
It would be useful, especially for research/testing purposes, to have a flag
for the FetchListTool that make sure to only include URLs in the fetchlist that
have not already been fetched (according to the information from the webdb that
you're generating the fetchlist from).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers