Jorge Luis Betancourt Gonzalez created NUTCH-2036: -----------------------------------------------------
Summary: Adding some continuous crawl goodies to the crawl script Key: NUTCH-2036 URL: https://issues.apache.org/jira/browse/NUTCH-2036 Project: Nutch Issue Type: Improvement Components: bin, tool, util Affects Versions: 1.10, 1.11 Reporter: Jorge Luis Betancourt Gonzalez Priority: Minor Although Nutch does not support continuous crawling out of the box, and yes this is somehow doable using cron or even sometimes irrelevant due the size of the crawl its a nice feature to have. This patch basically just adds a new parameter option to the {{bin/crawl}} script (-w|--wait) which adds a time to wait if the generator returns 0 (when no URLs are scheduled for fetching). This new parameter has the {{NUMBER\[SUFFIX\]}} format, if no suffix is provided the amount of time is assumed to be in seconds. Other valid suffixes are: s - second m - minutes h - hours d - days If a {{-1}} value is passed to the parameter or its not used at all the default behaviour of exciting the script is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)