[
https://issues.apache.org/jira/browse/NUTCH-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572820#comment-14572820
]
Julien Nioche commented on NUTCH-2036:
--------------------------------------
+1
Note that this patch allows also to handle cases where we set -1 as value for
the number of rounds, in which case the crawl never stops. This would often be
used in combination with the brand new 'wait' parameter.
> Adding some continuous crawl goodies to the crawl script
> --------------------------------------------------------
>
> Key: NUTCH-2036
> URL: https://issues.apache.org/jira/browse/NUTCH-2036
> Project: Nutch
> Issue Type: Improvement
> Components: bin, tool, util
> Affects Versions: 1.10, 1.11
> Reporter: Jorge Luis Betancourt Gonzalez
> Priority: Minor
> Labels: crawl, script
> Attachments: NUTCH-2036.patch
>
>
> Although Nutch does not support continuous crawling out of the box, and yes
> this is somehow doable using cron or even sometimes irrelevant due the size
> of the crawl its a nice feature to have.
> This patch basically just adds a new parameter option to the {{bin/crawl}}
> script (-w|--wait) which adds a time to wait if the generator returns 0 (when
> no URLs are scheduled for fetching).
> This new parameter has the {{NUMBER\[SUFFIX\]}} format, if no suffix is
> provided the amount of time is assumed to be in seconds. Other valid suffixes
> are:
> s - second
> m - minutes
> h - hours
> d - days
> If a {{-1}} value is passed to the parameter or its not used at all the
> default behaviour of exciting the script is used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)