[ https://issues.apache.org/jira/browse/NUTCH-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jorge Luis Betancourt Gonzalez updated NUTCH-2036: -------------------------------------------------- Attachment: NUTCH-2036.patch > Adding some continuous crawl goodies to the crawl script > -------------------------------------------------------- > > Key: NUTCH-2036 > URL: https://issues.apache.org/jira/browse/NUTCH-2036 > Project: Nutch > Issue Type: Improvement > Components: bin, tool, util > Affects Versions: 1.10, 1.11 > Reporter: Jorge Luis Betancourt Gonzalez > Priority: Minor > Labels: crawl, script > Attachments: NUTCH-2036.patch > > > Although Nutch does not support continuous crawling out of the box, and yes > this is somehow doable using cron or even sometimes irrelevant due the size > of the crawl its a nice feature to have. > This patch basically just adds a new parameter option to the {{bin/crawl}} > script (-w|--wait) which adds a time to wait if the generator returns 0 (when > no URLs are scheduled for fetching). > This new parameter has the {{NUMBER\[SUFFIX\]}} format, if no suffix is > provided the amount of time is assumed to be in seconds. Other valid suffixes > are: > s - second > m - minutes > h - hours > d - days > If a {{-1}} value is passed to the parameter or its not used at all the > default behaviour of exciting the script is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)