[ https://issues.apache.org/jira/browse/NUTCH-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882581#action_12882581 ]
Alex McLintock commented on NUTCH-478: -------------------------------------- I'm quite keen on this idea. The lack of this feature made things really quite difficult when testing I have one small comment... > User create a file named "FetchStop" in nutch home. Presumably it should be in the top directory of the crawl - not in nutch home. If it were in nutch home then you would switch off all crawls currently going on - and there may be more than one. > Add function for stopping FetherThread gracefully > ------------------------------------------------- > > Key: NUTCH-478 > URL: https://issues.apache.org/jira/browse/NUTCH-478 > Project: Nutch > Issue Type: New Feature > Components: fetcher > Affects Versions: 0.9.0 > Reporter: chee.wu > > Now the fetch process will be stopped only when time out occurred during > the fetch: > "System.currentTimeMillis() - lastRequestStart.get()) > timeout " > We don't have method to let fetch process to stop.Some times we may have > strict time requirement for fetch process, for example from 11pm to 7am.I > want to shutdown fetch process at 7am every day even there still have pages > remained unfeched in the segments generated. > A possible solution to implement this might be: > 1. User create a file named "FetchStop" in nutch home. > 2. Check the existence of the file every minute in the main thread,and set > the boolean variable like "stopFetch" to true; > 3. FetchThread will check the status of "stopFetch" before fetching next > URL. If changed to true, FetcherThread will stop right now,also the value of > activeThreads will be reduced. > 4. Finally, the main thread will end if activeThreads=0 > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.