[ 
https://issues.apache.org/jira/browse/NUTCH-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882581#action_12882581
 ] 

Alex McLintock commented on NUTCH-478:
--------------------------------------

I'm quite keen on this idea. The lack of this feature made things really quite 
difficult when testing

I have one small comment...

> User create a file named "FetchStop" in nutch home.

Presumably it should be in the top directory of the crawl - not in nutch home. 
If it were in nutch home then you would switch off all crawls currently going 
on - and there may be more than one. 



> Add function for stopping FetherThread gracefully
> -------------------------------------------------
>
>                 Key: NUTCH-478
>                 URL: https://issues.apache.org/jira/browse/NUTCH-478
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: chee.wu
>
> Now the fetch process will be  stopped only when  time out occurred during 
> the fetch:
> "System.currentTimeMillis() - lastRequestStart.get()) > timeout "
> We don't have method to let fetch process to stop.Some times we may have 
> strict time requirement for fetch process, for example from 11pm to 7am.I 
> want to shutdown fetch process at 7am every day even there  still have pages 
> remained unfeched in the segments generated.
> A possible solution to implement this might be:
> 1. User create a file named "FetchStop" in nutch home.
> 2. Check the existence of the file every minute in the main thread,and set 
> the boolean variable like "stopFetch" to true;
> 3. FetchThread will check  the status of "stopFetch" before fetching next 
> URL. If changed to true, FetcherThread will stop right now,also the value of 
> activeThreads will be reduced.
> 4. Finally, the main thread will end if  activeThreads=0
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to