[ http://issues.apache.org/jira/browse/NUTCH-151?page=all ]

Paul Baclace updated NUTCH-151:
-------------------------------

    Attachment: CommandRunner.java

Minimal required changes to fix bug NUTCH-151:
1. The pipe io threads should be daemons.
2. The main thread should always interrupt() the pipe io threads when finishing 
up, not just when a timeout occurs.
3. Sleep before testing whether the process has finished with 
Process.exitValue().
4. Increased the sleep time to be 1000msec.

Obvious cleanup hitchhiking along:
5. Remove unused _kaput;
6. Added comments indicating changes to make in order to use JDK 1.5 instead of 
 EDU.oswego.cs.dl.util.concurrent package.
7. Changed void evaluate() to be a convenience method that uses int exec() 
which returns the exit code (or -1 if timed out).

An alternative to the busy loop is to use Process.waitFor() and a separate 
alarm thread can interrupt the main thread to effect a timeout.  The main 
thread can then interrupt() the io pipe threads and they will receive an 
InterruptedIOException.  If necessary, the main thread can also close the 
streams the io pipe threads are reading from in order to force  them out of 
read().  (Oddly, the JavaDoc for Thread.interrupt() does not  mention 
InterruptedIOException.)  



> CommandRunner can hang after the main thread exec is finished and has 
> inefficient busy loop
> -------------------------------------------------------------------------------------------
>
>          Key: NUTCH-151
>          URL: http://issues.apache.org/jira/browse/NUTCH-151
>      Project: Nutch
>         Type: Bug
>   Components: indexer
>     Versions: 0.8-dev
>  Environment: all
>     Reporter: Paul Baclace
>  Attachments: CommandRunner.java
>
> I encountered a case where the JVM of a Tasktracker child did not exit after 
> the main thread returned; a thread dump showed only the threads named STDOUT 
> and STDERR from CommandRunner as non-daemon threads, and both were doing a 
> read().
> CommandRunner usually works correctly when the subprocess is expected to be 
> finished before the timeout or when no timeout is used. By _usually_, I mean 
> in the absence of external thread interrupts.  The busy loop that waits for 
> the process to finish has a sleep that is skipped over by an exception; this 
> causes the waiting main thread to compete with the subprocess in a tight loop 
> and effectively reduces the available cpu by 50%.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to