[ 
https://issues.apache.org/jira/browse/HADOOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525258
 ] 

Jim Kellerman commented on HADOOP-1831:
---------------------------------------

On Wed, 2007-09-05 at 14:21 -0700, Nigel Daley (JIRA) wrote:
> But I think the problem is with Junit.  JUnit is *supposed* to timeout a test 
> if it is 
> taking longer than 15 minutes.  This doesn't seem to work reliably if a test 
> gets really
> 'wedged'.

Understood. But how difficult would it be to start a subprocess from the build 
just prior to starting a test, and have it monitor the test and kill it if it 
takes too long?

(See the section "Killing a hung test" at 
http://wiki.apache.org/lucene-hadoop/HudsonBuildServer )

Once the test has been killed or if the test exits normally, the subprocess 
would just exit. The task that could do this is a pretty simple piece of 
shell-scripting.

When I have killed just the process running the test manually, the build 
resumes.

If we did this, I don't think we'd need a timeout on the whole build, because 
the reason builds take a long time is due to a hung test.

> Note too that having Hudson timeout a patch build won't have the effect you 
> desire.  
> It will simply hang the patch queue since the 'current' link on the 
> filesystem to the
> patch being tested won't get removed.

I wasn't really suggesting killing the whole build. In my experience just doing 
a kill -9 on the stuck test kills the test, and the build just resumes.


> Hudson should kill long running tests
> -------------------------------------
>
>                 Key: HADOOP-1831
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1831
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: build
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>             Fix For: 0.15.0
>
>
> Hudson should kill long running tests. (I believe it is supposed to but 
> doesn't quite seem to do the job if the test is really hung up).
> It would be nice if, when the timer goes off, Hudson did a {code}kill 
> -QUIT{code} (to try to get a thread dump) and then followed that with a 
> {code}kill -9{code}
> (See the section "Killing a hung test" at 
> http://wiki.apache.org/lucene-hadoop/HudsonBuildServer )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to