[
https://issues.apache.org/jira/browse/HADOOP-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525258
]
Jim Kellerman commented on HADOOP-1831:
---------------------------------------
On Wed, 2007-09-05 at 14:21 -0700, Nigel Daley (JIRA) wrote:
> But I think the problem is with Junit. JUnit is *supposed* to timeout a test
> if it is
> taking longer than 15 minutes. This doesn't seem to work reliably if a test
> gets really
> 'wedged'.
Understood. But how difficult would it be to start a subprocess from the build
just prior to starting a test, and have it monitor the test and kill it if it
takes too long?
(See the section "Killing a hung test" at
http://wiki.apache.org/lucene-hadoop/HudsonBuildServer )
Once the test has been killed or if the test exits normally, the subprocess
would just exit. The task that could do this is a pretty simple piece of
shell-scripting.
When I have killed just the process running the test manually, the build
resumes.
If we did this, I don't think we'd need a timeout on the whole build, because
the reason builds take a long time is due to a hung test.
> Note too that having Hudson timeout a patch build won't have the effect you
> desire.
> It will simply hang the patch queue since the 'current' link on the
> filesystem to the
> patch being tested won't get removed.
I wasn't really suggesting killing the whole build. In my experience just doing
a kill -9 on the stuck test kills the test, and the build just resumes.
> Hudson should kill long running tests
> -------------------------------------
>
> Key: HADOOP-1831
> URL: https://issues.apache.org/jira/browse/HADOOP-1831
> Project: Hadoop
> Issue Type: New Feature
> Components: build
> Affects Versions: 0.15.0
> Reporter: Jim Kellerman
> Fix For: 0.15.0
>
>
> Hudson should kill long running tests. (I believe it is supposed to but
> doesn't quite seem to do the job if the test is really hung up).
> It would be nice if, when the timer goes off, Hudson did a {code}kill
> -QUIT{code} (to try to get a thread dump) and then followed that with a
> {code}kill -9{code}
> (See the section "Killing a hung test" at
> http://wiki.apache.org/lucene-hadoop/HudsonBuildServer )
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.