[jira] [Commented] (LUCENE-3429) improve build system when tests hang

Dawid Weiss (JIRA) Wed, 14 Sep 2011 02:37:39 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104384#comment-13104384
 ]


Dawid Weiss commented on LUCENE-3429:
-------------------------------------

bq. the correct statement is that stop would not stop a thread that is waiting 
if interrupt would also not stop it

Ehm, too many negations for me, but I think you meant the other way around? 
Anyway, there's really little to it: stop() and interrupt() both act similar: 
they attempt to break the thread's execution by throwing an exception inside 
the thread's current call stack. The difference is that interrupt() sets a flag 
on the thread which is checked by wait/sleep method and I/O and then thrown as 
a checked exception and stop() tries to throw an unchecked exception as early 
as possible and theoretically can happen at any given statement.

In a piece of software that cleans up resources using finally() and doesn't 
capture-and-ignore of Throwable/Error exceptions this shouldn't really matter 
that much and be safe.

Simon was worried about calling stop() and possibly leaving junk on disk or 
doing weird stuff. True, this can happen, but in the end it's what will happen 
anyway if a thread is busy-looped infinitely or locked: either we will try to 
kill it or the jvm will at the end of its execution.

I will modify the code to use a more graceful cascade of: interrupt() - wait a 
bit - then try to kill the thread because I still think it has advantages over 
just leaving the problematic thread running in the background. These 
disadvantages are:

- the vm will never exit from tests if the threads are non-daemon threads,
- background threads may interfere with other threads and provide noise that 
will not be reproducible.

These are my motivating factors for using stop() as a last resort option for 
threads that did go into an endless loop (or exceeded a largeish timeout time). 
Simon, I know you have a gut feeling that calling stop() is wrong, but you need 
to convince me with arguments other than just your gut feeling :)




> improve build system when tests hang
> ------------------------------------
>
>                 Key: LUCENE-3429
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3429
>             Project: Lucene - Java
>          Issue Type: Test
>            Reporter: Robert Muir
>             Fix For: 3.5, 4.0
>
>         Attachments: LUCENE-3429.patch, LUCENE-3429.patch
>
>
> Currently, if tests hang in hudson it can go hung for days until we manually 
> kill it.
> The problem is that when a hang happens its probably serious, what we want to 
> do (I think), is:
> # time out the build.
> # ensure we have enough debugging information to hopefully fix any hang.
> So I think the ideal solution would be:
> # add a sysprop "-D" that LuceneTestCase respects, it could default to no 
> timeout at all (some value like zero).
> # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
> the test class? method?
> # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
> information, random seed information to hopefully reproduce the hang, and 
> fails the test.
> # nightly builds would pass some reasonable -D for each test.
> separately, I think we should have an "ant-level" timeout for the whole 
> build, in case it goes completely crazy (e.g. jvm completely hangs or 
> something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3429) improve build system when tests hang

Reply via email to