[ 
https://issues.apache.org/jira/browse/DERBY-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Knut Anders Hatlen updated DERBY-5643:
--------------------------------------

    Attachment: fail-on-timeout.diff

I've found some problems with the previous changes. I'm not sure if that's what 
causing problems in the nightly testing, though.

The replication tests and the compatibility tests now call pingForServerUp() 
and wait until the server is up or a timeout happens, but they don't fail if a 
timeout happens. The attached patch (fail-on-timeout.diff) makes those tests 
check the return value of pingForServerUp() and fail if the server did not come 
up in time.

The patch also changes how AutoloadTest.testAutoNetworkServerBoot() verifies 
that the server did not come up. In the existing code, it pings until the 
timeout happens, and then returns successfully. This took 40 seconds before the 
timeout was changed, and now it takes 4 minutes. The timeout is not supposed to 
affect tests under normal operation, so it should not have pinged that long in 
the first place.

The patch makes the test wait for a shorter time (5 seconds) before concluding 
that the server didn't come up. This may cause bugs to go unnoticed on very 
slow machines (if the server comes up when it shouldn't, but it takes more than 
5 seconds), but it will speed up the test considerably and still detect the 
problem on reasonably fast machines.
                
> Occasional hangs in replication tests on Linux
> ----------------------------------------------
>
>                 Key: DERBY-5643
>                 URL: https://issues.apache.org/jira/browse/DERBY-5643
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication, Test
>    Affects Versions: 10.9.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>             Fix For: 10.9.0.0
>
>         Attachments: fail-on-timeout.diff, higher-timeout.diff, 
> thread-dump.txt, waitFor-2.diff, waitFor.diff
>
>
> We occasionally see hangs in the replication tests on Linux. For example 
> here: 
> http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/sles/1298470-suitesAll_diff.txt
> This test run was stuck in tearDown() after 
> ReplicationRun_Local_Derby4910.testSlaveWaitsForMaster(). (Waiting for 
> Thread.join() to return.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to