Have we made all of the changes that we think will help prevent
BindException failures?

Last night's nightly build failed with one again:

:geode-core:flakyTest

com.gemstone.gemfire.security.ClientAuthenticationDUnitTest >
testCredentialsForNotifications FAILED
    com.gemstone.gemfire.test.dunit.RMIException: While invoking
com.gemstone.gemfire.security.ClientAuthenticationTestCase$$Lambda$26/1964608307.call
in VM 0 running on Host asf902.gq1.ygridcore.net with 4 VMs
        at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:389)
        at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:355)
        at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:320)
        at
com.gemstone.gemfire.security.ClientAuthenticationTestCase.doTestCredentialsForNotifications(ClientAuthenticationTestCase.java:456)
        at
com.gemstone.gemfire.security.ClientAuthenticationDUnitTest.testCredentialsForNotifications(ClientAuthenticationDUnitTest.java:82)

        Caused by:
        java.lang.AssertionError: Got unexpected exception when starting
server

            Caused by:
            java.net.BindException: Failed to create server socket on
 null[60,026]

                Caused by:
                java.net.BindException: Address already in use

193 tests completed, 1 failed, 6 skipped

On Thu, Aug 4, 2016 at 10:38 AM, Bruce Schuchardt <[email protected]>
wrote:

> I've pushed the port-range changes that I described in my last email on
> this subject.
>
>
> Le 8/1/2016 à 5:33 PM, Kirk Lund a écrit :
>
>> I think that the changes mentioned by Jens and Bruce obviate the need to
>> do
>> what I was proposing.
>>
>> -Kirk
>>
>>
>> On Fri, Jul 29, 2016 at 3:41 PM, Bruce Schuchardt <[email protected]
>> >
>> wrote:
>>
>> I'm making another change that will help.
>>>
>>> One of the problems with these tests is that they will choose a random
>>> port for a Cache Server or some other component and only use the port
>>> after
>>> opening a cache.  Doing that allows the communications/membership
>>> component
>>> to grab two ports. AvailablePort restricts the ports it hands out to the
>>> range [20000, 30000], so if we restrict the communications/membership
>>> component to use ports outside of that range it will help avoid
>>> collisions.
>>>
>>>
>>> Le 7/29/2016 à 3:23 PM, Nabarun Nag a écrit :
>>>
>>> +1 for the retry.
>>>>
>>>> In my opinion, maintaining available port lists maybe hard as we move
>>>> towards running test modules in parallel. Also maybe some non-geode
>>>> entity
>>>> may come up and pick up a port hence we will need to constantly
>>>> refresh/update the list before/after each test run. (10000 ports needs
>>>> to
>>>> be checked as per geode getRandomWildcardBindPortNumber)
>>>>
>>>>
>>>> Also for GEODE-1600 fix, DUnitLauncher now passes 0 as the port number
>>>> while creating a locator. The system assigns it an available port number
>>>> while staring the server rather than getting a random available port
>>>> number
>>>> first then asking things to be started on that port. (race conditions
>>>> ensues )
>>>>
>>>> On Fri, Jul 29, 2016 at 2:36 PM William Markito <[email protected]>
>>>> wrote:
>>>>
>>>> Why not create a JUnit rule that returns available ports and keep track
>>>> of
>>>>
>>>>> ports being used ?
>>>>>
>>>>> I've cloned this gist from somewhere (don't remember now) but I've
>>>>> planning
>>>>> to send it for discussion...
>>>>>
>>>>> https://gist.github.com/markito/b5be3fc570c7c7c84e6d09e064a6e898
>>>>>
>>>>> Still talking about rules, I've played a bit with the TemporaryFolder
>>>>> rule
>>>>> and that's very useful as well, specially to clean up after test runs
>>>>> and
>>>>> to avoid conflicts.
>>>>>
>>>>> http://junit.org/junit4/javadoc/4.12/org/junit/rules/Tempora
>>>>> ryFolder.html
>>>>>
>>>>> Just my 2c
>>>>>
>>>>> On Fri, Jul 29, 2016 at 1:54 PM, Hitesh Khamesra <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Is there any possibility of running multiple test same time on that
>>>>>
>>>>>> machine?
>>>>>>
>>>>>> -Hitesh
>>>>>>
>>>>>>
>>>>>>         From: Kirk Lund <[email protected]>
>>>>>>    To: geode <[email protected]>
>>>>>>    Sent: Friday, July 29, 2016 1:21 PM
>>>>>>    Subject: Flaky tests failing with BindException
>>>>>>
>>>>>> Many of our flaky tests are flaky because they use AvailablePort or
>>>>>> AvailablePortHelper to find randomly available ports. They then later
>>>>>>
>>>>>> fail
>>>>>
>>>>> with a BindException because the port is already in use by the time the
>>>>>> test uses it.
>>>>>>
>>>>>> Here's a proposal for a temporary fix:
>>>>>>
>>>>>> The module geode-junit contains a JUnit 4 rule called RetryRule. We
>>>>>> could
>>>>>> modify RetryRule to only retry if a BindException (or configurable
>>>>>> exception/s) is detected. This rule would then be dropped into every
>>>>>> test
>>>>>> that uses AvailablePort or AvailablePortHelper. Then if the test fails
>>>>>>
>>>>>> with
>>>>>
>>>>> a BindException, it would automatically retry (once or twice or
>>>>>> whatever
>>>>>>
>>>>>> we
>>>>>
>>>>> decide to configure RetryRule with). If the test fails without any
>>>>>>
>>>>>> detected
>>>>>
>>>>> BindException, then it would just fail without retrying.
>>>>>>
>>>>>> Opinions on this?
>>>>>>
>>>>>> Thanks,
>>>>>> Kirk
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>
>>>>> ~/William
>>>>>
>>>>>
>>>>>
>

Reply via email to