There is a 3rd option for #1:

c) apply our custom JUnit 4 RetryRule to any test that uses random ports so
that the test would be reattempted if it fails due to a BindException

We could modify RetryRule to only reattempt a failed test if the output
from any of the DUnit VMs contained some specified string such as
"BindException" -- this would avoid reattempts for real failures.

-Kirk


On Tue, Jul 19, 2016 at 6:13 PM, Kirk Lund <kl...@pivotal.io> wrote:

> 1) GemfireDataCommandsDUnitTest failed because a TCP port that was
> determined to be available was no longer available when the code finally
> attempted to use it, resulting in a "java.net.BindException: Address
> already in use"
>
> Every test that uses getRandomAvailableTCPPorts is prone to hit this
> occasionally. We would have to either:
> a) categorize all such tests as Flaky or
> b) change the Geode APIs (gemfire.properties) to use "0" to mean that the
> product will let the system pick up an ephemeral port during bind instead
> of interpreting it as disabling the service that "0" is being specified for
> (this would break backwards compatibility for any User using "0" to disable
> anything)
>
>       final int[] ports =
> AvailablePortHelper.getRandomAvailableTCPPorts(2);
>
>       jmxPort = ports[0];
>       httpPort = ports[1];
>
>       localProps.setProperty(JMX_MANAGER, "true");
>       localProps.setProperty(JMX_MANAGER_START, "true");
>       localProps.setProperty(JMX_MANAGER_BIND_ADDRESS,
> String.valueOf(jmxHost));
>       localProps.setProperty(JMX_MANAGER_PORT, String.valueOf(jmxPort));
>       localProps.setProperty(HTTP_SERVICE_PORT, String.valueOf(httpPort));
>
>       getSystem(localProps);
>       verifyManagementServiceStarted(getCache());
>
> 2) ConsoleDistributionManagerDUnitTest failed because the
> InternalDistributedSystem connection shut down for an unknown reason during
> the test (just as Bruce already mentioned)
>
> ConsoleDistributionManagerDUnitTest is testing the old deprecated Admin
> API. We could create a new category such as "DeprecatedTest" to temporarily
> exclude such tests from the standard test targets. Ultimately this test and
> the Admin API should be removed which requires some investment of time.
> Unfortunately, the team currently working on GEODE-17 Integrated Security
> doesn't have any time (yet) to dedicate to removal the Admin API and its
> tests.
>
> I don't think release/1.0.0-incubating.M3 is any less stable than develop.
> I believe these tests could potentially hit the same failures on develop,
> though I really don't know what's happening in
> ConsoleDistributionManagerDUnitTest.
>
> Is the system that's running the build for release/1.0.0-incubating.M3
> different in any way from the infrastructure running the nightly build for
> Geode?
>
> -Kirk
>
>
> On Tue, Jul 19, 2016 at 10:49 AM, Bruce Schuchardt <bschucha...@pivotal.io
> > wrote:
>
>> The ConsoleDistributionManagerDUnitTest failure isn't a PDX problem.  The
>> admin distributed system is being shut down for some reason before the test
>> attempts to send an admin request message. Build-artifacts don't have any
>> useful information on why that is happening.
>>
>> Unfortunately the test passes repeatedly when run on its own.
>>
>>
>>
>> Le 7/17/2016 à 8:56 PM, Apache Jenkins Server a écrit :
>>
>>> See <https://builds.apache.org/job/Geode-release/14/>
>>>
>>> ------------------------------------------
>>> [...truncated 697 lines...]
>>>      java.lang.AssertionError: Suspicious strings were written to the
>>> log during this run.
>>>      Fix the strings or use IgnoredException.addIgnoredException to
>>> ignore.
>>>
>>>  -----------------------------------------------------------------------
>>>      Found suspect string in log4j at line 257
>>>
>>>      [error 2016/07/17 21:46:41.931 UTC <RMI TCP
>>> Connection(1)-67.195.81.144> tid=0x1b] Jmx manager could not be started
>>> because java.rmi.server.ExportException: Port already in use: 24744; nested
>>> exception is:
>>>         java.net.BindException: Failed to create server socket on
>>> asf900.gq1.ygridcore.net/67.195.81.144[24,744]
>>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D>
>>>      com.gemstone.gemfire.management.ManagementException:
>>> java.rmi.server.ExportException: Port already in use: 24744; nested
>>> exception is:
>>>         java.net.BindException: Failed to create server socket on
>>> asf900.gq1.ygridcore.net/67.195.81.144[24,744]
>>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D>
>>>         at
>>> com.gemstone.gemfire.management.internal.ManagementAgent.startAgent(ManagementAgent.java:147)
>>>         at
>>> com.gemstone.gemfire.management.internal.SystemManagementService.startManager(SystemManagementService.java:479)
>>>         at
>>> com.gemstone.gemfire.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:197)
>>>         at
>>> com.gemstone.gemfire.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:119)
>>>         at
>>> com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2085)
>>>         at
>>> com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:478)
>>>         at
>>> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1054)
>>>         at
>>> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:707)
>>>         at
>>> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:695)
>>>         at
>>> com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:181)
>>>         at
>>> com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:172)
>>>         at
>>> com.gemstone.gemfire.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:120)
>>>         at
>>> com.gemstone.gemfire.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:263)
>>>         at
>>> com.gemstone.gemfire.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:242)
>>>         at
>>> com.gemstone.gemfire.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:234)
>>>         at
>>> com.gemstone.gemfire.management.internal.cli.commands.CliCommandTestBase.lambda$setUpJMXManagerOnVM$d0384528$1(CliCommandTestBase.java:154)
>>>         at
>>> com.gemstone.gemfire.test.dunit.NamedCallable.call(NamedCallable.java:33)
>>>         at sun.reflect.GeneratedMethodAccessor338.invoke(Unknown Source)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>         at hydra.MethExecutor.executeObject(MethExecutor.java:268)
>>>         at
>>> com.gemstone.gemfire.test.dunit.standalone.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:82)
>>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>         at
>>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
>>>         at sun.rmi.transport.Transport$1.run(Transport.java:200)
>>>         at sun.rmi.transport.Transport$1.run(Transport.java:197)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>      Caused by: java.rmi.server.ExportException: Port already in use:
>>> 24744; nested exception is:
>>>         java.net.BindException: Failed to create server socket on
>>> asf900.gq1.ygridcore.net/67.195.81.144[24,744]
>>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D>
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:341)
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:249)
>>>         at
>>> sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
>>>         at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147)
>>>         at
>>> sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208)
>>>         at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:152)
>>>         at sun.rmi.registry.RegistryImpl.<init>(RegistryImpl.java:112)
>>>         at
>>> java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:239)
>>>         at
>>> com.gemstone.gemfire.management.internal.ManagementAgent.configureAndStart(ManagementAgent.java:389)
>>>         at
>>> com.gemstone.gemfire.management.internal.ManagementAgent.startAgent(ManagementAgent.java:145)
>>>         ... 37 more
>>>      Caused by: java.net.BindException: Failed to create server socket
>>> on  asf900.gq1.ygridcore.net/67.195.81.144[24,744]
>>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D>
>>>         at
>>> com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:814)
>>>         at
>>> com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:768)
>>>         at
>>> com.gemstone.gemfire.management.internal.ManagementAgent$GemFireRMIServerSocketFactory.createServerSocket(ManagementAgent.java:545)
>>>         at
>>> sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666)
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:330)
>>>         ... 46 more
>>>      Caused by: java.net.BindException: Address already in use
>>>         at java.net.PlainSocketImpl.socketBind(Native Method)
>>>         at java.net
>>> .AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>>>         at java.net.ServerSocket.bind(ServerSocket.java:375)
>>>         at
>>> com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:811)
>>>         ... 50 more
>>>
>>>
>>>  -----------------------------------------------------------------------
>>>      Found suspect string in log4j at line 326
>>>
>>>      [error 2016/07/17 21:46:41.938 UTC <RMI TCP
>>> Connection(1)-67.195.81.144> tid=0x1b]
>>> com.gemstone.gemfire.management.ManagementException:
>>> java.rmi.server.ExportException: Port already in use: 24744; nested
>>> exception is:
>>>         java.net.BindException: Failed to create server socket on
>>> asf900.gq1.ygridcore.net/67.195.81.144[24,744]
>>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D>
>>>
>>> 7378 tests completed, 2 failed, 555 skipped
>>> :geode-core:distributedTest FAILED
>>> :geode-core:flakyTest
>>> :geode-core:integrationTest
>>> :geode-cq:assemble
>>> :geode-cq:compileTestJavaNote: Some input files use or override a
>>> deprecated API.
>>> Note: Recompile with -Xlint:deprecation for details.
>>> Note: Some input files use unchecked or unsafe operations.
>>> Note: Recompile with -Xlint:unchecked for details.
>>>
>>> :geode-cq:processTestResources
>>> :geode-cq:testClasses
>>> :geode-cq:checkMissedTests
>>> :geode-cq:test
>>> :geode-cq:check
>>> :geode-cq:build
>>> :geode-cq:distributedTest
>>> :geode-cq:flakyTest
>>> :geode-cq:integrationTest
>>> :geode-json:assemble
>>> :geode-json:compileTestJava UP-TO-DATE
>>> :geode-json:processTestResources UP-TO-DATE
>>> :geode-json:testClasses UP-TO-DATE
>>> :geode-json:checkMissedTests UP-TO-DATE
>>> :geode-json:test UP-TO-DATE
>>> :geode-json:check
>>> :geode-json:build
>>> :geode-json:distributedTest UP-TO-DATE
>>> :geode-json:flakyTest UP-TO-DATE
>>> :geode-json:integrationTest UP-TO-DATE
>>> :geode-junit:javadoc
>>> :geode-junit:javadocJar
>>> :geode-junit:sourcesJar
>>> :geode-junit:signArchives SKIPPED
>>> :geode-junit:assemble
>>> :geode-junit:compileTestJava
>>> :geode-junit:processTestResources UP-TO-DATE
>>> :geode-junit:testClasses
>>> :geode-junit:checkMissedTests
>>> :geode-junit:test
>>> :geode-junit:check
>>> :geode-junit:build
>>> :geode-junit:distributedTest
>>> :geode-junit:flakyTest
>>> :geode-junit:integrationTest
>>> :geode-lucene:assemble
>>> :geode-lucene:compileTestJavaNote: Some input files use or override a
>>> deprecated API.
>>> Note: Recompile with -Xlint:deprecation for details.
>>> Note: Some input files use unchecked or unsafe operations.
>>> Note: Recompile with -Xlint:unchecked for details.
>>>
>>> :geode-lucene:processTestResources
>>> :geode-lucene:testClasses
>>> :geode-lucene:checkMissedTests
>>> :geode-lucene:test
>>> :geode-lucene:check
>>> :geode-lucene:build
>>> :geode-lucene:distributedTest
>>> :geode-lucene:flakyTest
>>> :geode-lucene:integrationTest
>>> :geode-pulse:assemble
>>> :geode-pulse:compileTestJavaNote: <
>>> https://builds.apache.org/job/Geode-release/ws/geode-pulse/src/test/java/com/vmware/gemfire/tools/pulse/tests/PulseAbstractTest.java>
>>> uses or overrides a deprecated API.
>>> Note: Recompile with -Xlint:deprecation for details.
>>> Note: Some input files use unchecked or unsafe operations.
>>> Note: Recompile with -Xlint:unchecked for details.
>>>
>>> :geode-pulse:processTestResources
>>> :geode-pulse:testClasses
>>> :geode-pulse:checkMissedTests
>>> :geode-pulse:test
>>> :geode-pulse:check
>>> :geode-pulse:build
>>> :geode-pulse:distributedTest
>>> :geode-pulse:flakyTest
>>> :geode-pulse:integrationTest
>>> :geode-rebalancer:jar
>>> :geode-rebalancer:javadoc
>>> :geode-rebalancer:javadocJar
>>> :geode-rebalancer:sourcesJar
>>> :geode-rebalancer:signArchives SKIPPED
>>> :geode-rebalancer:assemble
>>> :geode-rebalancer:compileTestJava
>>> :geode-rebalancer:processTestResources UP-TO-DATE
>>> :geode-rebalancer:testClasses
>>> :geode-rebalancer:checkMissedTests
>>> :geode-rebalancer:test
>>> :geode-rebalancer:check
>>> :geode-rebalancer:build
>>> :geode-rebalancer:distributedTest
>>> :geode-rebalancer:flakyTest
>>> :geode-rebalancer:integrationTest
>>> :geode-wan:assemble
>>> :geode-wan:compileTestJavaNote: Some input files use or override a
>>> deprecated API.
>>> Note: Recompile with -Xlint:deprecation for details.
>>> Note: Some input files use unchecked or unsafe operations.
>>> Note: Recompile with -Xlint:unchecked for details.
>>>
>>> :geode-wan:processTestResources
>>> :geode-wan:testClasses
>>> :geode-wan:checkMissedTests
>>> :geode-wan:test
>>> :geode-wan:check
>>> :geode-wan:build
>>> :geode-wan:distributedTest
>>> :geode-wan:flakyTest
>>> :geode-wan:integrationTest
>>> :geode-web:assemble
>>> :geode-web:compileTestJavaNote: <
>>> https://builds.apache.org/job/Geode-release/ws/geode-web/src/test/java/com/gemstone/gemfire/management/internal/cli/commands/DataCommandsOverHttpDistributedTest.java>
>>> uses or overrides a deprecated API.
>>> Note: Recompile with -Xlint:deprecation for details.
>>> Note: Some input files use unchecked or unsafe operations.
>>> Note: Recompile with -Xlint:unchecked for details.
>>>
>>> :geode-web:processTestResources UP-TO-DATE
>>> :geode-web:testClasses
>>> :geode-web:checkMissedTests
>>> :geode-web:test
>>> :geode-web:check
>>> :geode-web:build
>>> :geode-web:distributedTest
>>> :geode-web:flakyTest
>>> :geode-web:integrationTest
>>> :geode-web-api:assemble
>>> :geode-web-api:compileTestJava UP-TO-DATE
>>> :geode-web-api:processTestResources UP-TO-DATE
>>> :geode-web-api:testClasses UP-TO-DATE
>>> :geode-web-api:checkMissedTests UP-TO-DATE
>>> :geode-web-api:test UP-TO-DATE
>>> :geode-web-api:check
>>> :geode-web-api:build
>>> :geode-web-api:distributedTest UP-TO-DATE
>>> :geode-web-api:flakyTest UP-TO-DATE
>>> :geode-web-api:integrationTest UP-TO-DATE
>>> :combineReports
>>> All test reports at <
>>> https://builds.apache.org/job/Geode-release/ws/build/reports/combined>
>>> :extensions/geode-modules:precheckin
>>> :extensions/geode-modules-assembly:precheckin
>>> :extensions/geode-modules-hibernate:precheckin
>>> :extensions/geode-modules-session:precheckin
>>> :extensions/geode-modules-session-internal:precheckin
>>> :extensions/geode-modules-tomcat7:precheckin
>>> :geode-assembly:precheckin
>>> :geode-common:precheckin
>>> :geode-cq:precheckin
>>> :geode-json:precheckin
>>> :geode-junit:precheckin
>>> :geode-lucene:precheckin
>>> :geode-pulse:precheckin
>>> :geode-rebalancer:precheckin
>>> :geode-wan:precheckin
>>> :geode-web:precheckin
>>> :geode-web-api:precheckin
>>>
>>> FAILURE: Build failed with an exception.
>>>
>>> * What went wrong:
>>> Execution failed for task ':geode-core:distributedTest'.
>>>
>>>> There were failing tests. See the report at: file://<
>>>> https://builds.apache.org/job/Geode-release/ws/geode-core/build/reports/distributedTest/index.html
>>>> >
>>>>
>>> * Try:
>>> Run with --stacktrace option to get the stack trace. Run with --info or
>>> --debug option to get more log output.
>>>
>>> BUILD FAILED
>>>
>>> Total time: 11 hrs 26 mins 8.356 secs
>>> Build step 'Invoke Gradle script' changed build result to FAILURE
>>> Build step 'Invoke Gradle script' marked build as failure
>>> Archiving artifacts
>>> Compressed 250.96 MB of artifacts by 13.1% relative to #7
>>> Recording test results
>>>
>>
>>
>

Reply via email to