There is a 3rd option for #1: c) apply our custom JUnit 4 RetryRule to any test that uses random ports so that the test would be reattempted if it fails due to a BindException
We could modify RetryRule to only reattempt a failed test if the output from any of the DUnit VMs contained some specified string such as "BindException" -- this would avoid reattempts for real failures. -Kirk On Tue, Jul 19, 2016 at 6:13 PM, Kirk Lund <kl...@pivotal.io> wrote: > 1) GemfireDataCommandsDUnitTest failed because a TCP port that was > determined to be available was no longer available when the code finally > attempted to use it, resulting in a "java.net.BindException: Address > already in use" > > Every test that uses getRandomAvailableTCPPorts is prone to hit this > occasionally. We would have to either: > a) categorize all such tests as Flaky or > b) change the Geode APIs (gemfire.properties) to use "0" to mean that the > product will let the system pick up an ephemeral port during bind instead > of interpreting it as disabling the service that "0" is being specified for > (this would break backwards compatibility for any User using "0" to disable > anything) > > final int[] ports = > AvailablePortHelper.getRandomAvailableTCPPorts(2); > > jmxPort = ports[0]; > httpPort = ports[1]; > > localProps.setProperty(JMX_MANAGER, "true"); > localProps.setProperty(JMX_MANAGER_START, "true"); > localProps.setProperty(JMX_MANAGER_BIND_ADDRESS, > String.valueOf(jmxHost)); > localProps.setProperty(JMX_MANAGER_PORT, String.valueOf(jmxPort)); > localProps.setProperty(HTTP_SERVICE_PORT, String.valueOf(httpPort)); > > getSystem(localProps); > verifyManagementServiceStarted(getCache()); > > 2) ConsoleDistributionManagerDUnitTest failed because the > InternalDistributedSystem connection shut down for an unknown reason during > the test (just as Bruce already mentioned) > > ConsoleDistributionManagerDUnitTest is testing the old deprecated Admin > API. We could create a new category such as "DeprecatedTest" to temporarily > exclude such tests from the standard test targets. Ultimately this test and > the Admin API should be removed which requires some investment of time. > Unfortunately, the team currently working on GEODE-17 Integrated Security > doesn't have any time (yet) to dedicate to removal the Admin API and its > tests. > > I don't think release/1.0.0-incubating.M3 is any less stable than develop. > I believe these tests could potentially hit the same failures on develop, > though I really don't know what's happening in > ConsoleDistributionManagerDUnitTest. > > Is the system that's running the build for release/1.0.0-incubating.M3 > different in any way from the infrastructure running the nightly build for > Geode? > > -Kirk > > > On Tue, Jul 19, 2016 at 10:49 AM, Bruce Schuchardt <bschucha...@pivotal.io > > wrote: > >> The ConsoleDistributionManagerDUnitTest failure isn't a PDX problem. The >> admin distributed system is being shut down for some reason before the test >> attempts to send an admin request message. Build-artifacts don't have any >> useful information on why that is happening. >> >> Unfortunately the test passes repeatedly when run on its own. >> >> >> >> Le 7/17/2016 à 8:56 PM, Apache Jenkins Server a écrit : >> >>> See <https://builds.apache.org/job/Geode-release/14/> >>> >>> ------------------------------------------ >>> [...truncated 697 lines...] >>> java.lang.AssertionError: Suspicious strings were written to the >>> log during this run. >>> Fix the strings or use IgnoredException.addIgnoredException to >>> ignore. >>> >>> ----------------------------------------------------------------------- >>> Found suspect string in log4j at line 257 >>> >>> [error 2016/07/17 21:46:41.931 UTC <RMI TCP >>> Connection(1)-67.195.81.144> tid=0x1b] Jmx manager could not be started >>> because java.rmi.server.ExportException: Port already in use: 24744; nested >>> exception is: >>> java.net.BindException: Failed to create server socket on >>> asf900.gq1.ygridcore.net/67.195.81.144[24,744] >>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D> >>> com.gemstone.gemfire.management.ManagementException: >>> java.rmi.server.ExportException: Port already in use: 24744; nested >>> exception is: >>> java.net.BindException: Failed to create server socket on >>> asf900.gq1.ygridcore.net/67.195.81.144[24,744] >>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D> >>> at >>> com.gemstone.gemfire.management.internal.ManagementAgent.startAgent(ManagementAgent.java:147) >>> at >>> com.gemstone.gemfire.management.internal.SystemManagementService.startManager(SystemManagementService.java:479) >>> at >>> com.gemstone.gemfire.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:197) >>> at >>> com.gemstone.gemfire.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:119) >>> at >>> com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2085) >>> at >>> com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:478) >>> at >>> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1054) >>> at >>> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:707) >>> at >>> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:695) >>> at >>> com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:181) >>> at >>> com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:172) >>> at >>> com.gemstone.gemfire.test.dunit.cache.internal.JUnit4CacheTestCase.createCache(JUnit4CacheTestCase.java:120) >>> at >>> com.gemstone.gemfire.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:263) >>> at >>> com.gemstone.gemfire.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:242) >>> at >>> com.gemstone.gemfire.test.dunit.cache.internal.JUnit4CacheTestCase.getCache(JUnit4CacheTestCase.java:234) >>> at >>> com.gemstone.gemfire.management.internal.cli.commands.CliCommandTestBase.lambda$setUpJMXManagerOnVM$d0384528$1(CliCommandTestBase.java:154) >>> at >>> com.gemstone.gemfire.test.dunit.NamedCallable.call(NamedCallable.java:33) >>> at sun.reflect.GeneratedMethodAccessor338.invoke(Unknown Source) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at hydra.MethExecutor.executeObject(MethExecutor.java:268) >>> at >>> com.gemstone.gemfire.test.dunit.standalone.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:82) >>> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at >>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323) >>> at sun.rmi.transport.Transport$1.run(Transport.java:200) >>> at sun.rmi.transport.Transport$1.run(Transport.java:197) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at sun.rmi.transport.Transport.serviceCall(Transport.java:196) >>> at >>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) >>> at >>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) >>> at >>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at >>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.rmi.server.ExportException: Port already in use: >>> 24744; nested exception is: >>> java.net.BindException: Failed to create server socket on >>> asf900.gq1.ygridcore.net/67.195.81.144[24,744] >>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D> >>> at >>> sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:341) >>> at >>> sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:249) >>> at >>> sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) >>> at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) >>> at >>> sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208) >>> at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:152) >>> at sun.rmi.registry.RegistryImpl.<init>(RegistryImpl.java:112) >>> at >>> java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:239) >>> at >>> com.gemstone.gemfire.management.internal.ManagementAgent.configureAndStart(ManagementAgent.java:389) >>> at >>> com.gemstone.gemfire.management.internal.ManagementAgent.startAgent(ManagementAgent.java:145) >>> ... 37 more >>> Caused by: java.net.BindException: Failed to create server socket >>> on asf900.gq1.ygridcore.net/67.195.81.144[24,744] >>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D> >>> at >>> com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:814) >>> at >>> com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:768) >>> at >>> com.gemstone.gemfire.management.internal.ManagementAgent$GemFireRMIServerSocketFactory.createServerSocket(ManagementAgent.java:545) >>> at >>> sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666) >>> at >>> sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:330) >>> ... 46 more >>> Caused by: java.net.BindException: Address already in use >>> at java.net.PlainSocketImpl.socketBind(Native Method) >>> at java.net >>> .AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) >>> at java.net.ServerSocket.bind(ServerSocket.java:375) >>> at >>> com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:811) >>> ... 50 more >>> >>> >>> ----------------------------------------------------------------------- >>> Found suspect string in log4j at line 326 >>> >>> [error 2016/07/17 21:46:41.938 UTC <RMI TCP >>> Connection(1)-67.195.81.144> tid=0x1b] >>> com.gemstone.gemfire.management.ManagementException: >>> java.rmi.server.ExportException: Port already in use: 24744; nested >>> exception is: >>> java.net.BindException: Failed to create server socket on >>> asf900.gq1.ygridcore.net/67.195.81.144[24,744] >>> <http://asf900.gq1.ygridcore.net/67.195.81.144%5B24,744%5D> >>> >>> 7378 tests completed, 2 failed, 555 skipped >>> :geode-core:distributedTest FAILED >>> :geode-core:flakyTest >>> :geode-core:integrationTest >>> :geode-cq:assemble >>> :geode-cq:compileTestJavaNote: Some input files use or override a >>> deprecated API. >>> Note: Recompile with -Xlint:deprecation for details. >>> Note: Some input files use unchecked or unsafe operations. >>> Note: Recompile with -Xlint:unchecked for details. >>> >>> :geode-cq:processTestResources >>> :geode-cq:testClasses >>> :geode-cq:checkMissedTests >>> :geode-cq:test >>> :geode-cq:check >>> :geode-cq:build >>> :geode-cq:distributedTest >>> :geode-cq:flakyTest >>> :geode-cq:integrationTest >>> :geode-json:assemble >>> :geode-json:compileTestJava UP-TO-DATE >>> :geode-json:processTestResources UP-TO-DATE >>> :geode-json:testClasses UP-TO-DATE >>> :geode-json:checkMissedTests UP-TO-DATE >>> :geode-json:test UP-TO-DATE >>> :geode-json:check >>> :geode-json:build >>> :geode-json:distributedTest UP-TO-DATE >>> :geode-json:flakyTest UP-TO-DATE >>> :geode-json:integrationTest UP-TO-DATE >>> :geode-junit:javadoc >>> :geode-junit:javadocJar >>> :geode-junit:sourcesJar >>> :geode-junit:signArchives SKIPPED >>> :geode-junit:assemble >>> :geode-junit:compileTestJava >>> :geode-junit:processTestResources UP-TO-DATE >>> :geode-junit:testClasses >>> :geode-junit:checkMissedTests >>> :geode-junit:test >>> :geode-junit:check >>> :geode-junit:build >>> :geode-junit:distributedTest >>> :geode-junit:flakyTest >>> :geode-junit:integrationTest >>> :geode-lucene:assemble >>> :geode-lucene:compileTestJavaNote: Some input files use or override a >>> deprecated API. >>> Note: Recompile with -Xlint:deprecation for details. >>> Note: Some input files use unchecked or unsafe operations. >>> Note: Recompile with -Xlint:unchecked for details. >>> >>> :geode-lucene:processTestResources >>> :geode-lucene:testClasses >>> :geode-lucene:checkMissedTests >>> :geode-lucene:test >>> :geode-lucene:check >>> :geode-lucene:build >>> :geode-lucene:distributedTest >>> :geode-lucene:flakyTest >>> :geode-lucene:integrationTest >>> :geode-pulse:assemble >>> :geode-pulse:compileTestJavaNote: < >>> https://builds.apache.org/job/Geode-release/ws/geode-pulse/src/test/java/com/vmware/gemfire/tools/pulse/tests/PulseAbstractTest.java> >>> uses or overrides a deprecated API. >>> Note: Recompile with -Xlint:deprecation for details. >>> Note: Some input files use unchecked or unsafe operations. >>> Note: Recompile with -Xlint:unchecked for details. >>> >>> :geode-pulse:processTestResources >>> :geode-pulse:testClasses >>> :geode-pulse:checkMissedTests >>> :geode-pulse:test >>> :geode-pulse:check >>> :geode-pulse:build >>> :geode-pulse:distributedTest >>> :geode-pulse:flakyTest >>> :geode-pulse:integrationTest >>> :geode-rebalancer:jar >>> :geode-rebalancer:javadoc >>> :geode-rebalancer:javadocJar >>> :geode-rebalancer:sourcesJar >>> :geode-rebalancer:signArchives SKIPPED >>> :geode-rebalancer:assemble >>> :geode-rebalancer:compileTestJava >>> :geode-rebalancer:processTestResources UP-TO-DATE >>> :geode-rebalancer:testClasses >>> :geode-rebalancer:checkMissedTests >>> :geode-rebalancer:test >>> :geode-rebalancer:check >>> :geode-rebalancer:build >>> :geode-rebalancer:distributedTest >>> :geode-rebalancer:flakyTest >>> :geode-rebalancer:integrationTest >>> :geode-wan:assemble >>> :geode-wan:compileTestJavaNote: Some input files use or override a >>> deprecated API. >>> Note: Recompile with -Xlint:deprecation for details. >>> Note: Some input files use unchecked or unsafe operations. >>> Note: Recompile with -Xlint:unchecked for details. >>> >>> :geode-wan:processTestResources >>> :geode-wan:testClasses >>> :geode-wan:checkMissedTests >>> :geode-wan:test >>> :geode-wan:check >>> :geode-wan:build >>> :geode-wan:distributedTest >>> :geode-wan:flakyTest >>> :geode-wan:integrationTest >>> :geode-web:assemble >>> :geode-web:compileTestJavaNote: < >>> https://builds.apache.org/job/Geode-release/ws/geode-web/src/test/java/com/gemstone/gemfire/management/internal/cli/commands/DataCommandsOverHttpDistributedTest.java> >>> uses or overrides a deprecated API. >>> Note: Recompile with -Xlint:deprecation for details. >>> Note: Some input files use unchecked or unsafe operations. >>> Note: Recompile with -Xlint:unchecked for details. >>> >>> :geode-web:processTestResources UP-TO-DATE >>> :geode-web:testClasses >>> :geode-web:checkMissedTests >>> :geode-web:test >>> :geode-web:check >>> :geode-web:build >>> :geode-web:distributedTest >>> :geode-web:flakyTest >>> :geode-web:integrationTest >>> :geode-web-api:assemble >>> :geode-web-api:compileTestJava UP-TO-DATE >>> :geode-web-api:processTestResources UP-TO-DATE >>> :geode-web-api:testClasses UP-TO-DATE >>> :geode-web-api:checkMissedTests UP-TO-DATE >>> :geode-web-api:test UP-TO-DATE >>> :geode-web-api:check >>> :geode-web-api:build >>> :geode-web-api:distributedTest UP-TO-DATE >>> :geode-web-api:flakyTest UP-TO-DATE >>> :geode-web-api:integrationTest UP-TO-DATE >>> :combineReports >>> All test reports at < >>> https://builds.apache.org/job/Geode-release/ws/build/reports/combined> >>> :extensions/geode-modules:precheckin >>> :extensions/geode-modules-assembly:precheckin >>> :extensions/geode-modules-hibernate:precheckin >>> :extensions/geode-modules-session:precheckin >>> :extensions/geode-modules-session-internal:precheckin >>> :extensions/geode-modules-tomcat7:precheckin >>> :geode-assembly:precheckin >>> :geode-common:precheckin >>> :geode-cq:precheckin >>> :geode-json:precheckin >>> :geode-junit:precheckin >>> :geode-lucene:precheckin >>> :geode-pulse:precheckin >>> :geode-rebalancer:precheckin >>> :geode-wan:precheckin >>> :geode-web:precheckin >>> :geode-web-api:precheckin >>> >>> FAILURE: Build failed with an exception. >>> >>> * What went wrong: >>> Execution failed for task ':geode-core:distributedTest'. >>> >>>> There were failing tests. See the report at: file://< >>>> https://builds.apache.org/job/Geode-release/ws/geode-core/build/reports/distributedTest/index.html >>>> > >>>> >>> * Try: >>> Run with --stacktrace option to get the stack trace. Run with --info or >>> --debug option to get more log output. >>> >>> BUILD FAILED >>> >>> Total time: 11 hrs 26 mins 8.356 secs >>> Build step 'Invoke Gradle script' changed build result to FAILURE >>> Build step 'Invoke Gradle script' marked build as failure >>> Archiving artifacts >>> Compressed 250.96 MB of artifacts by 13.1% relative to #7 >>> Recording test results >>> >> >> >