[jira] [Commented] (GEODE-3276) CI failure: org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationWithRemoteRegionDestroy FAILED
[ https://issues.apache.org/jira/browse/GEODE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213551#comment-16213551 ] ASF subversion and git services commented on GEODE-3276: Commit 65f6c6c5515d2356e00fb736f205bdd024e7ea59 in geode's branch refs/heads/feature/GEODE-3245 from [~nnag] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=65f6c6c ] GEODE-1772: Removal of flaky tag. * The exception occured when in the tear down. * The test is very simple and there are a lot of other similar tests which are not flaky. * This test was a part of improvements made to the WAN test base. * Test tear down involving senders was also fixed as a part of GEODE-3276 > CI failure: > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > -- > > Key: GEODE-3276 > URL: https://issues.apache.org/jira/browse/GEODE-3276 > Project: Geode > Issue Type: Bug > Components: statistics, wan >Affects Versions: 1.2.0 >Reporter: Shelley Lynn Hughes-Godfrey >Assignee: nabarun > Fix For: 1.3.0 > > > {noformat} > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > Caused by: > java.lang.OutOfMemoryError: Java heap space > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3276) CI failure: org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationWithRemoteRegionDestroy FAILED
[ https://issues.apache.org/jira/browse/GEODE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213005#comment-16213005 ] ASF subversion and git services commented on GEODE-3276: Commit 65f6c6c5515d2356e00fb736f205bdd024e7ea59 in geode's branch refs/heads/develop from [~nnag] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=65f6c6c ] GEODE-1772: Removal of flaky tag. * The exception occured when in the tear down. * The test is very simple and there are a lot of other similar tests which are not flaky. * This test was a part of improvements made to the WAN test base. * Test tear down involving senders was also fixed as a part of GEODE-3276 > CI failure: > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > -- > > Key: GEODE-3276 > URL: https://issues.apache.org/jira/browse/GEODE-3276 > Project: Geode > Issue Type: Bug > Components: statistics, wan >Affects Versions: 1.2.0 >Reporter: Shelley Lynn Hughes-Godfrey >Assignee: nabarun > Fix For: 1.3.0 > > > {noformat} > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > Caused by: > java.lang.OutOfMemoryError: Java heap space > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3276) CI failure: org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationWithRemoteRegionDestroy FAILED
[ https://issues.apache.org/jira/browse/GEODE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139439#comment-16139439 ] ASF GitHub Bot commented on GEODE-3276: --- Github user asfgit closed the pull request at: https://github.com/apache/geode/pull/732 > CI failure: > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > -- > > Key: GEODE-3276 > URL: https://issues.apache.org/jira/browse/GEODE-3276 > Project: Geode > Issue Type: Bug > Components: statistics, wan >Affects Versions: 1.2.0 >Reporter: Shelley Lynn Hughes-Godfrey >Assignee: nabarun > > {noformat} > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > Caused by: > java.lang.OutOfMemoryError: Java heap space > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3276) CI failure: org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationWithRemoteRegionDestroy FAILED
[ https://issues.apache.org/jira/browse/GEODE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139438#comment-16139438 ] ASF subversion and git services commented on GEODE-3276: Commit 0daf9549a69a135ad1ab267be8e0749e00986dfa in geode's branch refs/heads/develop from [~nnag] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=0daf954 ] GEODE-3276: Managing race conditions while the senders are stopped * When a connection is initialized, a readAckThread may be alive from a previous incarnation. * This AckThread will be stuck on a read socket with no timeout as nothing was dispatched. * Also while it was stuck on the read, it will hold a connection lifecycle read lock * The initialize connection needs a connection life cycle write lock to start the connection but the read lock is held by the ack thread. * This results in a deadlock and eventually a hang. * Another situation is that we set the flag isStopped for the event processor before actually shutting down the diapatcher and ack thread. * So after the flag is set and before actually shutting down the dispatcher and ackThread, a gateway proxy stomper thread gets in between these two steps of execution. * The stomper thread checks the isStopped flag, which was set to true, and proceeds to destroy the connection pool. However the dispatcher and ackThread were still running. * This results in a out of heap memory exception while the ack thread is reading from the socket while connection pool was destroyed. * To solve this issue, the stomper thread checks if the event processor and dispatcher exists, if true then we close the input streams before destroying the connection pool. This closes #732 > CI failure: > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > -- > > Key: GEODE-3276 > URL: https://issues.apache.org/jira/browse/GEODE-3276 > Project: Geode > Issue Type: Bug > Components: statistics, wan >Affects Versions: 1.2.0 >Reporter: Shelley Lynn Hughes-Godfrey >Assignee: nabarun > > {noformat} > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > Caused by: > java.lang.OutOfMemoryError: Java heap space > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3276) CI failure: org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationWithRemoteRegionDestroy FAILED
[ https://issues.apache.org/jira/browse/GEODE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137573#comment-16137573 ] ASF GitHub Bot commented on GEODE-3276: --- Github user jhuynh1 commented on a diff in the pull request: https://github.com/apache/geode/pull/732#discussion_r134622619 --- Diff: geode-wan/src/main/java/org/apache/geode/internal/cache/wan/parallel/ParallelGatewaySenderImpl.java --- @@ -107,6 +107,9 @@ public void stop() { if (ev != null && !ev.isStopped()) { ev.stopProcessing(); } + if (ev != null && ev.getDispatcher() != null) { --- End diff -- Is it possible to pull this check and shutdown into a method? Looks like it's used a few times throughout the code > CI failure: > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > -- > > Key: GEODE-3276 > URL: https://issues.apache.org/jira/browse/GEODE-3276 > Project: Geode > Issue Type: Bug > Components: statistics, wan >Affects Versions: 1.2.0 >Reporter: Shelley Lynn Hughes-Godfrey >Assignee: nabarun > > {noformat} > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > java.lang.AssertionError: An exception occurred during asynchronous > invocation. > Caused by: > java.lang.OutOfMemoryError: Java heap space > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3276) CI failure: org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationWithRemoteRegionDestroy FAILED
[ https://issues.apache.org/jira/browse/GEODE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137473#comment-16137473 ] ASF GitHub Bot commented on GEODE-3276: --- GitHub user nabarunnag opened a pull request: https://github.com/apache/geode/pull/732 GEODE-3276: Managing race conditions while the senders are stopped * When a connection is initialized, a readAckThread may be alive from a previous incarnation. * This AckThread will be stuck on a read socket with no timeout as nothing was dispatched. * Also while it was stuck on the read, it will hold a connection lifecycle read lock * The initialize connection needs a connection life cycle write lock to start the connection but the read lock is held by the ack thread. * This results in a deadlock and eventually a hang. * Another situation is that we set the flag isStopped for the event processor before actually shutting down the diapatcher and ack thread. * So after the flag is set and before actually shutting down the dispatcher and ackThread, a gateway proxy stomper thread gets in between these two steps of execution. * The stomper thread checks the isStopped flag, which was set to true, and proceeds to destroy the connection pool. However the dispatcher and ackThread were still running. * This results in a out of heap memory exception while the ack thread is reading from the socket while connection pool was destroyed. * To solve this issue, the stomper thread checks if the event processor and dispatcher exists, if true then we close the input streams before destroying the connection pool. Thank you for submitting a contribution to Apache Geode. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Has your PR been rebased against the latest commit within the target branch (typically `develop`)? - [ ] Is your initial contribution a single, squashed commit? - [ ] Does `gradlew build` run cleanly? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. If you need help, please send an email to d...@geode.apache.org. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nabarunnag/incubator-geode feature/GEODE-3276 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/geode/pull/732.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #732 commit 9618f69d2620f5a3d3a6a8576631906a61b512f9 Author: nabarun Date: 2017-08-17T17:51:15Z GEODE-3276: Managing race conditions while the senders are stopped * When a connection is initialized, a readAckThread may be alive from a previous incarnation. * This AckThread will be stuck on a read socket with no timeout as nothing was dispatched. * Also while it was stuck on the read, it will hold a connection lifecycle read lock * The initialize connection needs a connection life cycle write lock to start the connection but the read lock is held by the ack thread. * This results in a deadlock and eventually a hang. * Another situation is that we set the flag isStopped for the event processor before actually shutting down the diapatcher and ack thread. * So after the flag is set and before actually shutting down the dispatcher and ackThread, a gateway proxy stomper thread gets in between these two steps of execution. * The stomper thread checks the isStopped flag, which was set to true, and proceeds to destroy the connection pool. However the dispatcher and ackThread were still running. * This results in a out of heap memory exception while the ack thread is reading from the socket while connection pool was destroyed. * To solve this issue, the stomper thread checks if the event processor and dispatcher exists, if true then we close the input streams before destroying the connection pool. > CI failure: > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > --
[jira] [Commented] (GEODE-3276) CI failure: org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationWithRemoteRegionDestroy FAILED
[ https://issues.apache.org/jira/browse/GEODE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109471#comment-16109471 ] Shelley Lynn Hughes-Godfrey commented on GEODE-3276: More logging from threads involved: {noformat} [vm4] [warn 2017/07/20 07:15:08.889 UTC tid=0x31d] Error closing connection to server 33373ac9e461:23276 [vm4] java.io.IOException: Invalid message type 23,362 while reading header [vm4] at org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndPayload(Message.java:714) [vm4] at org.apache.geode.internal.cache.tier.sockets.Message.read(Message.java:652) [vm4] at org.apache.geode.internal.cache.tier.sockets.Message.recv(Message.java:1110) [vm4] at org.apache.geode.cache.client.internal.AbstractOp.attemptReadResponse(AbstractOp.java:166) [vm4] at org.apache.geode.cache.client.internal.AbstractOp.attempt(AbstractOp.java:382) [vm4] at org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:275) [vm4] at org.apache.geode.cache.client.internal.CloseConnectionOp.execute(CloseConnectionOp.java:37) [vm4] at org.apache.geode.cache.client.internal.ConnectionImpl.close(ConnectionImpl.java:152) [vm4] at org.apache.geode.cache.client.internal.pooling.PooledConnection.internalClose(PooledConnection.java:98) [vm4] at org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl$ConnectionMap.close(ConnectionManagerImpl.java:1125) [vm4] at org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.close(ConnectionManagerImpl.java:658) [vm4] at org.apache.geode.cache.client.internal.PoolImpl.basicDestroy(PoolImpl.java:608) [vm4] at org.apache.geode.cache.client.internal.PoolImpl.destroy(PoolImpl.java:554) [vm4] at org.apache.geode.cache.client.internal.PoolImpl.destroy(PoolImpl.java:484) [vm4] at org.apache.geode.internal.cache.wan.AbstractGatewaySender$1.run(AbstractGatewaySender.java:613) [vm4] at java.lang.Thread.run(Thread.java:748) [vm4] [warn 2017/07/20 07:15:08.890 UTC tid=0x31e] these members failed to respond to the view change: [172.17.0.4(534):32775, 172.17.0.4(474):32774, 172.17.0.4(594):32776] [vm4] [fatal 2017/07/20 07:15:08.907 UTC tid=0x319] Uncaught exception in thread Thread[AckReaderThread for : Event Processor for GatewaySender_ln_0,5,Event Processor for GatewaySender_ln] [vm4] java.lang.OutOfMemoryError: Java heap space [vm4] at org.apache.geode.internal.cache.tier.sockets.Message.readPayloadFields(Message.java:876) [vm4] at org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndPayload(Message.java:804) [vm4] at org.apache.geode.internal.cache.tier.sockets.Message.read(Message.java:652) [vm4] at org.apache.geode.internal.cache.tier.sockets.Message.recv(Message.java:1110) [vm4] at org.apache.geode.cache.client.internal.GatewaySenderBatchOp$GatewaySenderGFEBatchOpImpl.attemptReadResponse(GatewaySenderBatchOp.java:205) [vm4] at org.apache.geode.cache.client.internal.GatewaySenderBatchOp$GatewaySenderGFEBatchOpImpl.attemptRead(GatewaySenderBatchOp.java:167) [vm4] at org.apache.geode.cache.client.internal.GatewaySenderBatchOp$GatewaySenderGFEBatchOpImpl.attempt(GatewaySenderBatchOp.java:146) [vm4] at org.apache.geode.cache.client.internal.ConnectionImpl.execute(ConnectionImpl.java:260) [vm4] at org.apache.geode.cache.client.internal.pooling.PooledConnection.execute(PooledConnection.java:332) [vm4] at org.apache.geode.cache.client.internal.OpExecutorImpl.executeWithPossibleReAuthentication(OpExecutorImpl.java:900) [vm4] at org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:599) [vm4] at org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:819) [vm4] at org.apache.geode.cache.client.internal.GatewaySenderBatchOp.executeOn(GatewaySenderBatchOp.java:74) [vm4] at org.apache.geode.cache.client.internal.SenderProxy.receiveAckFromReceiver(SenderProxy.java:39) [vm4] at org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher.readAcknowledgement(GatewaySenderEventRemoteDispatcher.java:95) [vm4] at org.apache.geode.internal.cache.wan.GatewaySenderEventRemoteDispatcher$AckReaderThread.run(GatewaySenderEventRemoteDispatcher.java:606) {noformat} > CI failure: > org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > > testParallelPropagationWithRemoteRegionDestroy FAILED > -- > > Key: GEODE-3276 > URL: https://issues.apache.org/jira/browse/GEODE-3276 > Project: Geode > Issue Type: Bug > Components: statistics, wan >Affects Versions: 1.2.0 >Reporter: Shelley Lynn Hughes-Godfrey > > {noformat} > org.apache.geode.internal.cache.wan.par
[jira] [Commented] (GEODE-3276) CI failure: org.apache.geode.internal.cache.wan.parallel.ParallelWANStatsDUnitTest > testParallelPropagationWithRemoteRegionDestroy FAILED
[ https://issues.apache.org/jira/browse/GEODE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109463#comment-16109463 ] Shelley Lynn Hughes-Godfrey commented on GEODE-3276: Full stack trace {noformat} java.lang.AssertionError: An exception occurred during asynchronous invocation. at org.apache.geode.test.dunit.AsyncInvocation.checkException(AsyncInvocation.java:148) at org.apache.geode.internal.cache.wan.WANTestBase.preTearDown(WANTestBase.java:3731) at org.apache.geode.test.dunit.internal.JUnit4DistributedTestCase.tearDown(JUnit4DistributedTestCase.java:508) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:377) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54) at org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.geode.internal.cache.tier.sockets.Message.readPayloadFields(Message.java:876) at org.apache.geode.in