[ 
https://issues.apache.org/jira/browse/GEODE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Blum updated GEODE-7763:
-----------------------------
    Description: 
This problem was first observed in Apache Geode 1.11.0.  The problem was not 
present in Apache Geode 1.9.2.  This problem is an issue for Apache Geode 1.10 
as well!

After upgrading _Spring Session for Apache Geode_ (SSDG) 2.3 to _Spring Data 
for Apache Geode_ (SDG) Neumann/2.3, which is based on Apache Geode 1.11, this 
problem with SSDG's test suite started occurring.

 _Spring Session for Apache Geode_ (SSDG) 2.2, which is based on _Spring Data 
for Apache Geode_ (SDG) Moore/2.2, pulls in Apache Geode 1.9.2.  This problem 
did not occur in SSDG 2.2.

Out of curiosity, I wondered whether this problem affects (i.e. was actually 
introduced in) Apache Geode 1.10.0.  So, I configured SSDG 2.3 to pull in SDG 
Moore/2.2 but run with Apache Geode 1.10. The problem occurred with Apache 
Geode 1.10 as well!

The SSDG test class in question, affected by Geode's deficiencies, is the 
[MultiThreadedHighlyConcurrentClientServerSessionOperationsIntegrationTests|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java].

The test class was modeled after a customer UC, who were using Spring Session 
and Apache Geode/Pivotal GemFire as the HTTP Session state management provider, 
therefore it simulates their highly concurrent environment.

The test class has 2 primary parameters: [Thread 
Count|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90]
 and the [Workload 
Size|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L91].

The "_Workload Size_" should not be confused with the "_Payload Size_" of the 
individual objects passed to the Geode data access operations (i.e. {{gets}}, 
{{puts}}, {{removes}}).  The "_Workload Size_" merely determines the number of 
{{get}}, {{put}} or {{remove}} operations performed on the (Session) Region 
over the duration of the test run.  Certain operations are "favored" over 
others, therefore the number of {{gets}}, {{puts}} and {{removes}} is weighted.

The "_Payload_" in this case is a (HTTP) {{Session}} object and the "size" is 
directly proportional to the number of Session attributes stored in the Session.

As you can see from the [test class 
configuration|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90-L91]
 in *SSDG* {{2.2}}, the *Thread Count* was set to *180* and the *Workload Size* 
(or number of Region operations) was set to *10,000*.

This had to be significantly adjusted in SSDG 2.3 using Apache Geode 1.11 (and, 
as it turns out, Apache Geode 1.10 as well), as can be seen in the {{2.3.0.M1}} 
release bits source, 
[here|https://github.com/spring-projects/spring-session-data-geode/blob/2.3.0.M1/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L94-L95].

It turns out different combinations of the Thread Count (number of workers, or 
"concurrent Sessions") and Workload Size ultimately determine whether this test 
class passes or not.

In other words, if I increase the Thread Count, then the Workload Size must 
decrease, otherwise the test fails!  If I increase the Workload Size, then the 
Thread Count must decrease, otherwise again the test fails!

I tried with different combinations of Thread Count and Workload Size until the 
test passed.  More often than not 180 Threads with 3000 Regions operations 
worked, but was right on the cusp of failing, therefore, I settled on 180 
Threads (which nearly matches the customers environment of 200 clients) and 
2000 concurrent Region operations.

The point of the test class is to assert the state of the Session is consistent 
at the end of the test run.

However, before this test can even finish, the client, as in the 
{{ClientCache}} instance starts failing with Exceptions, specifically:

{code:java}
java.lang.RuntimeException: Session Access Task Failed

        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:298)
        at 
java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:210)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.IntPipeline.reduce(IntPipeline.java:456)
        at java.util.stream.IntPipeline.sum(IntPipeline.java:414)
        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.runSessionWorkload(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:313)
        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.concurrentSessionAccessIsCorrect(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:324)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
        at 
org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
        at 
org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
        at 
org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
        at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
        at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at 
org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
        at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
        at 
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
        at 
com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
        at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: java.util.concurrent.ExecutionException: 
org.springframework.dao.DataAccessResourceFailureException: Pool unexpected 
socket timed out on client connection=Pooled Connection to localhost:60964: 
Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts; 
nested exception is org.apache.geode.cache.client.ServerConnectivityException: 
Pool unexpected socket timed out on client connection=Pooled Connection to 
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect 
after 1 attempts
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:295)
        ... 43 more
Caused by: org.springframework.dao.DataAccessResourceFailureException: Pool 
unexpected socket timed out on client connection=Pooled Connection to 
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect 
after 1 attempts; nested exception is 
org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected 
socket timed out on client connection=Pooled Connection to localhost:60964: 
Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
        at 
org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:235)
        at 
org.springframework.data.gemfire.GemfireAccessor.convertGemFireAccessException(GemfireAccessor.java:93)
        at 
org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:200)
        at 
org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.doSave(GemFireOperationsSessionRepository.java:226)
        at 
org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:186)
        at 
org.springframework.session.data.gemfire.AbstractGemFireIntegrationTests.save(AbstractGemFireIntegrationTests.java:409)
        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.lambda$newAddSessionAttributeTask$2(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:216)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.geode.cache.client.ServerConnectivityException: Pool 
unexpected socket timed out on client connection=Pooled Connection to 
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect 
after 1 attempts
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659)
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501)
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:153)
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
        at 
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
        at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:89)
        at 
org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:159)
        at 
org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3035)
        at 
org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3152)
        at 
org.apache.geode.internal.cache.ProxyRegionMap.basicPut(ProxyRegionMap.java:238)
        at 
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5580)
        at 
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
        at 
org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5036)
        at 
org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1635)
        at 
org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1622)
        at 
org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:442)
        at 
org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:197)
        ... 8 more
{code}


Attached to this issue are log output files from each of my runs using Apache 
Geode 1.10 and 1.11.

The log files serve 2 purposes: 1) to show the version of Apache Geode used and 
2) the errors occurs on the client and server during the run.

Any lines in the log output prefixed with "{{[FORK]}}" originates from the 
cache server.  The other lines come from the client.  There is only a single 
client and server in this test case.

It takes a bit of initial time during the run for the failures to start 
occurring, which is why this seems like a resource utilization problem.

After first, I suspected issues with the client Pool configuration, or 
{{CacheServer}} configuration, adjusting timeouts and so on.  I even suspected 
memory being an issue for the client and server processes, upping each to 2 
GB+.  However, it turns out none of the changes made a bit of difference.  And 
the truth of the matter is, this (existing) configuration worked seamlessly 
until I upgraded to Apache Geode 1.10+ (specifically, 1.11).

For good measure I also include the test run with Apache Geode 1.9.2 for 
comparison.



  was:
This problem was first observed in Apache Geode 1.11.0.  The problem was not 
present in Apache Geode 1.9.2.  This problem is an issue for Apache Geode 1.10 
as well!

After upgrading _Spring Session for Apache Geode_ (SSDG) 2.3 to _Spring Data 
for Apache Geode_ (SDG) Neumann/2.3, which is based on Apache Geode 1.11, this 
problem with SSDG's test suite started occurring.

 _Spring Session for Apache Geode_ (SSDG) 2.2, which is based on _Spring Data 
for Apache Geode_ (SDG) Moore/2.2, pulls in Apache Geode 1.9.2.  This problem 
did not occur in SSDG 2.2.

Out of curiosity, I wondered whether this problem affects (i.e. was actually 
introduced in) Apache Geode 1.10.0.  So, I configured SSDG 2.3 to pull in SDG 
Moore/2.2 but run with Apache Geode 1.10. The problem occurred with Apache 
Geode 1.10 as well!

The SSDG test class in question, affected by Geode's deficiencies, is the 
[MultiThreadedHighlyConcurrentClientServerSessionOperationsIntegrationTests|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java].

The test class was modeled after a customer UC, who were using Spring Session 
and Apache Geode/Pivotal GemFire as the HTTP Session state management provider, 
therefore it simulates their highly concurrent environment.

The test class has 2 primary parameters: [Thread 
Count|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90]
 and the [Workload 
Size|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L91].

The "_Workload Size_" should not be confused with the "_Payload Size_" of the 
individual objects passed to the Geode data access operations (i.e. {{gets}}, 
{{puts}}, {{removes}}).  The "_Workload Size_" merely determines the number of 
{{get}}, {{put}} or {{remove}} operations performed on the (Session) Region 
over the duration of the test run.  Certain operations are "favored" over 
others, therefore the number of {{gets}}, {{puts}} and {{removes}} is weighted.

The "_Payload_" in this case is a (HTTP) {{Session}} object and the "size" is 
directly proportional to the number of Session attributes stored in the Session.

As you can see from the [test class 
configuration|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90-L91]
 in *SSDG* {{2.2}}, the *Thread Count* was set to *180* and the *Workload Size* 
(or number of Region operations) was set to *10,000*.

This had to be significantly adjusted in SSDG 2.3 using Apache Geode 1.11 (and, 
as it turns out, Apache Geode 1.10 as well), as can be seen in the {{2.3.0.M1}} 
release bits source, 
[here|https://github.com/spring-projects/spring-session-data-geode/blob/2.3.0.M1/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L94-L95].

It turns out different combinations of the Thread Count (number of workers, or 
"concurrent Sessions") and Workload Size ultimately determine whether this test 
class passes or not.

In other words, if I increase the Thread Count, then the Workload Size must 
decrease, otherwise the test fails!  If I increase the Workload Size, then the 
Thread Count must decrease, otherwise again the test fails!

I tried with different combinations of Thread Count and Workload Size until the 
test passed.  More often than not 180 Threads with 3000 Regions operations 
worked, but was right on the cusp of failing, therefore, I settled on 180 
Threads (which nearly matches the customers environment of 200 clients) and 
2000 concurrent Region operations.

The point of the test class is to assert the state of the Session is consistent 
at the end of the test run.

However, before this test can even finish, the client, as in the 
{{ClientCache}} instance starts failing with Exceptions, specifically:

{code:java}
java.lang.RuntimeException: Session Access Task Failed

        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:298)
        at 
java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:210)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.IntPipeline.reduce(IntPipeline.java:456)
        at java.util.stream.IntPipeline.sum(IntPipeline.java:414)
        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.runSessionWorkload(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:313)
        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.concurrentSessionAccessIsCorrect(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:324)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
        at 
org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
        at 
org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
        at 
org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
        at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
        at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at 
org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
        at 
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
        at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
        at 
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
        at 
com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
        at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: java.util.concurrent.ExecutionException: 
org.springframework.dao.DataAccessResourceFailureException: Pool unexpected 
socket timed out on client connection=Pooled Connection to localhost:60964: 
Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts; 
nested exception is org.apache.geode.cache.client.ServerConnectivityException: 
Pool unexpected socket timed out on client connection=Pooled Connection to 
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect 
after 1 attempts
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:295)
        ... 43 more
Caused by: org.springframework.dao.DataAccessResourceFailureException: Pool 
unexpected socket timed out on client connection=Pooled Connection to 
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect 
after 1 attempts; nested exception is 
org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected 
socket timed out on client connection=Pooled Connection to localhost:60964: 
Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
        at 
org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:235)
        at 
org.springframework.data.gemfire.GemfireAccessor.convertGemFireAccessException(GemfireAccessor.java:93)
        at 
org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:200)
        at 
org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.doSave(GemFireOperationsSessionRepository.java:226)
        at 
org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:186)
        at 
org.springframework.session.data.gemfire.AbstractGemFireIntegrationTests.save(AbstractGemFireIntegrationTests.java:409)
        at 
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.lambda$newAddSessionAttributeTask$2(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:216)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.geode.cache.client.ServerConnectivityException: Pool 
unexpected socket timed out on client connection=Pooled Connection to 
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect 
after 1 attempts
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659)
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501)
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:153)
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
        at 
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
        at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:89)
        at 
org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:159)
        at 
org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3035)
        at 
org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3152)
        at 
org.apache.geode.internal.cache.ProxyRegionMap.basicPut(ProxyRegionMap.java:238)
        at 
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5580)
        at 
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
        at 
org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5036)
        at 
org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1635)
        at 
org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1622)
        at 
org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:442)
        at 
org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:197)
        ... 8 more
{code}


Attached to this issue are log output files from each of my runs using Apache 
Geode 1.10 and 1.11.

The log files serve 2 purposes: 1) to show the version of Apache Geode used and 
2) the errors occurs on the client and server during the run.

Any lines in the log output prefixed with "{{[FORK]}}" originates from the 
cache server.  The other lines come from the client.  There is only a single 
client and server in this test case.

It takes a bit of initial time during the run for the failures to start 
occurring, which is why this seems like a resource utilization problem.

After first, I suspected issues with the client Pool configuration, or 
{{CacheServer}} configuration, adjusting timeouts and so on.  I even suspected 
memory being an issue for the client and server processes, upping each to 2 
GB+.  However, it turns out none of the changes made a bit of difference.  And 
the truth of the matter is, this (existing) configuration worked seamlessly 
until I upgraded to Apache Geode 1.10+ (specifically, 1.11).

 [^apache-geode-1.11-client-server-interaction-output.txt]  
[^apache-geode-1.11-client-server-startup-output.txt] 





> Apache Geode 1.11 severely and negatively impacts performance and resource 
> utilization
> --------------------------------------------------------------------------------------
>
>                 Key: GEODE-7763
>                 URL: https://issues.apache.org/jira/browse/GEODE-7763
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server
>            Reporter: John Blum
>            Priority: Major
>         Attachments: apache-geode-1.10-client-server-interaction-output.txt, 
> apache-geode-1.10-client-server-startup-output.txt, 
> apache-geode-1.11-client-server-interaction-output.txt, 
> apache-geode-1.11-client-server-startup-output.txt
>
>
> This problem was first observed in Apache Geode 1.11.0.  The problem was not 
> present in Apache Geode 1.9.2.  This problem is an issue for Apache Geode 
> 1.10 as well!
> After upgrading _Spring Session for Apache Geode_ (SSDG) 2.3 to _Spring Data 
> for Apache Geode_ (SDG) Neumann/2.3, which is based on Apache Geode 1.11, 
> this problem with SSDG's test suite started occurring.
>  _Spring Session for Apache Geode_ (SSDG) 2.2, which is based on _Spring Data 
> for Apache Geode_ (SDG) Moore/2.2, pulls in Apache Geode 1.9.2.  This problem 
> did not occur in SSDG 2.2.
> Out of curiosity, I wondered whether this problem affects (i.e. was actually 
> introduced in) Apache Geode 1.10.0.  So, I configured SSDG 2.3 to pull in SDG 
> Moore/2.2 but run with Apache Geode 1.10. The problem occurred with Apache 
> Geode 1.10 as well!
> The SSDG test class in question, affected by Geode's deficiencies, is the 
> [MultiThreadedHighlyConcurrentClientServerSessionOperationsIntegrationTests|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java].
> The test class was modeled after a customer UC, who were using Spring Session 
> and Apache Geode/Pivotal GemFire as the HTTP Session state management 
> provider, therefore it simulates their highly concurrent environment.
> The test class has 2 primary parameters: [Thread 
> Count|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90]
>  and the [Workload 
> Size|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L91].
> The "_Workload Size_" should not be confused with the "_Payload Size_" of the 
> individual objects passed to the Geode data access operations (i.e. {{gets}}, 
> {{puts}}, {{removes}}).  The "_Workload Size_" merely determines the number 
> of {{get}}, {{put}} or {{remove}} operations performed on the (Session) 
> Region over the duration of the test run.  Certain operations are "favored" 
> over others, therefore the number of {{gets}}, {{puts}} and {{removes}} is 
> weighted.
> The "_Payload_" in this case is a (HTTP) {{Session}} object and the "size" is 
> directly proportional to the number of Session attributes stored in the 
> Session.
> As you can see from the [test class 
> configuration|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90-L91]
>  in *SSDG* {{2.2}}, the *Thread Count* was set to *180* and the *Workload 
> Size* (or number of Region operations) was set to *10,000*.
> This had to be significantly adjusted in SSDG 2.3 using Apache Geode 1.11 
> (and, as it turns out, Apache Geode 1.10 as well), as can be seen in the 
> {{2.3.0.M1}} release bits source, 
> [here|https://github.com/spring-projects/spring-session-data-geode/blob/2.3.0.M1/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L94-L95].
> It turns out different combinations of the Thread Count (number of workers, 
> or "concurrent Sessions") and Workload Size ultimately determine whether this 
> test class passes or not.
> In other words, if I increase the Thread Count, then the Workload Size must 
> decrease, otherwise the test fails!  If I increase the Workload Size, then 
> the Thread Count must decrease, otherwise again the test fails!
> I tried with different combinations of Thread Count and Workload Size until 
> the test passed.  More often than not 180 Threads with 3000 Regions 
> operations worked, but was right on the cusp of failing, therefore, I settled 
> on 180 Threads (which nearly matches the customers environment of 200 
> clients) and 2000 concurrent Region operations.
> The point of the test class is to assert the state of the Session is 
> consistent at the end of the test run.
> However, before this test can even finish, the client, as in the 
> {{ClientCache}} instance starts failing with Exceptions, specifically:
> {code:java}
> java.lang.RuntimeException: Session Access Task Failed
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:298)
>       at 
> java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:210)
>       at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>       at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>       at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>       at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>       at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>       at java.util.stream.IntPipeline.reduce(IntPipeline.java:456)
>       at java.util.stream.IntPipeline.sum(IntPipeline.java:414)
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.runSessionWorkload(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:313)
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.concurrentSessionAccessIsCorrect(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:324)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
>       at 
> org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at 
> org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
>       at 
> org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
>       at 
> org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>       at 
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
>       at 
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at 
> org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at 
> org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>       at 
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
>       at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>       at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>       at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>       at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>       at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: java.util.concurrent.ExecutionException: 
> org.springframework.dao.DataAccessResourceFailureException: Pool unexpected 
> socket timed out on client connection=Pooled Connection to localhost:60964: 
> Connection[DESTROYED]). Server unreachable: could not connect after 1 
> attempts; nested exception is 
> org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected 
> socket timed out on client connection=Pooled Connection to localhost:60964: 
> Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:295)
>       ... 43 more
> Caused by: org.springframework.dao.DataAccessResourceFailureException: Pool 
> unexpected socket timed out on client connection=Pooled Connection to 
> localhost:60964: Connection[DESTROYED]). Server unreachable: could not 
> connect after 1 attempts; nested exception is 
> org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected 
> socket timed out on client connection=Pooled Connection to localhost:60964: 
> Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
>       at 
> org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:235)
>       at 
> org.springframework.data.gemfire.GemfireAccessor.convertGemFireAccessException(GemfireAccessor.java:93)
>       at 
> org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:200)
>       at 
> org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.doSave(GemFireOperationsSessionRepository.java:226)
>       at 
> org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:186)
>       at 
> org.springframework.session.data.gemfire.AbstractGemFireIntegrationTests.save(AbstractGemFireIntegrationTests.java:409)
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.lambda$newAddSessionAttributeTask$2(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:216)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.geode.cache.client.ServerConnectivityException: Pool 
> unexpected socket timed out on client connection=Pooled Connection to 
> localhost:60964: Connection[DESTROYED]). Server unreachable: could not 
> connect after 1 attempts
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659)
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501)
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:153)
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
>       at 
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
>       at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:89)
>       at 
> org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:159)
>       at 
> org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3035)
>       at 
> org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3152)
>       at 
> org.apache.geode.internal.cache.ProxyRegionMap.basicPut(ProxyRegionMap.java:238)
>       at 
> org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5580)
>       at 
> org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
>       at 
> org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5036)
>       at 
> org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1635)
>       at 
> org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1622)
>       at 
> org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:442)
>       at 
> org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:197)
>       ... 8 more
> {code}
> Attached to this issue are log output files from each of my runs using Apache 
> Geode 1.10 and 1.11.
> The log files serve 2 purposes: 1) to show the version of Apache Geode used 
> and 2) the errors occurs on the client and server during the run.
> Any lines in the log output prefixed with "{{[FORK]}}" originates from the 
> cache server.  The other lines come from the client.  There is only a single 
> client and server in this test case.
> It takes a bit of initial time during the run for the failures to start 
> occurring, which is why this seems like a resource utilization problem.
> After first, I suspected issues with the client Pool configuration, or 
> {{CacheServer}} configuration, adjusting timeouts and so on.  I even 
> suspected memory being an issue for the client and server processes, upping 
> each to 2 GB+.  However, it turns out none of the changes made a bit of 
> difference.  And the truth of the matter is, this (existing) configuration 
> worked seamlessly until I upgraded to Apache Geode 1.10+ (specifically, 1.11).
> For good measure I also include the test run with Apache Geode 1.9.2 for 
> comparison.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to