[ 
https://issues.apache.org/jira/browse/GEODE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044844#comment-17044844
 ] 

John Blum commented on GEODE-7763:
----------------------------------

Regarding this 
[comment|https://issues.apache.org/jira/browse/GEODE-7763?focusedCommentId=17039245&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17039245]...

The important part of [GEODE-6152] is that a client {{PROXY}} Region should 
never return a reference to the same object in memory, i.e. it should honor its 
[contract|https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/client/ClientRegionShortcut.html#PROXY].

However, that is accomplished, whether a client {{PROXY}} Region {{get(..)}} 
results in calling the server-side Region each time, or a copy is made on the 
client so that separate Threads making identical requests to the same "logical" 
object gets a unique reference, matters little.

> Apache Geode 1.11 severely and negatively impacts performance and resource 
> utilization
> --------------------------------------------------------------------------------------
>
>                 Key: GEODE-7763
>                 URL: https://issues.apache.org/jira/browse/GEODE-7763
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: John Blum
>            Priority: Critical
>              Labels: performance
>         Attachments: 1.11-client-stats.gfs, 1.11-server-stats.gfs, 
> 1.11_thread_dumps.rtf, 1.9-client-stats.gfs, 1.9-server-stats.gfs, 1.9.log, 
> apache-geode-1.10-client-server-interaction-output.txt, 
> apache-geode-1.10-client-server-startup-output.txt, 
> apache-geode-1.11-client-server-interaction-output.txt, 
> apache-geode-1.11-client-server-startup-output.txt, 
> geode-7763-geode-changes.diff, geode-7763-ssdg-changes.diff
>
>
> This problem was first observed in Apache Geode 1.11.0.  The problem was not 
> present in Apache Geode 1.9.2.  This problem is an issue for Apache Geode 
> 1.10 as well!
> After upgrading _Spring Session for Apache Geode_ (SSDG) 2.3 to _Spring Data 
> for Apache Geode_ (SDG) Neumann/2.3, which is based on Apache Geode 1.11, 
> this problem with SSDG's test suite started occurring.
>  _Spring Session for Apache Geode_ (SSDG) 2.2, which is based on _Spring Data 
> for Apache Geode_ (SDG) Moore/2.2, pulls in Apache Geode 1.9.2.  This problem 
> did not occur in SSDG 2.2. with Apache Geode 1.9.2.
> Out of curiosity, I wondered whether this problem affects (i.e. was actually 
> introduced in) Apache Geode 1.10.0.  So, I configured SSDG 2.3 to pull in SDG 
> Moore/2.2 but run with Apache Geode 1.10. The problem occurred with Apache 
> Geode 1.10 as well!
> The SSDG test class in question, affected by Geode's deficiencies, is the 
> [MultiThreadedHighlyConcurrentClientServerSessionOperationsIntegrationTests|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java].
> The test class was modeled after a customer UC, who were using Spring Session 
> and Apache Geode/Pivotal GemFire as the HTTP Session state management 
> provider, therefore it simulates their highly concurrent environment.
> The test class has 2 primary parameters: [Thread 
> Count|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90]
>  and the [Workload 
> Size|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L91].
> The "_Workload Size_" should not be confused with the "_Payload Size_" of the 
> individual objects passed to the Geode data access operations (i.e. {{gets}}, 
> {{puts}}, {{removes}}).  The "_Workload Size_" merely determines the number 
> of {{get}}, {{put}} or {{remove}} operations performed on the (Session) 
> Region over the duration of the test run.  Certain operations are "favored" 
> over others, therefore the number of {{gets}}, {{puts}} and {{removes}} is 
> weighted.
> The "_Payload_" in this case is a (HTTP) {{Session}} object and the "size" is 
> directly proportional to the number of Session attributes stored in the 
> Session.
> As you can see from the [test class 
> configuration|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90-L91]
>  in *SSDG* {{2.2}}, the *Thread Count* was set to *180* and the *Workload 
> Size* (or number of Region operations) was set to *10,000*.
> This had to be significantly adjusted in SSDG 2.3 using Apache Geode 1.11 
> (and, as it turns out, Apache Geode 1.10 as well), as can be seen in the 
> {{2.3.0.M1}} release bits source, 
> [here|https://github.com/spring-projects/spring-session-data-geode/blob/2.3.0.M1/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L94-L95].
> It turns out different combinations of the Thread Count (number of workers, 
> or "concurrent Sessions") and Workload Size ultimately determine whether this 
> test class passes or not.
> In other words, if I increase the Thread Count, then the Workload Size must 
> decrease, otherwise the test fails!  If I increase the Workload Size, then 
> the Thread Count must decrease, otherwise again the test fails!
> I tried with different combinations of Thread Count and Workload Size until 
> the test passed.  More often than not 180 Threads with 3000 Regions 
> operations worked, but was right on the cusp of failing, therefore, I settled 
> on 180 Threads (which nearly matches the customers environment of 200 
> concurrent client Sessions) and 2000 concurrent Region operations.
> The point of the test class is to assert the state of the Session is 
> consistent at the end of the test run.
> However, before this test can even finish, the client, as in the 
> {{ClientCache}} instance, starts failing with Exceptions, specifically:
> {code:java}
> java.lang.RuntimeException: Session Access Task Failed
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:298)
>       at 
> java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:210)
>       at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>       at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>       at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>       at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>       at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>       at java.util.stream.IntPipeline.reduce(IntPipeline.java:456)
>       at java.util.stream.IntPipeline.sum(IntPipeline.java:414)
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.runSessionWorkload(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:313)
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.concurrentSessionAccessIsCorrect(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:324)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
>       at 
> org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at 
> org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
>       at 
> org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
>       at 
> org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>       at 
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
>       at 
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at 
> org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at 
> org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>       at 
> org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
>       at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>       at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>       at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>       at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>       at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: java.util.concurrent.ExecutionException: 
> org.springframework.dao.DataAccessResourceFailureException: Pool unexpected 
> socket timed out on client connection=Pooled Connection to localhost:60964: 
> Connection[DESTROYED]). Server unreachable: could not connect after 1 
> attempts; nested exception is 
> org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected 
> socket timed out on client connection=Pooled Connection to localhost:60964: 
> Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:295)
>       ... 43 more
> Caused by: org.springframework.dao.DataAccessResourceFailureException: Pool 
> unexpected socket timed out on client connection=Pooled Connection to 
> localhost:60964: Connection[DESTROYED]). Server unreachable: could not 
> connect after 1 attempts; nested exception is 
> org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected 
> socket timed out on client connection=Pooled Connection to localhost:60964: 
> Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
>       at 
> org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:235)
>       at 
> org.springframework.data.gemfire.GemfireAccessor.convertGemFireAccessException(GemfireAccessor.java:93)
>       at 
> org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:200)
>       at 
> org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.doSave(GemFireOperationsSessionRepository.java:226)
>       at 
> org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:186)
>       at 
> org.springframework.session.data.gemfire.AbstractGemFireIntegrationTests.save(AbstractGemFireIntegrationTests.java:409)
>       at 
> org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.lambda$newAddSessionAttributeTask$2(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:216)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.geode.cache.client.ServerConnectivityException: Pool 
> unexpected socket timed out on client connection=Pooled Connection to 
> localhost:60964: Connection[DESTROYED]). Server unreachable: could not 
> connect after 1 attempts
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659)
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501)
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:153)
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
>       at 
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
>       at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:89)
>       at 
> org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:159)
>       at 
> org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3035)
>       at 
> org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3152)
>       at 
> org.apache.geode.internal.cache.ProxyRegionMap.basicPut(ProxyRegionMap.java:238)
>       at 
> org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5580)
>       at 
> org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
>       at 
> org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5036)
>       at 
> org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1635)
>       at 
> org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1622)
>       at 
> org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:442)
>       at 
> org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:197)
>       ... 8 more
> {code}
> Attached to this issue are log output files from each of my runs using Apache 
> Geode 1.10 and 1.11.
> The log files serve 2 purposes: 1) to show the version of Apache Geode used 
> and 2) the errors occurs on the client and server during the run.
> Any lines in the log output prefixed with "{{[FORK]}}" originates from the 
> cache server.  The other lines come from the client.  There is only a single 
> client and server in this test case.
> It takes a bit of initial time during the run for the failures to start 
> occurring, which is why this seems like a resource utilization problem.
> After first, I suspected issues with the client Pool configuration, or 
> {{CacheServer}} configuration, adjusting timeouts and so on.  I even 
> suspected memory being an issue for the client and server processes, upping 
> each to 2 GB+.  However, it turns out none of the changes made a bit of 
> difference.  And the truth of the matter is, this (existing) configuration 
> worked seamlessly until I upgraded to Apache Geode 1.10+ (specifically, 1.11).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to