[jira] [Created] (GEODE-9847) Benchmark instability in PartitionedPutLongBenchmark with security manager on support/1.14
Kamilla Aslami created GEODE-9847: - Summary: Benchmark instability in PartitionedPutLongBenchmark with security manager on support/1.14 Key: GEODE-9847 URL: https://issues.apache.org/jira/browse/GEODE-9847 Project: Geode Issue Type: Bug Components: benchmarks Affects Versions: 1.14.1 Reporter: Kamilla Aslami PartitionedPutLongBenchmark failed in apache-support-1-14-main/benchmark-with-security-manager. This issue could have the same root cause as GEODE-9340, but GEODE-9340 fails on 1.15 and in another CI job (apache-develop-main/benchmark-base). {noformat} org.apache.geode.benchmark.tests.PartitionedPutLongBenchmark 05:20:08 average ops/second Baseline:381785.31 Test: 351135.20 Difference: -8.0% 05:20:08 ops/second standard error Baseline: 2163.90 Test: 3115.81 Difference: +44.0% 05:20:08 ops/second standard deviation Baseline: 37417.40 Test: 53877.38 Difference: +44.0% 05:20:08 YS 99th percentile latency Baseline: 1606.00 Test: 1606.00 Difference: +0.0% 05:20:08 median latency Baseline: 1068031.00 Test: 1065983.00 Difference: -0.2% 05:20:08 90th percentile latency Baseline: 1364991.00 Test: 1356799.00 Difference: -0.6% 05:20:08 99th percentile latency Baseline: 7688191.00 Test: 8138751.00 Difference: +5.9% 05:20:08 99.9th percentile latency Baseline: 209584127.00 Test: 260964351.00 Difference: +24.5% 05:20:08 average latency Baseline: 1884576.09 Test: 2050262.70 Difference: +8.8% 05:20:08 latency standard deviation Baseline: 11587055.57 Test: 14728140.58 Difference: +27.1% 05:20:08 latency standard error Baseline: 1083.17 Test: 1435.92 Difference: +32.6% 05:20:08 average ops/second Baseline:381621.08 Test: 350789.84 Difference: -8.1% 05:20:08BENCHMARK FAILED: org.apache.geode.benchmark.tests.PartitionedPutLongBenchmark average latency is 5% worse than baseline. 05:20:08{noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9847) Benchmark instability in PartitionedPutLongBenchmark with security manager on support/1.14
[ https://issues.apache.org/jira/browse/GEODE-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447731#comment-17447731 ] Geode Integration commented on GEODE-9847: -- Seen on support/1.14 in [benchmark-with-security-manager #3|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-14-main/jobs/benchmark-with-security-manager/builds/3]. > Benchmark instability in PartitionedPutLongBenchmark with security manager on > support/1.14 > -- > > Key: GEODE-9847 > URL: https://issues.apache.org/jira/browse/GEODE-9847 > Project: Geode > Issue Type: Bug > Components: benchmarks >Affects Versions: 1.14.1 >Reporter: Kamilla Aslami >Priority: Major > > PartitionedPutLongBenchmark failed in > apache-support-1-14-main/benchmark-with-security-manager. This issue could > have the same root cause as GEODE-9340, but GEODE-9340 fails on 1.15 and in > another CI job (apache-develop-main/benchmark-base). > {noformat} > org.apache.geode.benchmark.tests.PartitionedPutLongBenchmark > 05:20:08 average ops/second Baseline:381785.31 Test: > 351135.20 Difference: -8.0% > 05:20:08 ops/second standard error Baseline: 2163.90 Test: > 3115.81 Difference: +44.0% > 05:20:08 ops/second standard deviation Baseline: 37417.40 Test: > 53877.38 Difference: +44.0% > 05:20:08 YS 99th percentile latency Baseline: 1606.00 Test: > 1606.00 Difference: +0.0% > 05:20:08 median latency Baseline: 1068031.00 Test: > 1065983.00 Difference: -0.2% > 05:20:08 90th percentile latency Baseline: 1364991.00 Test: > 1356799.00 Difference: -0.6% > 05:20:08 99th percentile latency Baseline: 7688191.00 Test: > 8138751.00 Difference: +5.9% > 05:20:08 99.9th percentile latency Baseline: 209584127.00 Test: > 260964351.00 Difference: +24.5% > 05:20:08 average latency Baseline: 1884576.09 Test: > 2050262.70 Difference: +8.8% > 05:20:08 latency standard deviation Baseline: 11587055.57 Test: > 14728140.58 Difference: +27.1% > 05:20:08 latency standard error Baseline: 1083.17 Test: > 1435.92 Difference: +32.6% > 05:20:08 average ops/second Baseline:381621.08 Test: > 350789.84 Difference: -8.1% > 05:20:08BENCHMARK FAILED: > org.apache.geode.benchmark.tests.PartitionedPutLongBenchmark average latency > is 5% worse than baseline. > 05:20:08{noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9846) CI failure: Many tests in ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest failed with ConnectException
[ https://issues.apache.org/jira/browse/GEODE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447724#comment-17447724 ] Geode Integration commented on GEODE-9846: -- Seen in [upgrade-test-openjdk8 #20|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/upgrade-test-openjdk8/builds/20] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-results/upgradeTest/1637612592/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-artifacts/1637612592/upgradetestfiles-openjdk8-1.15.0-build.0685.tgz]. > CI failure: Many tests in > ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest failed with > ConnectException > -- > > Key: GEODE-9846 > URL: https://issues.apache.org/jira/browse/GEODE-9846 > Project: Geode > Issue Type: Bug > Components: client/server, security >Affects Versions: 1.15.0 >Reporter: Kamilla Aslami >Priority: Major > > There were 96 failures in > ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest, all of them > failed with `java.net.ConnectException: Connection refused`. > This could be a transient network issue but it seems suspicious that all > failures occured in the same DUnit test file. I'm only adding one stacktrace > to the ticket description, others can be found > [here|[http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-results/upgradeTest/1637612592/]]. > > {noformat} > ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest > > dataReaderCanRegisterAndUnregisterAcrossFailover[clientVersion=1.2.0] FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties$$Lambda$302/1055293868.run > in VM 3 running on Host > heavy-lifter-617ee9be-7fe1-5cc4-bf4f-307cc3e33a7b.c.apachegeode-ci.internal > with 4 VMs with version 1.2.0 > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:635) > at org.apache.geode.test.dunit.VM.invoke(VM.java:448) > at org.apache.geode.test.dunit.Invoke.invokeInEveryVM(Invoke.java:59) > at org.apache.geode.test.dunit.Invoke.invokeInEveryVM(Invoke.java:48) > at > org.apache.geode.test.dunit.rules.RemoteInvoker.invokeInEveryVMAndController(RemoteInvoker.java:49) > at > org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties.after(DistributedRestoreSystemProperties.java:44) > at > org.apache.geode.test.dunit.rules.AbstractDistributedRule.afterDistributedTest(AbstractDistributedRule.java:81) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule.after(ClusterStartupRule.java:176) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule.access$100(ClusterStartupRule.java:69) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:140) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at >
[jira] [Created] (GEODE-9846) CI failure: Many tests in ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest failed with ConnectException
Kamilla Aslami created GEODE-9846: - Summary: CI failure: Many tests in ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest failed with ConnectException Key: GEODE-9846 URL: https://issues.apache.org/jira/browse/GEODE-9846 Project: Geode Issue Type: Bug Components: client/server, security Affects Versions: 1.15.0 Reporter: Kamilla Aslami There were 96 failures in ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest, all of them failed with `java.net.ConnectException: Connection refused`. This could be a transient network issue but it seems suspicious that all failures occured in the same DUnit test file. I'm only adding one stacktrace to the ticket description, others can be found [here|[http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-results/upgradeTest/1637612592/]]. {noformat} ClientDataAuthorizationUsingLegacySecurityWithFailoverDUnitTest > dataReaderCanRegisterAndUnregisterAcrossFailover[clientVersion=1.2.0] FAILED org.apache.geode.test.dunit.RMIException: While invoking org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties$$Lambda$302/1055293868.run in VM 3 running on Host heavy-lifter-617ee9be-7fe1-5cc4-bf4f-307cc3e33a7b.c.apachegeode-ci.internal with 4 VMs with version 1.2.0 at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:635) at org.apache.geode.test.dunit.VM.invoke(VM.java:448) at org.apache.geode.test.dunit.Invoke.invokeInEveryVM(Invoke.java:59) at org.apache.geode.test.dunit.Invoke.invokeInEveryVM(Invoke.java:48) at org.apache.geode.test.dunit.rules.RemoteInvoker.invokeInEveryVMAndController(RemoteInvoker.java:49) at org.apache.geode.test.dunit.rules.DistributedRestoreSystemProperties.after(DistributedRestoreSystemProperties.java:44) at org.apache.geode.test.dunit.rules.AbstractDistributedRule.afterDistributedTest(AbstractDistributedRule.java:81) at org.apache.geode.test.dunit.rules.ClusterStartupRule.after(ClusterStartupRule.java:176) at org.apache.geode.test.dunit.rules.ClusterStartupRule.access$100(ClusterStartupRule.java:69) at org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:140) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at org.junit.runner.JUnitCore.run(JUnitCore.java:115) at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at
[jira] [Updated] (GEODE-9845) CI failure: Multiple tests in OutOfMemoryDUnitTest failed with ConnectException
[ https://issues.apache.org/jira/browse/GEODE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamilla Aslami updated GEODE-9845: -- Summary: CI failure: Multiple tests in OutOfMemoryDUnitTest failed with ConnectException (was: Multiple tests in OutOfMemoryDUnitTest failed with ConnectException) > CI failure: Multiple tests in OutOfMemoryDUnitTest failed with > ConnectException > --- > > Key: GEODE-9845 > URL: https://issues.apache.org/jira/browse/GEODE-9845 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Kamilla Aslami >Priority: Major > > 4 tests in OutOfMemoryDUnitTest failed with `java.net.ConnectException: > Connection refused`. > {noformat} > OutOfMemoryDUnitTest > shouldAllowDeleteOperations_afterThresholdReached > FAILED > java.lang.AssertionError: > Expecting throwable message: > "No more cluster attempts left." > to contain: > "OOM command not allowed" > but did not. > Throwable that failed the check: > redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more > cluster attempts left. > at > redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:156) > at > redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:45) > at redis.clients.jedis.JedisCluster.set(JedisCluster.java:293) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.setRedisKeyAndValue(OutOfMemoryDUnitTest.java:228) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.lambda$addMultipleKeys$5(OutOfMemoryDUnitTest.java:212) > at > org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:62) > at > org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:877) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.addMultipleKeys(OutOfMemoryDUnitTest.java:210) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.fillMemory(OutOfMemoryDUnitTest.java:201) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.shouldAllowDeleteOperations_afterThresholdReached(OutOfMemoryDUnitTest.java:166) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at >
[jira] [Commented] (GEODE-9845) Multiple tests in OutOfMemoryDUnitTest failed with ConnectException
[ https://issues.apache.org/jira/browse/GEODE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447722#comment-17447722 ] Geode Integration commented on GEODE-9845: -- Seen in [distributed-test-openjdk8 #174|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/174] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637433540/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637433540/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > Multiple tests in OutOfMemoryDUnitTest failed with ConnectException > --- > > Key: GEODE-9845 > URL: https://issues.apache.org/jira/browse/GEODE-9845 > Project: Geode > Issue Type: Bug > Components: redis >Affects Versions: 1.15.0 >Reporter: Kamilla Aslami >Priority: Major > > 4 tests in OutOfMemoryDUnitTest failed with `java.net.ConnectException: > Connection refused`. > {noformat} > OutOfMemoryDUnitTest > shouldAllowDeleteOperations_afterThresholdReached > FAILED > java.lang.AssertionError: > Expecting throwable message: > "No more cluster attempts left." > to contain: > "OOM command not allowed" > but did not. > Throwable that failed the check: > redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more > cluster attempts left. > at > redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:156) > at > redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:45) > at redis.clients.jedis.JedisCluster.set(JedisCluster.java:293) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.setRedisKeyAndValue(OutOfMemoryDUnitTest.java:228) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.lambda$addMultipleKeys$5(OutOfMemoryDUnitTest.java:212) > at > org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:62) > at > org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:877) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.addMultipleKeys(OutOfMemoryDUnitTest.java:210) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.fillMemory(OutOfMemoryDUnitTest.java:201) > at > org.apache.geode.redis.OutOfMemoryDUnitTest.shouldAllowDeleteOperations_afterThresholdReached(OutOfMemoryDUnitTest.java:166) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) >
[jira] [Created] (GEODE-9845) Multiple tests in OutOfMemoryDUnitTest failed with ConnectException
Kamilla Aslami created GEODE-9845: - Summary: Multiple tests in OutOfMemoryDUnitTest failed with ConnectException Key: GEODE-9845 URL: https://issues.apache.org/jira/browse/GEODE-9845 Project: Geode Issue Type: Bug Components: redis Affects Versions: 1.15.0 Reporter: Kamilla Aslami 4 tests in OutOfMemoryDUnitTest failed with `java.net.ConnectException: Connection refused`. {noformat} OutOfMemoryDUnitTest > shouldAllowDeleteOperations_afterThresholdReached FAILED java.lang.AssertionError: Expecting throwable message: "No more cluster attempts left." to contain: "OOM command not allowed" but did not. Throwable that failed the check: redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more cluster attempts left. at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:156) at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:45) at redis.clients.jedis.JedisCluster.set(JedisCluster.java:293) at org.apache.geode.redis.OutOfMemoryDUnitTest.setRedisKeyAndValue(OutOfMemoryDUnitTest.java:228) at org.apache.geode.redis.OutOfMemoryDUnitTest.lambda$addMultipleKeys$5(OutOfMemoryDUnitTest.java:212) at org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:62) at org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:877) at org.apache.geode.redis.OutOfMemoryDUnitTest.addMultipleKeys(OutOfMemoryDUnitTest.java:210) at org.apache.geode.redis.OutOfMemoryDUnitTest.fillMemory(OutOfMemoryDUnitTest.java:201) at org.apache.geode.redis.OutOfMemoryDUnitTest.shouldAllowDeleteOperations_afterThresholdReached(OutOfMemoryDUnitTest.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at org.junit.runner.JUnitCore.run(JUnitCore.java:115) at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at
[jira] [Commented] (GEODE-9844) CI failure: RebalanceCommandDUnitTest.testWithTimeOut failed with AssertionError
[ https://issues.apache.org/jira/browse/GEODE-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447704#comment-17447704 ] Geode Integration commented on GEODE-9844: -- Seen in [distributed-test-openjdk8 #114|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/114] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637385637/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637385637/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > CI failure: RebalanceCommandDUnitTest.testWithTimeOut failed with > AssertionError > > > Key: GEODE-9844 > URL: https://issues.apache.org/jira/browse/GEODE-9844 > Project: Geode > Issue Type: Bug > Components: gfsh >Affects Versions: 1.15.0 >Reporter: Kamilla Aslami >Priority: Major > > {noformat} > RebalanceCommandDUnitTest > testWithTimeOut FAILED > java.lang.AssertionError: > Expecting actual: > 7 > to be less than or equal to: > 1 > at > org.apache.geode.management.internal.cli.commands.RebalanceCommandDUnitTest.assertRegionBalanced(RebalanceCommandDUnitTest.java:288) > at > org.apache.geode.management.internal.cli.commands.RebalanceCommandDUnitTest.testWithTimeOut(RebalanceCommandDUnitTest.java:133) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9844) CI failure: RebalanceCommandDUnitTest.testWithTimeOut failed with AssertionError
Kamilla Aslami created GEODE-9844: - Summary: CI failure: RebalanceCommandDUnitTest.testWithTimeOut failed with AssertionError Key: GEODE-9844 URL: https://issues.apache.org/jira/browse/GEODE-9844 Project: Geode Issue Type: Bug Components: gfsh Affects Versions: 1.15.0 Reporter: Kamilla Aslami {noformat} RebalanceCommandDUnitTest > testWithTimeOut FAILED java.lang.AssertionError: Expecting actual: 7 to be less than or equal to: 1 at org.apache.geode.management.internal.cli.commands.RebalanceCommandDUnitTest.assertRegionBalanced(RebalanceCommandDUnitTest.java:288) at org.apache.geode.management.internal.cli.commands.RebalanceCommandDUnitTest.testWithTimeOut(RebalanceCommandDUnitTest.java:133) {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (GEODE-9782) improve package organization of geode-for-redis
[ https://issues.apache.org/jira/browse/GEODE-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Donal Evans reassigned GEODE-9782: -- Assignee: Donal Evans > improve package organization of geode-for-redis > --- > > Key: GEODE-9782 > URL: https://issues.apache.org/jira/browse/GEODE-9782 > Project: Geode > Issue Type: Improvement > Components: redis >Affects Versions: 1.15.0 >Reporter: Darrel Schneider >Assignee: Donal Evans >Priority: Major > > It would be nice to improve how the internals of geode-for-redis are packaged > before it is released in 1.15. Try to do this when others are not active > working on these classes since it could cause a bunch of conflicts. Be aware > that a few of the internals may have dependencies outside of geode and those > will also need to be updated. Make sure and move corresponding tests to be in > the same package. Here are some ideas: > # move the collections package into the data package > # move the delta package into the data package > # move all the Stripe classes in the services package into a new > services.locking package > # move RegionProvider into services > # move PassiveExpirationManager into services > # move RedisSanctionedSerializablesService into the services > # move SlotAdvisor into the cluster package > # move the cluster package into the services package (or leave it as is, > also consider moving pubsub and statics into services. The "services" package > is so generic lots of things could be put into it or we could get rid of it). > # create a new package named "commands" > # move Command, RedisCommandSupportLevel, and RedisCommandType into commands > # move parameters into commands > # move executor into commands -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9843) CI failure: DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts failed with TooFewActualInvocations
[ https://issues.apache.org/jira/browse/GEODE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447699#comment-17447699 ] Geode Integration commented on GEODE-9843: -- Seen in [distributed-test-openjdk8 #139|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/139] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637402577/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637402577/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > CI failure: > DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts > failed with TooFewActualInvocations > - > > Key: GEODE-9843 > URL: https://issues.apache.org/jira/browse/GEODE-9843 > Project: Geode > Issue Type: Bug > Components: core, management >Affects Versions: 1.15.0 >Reporter: Kamilla Aslami >Priority: Major > > {noformat} > DistributedSystemMXBeanWithAlertsDistributedTest > > managerMissesAnyAlertsBeforeItStarts FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest$$Lambda$301/390135366.run > in VM 0 running on Host > heavy-lifter-993df0f4-3655-560f-82a7-0c09d04efdd9.c.apachegeode-ci.internal > with 4 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631) > at org.apache.geode.test.dunit.VM.invoke(VM.java:448) > at > org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts(DistributedSystemMXBeanWithAlertsDistributedTest.java:379) > Caused by: > org.mockito.exceptions.verification.TooFewActualInvocations: > notificationListener.handleNotification( > , > isNull() > ); > Wanted 3 times: > -> at > org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllNotifications(DistributedSystemMXBeanWithAlertsDistributedTest.java:439) > But was 2 times: > -> at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$ListenerWrapper.handleNotification(DefaultMBeanServerInterceptor.java:1754) > -> at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$ListenerWrapper.handleNotification(DefaultMBeanServerInterceptor.java:1754) > at > org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllNotifications(DistributedSystemMXBeanWithAlertsDistributedTest.java:439) > at > org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllAlerts(DistributedSystemMXBeanWithAlertsDistributedTest.java:451) > at > org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.lambda$managerMissesAnyAlertsBeforeItStarts$bb17a952$6(DistributedSystemMXBeanWithAlertsDistributedTest.java:380) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9843) CI failure: DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts failed with TooFewActualInvocations
Kamilla Aslami created GEODE-9843: - Summary: CI failure: DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts failed with TooFewActualInvocations Key: GEODE-9843 URL: https://issues.apache.org/jira/browse/GEODE-9843 Project: Geode Issue Type: Bug Components: core, management Affects Versions: 1.15.0 Reporter: Kamilla Aslami {noformat} DistributedSystemMXBeanWithAlertsDistributedTest > managerMissesAnyAlertsBeforeItStarts FAILED org.apache.geode.test.dunit.RMIException: While invoking org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest$$Lambda$301/390135366.run in VM 0 running on Host heavy-lifter-993df0f4-3655-560f-82a7-0c09d04efdd9.c.apachegeode-ci.internal with 4 VMs at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631) at org.apache.geode.test.dunit.VM.invoke(VM.java:448) at org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.managerMissesAnyAlertsBeforeItStarts(DistributedSystemMXBeanWithAlertsDistributedTest.java:379) Caused by: org.mockito.exceptions.verification.TooFewActualInvocations: notificationListener.handleNotification( , isNull() ); Wanted 3 times: -> at org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllNotifications(DistributedSystemMXBeanWithAlertsDistributedTest.java:439) But was 2 times: -> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$ListenerWrapper.handleNotification(DefaultMBeanServerInterceptor.java:1754) -> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$ListenerWrapper.handleNotification(DefaultMBeanServerInterceptor.java:1754) at org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllNotifications(DistributedSystemMXBeanWithAlertsDistributedTest.java:439) at org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.captureAllAlerts(DistributedSystemMXBeanWithAlertsDistributedTest.java:451) at org.apache.geode.management.DistributedSystemMXBeanWithAlertsDistributedTest.lambda$managerMissesAnyAlertsBeforeItStarts$bb17a952$6(DistributedSystemMXBeanWithAlertsDistributedTest.java:380) {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9842) CI failure: PartitionedRegionSingleHopDUnitTest.testMetadataContents failed with AssertionFailedError
[ https://issues.apache.org/jira/browse/GEODE-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447681#comment-17447681 ] Geode Integration commented on GEODE-9842: -- Seen in [distributed-test-openjdk8 #170|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/170] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637426775/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637426775/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > CI failure: PartitionedRegionSingleHopDUnitTest.testMetadataContents failed > with AssertionFailedError > - > > Key: GEODE-9842 > URL: https://issues.apache.org/jira/browse/GEODE-9842 > Project: Geode > Issue Type: Bug > Components: client/server >Affects Versions: 1.15.0 >Reporter: Kamilla Aslami >Priority: Major > > {noformat} > PartitionedRegionSingleHopDUnitTest > testMetadataContents FAILED > org.opentest4j.AssertionFailedError: > Expecting value to be false but was true > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.internal.cache.PartitionedRegionSingleHopDUnitTest.testMetadataContents(PartitionedRegionSingleHopDUnitTest.java:272) > {noformat} > This issue might be related to GEODE-9617. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9842) CI failure: PartitionedRegionSingleHopDUnitTest.testMetadataContents failed with AssertionFailedError
Kamilla Aslami created GEODE-9842: - Summary: CI failure: PartitionedRegionSingleHopDUnitTest.testMetadataContents failed with AssertionFailedError Key: GEODE-9842 URL: https://issues.apache.org/jira/browse/GEODE-9842 Project: Geode Issue Type: Bug Components: client/server Affects Versions: 1.15.0 Reporter: Kamilla Aslami {noformat} PartitionedRegionSingleHopDUnitTest > testMetadataContents FAILED org.opentest4j.AssertionFailedError: Expecting value to be false but was true at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at org.apache.geode.internal.cache.PartitionedRegionSingleHopDUnitTest.testMetadataContents(PartitionedRegionSingleHopDUnitTest.java:272) {noformat} This issue might be related to GEODE-9617. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9737) CI failure in TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest
[ https://issues.apache.org/jira/browse/GEODE-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wayne updated GEODE-9737: - Labels: release-blocker (was: ) > CI failure in > TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest > -- > > Key: GEODE-9737 > URL: https://issues.apache.org/jira/browse/GEODE-9737 > Project: Geode > Issue Type: Bug > Components: http session >Affects Versions: 1.15.0 >Reporter: Kamilla Aslami >Assignee: Benjamin P Ross >Priority: Major > Labels: release-blocker > Attachments: gemfire.log > > > {noformat} > TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest > > test[0] FAILED > java.lang.RuntimeException: Something very bad happened when trying to > start container > TOMCAT7_client-server_test0_1_dd13a1a6-effb-4430-8ccd-ee6c9142938c_ > at > org.apache.geode.session.tests.ContainerManager.startContainer(ContainerManager.java:82) > at > org.apache.geode.session.tests.ContainerManager.startContainers(ContainerManager.java:93) > at > org.apache.geode.session.tests.ContainerManager.startAllInactiveContainers(ContainerManager.java:101) > at > org.apache.geode.session.tests.TomcatSessionBackwardsCompatibilityTestBase.doPutAndGetSessionOnAllClients(TomcatSessionBackwardsCompatibilityTestBase.java:187) > at > org.apache.geode.session.tests.TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest.test(TomcatSessionBackwardsCompatibilityTomcat7079WithOldModulesMixedWithCurrentCanDoPutFromCurrentModuleTest.java:36) > Caused by: > java.lang.RuntimeException: Something very bad happened to this > container when starting. Check the cargo_logs folder for container logs. > at > org.apache.geode.session.tests.ServerContainer.start(ServerContainer.java:220) > at > org.apache.geode.session.tests.ContainerManager.startContainer(ContainerManager.java:79) > ... 4 more > Caused by: > org.codehaus.cargo.container.ContainerException: Deployable > [http://localhost:26322/cargocpc/index.html] failed to finish deploying > within the timeout period [12]. The Deployable state is thus unknown. > at > org.codehaus.cargo.container.spi.deployer.DeployerWatchdog.watch(DeployerWatchdog.java:111) > at > org.codehaus.cargo.container.spi.AbstractLocalContainer.waitForCompletion(AbstractLocalContainer.java:387) > at > org.codehaus.cargo.container.spi.AbstractLocalContainer.start(AbstractLocalContainer.java:234) > at > org.apache.geode.session.tests.ServerContainer.start(ServerContainer.java:218) > ... 5 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9838) Log key info for deserilation issue while index update
[ https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447672#comment-17447672 ] ASF subversion and git services commented on GEODE-9838: Commit 313fb24631ab16157452fc540b70d18a7aa1b10b in geode's branch refs/heads/feature/GEODE-9838 from zhouxh [ https://gitbox.apache.org/repos/asf?p=geode.git;h=313fb24 ] GEODE-9838: Log key info for deserialization issue while index update > Log key info for deserilation issue while index update > --- > > Key: GEODE-9838 > URL: https://issues.apache.org/jira/browse/GEODE-9838 > Project: Geode > Issue Type: Improvement > Components: querying >Affects Versions: 1.15.0 >Reporter: Anilkumar Gingade >Assignee: Xiaojian Zhou >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > When there is issue in Index update (maintenance); the index is marked as > invalid. And warning is logged: > [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier > failed. The index is corrupted and marked as invalid. > org.apache.geode.cache.query.internal.index.IMQException > Adding "key" information in the log helps diagnosing the failure and adding > or removing the entry in question. > Code path IndexManager.java: > void addIndexMapping(RegionEntry entry, IndexProtocol index) { > try { > index.addIndexMapping(entry); > } catch (Exception exception) { > index.markValid(false); > setPRIndexAsInvalid((AbstractIndex) index); > logger.warn(String.format( > "Updating the Index %s failed. The index is corrupted and marked as > invalid.", > ((AbstractIndex) index).indexName), exception); > } > } > void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) { > try { > index.removeIndexMapping(entry, opCode); > } catch (Exception exception) { > index.markValid(false); > setPRIndexAsInvalid((AbstractIndex) index); > logger.warn(String.format( > "Updating the Index %s failed. The index is corrupted and marked as > invalid.", > ((AbstractIndex) index).indexName), exception); > } > } -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9838) Log key info for deserilation issue while index update
[ https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-9838: -- Labels: GeodeOperationAPI pull-request-available (was: GeodeOperationAPI) > Log key info for deserilation issue while index update > --- > > Key: GEODE-9838 > URL: https://issues.apache.org/jira/browse/GEODE-9838 > Project: Geode > Issue Type: Improvement > Components: querying >Affects Versions: 1.15.0 >Reporter: Anilkumar Gingade >Assignee: Xiaojian Zhou >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > When there is issue in Index update (maintenance); the index is marked as > invalid. And warning is logged: > [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier > failed. The index is corrupted and marked as invalid. > org.apache.geode.cache.query.internal.index.IMQException > Adding "key" information in the log helps diagnosing the failure and adding > or removing the entry in question. > Code path IndexManager.java: > void addIndexMapping(RegionEntry entry, IndexProtocol index) { > try { > index.addIndexMapping(entry); > } catch (Exception exception) { > index.markValid(false); > setPRIndexAsInvalid((AbstractIndex) index); > logger.warn(String.format( > "Updating the Index %s failed. The index is corrupted and marked as > invalid.", > ((AbstractIndex) index).indexName), exception); > } > } > void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) { > try { > index.removeIndexMapping(entry, opCode); > } catch (Exception exception) { > index.markValid(false); > setPRIndexAsInvalid((AbstractIndex) index); > logger.warn(String.format( > "Updating the Index %s failed. The index is corrupted and marked as > invalid.", > ((AbstractIndex) index).indexName), exception); > } > } -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9841) Move internal packages to conform to new internal package structure
[ https://issues.apache.org/jira/browse/GEODE-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-9841: -- Labels: pull-request-available (was: ) > Move internal packages to conform to new internal package structure > --- > > Key: GEODE-9841 > URL: https://issues.apache.org/jira/browse/GEODE-9841 > Project: Geode > Issue Type: Improvement >Reporter: Udo Kohlmeyer >Assignee: Udo Kohlmeyer >Priority: Major > Labels: pull-request-available > > Both ClassLoader and Deployment are in the `internal.classloader` and > `internal.deployment` package structure, which would make more sense (and > conform to newer thinking) to be `classloader.internal` and > `deployment.internal`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (GEODE-9841) Move internal packages to conform to new internal package structure
[ https://issues.apache.org/jira/browse/GEODE-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udo Kohlmeyer reassigned GEODE-9841: Assignee: Udo Kohlmeyer > Move internal packages to conform to new internal package structure > --- > > Key: GEODE-9841 > URL: https://issues.apache.org/jira/browse/GEODE-9841 > Project: Geode > Issue Type: Improvement >Reporter: Udo Kohlmeyer >Assignee: Udo Kohlmeyer >Priority: Major > > Both ClassLoader and Deployment are in the `internal.classloader` and > `internal.deployment` package structure, which would make more sense (and > conform to newer thinking) to be `classloader.internal` and > `deployment.internal`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9841) Move internal packages to conform to new internal package structure
Udo Kohlmeyer created GEODE-9841: Summary: Move internal packages to conform to new internal package structure Key: GEODE-9841 URL: https://issues.apache.org/jira/browse/GEODE-9841 Project: Geode Issue Type: Improvement Reporter: Udo Kohlmeyer Both ClassLoader and Deployment are in the `internal.classloader` and `internal.deployment` package structure, which would make more sense (and conform to newer thinking) to be `classloader.internal` and `deployment.internal`. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly
[ https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447642#comment-17447642 ] Geode Integration commented on GEODE-8644: -- Seen in [distributed-test-openjdk8 #21|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/distributed-test-openjdk8/builds/21] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-results/distributedTest/1637612413/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0685/test-artifacts/1637612413/distributedtestfiles-openjdk8-1.15.0-build.0685.tgz]. > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > intermittently fails when queues drain too slowly > --- > > Key: GEODE-8644 > URL: https://issues.apache.org/jira/browse/GEODE-8644 > Project: Geode > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Benjamin P Ross >Assignee: Mark Hanson >Priority: Major > Labels: GeodeOperationAPI, needsTriage, pull-request-available > > Currently the test > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > relies on a 2 second delay to allow for queues to finish draining after > finishing the put operation. If queues take longer than 2 seconds to drain > the test will fail. We should change the test to wait for the queues to be > empty with a long timeout in case the queues never fully drain. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members
[ https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447641#comment-17447641 ] Geode Integration commented on GEODE-7739: -- Seen in [distributed-test-openjdk8 #22|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/distributed-test-openjdk8/builds/22] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0686/test-results/distributedTest/1637613716/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0686/test-artifacts/1637613716/distributedtestfiles-openjdk8-1.15.0-build.0686.tgz]. > JMX managers may fail to federate mbeans for other members > -- > > Key: GEODE-7739 > URL: https://issues.apache.org/jira/browse/GEODE-7739 > Project: Geode > Issue Type: Bug > Components: jmx >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > JMX Manager may fail to federate one or more MXBeans for other members > because of a race condition during startup. When ManagementCacheListener is > first constructed, it is in a state that will ignore all callbacks because > the field readyForEvents is false. > > Debugging with JMXMBeanReconnectDUnitTest revealed this bug. > The test starts two locators with jmx manager configured and started. > Locator1 always has all of locator2's mbeans, but locator2 is intermittently > missing the personal mbeans of locator1. > I think this is caused by some sort of race condition in the code that > creates the monitoring regions for other members in locator2. > It's possible that the jmx manager that hits this bug might fail to have > mbeans for servers as well as other locators but I haven't seen a test case > for this scenario. > The exposure of this bug means that a user running more than one locator > might have a locator that is missing one or more mbeans for the cluster. > > Studying the JMX code also reveals the existence of *GEODE-8012*. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out
[ https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-9764: Description: There is a weakness in the P2P/DirectChannel messaging architecture, in that it never gives up on a request (in a request-response scenario). As a result a bug (software fault) anywhere from the point where the requesting thread hands off the {{DistributionMessage}} e.g. to {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the point where that request is ultimately fulfilled on a (one) receiver, can result in a hang (of some task on the send side, which is waiting for a response). Well it's a little worse than that because any code in the return (response) path can also cause disruption of the (response) flow, thereby leaving the requesting task hanging. If the code in the request path (primarily in P2P messaging) and the code in the response path (P2P messaging and TBD higher-level code) were perfect this might not be a problem. But there is a fair amount of code there and we have some evidence that it is currently not perfect, nor do we expect it to become perfect and stay that way. This is a sketch of the situation. The left-most column is the request path or the originating member. The middle column is the server-side of the request-response path. And the right-most column is the response path back on the originating member. !image-2021-11-22-12-14-59-117.png! You can see that Geode product code, JDK code, and hardware components all lie in the end-to-end request-response messaging path. That being the case it seems prudent to institute response timeouts so that bugs of this sort (which disrupt request-response message flow) don't result in hangs. It's TBD if we want to go a step further and institute retries. The latter would entail introducing duplicate-suppression (conflation) in P2P messaging. We might also add exponential backoff (open-loop) or back-pressure (closed-loop) to prevent a flood of retries when the system is at or near the point of thrashing. But even without retries, a configurable timeout might have good ROI as a first step. This would entail: * adding a configuration parameter to specify the timeout value * changing ReplyProcessor21 and others TBD to "give up" after the timeout has elapsed * changing higher-level code dependent on request-reply messaging so it properly handles the situations where we might have to "give up" This issue affects all versions of Geode. h2. Counterpoint Not everybody thinks timeouts are a good idea. This section has the highlights. h3. Timeouts Will Result in Data-Inconsistency If we leave most the surrounding code as-is and introduce timeouts, then we risk data inconsistency. TODO: describe in detail why data inconsistency is _inherent_ in using timeouts. h3. Narrow The Vulnerability Cross-Section Without Timeouts The proposal (above) seeks to solve the problem using end-to-end timeouts since any component in the path can, in general, have faults. An alternative approach, would be to assume that _some_ of the components can be made "good enough" (without adding timeouts) and that those "good enough" components can protect themselves (and user applications) from faults in the remaining components. With this approach, the Cluster Distribution Manager, and P2P / TCP Conduit / Direct Channel framework would be enhanced so that it was less susceptible to bugs in: * the 341 Distribution Message classes * the 68 Reply Message classes * the 95 Reply Processor classes The question is: what form would that enhancement take, and also, would it be sufficient to overcome faults in remaining components (JDK, and the host+network layers). h2. Alternatives Discussed These alternatives have been discussed, to varying degrees. Baseline: no timeouts; members waiting for replies do "the right thing" if recipient departs view Give-up-after-timeout Retry-after-timeout-and-eventually-give-up Retry-after-forcing-receiver-out-of-view was: There is a weakness in the P2P/DirectChannel messaging architecture, in that it never gives up on a request (in a request-response scenario). As a result a bug (software fault) anywhere from the point where the requesting thread hands off the {{DistributionMessage}} e.g. to {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the point where that request is ultimately fulfilled on a (one) receiver, can result in a hang (of some task on the send side, which is waiting for a response). Well it's a little worse than that because any code in the return (response) path can also cause disruption of the (response) flow, thereby leaving the requesting task hanging. If the code in the request path (primarily in P2P messaging) and the code in the response path (P2P messaging and TBD higher-level code) were perfect this might
[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out
[ https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-9764: Attachment: image-2021-11-22-12-14-59-117.png > Request-Response Messaging Should Time Out > -- > > Key: GEODE-9764 > URL: https://issues.apache.org/jira/browse/GEODE-9764 > Project: Geode > Issue Type: Improvement > Components: messaging >Reporter: Bill Burcham >Assignee: Bill Burcham >Priority: Major > Attachments: image-2021-11-22-11-52-23-586.png, > image-2021-11-22-12-14-59-117.png > > > There is a weakness in the P2P/DirectChannel messaging architecture, in that > it never gives up on a request (in a request-response scenario). As a result > a bug (software fault) anywhere from the point where the requesting thread > hands off the {{DistributionMessage}} e.g. to > {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the > point where that request is ultimately fulfilled on a (one) receiver, can > result in a hang (of some task on the send side, which is waiting for a > response). > Well it's a little worse than that because any code in the return (response) > path can also cause disruption of the (response) flow, thereby leaving the > requesting task hanging. > If the code in the request path (primarily in P2P messaging) and the code in > the response path (P2P messaging and TBD higher-level code) were perfect this > might not be a problem. But there is a fair amount of code there and we have > some evidence that it is currently not perfect, nor do we expect it to become > perfect and stay that way. That being the case it seems prudent to institute > response timeouts so that bugs of this sort (which disrupt request-response > message flow) don't result in hangs. > It's TBD if we want to go a step further and institute retries. The latter > would entail introducing duplicate-suppression (conflation) in P2P messaging. > We might also add exponential backoff (open-loop) or back-pressure > (closed-loop) to prevent a flood of retries when the system is at or near the > point of thrashing. > But even without retries, a configurable timeout might have good ROI as a > first step. This would entail: > * adding a configuration parameter to specify the timeout value > * changing ReplyProcessor21 and others TBD to "give up" after the timeout > has elapsed > * changing higher-level code dependent on request-reply messaging so it > properly handles the situations where we might have to "give up" > This issue affects all versions of Geode. > h2. Counterpoint > Not everbody thinks timeouts are a good idea. Here are some alternative ideas: > > Make request-response primitive better. make it so only bugs in our core > messaging framework could cause a lack of response - rather than our current > approach where a bug in a class like “RemotePutMessage” could cause a lack of > a response. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out
[ https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-9764: Description: There is a weakness in the P2P/DirectChannel messaging architecture, in that it never gives up on a request (in a request-response scenario). As a result a bug (software fault) anywhere from the point where the requesting thread hands off the {{DistributionMessage}} e.g. to {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the point where that request is ultimately fulfilled on a (one) receiver, can result in a hang (of some task on the send side, which is waiting for a response). Well it's a little worse than that because any code in the return (response) path can also cause disruption of the (response) flow, thereby leaving the requesting task hanging. If the code in the request path (primarily in P2P messaging) and the code in the response path (P2P messaging and TBD higher-level code) were perfect this might not be a problem. But there is a fair amount of code there and we have some evidence that it is currently not perfect, nor do we expect it to become perfect and stay that way. This is a sketch of the situation. The left-most column is the request path or the originating member. The middle column is the server-side of the request-response path. And the right-most column is the response path back on the originating member. !image-2021-11-22-12-14-59-117.png! You can see that Geode product code, JDK code, and hardware components all lie in the end-to-end request-response messaging path. That being the case it seems prudent to institute response timeouts so that bugs of this sort (which disrupt request-response message flow) don't result in hangs. It's TBD if we want to go a step further and institute retries. The latter would entail introducing duplicate-suppression (conflation) in P2P messaging. We might also add exponential backoff (open-loop) or back-pressure (closed-loop) to prevent a flood of retries when the system is at or near the point of thrashing. But even without retries, a configurable timeout might have good ROI as a first step. This would entail: * adding a configuration parameter to specify the timeout value * changing ReplyProcessor21 and others TBD to "give up" after the timeout has elapsed * changing higher-level code dependent on request-reply messaging so it properly handles the situations where we might have to "give up" This issue affects all versions of Geode. h2. Counterpoint Not everybody thinks timeouts are a good idea. Here are some alternative ideas: The proposal (above) seeks to solve the problem using end-to-end timeouts since any component in the path can, in general, have faults. An alternative approach, would be to assume that _some_ of the components can be made "good enough" (without adding timeouts) and that those "good enough" components can protect themselves (and user applications) from faults in the remaining components. With this approach, the Cluster Distribution Manager, and P2P / TCP Conduit / Direct Channel framework would be enhanced so that it was less susceptible to bugs in: * the 341 Distribution Message classes * the 68 Reply Message classes * the 95 Reply Processor classes The question is: what form would that enhancement take, and also, would it be sufficient to overcome faults in remaining components (JDK, and the host+network layers). was: There is a weakness in the P2P/DirectChannel messaging architecture, in that it never gives up on a request (in a request-response scenario). As a result a bug (software fault) anywhere from the point where the requesting thread hands off the {{DistributionMessage}} e.g. to {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the point where that request is ultimately fulfilled on a (one) receiver, can result in a hang (of some task on the send side, which is waiting for a response). Well it's a little worse than that because any code in the return (response) path can also cause disruption of the (response) flow, thereby leaving the requesting task hanging. If the code in the request path (primarily in P2P messaging) and the code in the response path (P2P messaging and TBD higher-level code) were perfect this might not be a problem. But there is a fair amount of code there and we have some evidence that it is currently not perfect, nor do we expect it to become perfect and stay that way. That being the case it seems prudent to institute response timeouts so that bugs of this sort (which disrupt request-response message flow) don't result in hangs. It's TBD if we want to go a step further and institute retries. The latter would entail introducing duplicate-suppression (conflation) in P2P messaging. We might also add exponential backoff (open-loop) or back-pressure (closed-loop) to
[jira] [Commented] (GEODE-7822) MemoryThresholdsOffHeapDUnitTest has failures
[ https://issues.apache.org/jira/browse/GEODE-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447599#comment-17447599 ] Geode Integration commented on GEODE-7822: -- Seen in [distributed-test-openjdk8 #118|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/118] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637385064/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637385064/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > MemoryThresholdsOffHeapDUnitTest has failures > - > > Key: GEODE-7822 > URL: https://issues.apache.org/jira/browse/GEODE-7822 > Project: Geode > Issue Type: Bug > Components: tests >Reporter: Mark Hanson >Priority: Major > Labels: flaky > > These two failures were seen in mass test runs... > {noformat} > testPRLoadRejection > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/674 > {noformat} > {noformat} > org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest > > testPRLoadRejection FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest$31.call in > VM 2 running on Host a57bd8581b8d with 4 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:610) > at org.apache.geode.test.dunit.VM.invoke(VM.java:462) > at > org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest.testPRLoadRejection(MemoryThresholdsOffHeapDUnitTest.java:1045) > Caused by: > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest$31.call(MemoryThresholdsOffHeapDUnitTest.java:1077){noformat} > =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > [http://files.apachegeode-ci.info/builds/apache-mass-test-run-main/1.12.0-SNAPSHOT.0005/test-results/distributedTest/1582515952/] > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test report artifacts from this job are available at: > [http://files.apachegeode-ci.info/builds/apache-mass-test-run-main/1.12.0-SNAPSHOT.0005/test-artifacts/1582515952/distributedtestfiles-OpenJDK8-1.12.0-SNAPSHOT.0005.tgz] > {noformat} > testDRLoadRejection > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/742 > {noformat} > {noformat} > org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest > > testDRLoadRejection FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest$18.call in > VM 2 running on Host b2c673017cde with 4 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:610) > at org.apache.geode.test.dunit.VM.invoke(VM.java:462) > at > org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest.testDRLoadRejection(MemoryThresholdsOffHeapDUnitTest.java:667) > Caused by: >java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest$18.call(MemoryThresholdsOffHeapDUnitTest.java:673) > {noformat} > =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > http://files.apachegeode-ci.info/builds/apache-mass-test-run-main/1.12.0-SNAPSHOT.0005/test-results/distributedTest/1582626992/ > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test report artifacts from this job are available at: > http://files.apachegeode-ci.info/builds/apache-mass-test-run-main/1.12.0-SNAPSHOT.0005/test-artifacts/1582626992/distributedtestfiles-OpenJDK8-1.12.0-SNAPSHOT.0005.tgz -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly
[ https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447595#comment-17447595 ] Geode Integration commented on GEODE-8644: -- Seen in [distributed-test-openjdk8 #144|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/144] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637410069/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637410069/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > intermittently fails when queues drain too slowly > --- > > Key: GEODE-8644 > URL: https://issues.apache.org/jira/browse/GEODE-8644 > Project: Geode > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Benjamin P Ross >Assignee: Mark Hanson >Priority: Major > Labels: GeodeOperationAPI, needsTriage, pull-request-available > > Currently the test > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > relies on a 2 second delay to allow for queues to finish draining after > finishing the put operation. If queues take longer than 2 seconds to drain > the test will fail. We should change the test to wait for the queues to be > empty with a long timeout in case the queues never fully drain. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly
[ https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447594#comment-17447594 ] Geode Integration commented on GEODE-8644: -- Seen in [distributed-test-openjdk8 #154|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/154] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637417234/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637417234/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > intermittently fails when queues drain too slowly > --- > > Key: GEODE-8644 > URL: https://issues.apache.org/jira/browse/GEODE-8644 > Project: Geode > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Benjamin P Ross >Assignee: Mark Hanson >Priority: Major > Labels: GeodeOperationAPI, needsTriage, pull-request-available > > Currently the test > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > relies on a 2 second delay to allow for queues to finish draining after > finishing the put operation. If queues take longer than 2 seconds to drain > the test will fail. We should change the test to wait for the queues to be > empty with a long timeout in case the queues never fully drain. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out
[ https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-9764: Attachment: image-2021-11-22-11-52-23-586.png > Request-Response Messaging Should Time Out > -- > > Key: GEODE-9764 > URL: https://issues.apache.org/jira/browse/GEODE-9764 > Project: Geode > Issue Type: Improvement > Components: messaging >Reporter: Bill Burcham >Assignee: Bill Burcham >Priority: Major > Attachments: image-2021-11-22-11-52-23-586.png > > > There is a weakness in the P2P/DirectChannel messaging architecture, in that > it never gives up on a request (in a request-response scenario). As a result > a bug (software fault) anywhere from the point where the requesting thread > hands off the {{DistributionMessage}} e.g. to > {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the > point where that request is ultimately fulfilled on a (one) receiver, can > result in a hang (of some task on the send side, which is waiting for a > response). > Well it's a little worse than that because any code in the return (response) > path can also cause disruption of the (response) flow, thereby leaving the > requesting task hanging. > If the code in the request path (primarily in P2P messaging) and the code in > the response path (P2P messaging and TBD higher-level code) were perfect this > might not be a problem. But there is a fair amount of code there and we have > some evidence that it is currently not perfect, nor do we expect it to become > perfect and stay that way. That being the case it seems prudent to institute > response timeouts so that bugs of this sort (which disrupt request-response > message flow) don't result in hangs. > It's TBD if we want to go a step further and institute retries. The latter > would entail introducing duplicate-suppression (conflation) in P2P messaging. > We might also add exponential backoff (open-loop) or back-pressure > (closed-loop) to prevent a flood of retries when the system is at or near the > point of thrashing. > But even without retries, a configurable timeout might have good ROI as a > first step. This would entail: > * adding a configuration parameter to specify the timeout value > * changing ReplyProcessor21 and others TBD to "give up" after the timeout > has elapsed > * changing higher-level code dependent on request-reply messaging so it > properly handles the situations where we might have to "give up" > This issue affects all versions of Geode. > h2. Counterpoint > Not everbody thinks timeouts are a good idea. Here are some alternative ideas: > > Make request-response primitive better. make it so only bugs in our core > messaging framework could cause a lack of response - rather than our current > approach where a bug in a class like “RemotePutMessage” could cause a lack of > a response. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-7739) JMX managers may fail to federate mbeans for other members
[ https://issues.apache.org/jira/browse/GEODE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447587#comment-17447587 ] Geode Integration commented on GEODE-7739: -- Seen in [distributed-test-openjdk8 #178|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/178] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637434380/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637434380/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > JMX managers may fail to federate mbeans for other members > -- > > Key: GEODE-7739 > URL: https://issues.apache.org/jira/browse/GEODE-7739 > Project: Geode > Issue Type: Bug > Components: jmx >Reporter: Kirk Lund >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > JMX Manager may fail to federate one or more MXBeans for other members > because of a race condition during startup. When ManagementCacheListener is > first constructed, it is in a state that will ignore all callbacks because > the field readyForEvents is false. > > Debugging with JMXMBeanReconnectDUnitTest revealed this bug. > The test starts two locators with jmx manager configured and started. > Locator1 always has all of locator2's mbeans, but locator2 is intermittently > missing the personal mbeans of locator1. > I think this is caused by some sort of race condition in the code that > creates the monitoring regions for other members in locator2. > It's possible that the jmx manager that hits this bug might fail to have > mbeans for servers as well as other locators but I haven't seen a test case > for this scenario. > The exposure of this bug means that a user running more than one locator > might have a locator that is missing one or more mbeans for the cluster. > > Studying the JMX code also reveals the existence of *GEODE-8012*. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9764) Request-Response Messaging Should Time Out
[ https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham updated GEODE-9764: Description: There is a weakness in the P2P/DirectChannel messaging architecture, in that it never gives up on a request (in a request-response scenario). As a result a bug (software fault) anywhere from the point where the requesting thread hands off the {{DistributionMessage}} e.g. to {{{}ClusterDistributionManager.putOutgoing(DistributionMessage){}}}, to the point where that request is ultimately fulfilled on a (one) receiver, can result in a hang (of some task on the send side, which is waiting for a response). Well it's a little worse than that because any code in the return (response) path can also cause disruption of the (response) flow, thereby leaving the requesting task hanging. If the code in the request path (primarily in P2P messaging) and the code in the response path (P2P messaging and TBD higher-level code) were perfect this might not be a problem. But there is a fair amount of code there and we have some evidence that it is currently not perfect, nor do we expect it to become perfect and stay that way. That being the case it seems prudent to institute response timeouts so that bugs of this sort (which disrupt request-response message flow) don't result in hangs. It's TBD if we want to go a step further and institute retries. The latter would entail introducing duplicate-suppression (conflation) in P2P messaging. We might also add exponential backoff (open-loop) or back-pressure (closed-loop) to prevent a flood of retries when the system is at or near the point of thrashing. But even without retries, a configurable timeout might have good ROI as a first step. This would entail: * adding a configuration parameter to specify the timeout value * changing ReplyProcessor21 and others TBD to "give up" after the timeout has elapsed * changing higher-level code dependent on request-reply messaging so it properly handles the situations where we might have to "give up" This issue affects all versions of Geode. h2. Counterpoint Not everbody thinks timeouts are a good idea. Here are some alternative ideas: Make request-response primitive better. make it so only bugs in our core messaging framework could cause a lack of response - rather than our current approach where a bug in a class like “RemotePutMessage” could cause a lack of a response. was: There is a weakness in the P2P/DirectChannel messaging architecture, in that it never gives up on a request (in a request-response scenario). As a result a bug (software fault) anywhere from the point where the requesting thread hands off the {{DistributionMessage}} e.g. to {{ClusterDistributionManager.putOutgoing(DistributionMessage)}}, to the point where that request is ultimately fulfilled on a (one) receiver, can result in a hang (of some task on the send side, which is waiting for a response). Well it's a little worse than that because any code in the return (response) path can also cause disruption of the (response) flow, thereby leaving the requesting task hanging. If the code in the request path (primarily in P2P messaging) and the code in the response path (P2P messaging and TBD higher-level code) were perfect this might not be a problem. But there is a fair amount of code there and we have some evidence that it is currently not perfect, nor do we expect it to become perfect and stay that way. That being the case it seems prudent to institute response timeouts so that bugs of this sort (which disrupt request-response message flow) don't result in hangs. It's TBD if we want to go a step further and institute retries. The latter would entail introducing duplicate-suppression (conflation) in P2P messaging. We might also add exponential backoff (open-loop) or back-pressure (closed-loop) to prevent a flood of retries when the system is at or near the point of thrashing. But even without retries, a configurable timeout might have good ROI as a first step. This would entail: * adding a configuration parameter to specify the timeout value * changing ReplyProcessor21 and others TBD to "give up" after the timeout has elapsed * changing higher-level code dependent on request-reply messaging so it properly handles the situations where we might have to "give up" This issue affects all versions of Geode. > Request-Response Messaging Should Time Out > -- > > Key: GEODE-9764 > URL: https://issues.apache.org/jira/browse/GEODE-9764 > Project: Geode > Issue Type: Improvement > Components: messaging >Reporter: Bill Burcham >Assignee: Bill Burcham >Priority: Major > > There is a weakness in the P2P/DirectChannel messaging architecture, in that > it never gives up on a
[jira] [Commented] (GEODE-8644) SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() intermittently fails when queues drain too slowly
[ https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447573#comment-17447573 ] Xiaojian Zhou commented on GEODE-8644: -- The reproduced failures are all caused by locator disconnected: [locator] [info 2021/11/13 08:43:12.331 UTC tid=0x46] Failed to connect to localhost/127.0.0.1:0 [locator] [warn 2021/11/13 08:43:12.331 UTC tid=0x46] Locator discovery task for locator heavy-lifter-ca6688de-b95d-5db6-9ac5-57db242f6302.c.apachegeode-ci.internal[34223] could not exchange locator information with localhost[0] after 45 retry attempts. Retrying in 1 ms. > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > intermittently fails when queues drain too slowly > --- > > Key: GEODE-8644 > URL: https://issues.apache.org/jira/browse/GEODE-8644 > Project: Geode > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Benjamin P Ross >Assignee: Mark Hanson >Priority: Major > Labels: GeodeOperationAPI, needsTriage, pull-request-available > > Currently the test > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > relies on a 2 second delay to allow for queues to finish draining after > finishing the put operation. If queues take longer than 2 seconds to drain > the test will fail. We should change the test to wait for the queues to be > empty with a long timeout in case the queues never fully drain. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9770) CI Failure: ConflictingPersistentDataException in PersistentRecoveryOrderDUnitTest > testRecoverAfterConflict
[ https://issues.apache.org/jira/browse/GEODE-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447572#comment-17447572 ] Geode Integration commented on GEODE-9770: -- Seen in [distributed-test-openjdk8 #196|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/196] ... see [test results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-results/distributedTest/1637450065/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0682/test-artifacts/1637450065/distributedtestfiles-openjdk8-1.15.0-build.0682.tgz]. > CI Failure: ConflictingPersistentDataException in > PersistentRecoveryOrderDUnitTest > testRecoverAfterConflict > - > > Key: GEODE-9770 > URL: https://issues.apache.org/jira/browse/GEODE-9770 > Project: Geode > Issue Type: Bug > Components: persistence >Affects Versions: 1.15.0 >Reporter: Nabarun Nag >Assignee: Kirk Lund >Priority: Major > Labels: GeodeOperationAPI, needsTriage > > This ConflictingPersistentDataException has popped up multiple number of > times. > GEODE-6975 > GEODE-7898 > > {noformat} > PersistentRecoveryOrderDUnitTest > testRecoverAfterConflict FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest$$Lambda$477/1255368072.run > in VM 0 running on Host > heavy-lifter-7860ae84-3be2-5775-9a40-47a7abc4e64d.c.apachegeode-ci.internal > with 4 VMs > at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631) > at org.apache.geode.test.dunit.VM.invoke(VM.java:448) > at > org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest.testRecoverAfterConflict(PersistentRecoveryOrderDUnitTest.java:1328) > Caused by: > org.apache.geode.cache.CacheClosedException: Region > /PersistentRecoveryOrderDUnitTest_testRecoverAfterConflictRegion remote > member heavy-lifter-7860ae84-3be2-5775-9a40-47a7abc4e64d(585689):51002 > with persistent data > /10.0.0.50:/tmp/junit4736556655757609006/rootDir-testRecoverAfterConflict/vm-1 > created at timestamp 1635009815552 version 0 diskStoreId > bf4774f44f2e4dcd-aa6c79424132a2e4 name was not part of the same distributed > system as the local data from > /10.0.0.50:/tmp/junit4736556655757609006/rootDir-testRecoverAfterConflict/vm-0 > created at timestamp 1635009814986 version 0 diskStoreId > cc4c64d81e9d4119-9e7320b29f540199 name , caused by > org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region > /PersistentRecoveryOrderDUnitTest_testRecoverAfterConflictRegion remote > member heavy-lifter-7860ae84-3be2-5775-9a40-47a7abc4e64d(585689):51002 > with persistent data > /10.0.0.50:/tmp/junit4736556655757609006/rootDir-testRecoverAfterConflict/vm-1 > created at timestamp 1635009815552 version 0 diskStoreId > bf4774f44f2e4dcd-aa6c79424132a2e4 name was not part of the same distributed > system as the local data from > /10.0.0.50:/tmp/junit4736556655757609006/rootDir-testRecoverAfterConflict/vm-0 > created at timestamp 1635009814986 version 0 diskStoreId > cc4c64d81e9d4119-9e7320b29f540199 name > at > org.apache.geode.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:5223) > at > org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83) > at > org.apache.geode.internal.cache.GemFireCacheImpl.getInternalResourceManager(GemFireCacheImpl.java:4259) > at > org.apache.geode.internal.cache.GemFireCacheImpl.getInternalResourceManager(GemFireCacheImpl.java:4253) > at > org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1175) > at > org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1095) > at > org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3108) > at > org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3002) > at > org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2986) > at > org.apache.geode.cache.RegionFactory.create(RegionFactory.java:773) > at > org.apache.geode.internal.cache.InternalRegionFactory.create(InternalRegionFactory.java:75) > at > org.apache.geode.internal.cache.persistence.PersistentRecoveryOrderDUnitTest.createReplicateRegion(PersistentRecoveryOrderDUnitTest.java:1358) > at
[jira] [Updated] (GEODE-9819) Client socket leak in CacheClientNotifier.registerClientInternal when error conditions occur for the durable client
[ https://issues.apache.org/jira/browse/GEODE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darrel Schneider updated GEODE-9819: Affects Version/s: 1.13.4 1.12.5 > Client socket leak in CacheClientNotifier.registerClientInternal when error > conditions occur for the durable client > --- > > Key: GEODE-9819 > URL: https://issues.apache.org/jira/browse/GEODE-9819 > Project: Geode > Issue Type: Bug > Components: client/server, core >Affects Versions: 1.12.5, 1.13.4, 1.14.0, 1.15.0 >Reporter: Leon Finker >Priority: Critical > Labels: needsTriage > > In CacheClientNotifier.registerClientInternal client socket can be left half > open and not properly closed when error conditions occur. Such as the case of: > {code:java} > } else { > // The existing proxy is already running (which means that another > // client is already using this durable id. > unsuccessfulMsg = > String.format( > "The requested durable client has the same identifier ( %s ) as an > existing durable client ( %s ). Duplicate durable clients are not allowed.", > clientProxyMembershipID.getDurableId(), cacheClientProxy); > logger.warn(unsuccessfulMsg); > // Set the unsuccessful response byte. > responseByte = Handshake.REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT; > } {code} > It considers the current client connect attempt to have failed. It writes > this response back to client: REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This > will cause the client to throw ServerRefusedConnectionException. What seems > wrong about this method is that even though it sets "unsuccessfulMsg" and > correctly sends back a handshake saying the client is rejected, it does not > throw an exception and it does not close "socket". I think right before it > calls performPostAuthorization it should do the followiing: > {code:java} > if (unsuccessfulMsg != null) { > try { > socket.close(); > } catch (IOException ignore) { > } > } else { > performPostAuthorization(...) > }{code} > Full discussion details can be found at > https://markmail.org/thread/2gqmbq2m57pz7pxu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-1537) DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA
[ https://issues.apache.org/jira/browse/GEODE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447554#comment-17447554 ] ASF subversion and git services commented on GEODE-1537: Commit 5ec6a663b44f95d3a23dd7f040012f8b29d54701 in geode's branch refs/heads/develop from Jens Deppe [ https://gitbox.apache.org/repos/asf?p=geode.git;h=5ec6a66 ] GEODE-1537: Re-order ephemeral port acquisition to fix flaky DurableRegistrationDUnitTest (#7111) > DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA > > > Key: GEODE-1537 > URL: https://issues.apache.org/jira/browse/GEODE-1537 > Project: Geode > Issue Type: Bug > Components: client queues >Reporter: Jinmei Liao >Assignee: Jens Deppe >Priority: Major > Labels: CI, pull-request-available > Fix For: 1.15.0 > > > Geode_develop_DistributedTests/2883 > Error Message > com.gemstone.gemfire.test.dunit.RMIException: While invoking > com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest$$Lambda$373/449639279.run > in VM 1 running on Host timor.gemstone.com with 4 VMs > Stacktrace > com.gemstone.gemfire.test.dunit.RMIException: While invoking > com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest$$Lambda$373/449639279.run > in VM 1 running on Host timor.gemstone.com with 4 VMs > at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:389) > at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:355) > at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:293) > at > com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA(DurableRegistrationDUnitTest.java:421) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:112) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.GeneratedMethodAccessor426.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at
[jira] [Commented] (GEODE-9764) Request-Response Messaging Should Time Out
[ https://issues.apache.org/jira/browse/GEODE-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447553#comment-17447553 ] Anthony Baker commented on GEODE-9764: -- I would argue that for certain messages like replication of values, timeouts alone are insufficient. To maintain consistency, we have to replicate the change or revert it. I think that implies the need for timeouts as well failure detection improvements. > Request-Response Messaging Should Time Out > -- > > Key: GEODE-9764 > URL: https://issues.apache.org/jira/browse/GEODE-9764 > Project: Geode > Issue Type: Improvement > Components: messaging >Reporter: Bill Burcham >Assignee: Bill Burcham >Priority: Major > > There is a weakness in the P2P/DirectChannel messaging architecture, in that > it never gives up on a request (in a request-response scenario). As a result > a bug (software fault) anywhere from the point where the requesting thread > hands off the {{DistributionMessage}} e.g. to > {{ClusterDistributionManager.putOutgoing(DistributionMessage)}}, to the point > where that request is ultimately fulfilled on a (one) receiver, can result in > a hang (of some task on the send side, which is waiting for a response). > Well it's a little worse than that because any code in the return (response) > path can also cause disruption of the (response) flow, thereby leaving the > requesting task hanging. > If the code in the request path (primarily in P2P messaging) and the code in > the response path (P2P messaging and TBD higher-level code) were perfect this > might not be a problem. But there is a fair amount of code there and we have > some evidence that it is currently not perfect, nor do we expect it to become > perfect and stay that way. That being the case it seems prudent to institute > response timeouts so that bugs of this sort (which disrupt request-response > message flow) don't result in hangs. > It's TBD if we want to go a step further and institute retries. The latter > would entail introducing duplicate-suppression (conflation) in P2P messaging. > We might also add exponential backoff (open-loop) or back-pressure > (closed-loop) to prevent a flood of retries when the system is at or near the > point of thrashing. > But even without retries, a configurable timeout might have good ROI as a > first step. This would entail: > * adding a configuration parameter to specify the timeout value > * changing ReplyProcessor21 and others TBD to "give up" after the timeout has > elapsed > * changing higher-level code dependent on request-reply messaging so it > properly handles the situations where we might have to "give up" > This issue affects all versions of Geode. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (GEODE-1537) DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA
[ https://issues.apache.org/jira/browse/GEODE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Deppe resolved GEODE-1537. --- Fix Version/s: 1.15.0 Resolution: Fixed > DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA > > > Key: GEODE-1537 > URL: https://issues.apache.org/jira/browse/GEODE-1537 > Project: Geode > Issue Type: Bug > Components: client queues >Reporter: Jinmei Liao >Assignee: Jens Deppe >Priority: Major > Labels: CI, pull-request-available > Fix For: 1.15.0 > > > Geode_develop_DistributedTests/2883 > Error Message > com.gemstone.gemfire.test.dunit.RMIException: While invoking > com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest$$Lambda$373/449639279.run > in VM 1 running on Host timor.gemstone.com with 4 VMs > Stacktrace > com.gemstone.gemfire.test.dunit.RMIException: While invoking > com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest$$Lambda$373/449639279.run > in VM 1 running on Host timor.gemstone.com with 4 VMs > at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:389) > at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:355) > at com.gemstone.gemfire.test.dunit.VM.invoke(VM.java:293) > at > com.gemstone.gemfire.internal.cache.tier.sockets.DurableRegistrationDUnitTest.testDurableClientWithRegistrationHA(DurableRegistrationDUnitTest.java:421) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:112) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.GeneratedMethodAccessor426.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109) > at sun.reflect.GeneratedMethodAccessor425.invoke(Unknown Source) > at >
[jira] [Updated] (GEODE-9819) Client socket leak in CacheClientNotifier.registerClientInternal when error conditions occur for the durable client
[ https://issues.apache.org/jira/browse/GEODE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darrel Schneider updated GEODE-9819: Labels: needsTriage (was: ) > Client socket leak in CacheClientNotifier.registerClientInternal when error > conditions occur for the durable client > --- > > Key: GEODE-9819 > URL: https://issues.apache.org/jira/browse/GEODE-9819 > Project: Geode > Issue Type: Bug > Components: client/server, core >Affects Versions: 1.14.0, 1.15.0 >Reporter: Leon Finker >Priority: Critical > Labels: needsTriage > > In CacheClientNotifier.registerClientInternal client socket can be left half > open and not properly closed when error conditions occur. Such as the case of: > {code:java} > } else { > // The existing proxy is already running (which means that another > // client is already using this durable id. > unsuccessfulMsg = > String.format( > "The requested durable client has the same identifier ( %s ) as an > existing durable client ( %s ). Duplicate durable clients are not allowed.", > clientProxyMembershipID.getDurableId(), cacheClientProxy); > logger.warn(unsuccessfulMsg); > // Set the unsuccessful response byte. > responseByte = Handshake.REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT; > } {code} > It considers the current client connect attempt to have failed. It writes > this response back to client: REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This > will cause the client to throw ServerRefusedConnectionException. What seems > wrong about this method is that even though it sets "unsuccessfulMsg" and > correctly sends back a handshake saying the client is rejected, it does not > throw an exception and it does not close "socket". I think right before it > calls performPostAuthorization it should do the followiing: > {code:java} > if (unsuccessfulMsg != null) { > try { > socket.close(); > } catch (IOException ignore) { > } > } else { > performPostAuthorization(...) > }{code} > Full discussion details can be found at > https://markmail.org/thread/2gqmbq2m57pz7pxu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9819) Client socket leak in CacheClientNotifier.registerClientInternal when error conditions occur for the durable client
[ https://issues.apache.org/jira/browse/GEODE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darrel Schneider updated GEODE-9819: Affects Version/s: 1.15.0 > Client socket leak in CacheClientNotifier.registerClientInternal when error > conditions occur for the durable client > --- > > Key: GEODE-9819 > URL: https://issues.apache.org/jira/browse/GEODE-9819 > Project: Geode > Issue Type: Bug > Components: client/server, core >Affects Versions: 1.14.0, 1.15.0 >Reporter: Leon Finker >Priority: Critical > > In CacheClientNotifier.registerClientInternal client socket can be left half > open and not properly closed when error conditions occur. Such as the case of: > {code:java} > } else { > // The existing proxy is already running (which means that another > // client is already using this durable id. > unsuccessfulMsg = > String.format( > "The requested durable client has the same identifier ( %s ) as an > existing durable client ( %s ). Duplicate durable clients are not allowed.", > clientProxyMembershipID.getDurableId(), cacheClientProxy); > logger.warn(unsuccessfulMsg); > // Set the unsuccessful response byte. > responseByte = Handshake.REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT; > } {code} > It considers the current client connect attempt to have failed. It writes > this response back to client: REPLY_EXCEPTION_DUPLICATE_DURABLE_CLIENT. This > will cause the client to throw ServerRefusedConnectionException. What seems > wrong about this method is that even though it sets "unsuccessfulMsg" and > correctly sends back a handshake saying the client is rejected, it does not > throw an exception and it does not close "socket". I think right before it > calls performPostAuthorization it should do the followiing: > {code:java} > if (unsuccessfulMsg != null) { > try { > socket.close(); > } catch (IOException ignore) { > } > } else { > performPostAuthorization(...) > }{code} > Full discussion details can be found at > https://markmail.org/thread/2gqmbq2m57pz7pxu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (GEODE-9820) stopCQ does not trigger re-authentication
[ https://issues.apache.org/jira/browse/GEODE-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinmei Liao resolved GEODE-9820. Fix Version/s: 1.15.0 Resolution: Fixed > stopCQ does not trigger re-authentication > - > > Key: GEODE-9820 > URL: https://issues.apache.org/jira/browse/GEODE-9820 > Project: Geode > Issue Type: Sub-task > Components: cq >Affects Versions: 1.14.0 >Reporter: Jinmei Liao >Assignee: Jinmei Liao >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > Fix For: 1.15.0 > > > after credential expires, when user execute `stopCQ` operation, > re-authentication does not get triggered. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (GEODE-9820) stopCQ does not trigger re-authentication
[ https://issues.apache.org/jira/browse/GEODE-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447549#comment-17447549 ] ASF subversion and git services commented on GEODE-9820: Commit f61e32f566761e61cc12cd9b32e2ceaa05ccdc72 in geode's branch refs/heads/develop from Jinmei Liao [ https://gitbox.apache.org/repos/asf?p=geode.git;h=f61e32f ] GEODE-9820: stopCQ should handle general exception same way as ExecuteCQ61 (#7122) > stopCQ does not trigger re-authentication > - > > Key: GEODE-9820 > URL: https://issues.apache.org/jira/browse/GEODE-9820 > Project: Geode > Issue Type: Sub-task > Components: cq >Affects Versions: 1.14.0 >Reporter: Jinmei Liao >Assignee: Jinmei Liao >Priority: Major > Labels: GeodeOperationAPI, pull-request-available > > after credential expires, when user execute `stopCQ` operation, > re-authentication does not get triggered. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (GEODE-9838) Log key info for deserilation issue while index update
[ https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaojian Zhou reassigned GEODE-9838: Assignee: Xiaojian Zhou > Log key info for deserilation issue while index update > --- > > Key: GEODE-9838 > URL: https://issues.apache.org/jira/browse/GEODE-9838 > Project: Geode > Issue Type: Improvement > Components: querying >Affects Versions: 1.15.0 >Reporter: Anilkumar Gingade >Assignee: Xiaojian Zhou >Priority: Major > Labels: GeodeOperationAPI > > When there is issue in Index update (maintenance); the index is marked as > invalid. And warning is logged: > [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier > failed. The index is corrupted and marked as invalid. > org.apache.geode.cache.query.internal.index.IMQException > Adding "key" information in the log helps diagnosing the failure and adding > or removing the entry in question. > Code path IndexManager.java: > void addIndexMapping(RegionEntry entry, IndexProtocol index) { > try { > index.addIndexMapping(entry); > } catch (Exception exception) { > index.markValid(false); > setPRIndexAsInvalid((AbstractIndex) index); > logger.warn(String.format( > "Updating the Index %s failed. The index is corrupted and marked as > invalid.", > ((AbstractIndex) index).indexName), exception); > } > } > void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) { > try { > index.removeIndexMapping(entry, opCode); > } catch (Exception exception) { > index.markValid(false); > setPRIndexAsInvalid((AbstractIndex) index); > logger.warn(String.format( > "Updating the Index %s failed. The index is corrupted and marked as > invalid.", > ((AbstractIndex) index).indexName), exception); > } > } -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9840) Improve Redundancy Level Log Message
Wayne created GEODE-9840: Summary: Improve Redundancy Level Log Message Key: GEODE-9840 URL: https://issues.apache.org/jira/browse/GEODE-9840 Project: Geode Issue Type: New Feature Components: redis Reporter: Wayne The current log message Configured redundancy of 2 copies has been restored to /GEODE_FOR_REDIS is confusing and not intuitive. This message should be changed to the following: "Configured redundancy of 2 copies has been restored to /region (1 primary and 1 secondary copies)"? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9839) Ability to Set Log Level for Specific Package
[ https://issues.apache.org/jira/browse/GEODE-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wayne updated GEODE-9839: - Summary: Ability to Set Log Level for Specific Package (was: Ability to Set Log Live for Specific Package) > Ability to Set Log Level for Specific Package > - > > Key: GEODE-9839 > URL: https://issues.apache.org/jira/browse/GEODE-9839 > Project: Geode > Issue Type: New Feature > Components: redis >Affects Versions: 1.15.0 >Reporter: Wayne >Priority: Major > > As a user of Geode, I would like the ability to use the alter runtime > --log-level command to set the logging level for a specific package. This > would allow me to turn on debug level logging for just the redis packages, > org.apache.geode.redis. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9839) Ability to Set Log Live for Specific Package
Wayne created GEODE-9839: Summary: Ability to Set Log Live for Specific Package Key: GEODE-9839 URL: https://issues.apache.org/jira/browse/GEODE-9839 Project: Geode Issue Type: New Feature Components: redis Affects Versions: 1.15.0 Reporter: Wayne As a user of Geode, I would like the ability to use the alter runtime --log-level command to set the logging level for a specific package. This would allow me to turn on debug level logging for just the redis packages, org.apache.geode.redis. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (GEODE-9825) Disparate socket-buffer-size Results in "IOException: Unknown header byte" and Hangs
[ https://issues.apache.org/jira/browse/GEODE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Burcham reassigned GEODE-9825: --- Assignee: Bill Burcham > Disparate socket-buffer-size Results in "IOException: Unknown header byte" > and Hangs > > > Key: GEODE-9825 > URL: https://issues.apache.org/jira/browse/GEODE-9825 > Project: Geode > Issue Type: Bug > Components: messaging >Affects Versions: 1.12.4, 1.15.0 >Reporter: Bill Burcham >Assignee: Bill Burcham >Priority: Major > Labels: pull-request-available > > GEODE-9141 introduced a bug that causes {{IOException: "Unknown header > byte..."}} and hangs if members are configured with different > {{socket-buffer-size}} settings. > h2. Reproduction > To reproduce this bug turn off TLS and set socket-buffer-size on sender to be > 64KB and set socket-buffer-size on receiver to be 32KB. See associated PR for > an example. > h2. Analysis > In {{{}Connection.processInputBuffer(){}}}. When that method has read all the > messages it can from the current input buffer, it then considers whether the > buffer needs expansion. If it does then: > {code:java} > inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); {code} > Is executed and the method returns. The caller then expects to be able to > _write_ bytes into {{{}inputBuffer{}}}. > The problem, it seems, is that > {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} does not leave > the the {{ByteBuffer}} in the proper state. It leaves the buffer ready to be > _read_ not written. > Before the changes for GEODE-9141 were introduced, the line of code > referenced above used to be this snippet in > {{Connection.compactOrResizeBuffer(int messageLength)}} (that method has > since been removed): > {code:java} > // need a bigger buffer > logger.info("Allocating larger network read buffer, new size is {} old > size was {}.", > allocSize, oldBufferSize); > ByteBuffer oldBuffer = inputBuffer; > inputBuffer = getBufferPool().acquireDirectReceiveBuffer(allocSize); > if (oldBuffer != null) { > int oldByteCount = oldBuffer.remaining(); > inputBuffer.put(oldBuffer); > inputBuffer.position(oldByteCount); > getBufferPool().releaseReceiveBuffer(oldBuffer); > } {code} > Notice how this method leaves {{inputBuffer}} ready to be _written_ to. > But the code inside > {{ByteBufferSharingInternalImpl.expandReadBufferIfNeeded()}} is doing > something like: > {code:java} > newBuffer.clear(); > newBuffer.put(existing); > newBuffer.flip(); > releaseBuffer(type, existing); > return newBuffer; {code} > A solution (shown in the associated PR) is to do add logic after the call to > {{expandReadBufferIfNeeded(allocSize)}} to leave the buffer in a _writeable_ > state: > {code:java} > inputBuffer = inputSharing.expandReadBufferIfNeeded(allocSize); > // we're returning to the caller (done == true) so make buffer writeable > inputBuffer.position(inputBuffer.limit()); > inputBuffer.limit(inputBuffer.capacity()); {code} > h2. Resolution > When this ticket is complete the bug will be fixed and > {{P2PMessagingConcurrencyDUnitTest}} will be enhanced to test at least these > combinations: > [security, sender/locator socket-buffer-size, receiver socket-buffer-size] > [TLS, (default), (default)] this is what the test currently does > [no TLS, 64 * 1024, 32 * 1024] *new: this illustrates this bug* > [no TLS, (default), (default)] *new* > We might want to mix in conserve-sockets true/false in there too while we're > at it (the test currently holds it at true). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9838) Log key info for deserilation issue while index update
[ https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anilkumar Gingade updated GEODE-9838: - Description: When there is issue in Index update (maintenance); the index is marked as invalid. And warning is logged: [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 tid=0x124ecf] Updating the Index patientMemberIdentifier failed. The index is corrupted and marked as invalid. org.apache.geode.cache.query.internal.index.IMQException Adding "key" information in the log helps diagnosing the failure and adding or removing the entry in question. Code path IndexManager.java: void addIndexMapping(RegionEntry entry, IndexProtocol index) { try { index.addIndexMapping(entry); } catch (Exception exception) { index.markValid(false); setPRIndexAsInvalid((AbstractIndex) index); logger.warn(String.format( "Updating the Index %s failed. The index is corrupted and marked as invalid.", ((AbstractIndex) index).indexName), exception); } } void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) { try { index.removeIndexMapping(entry, opCode); } catch (Exception exception) { index.markValid(false); setPRIndexAsInvalid((AbstractIndex) index); logger.warn(String.format( "Updating the Index %s failed. The index is corrupted and marked as invalid.", ((AbstractIndex) index).indexName), exception); } } was: When there is issue in Index update (maintenance); the index is marked as invalid. And warning is logged: [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 tid=0x124ecf] Updating the Index patientMemberIdentifier failed. The index is corrupted and marked as invalid. org.apache.geode.cache.query.internal.index.IMQException Adding "key" information in the log helps diagnosing the failure and adding or removing the entry in question. > Log key info for deserilation issue while index update > --- > > Key: GEODE-9838 > URL: https://issues.apache.org/jira/browse/GEODE-9838 > Project: Geode > Issue Type: Improvement > Components: querying >Affects Versions: 1.15.0 >Reporter: Anilkumar Gingade >Priority: Major > Labels: GeodeOperationAPI > > When there is issue in Index update (maintenance); the index is marked as > invalid. And warning is logged: > [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier > failed. The index is corrupted and marked as invalid. > org.apache.geode.cache.query.internal.index.IMQException > Adding "key" information in the log helps diagnosing the failure and adding > or removing the entry in question. > Code path IndexManager.java: > void addIndexMapping(RegionEntry entry, IndexProtocol index) { > try { > index.addIndexMapping(entry); > } catch (Exception exception) { > index.markValid(false); > setPRIndexAsInvalid((AbstractIndex) index); > logger.warn(String.format( > "Updating the Index %s failed. The index is corrupted and marked as > invalid.", > ((AbstractIndex) index).indexName), exception); > } > } > void removeIndexMapping(RegionEntry entry, IndexProtocol index, int opCode) { > try { > index.removeIndexMapping(entry, opCode); > } catch (Exception exception) { > index.markValid(false); > setPRIndexAsInvalid((AbstractIndex) index); > logger.warn(String.format( > "Updating the Index %s failed. The index is corrupted and marked as > invalid.", > ((AbstractIndex) index).indexName), exception); > } > } -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9838) Log key info for deserilation issue while index update
[ https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anilkumar Gingade updated GEODE-9838: - Labels: GeodeOperationAPI (was: ) > Log key info for deserilation issue while index update > --- > > Key: GEODE-9838 > URL: https://issues.apache.org/jira/browse/GEODE-9838 > Project: Geode > Issue Type: Improvement > Components: querying >Affects Versions: 1.15.0 >Reporter: Anilkumar Gingade >Priority: Major > Labels: GeodeOperationAPI > > When there is issue in Index update (maintenance); the index is marked as > invalid. And warning is logged: > [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier > failed. The index is corrupted and marked as invalid. > org.apache.geode.cache.query.internal.index.IMQException > Adding "key" information in the log helps diagnosing the failure and adding > or removing the entry in question. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (GEODE-9838) Log key info for deserilation issue while index update
Anilkumar Gingade created GEODE-9838: Summary: Log key info for deserilation issue while index update Key: GEODE-9838 URL: https://issues.apache.org/jira/browse/GEODE-9838 Project: Geode Issue Type: Improvement Components: querying Reporter: Anilkumar Gingade When there is issue in Index update (maintenance); the index is marked as invalid. And warning is logged: [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 tid=0x124ecf] Updating the Index patientMemberIdentifier failed. The index is corrupted and marked as invalid. org.apache.geode.cache.query.internal.index.IMQException Adding "key" information in the log helps diagnosing the failure and adding or removing the entry in question. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (GEODE-9838) Log key info for deserilation issue while index update
[ https://issues.apache.org/jira/browse/GEODE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anilkumar Gingade updated GEODE-9838: - Affects Version/s: 1.15.0 > Log key info for deserilation issue while index update > --- > > Key: GEODE-9838 > URL: https://issues.apache.org/jira/browse/GEODE-9838 > Project: Geode > Issue Type: Improvement > Components: querying >Affects Versions: 1.15.0 >Reporter: Anilkumar Gingade >Priority: Major > > When there is issue in Index update (maintenance); the index is marked as > invalid. And warning is logged: > [warn 2021/11/11 07:39:28.215 CST pazrslsrv004 Processor 963> tid=0x124ecf] Updating the Index patientMemberIdentifier > failed. The index is corrupted and marked as invalid. > org.apache.geode.cache.query.internal.index.IMQException > Adding "key" information in the log helps diagnosing the failure and adding > or removing the entry in question. -- This message was sent by Atlassian Jira (v8.20.1#820001)