Eric Shu created GEODE-9310: ------------------------------- Summary: Geode redis should send error message when encounter DistributedSystemDisconnectedException, ForcedDisconnectException or FunctionInvocationTargetException Key: GEODE-9310 URL: https://issues.apache.org/jira/browse/GEODE-9310 Project: Geode Issue Type: Bug Components: redis Reporter: Eric Shu
Currently when redis function execution encounter these exception, it will just close the client connection. {code:java} else if (cause instanceof FunctionInvocationTargetException || cause instanceof DistributedSystemDisconnectedException || cause instanceof ForcedDisconnectException) { // This indicates a member departed or got disconnected logger.warn( "Closing client connection because one of the servers doing this operation departed."); channelInactive(ctx); response = null; } {code} This is incorrect, as this could lead to client performing retries of the same redis command. The correct way is to send an error response that command may or may not have been executed on servers, user need to verify and retry if necessary. Here is a test result showing that current behavior of client retrying: The node executing the ZREM command: (vm1 gets the connection from JedisCluster client) [vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 (primary is on vm3 and it is execute the function) [vm3] [warn 2021/05/25 15:31:26.692 PDT server-3 <Function Execution Processor3> tid=0xf3] bucket region is BucketRegion[path='/__PR/_B__REDIS__DATA_123;serial=399;primary=true] key key [vm3] java.lang.Exception [vm3] at org.apache.geode.internal.cache.BucketRegion.virtualPut(BucketRegion.java:533) [vm3] at org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5586) [vm3] at org.apache.geode.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1213) [vm3] at org.apache.geode.internal.cache.PartitionedRegion.putInBucket(PartitionedRegion.java:3024) [vm3] at org.apache.geode.internal.cache.PartitionedRegion.virtualPut(PartitionedRegion.java:2236) [vm3] at org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5586) [vm3] at org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:157) [vm3] at org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5044) [vm3] at org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1645) [vm3] at org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1632) [vm3] at org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:445) [vm3] at org.apache.geode.redis.internal.data.AbstractRedisData.storeChanges(AbstractRedisData.java:218) [vm3] at org.apache.geode.redis.internal.data.RedisSortedSet.zrem(RedisSortedSet.java:239) [vm3] at org.apache.geode.redis.internal.data.RedisSortedSetCommandsFunctionExecutor.lambda$zrem$2(RedisSortedSetCommandsFunctionExecutor.java:50) [vm3] at org.apache.geode.redis.internal.executor.SynchronizedStripedExecutor.execute(SynchronizedStripedExecutor.java:47) [vm3] at org.apache.geode.redis.internal.data.RedisDataCommandsFunctionExecutor.stripedExecute(RedisDataCommandsFunctionExecutor.java:43) [vm3] at org.apache.geode.redis.internal.data.RedisSortedSetCommandsFunctionExecutor.zrem(RedisSortedSetCommandsFunctionExecutor.java:49) [vm3] at org.apache.geode.redis.internal.executor.CommandFunction.compute(CommandFunction.java:289) [vm3] at org.apache.geode.redis.internal.executor.SingleResultRedisFunction.lambda$execute$0(SingleResultRedisFunction.java:47) [vm3] at org.apache.geode.internal.cache.PartitionedRegion.computeWithPrimaryLocked(PartitionedRegion.java:629) [vm3] at org.apache.geode.redis.internal.executor.SingleResultRedisFunction.execute(SingleResultRedisFunction.java:51) [vm3] at org.apache.geode.internal.cache.PartitionedRegionDataStore.executeOnDataStore(PartitionedRegionDataStore.java:2988) [vm3] at org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage.operateOnPartitionedRegion(PartitionedRegionFunctionStreamingMessage.java:101) [vm3] at org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:333) [vm3] at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376) [vm3] at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441) The operation has been distributed to the bucket with redundant copy. [vm1] [warn 2021/05/25 15:31:26.694 PDT server-1 <P2P message reader for 192.168.0.14(server-3:79774)<v3>:41003 shared ordered sender uid=19 local port=54410 remote port=51127> tid=0xf8] bucket region is BucketRegion[path='/__PR/_B__REDIS__DATA_123;serial=399;primary=false] key key [vm1] java.lang.Exception [vm1] at org.apache.geode.internal.cache.BucketRegion.virtualPut(BucketRegion.java:533) [vm1] at org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:171) [vm1] at org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5579) [vm1] at org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:156) [vm1] at org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:307) [vm1] at org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:278) [vm1] at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1213) [vm1] at org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1115) [vm1] at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376) [vm1] at org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:432) [vm1] at org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2066) [vm1] at org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:1831) [vm1] at org.apache.geode.distributed.internal.membership.gms.GMSMembership.dispatchMessage(GMSMembership.java:847) [vm1] at org.apache.geode.distributed.internal.membership.gms.GMSMembership.handleOrDeferMessage(GMSMembership.java:804) [vm1] at org.apache.geode.distributed.internal.membership.gms.GMSMembership.processMessage(GMSMembership.java:1098) [vm1] at org.apache.geode.distributed.internal.DistributionImpl$MyDCReceiver.messageReceived(DistributionImpl.java:828) [vm1] at org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:622) [vm1] at org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:682) [vm1] at org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3271) [vm1] at org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3022) [vm1] at org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2809) [vm1] at org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1647) [vm1] at org.apache.geode.internal.tcp.Connection.run(Connection.java:1487) [vm1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [vm1] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [vm1] at java.lang.Thread.run(Thread.java:748) [vm1] [warn 2021/05/25 15:31:26.714 PDT server-1 <P2P message reader for 192.168.0.14(server-3:79774)<v3>:41003 shared ordered sender uid=19 local port=54410 remote port=51127> tid=0xf8] membersRemoveAll invoked [vm1] java.lang.Exception [vm1] at org.apache.geode.redis.internal.data.RedisSortedSet.membersRemoveAll(RedisSortedSet.java:175) [vm1] at org.apache.geode.redis.internal.data.RedisSortedSet.applyDelta(RedisSortedSet.java:89) [vm1] at org.apache.geode.redis.internal.data.AbstractRedisData.fromDelta(AbstractRedisData.java:193) [vm1] at org.apache.geode.internal.cache.EntryEventImpl.processDeltaBytes(EntryEventImpl.java:1841) [vm1] at org.apache.geode.internal.cache.EntryEventImpl.setNewValueInRegion(EntryEventImpl.java:1696) [vm1] at org.apache.geode.internal.cache.EntryEventImpl.putExistingEntry(EntryEventImpl.java:1643) [vm1] at org.apache.geode.internal.cache.map.RegionMapPut.updateEntry(RegionMapPut.java:485) (vm3 is shutdown) [vm3] [info 2021/05/25 15:31:26.704 PDT server-3 <RMI TCP Connection(1)-192.168.0.14> tid=0x14] GemFireCache[id = 2015401929; isClosing = true; isShutDownAll = false; created = Tue May 25 15:31:24 PDT 2021; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing. [vm3] [info 2021/05/25 15:31:26.706 PDT server-3 <RMI TCP Connection(1)-192.168.0.14> tid=0x14] GeodeRedisServer shutting down vm1 reports command failed (even though it succeeded and was distributed to replicas) [vm1] [warn 2021/05/25 15:31:26.728 PDT server-1 <GeodeRedisServer-Command-1> tid=0xfb] Execution of Redis command ZREM key member1-212 member1-211 member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 failed: org.apache.geode.cache.execute.FunctionException: org.apache.geode.cache.execute.FunctionInvocationTargetException: The cache is closed. [vm1] [warn 2021/05/25 15:31:26.729 PDT server-1 <GeodeRedisServer-Command-1> tid=0xfb] Closing client connection because one of the servers doing this operation departed. This leads to JedisCluster client to retry with the same command: [vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 [vm1] [warn 2021/05/25 15:31:26.729 PDT server-1 <GeodeRedisServer-Command-1> tid=0xfb] Closing client connection because one of the servers doing this operation departed. [vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 ... [vm1] [warn 2021/05/25 15:31:26.737 PDT server-1 <GeodeRedisServer-Command-1> tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 ... [vm1] [warn 2021/05/25 15:31:26.739 PDT server-1 <GeodeRedisServer-Command-1> tid=0xfb] Execution of Redis command ZREM key member1-212 member1-211 member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 failed: org.apache.geode.cache.execute.FunctionException: org.apache.geode.cache.execute.FunctionInvocationTargetException: Remote cache is closed: 192.168.0.14(server-3:79774)<v3>:41003 ... Currently the retries aborted due to the following issue by JedisCluster client (which is tracked by redis issue of not back off and retry on different node: https://github.com/redis/jedis/issues/2347) java.util.concurrent.ExecutionException: redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more cluster attempts left. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.zRemCanRemoveMembersFromSortedSetDuringPrimaryIsCrashed(ZRemDUnitTest.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:54) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) Caused by: redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more cluster attempts left. at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:84) at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122) at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:25) at redis.clients.jedis.JedisCluster.zrem(JedisCluster.java:984) at org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.doZRem(ZRemDUnitTest.java:157) at org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.lambda$zRemCanRemoveMembersFromSortedSetDuringPrimaryIsCrashed$0(ZRemDUnitTest.java:174) at org.apache.geode.test.junit.rules.ExecutorServiceRule.lambda$submit$2(ExecutorServiceRule.java:232) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Once this is solved by JedisCluter client, the retry may cause wrong number being reported by the retried command. It is possible that this behavior may impact on APPEND and INCRBY command -- which would produce wrong results of the retried command (subject to additional testing). -- This message was sent by Atlassian Jira (v8.3.4#803005)