Eric Shu created GEODE-9310:
-------------------------------

             Summary: Geode redis should send error message when encounter 
DistributedSystemDisconnectedException, ForcedDisconnectException or 
FunctionInvocationTargetException
                 Key: GEODE-9310
                 URL: https://issues.apache.org/jira/browse/GEODE-9310
             Project: Geode
          Issue Type: Bug
          Components: redis
            Reporter: Eric Shu


Currently when redis function execution encounter these exception, it will just 
close the client connection. 

{code:java}
else if (cause instanceof FunctionInvocationTargetException
        || cause instanceof DistributedSystemDisconnectedException
        || cause instanceof ForcedDisconnectException) {
      // This indicates a member departed or got disconnected
      logger.warn(
          "Closing client connection because one of the servers doing this 
operation departed.");
      channelInactive(ctx);
      response = null;
    }
{code}

This is incorrect, as this could lead to client performing retries of the same 
redis command.
The correct way is to send an error response that command may or may not have 
been executed on servers, user need to verify and retry if necessary.

Here is a test result showing that current behavior of client retrying:
The node executing the ZREM command:
(vm1 gets the connection from JedisCluster client)
[vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> 
tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 member1-214 
member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 

(primary is on vm3 and it is execute the function)
[vm3] [warn 2021/05/25 15:31:26.692 PDT server-3 <Function Execution 
Processor3> tid=0xf3] bucket region is 
BucketRegion[path='/__PR/_B__REDIS__DATA_123;serial=399;primary=true] key key
[vm3] java.lang.Exception
[vm3]   at 
org.apache.geode.internal.cache.BucketRegion.virtualPut(BucketRegion.java:533)
[vm3]   at 
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5586)
[vm3]   at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1213)
[vm3]   at 
org.apache.geode.internal.cache.PartitionedRegion.putInBucket(PartitionedRegion.java:3024)
[vm3]   at 
org.apache.geode.internal.cache.PartitionedRegion.virtualPut(PartitionedRegion.java:2236)
[vm3]   at 
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5586)
[vm3]   at 
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:157)
[vm3]   at 
org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5044)
[vm3]   at 
org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1645)
[vm3]   at 
org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1632)
[vm3]   at 
org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:445)
[vm3]   at 
org.apache.geode.redis.internal.data.AbstractRedisData.storeChanges(AbstractRedisData.java:218)
[vm3]   at 
org.apache.geode.redis.internal.data.RedisSortedSet.zrem(RedisSortedSet.java:239)
[vm3]   at 
org.apache.geode.redis.internal.data.RedisSortedSetCommandsFunctionExecutor.lambda$zrem$2(RedisSortedSetCommandsFunctionExecutor.java:50)
[vm3]   at 
org.apache.geode.redis.internal.executor.SynchronizedStripedExecutor.execute(SynchronizedStripedExecutor.java:47)
[vm3]   at 
org.apache.geode.redis.internal.data.RedisDataCommandsFunctionExecutor.stripedExecute(RedisDataCommandsFunctionExecutor.java:43)
[vm3]   at 
org.apache.geode.redis.internal.data.RedisSortedSetCommandsFunctionExecutor.zrem(RedisSortedSetCommandsFunctionExecutor.java:49)
[vm3]   at 
org.apache.geode.redis.internal.executor.CommandFunction.compute(CommandFunction.java:289)
[vm3]   at 
org.apache.geode.redis.internal.executor.SingleResultRedisFunction.lambda$execute$0(SingleResultRedisFunction.java:47)
[vm3]   at 
org.apache.geode.internal.cache.PartitionedRegion.computeWithPrimaryLocked(PartitionedRegion.java:629)
[vm3]   at 
org.apache.geode.redis.internal.executor.SingleResultRedisFunction.execute(SingleResultRedisFunction.java:51)
[vm3]   at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.executeOnDataStore(PartitionedRegionDataStore.java:2988)
[vm3]   at 
org.apache.geode.internal.cache.partitioned.PartitionedRegionFunctionStreamingMessage.operateOnPartitionedRegion(PartitionedRegionFunctionStreamingMessage.java:101)
[vm3]   at 
org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:333)
[vm3]   at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376)
[vm3]   at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441)

The operation has been distributed to the bucket with redundant copy.
[vm1] [warn 2021/05/25 15:31:26.694 PDT server-1 <P2P message reader for 
192.168.0.14(server-3:79774)<v3>:41003 shared ordered sender uid=19 local 
port=54410 remote port=51127> tid=0xf8] bucket region is 
BucketRegion[path='/__PR/_B__REDIS__DATA_123;serial=399;primary=false] key key
[vm1] java.lang.Exception
[vm1]   at 
org.apache.geode.internal.cache.BucketRegion.virtualPut(BucketRegion.java:533)
[vm1]   at 
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:171)
[vm1]   at 
org.apache.geode.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5579)
[vm1]   at 
org.apache.geode.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:156)
[vm1]   at 
org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:307)
[vm1]   at 
org.apache.geode.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:278)
[vm1]   at 
org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1213)
[vm1]   at 
org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1115)
[vm1]   at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376)
[vm1]   at 
org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:432)
[vm1]   at 
org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2066)
[vm1]   at 
org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:1831)
[vm1]   at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership.dispatchMessage(GMSMembership.java:847)
[vm1]   at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership.handleOrDeferMessage(GMSMembership.java:804)
[vm1]   at 
org.apache.geode.distributed.internal.membership.gms.GMSMembership.processMessage(GMSMembership.java:1098)
[vm1]   at 
org.apache.geode.distributed.internal.DistributionImpl$MyDCReceiver.messageReceived(DistributionImpl.java:828)
[vm1]   at 
org.apache.geode.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:622)
[vm1]   at 
org.apache.geode.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:682)
[vm1]   at 
org.apache.geode.internal.tcp.Connection.dispatchMessage(Connection.java:3271)
[vm1]   at 
org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:3022)
[vm1]   at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2809)
[vm1]   at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1647)
[vm1]   at org.apache.geode.internal.tcp.Connection.run(Connection.java:1487)
[vm1]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[vm1]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[vm1]   at java.lang.Thread.run(Thread.java:748)

[vm1] [warn 2021/05/25 15:31:26.714 PDT server-1 <P2P message reader for 
192.168.0.14(server-3:79774)<v3>:41003 shared ordered sender uid=19 local 
port=54410 remote port=51127> tid=0xf8] membersRemoveAll invoked
[vm1] java.lang.Exception
[vm1]   at 
org.apache.geode.redis.internal.data.RedisSortedSet.membersRemoveAll(RedisSortedSet.java:175)
[vm1]   at 
org.apache.geode.redis.internal.data.RedisSortedSet.applyDelta(RedisSortedSet.java:89)
[vm1]   at 
org.apache.geode.redis.internal.data.AbstractRedisData.fromDelta(AbstractRedisData.java:193)
[vm1]   at 
org.apache.geode.internal.cache.EntryEventImpl.processDeltaBytes(EntryEventImpl.java:1841)
[vm1]   at 
org.apache.geode.internal.cache.EntryEventImpl.setNewValueInRegion(EntryEventImpl.java:1696)
[vm1]   at 
org.apache.geode.internal.cache.EntryEventImpl.putExistingEntry(EntryEventImpl.java:1643)
[vm1]   at 
org.apache.geode.internal.cache.map.RegionMapPut.updateEntry(RegionMapPut.java:485)

(vm3 is shutdown)
[vm3] [info 2021/05/25 15:31:26.704 PDT server-3 <RMI TCP 
Connection(1)-192.168.0.14> tid=0x14] GemFireCache[id = 2015401929; isClosing = 
true; isShutDownAll = false; created = Tue May 25 15:31:24 PDT 2021; server = 
false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.

[vm3] [info 2021/05/25 15:31:26.706 PDT server-3 <RMI TCP 
Connection(1)-192.168.0.14> tid=0x14] GeodeRedisServer shutting down

vm1 reports command failed (even though it succeeded and was distributed to 
replicas)
[vm1] [warn 2021/05/25 15:31:26.728 PDT server-1 <GeodeRedisServer-Command-1> 
tid=0xfb] Execution of Redis command ZREM key member1-212 member1-211 
member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 
member1-204 
failed: org.apache.geode.cache.execute.FunctionException: 
org.apache.geode.cache.execute.FunctionInvocationTargetException: The cache is 
closed.

[vm1] [warn 2021/05/25 15:31:26.729 PDT server-1 <GeodeRedisServer-Command-1> 
tid=0xfb] Closing client connection because one of the servers doing this 
operation departed.

This leads to JedisCluster client to retry with the same command:
[vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> 
tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 member1-214 
member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 

[vm1] [warn 2021/05/25 15:31:26.729 PDT server-1 <GeodeRedisServer-Command-1> 
tid=0xfb] Closing client connection because one of the servers doing this 
operation departed.

[vm1] [warn 2021/05/25 15:31:26.732 PDT server-1 <GeodeRedisServer-Command-1> 
tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 member1-214 
member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 
...

[vm1] [warn 2021/05/25 15:31:26.737 PDT server-1 <GeodeRedisServer-Command-1> 
tid=0xfb] Executing Redis command: ZREM key member1-212 member1-211 member1-214 
member1-213 member1-210 member1-209 member1-208 member1-205 member1-204 
...

[vm1] [warn 2021/05/25 15:31:26.739 PDT server-1 <GeodeRedisServer-Command-1> 
tid=0xfb] Execution of Redis command ZREM key member1-212 member1-211 
member1-214 member1-213 member1-210 member1-209 member1-208 member1-205 
member1-204 failed: org.apache.geode.cache.execute.FunctionException: 
org.apache.geode.cache.execute.FunctionInvocationTargetException: Remote cache 
is closed: 192.168.0.14(server-3:79774)<v3>:41003
...

Currently the retries aborted due to the following issue by JedisCluster client 
(which is tracked by redis issue of not back off and retry on different node: 
https://github.com/redis/jedis/issues/2347)

java.util.concurrent.ExecutionException: 
redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more 
cluster attempts left.

        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.zRemCanRemoveMembersFromSortedSetDuringPrimaryIsCrashed(ZRemDUnitTest.java:177)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.apache.geode.test.junit.rules.serializable.SerializableExternalResource$1.evaluate(SerializableExternalResource.java:38)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at 
org.apache.geode.test.dunit.rules.ClusterStartupRule$1.evaluate(ClusterStartupRule.java:138)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
        at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
        at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:54)
        at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
        at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No 
more cluster attempts left.
        at 
redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:84)
        at 
redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122)
        at 
redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122)
        at 
redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122)
        at 
redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122)
        at 
redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:122)
        at 
redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:25)
        at redis.clients.jedis.JedisCluster.zrem(JedisCluster.java:984)
        at 
org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.doZRem(ZRemDUnitTest.java:157)
        at 
org.apache.geode.redis.internal.executor.sortedset.ZRemDUnitTest.lambda$zRemCanRemoveMembersFromSortedSetDuringPrimaryIsCrashed$0(ZRemDUnitTest.java:174)
        at 
org.apache.geode.test.junit.rules.ExecutorServiceRule.lambda$submit$2(ExecutorServiceRule.java:232)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Once this is solved by JedisCluter client, the retry may cause wrong number 
being reported by the retried command.
It is possible that this behavior may impact on APPEND and INCRBY command -- 
which would produce wrong results of the retried command (subject to additional 
testing).







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to