[ 
https://issues.apache.org/jira/browse/RATIS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Geng updated RATIS-1072:
-----------------------------
    Description: 
[link title|http://example.com] 

When InstallSnapshotResponseHandler::onError() or 
AppendLogResponseHandler::onError() is called, it will:
 # call rpcService.getProxies().resetProxy(getFollowerId()).
 # set appendLogRequestObserver to be null.

The recovery steps for this BiDi streaming call is actually: 
 # call ManagedChannel.shutdown()
 # re-create the channel from NettyChannelBuilder
 # re-create the async stub from the channel.
 # re-create a StreamObserver<OurRequest> from the async stub.

 

*Above steps is un-necessary and dangerous.*

 

*It is un-necessary:*

according to the [reply from 
grpc|[https://github.com/grpc/grpc-java/issues/7442]]]
{quote}Yes, you don't need to recreate the channel (and stub) for starting a 
new RPC call. The overhead for creating a channel is big.

Receiving {{StreamObserver.onError()}} only indicates that specific RPC has 
been terminated. The connection should be still there. So you can definitely 
start new RPCs on the existing connection.
{quote}
 

*It is dangerous:*

In multi-raft configuraiton, if two leaders sit in DN1, their followers sit in 
DN2, their steaming call will share the managed channel in 
GrpcServerProtocolClient that targeting from DN1 to DN2.

If leader1 calls StreamObserver::onError(), it will shut down and re-create the 
underlying channel. In the same time, leader2's streaming call will be 
affected, and call StreamObserver::onError() to shut down and re-create the 
channel as well. The consequences is, the channel will be continually shutdown 
and re-create.

 

*trace from DN consisting of the leaders:*

We can see that, the channel is continually shutdown and re-create in every 3s.
 2020-09-18 19:47:04,025 [grpc-default-executor-93] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134495, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:07,025 [grpc-default-executor-115] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134499, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:10,026 [grpc-default-executor-106] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134503, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:13,026 [grpc-default-executor-115] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134507, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:16,027 [grpc-default-executor-106] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134511, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:19,027 [grpc-default-executor-114] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134532, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:22,027 [grpc-default-executor-105] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134540, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:25,028 [grpc-default-executor-104] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134544, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:28,028 [grpc-default-executor-117] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134548, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:31,028 [grpc-default-executor-114] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134552, 
target=172.16.90.69:9858}}. 
  

*trace from DN consisting of the followers:*

Streams of different raft-group is continually affected. This can be proved 
from the (term, index)

(t:9, i:8713), 

(t:7, i:12312)

(t:3, i:213868)

(t:10, i:12504)
 2020-09-18 19:51:49,498 [grpc-default-executor-113] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#320-t9,
 previous=(t:9, i:8713), leaderCommit=8714, initializing? false, entries: 
size=1, first=(t:9, i:8714), METADATAENTRY(c:8713): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:52,499 [grpc-default-executor-104] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#608-t7,
 previous=(t:7, i:12312), leaderCommit=12313, initializing? false, entries: 
size=1, first=(t:7, i:12313), METADATAENTRY(c:12312): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:52,499 [grpc-default-executor-109] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#1250591-t3,
 previous=(t:3, i:213868), leaderCommit=213869, initializing? false, entries: 
size=1, first=(t:3, i:213869), METADATAENTRY(c:213865): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:52,499 [grpc-default-executor-112] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#601-t10,
 previous=(t:10, i:12504), leaderCommit=12505, initializing? false, entries: 
size=1, first=(t:10, i:12505), METADATAENTRY(c:12504): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:55,499 [grpc-default-executor-109] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#785-t10,
 previous=(t:10, i:6660), leaderCommit=6661, initializing? false, entries: 
size=1, first=(t:10, i:6661), METADATAENTRY(c:6660): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:55,499 [grpc-default-executor-104] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#701-t6,
 previous=(t:6, i:9305), leaderCommit=9306, initializing? false, entries: 
size=1, first=(t:6, i:9306), METADATAENTRY(c:9305): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:58,499 [grpc-default-executor-114] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#611-t7,
 previous=(t:7, i:12312), leaderCommit=12313, initializing? false, entries: 
size=1, first=(t:7, i:12313), METADATAENTRY(c:12312): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:58,499 [grpc-default-executor-112] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#604-t10,
 previous=(t:10, i:12504), leaderCommit=12505, initializing? false, entries: 
size=1, first=(t:10, i:12505), METADATAENTRY(c:12504): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:58,499 [grpc-default-executor-113] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#324-t9,
 previous=(t:9, i:8713), leaderCommit=8714, initializing? false, entries: 
size=1, first=(t:9, i:8714), METADATAENTRY(c:8713): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
  

*The fix is simple* 

We can reuse the the channel and stub, just need {{step 4}} to re-create the 
{{StreamObserver.}}

 

  was:
 

When InstallSnapshotResponseHandler::onError() or 
AppendLogResponseHandler::onError() is called, it will:
 # call rpcService.getProxies().resetProxy(getFollowerId()).
 # set appendLogRequestObserver to be null.

The recovery steps for this BiDi streaming call is actually: 
 # call ManagedChannel.shutdown()
 # re-create the channel from NettyChannelBuilder
 # re-create the async stub from the channel.
 # re-create a StreamObserver<OurRequest> from the async stub.

 

*Above steps is un-necessary and dangerous.*

 

*It is un-necessary:*

according to the [reply from 
grpc|[https://github.com/grpc/grpc-java/issues/7442]]:
{quote}Yes, you don't need to recreate the channel (and stub) for starting a 
new RPC call. The overhead for creating a channel is big.

Receiving {{StreamObserver.onError()}} only indicates that specific RPC has 
been terminated. The connection should be still there. So you can definitely 
start new RPCs on the existing connection.
{quote}
 

*It is dangerous:*

In multi-raft configuraiton, if two leaders sit in DN1, their followers sit in 
DN2, their steaming call will share the managed channel in 
GrpcServerProtocolClient that targeting from DN1 to DN2.

If leader1 calls StreamObserver::onError(), it will shut down and re-create the 
underlying channel. In the same time, leader2's streaming call will be 
affected, and call StreamObserver::onError() to shut down and re-create the 
channel as well. The consequences is, the channel will be continually shutdown 
and re-create.

 

*trace from DN consisting of the leaders:*

We can see that, the channel is continually shutdown and re-create in every 3s.
 2020-09-18 19:47:04,025 [grpc-default-executor-93] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134495, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:07,025 [grpc-default-executor-115] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134499, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:10,026 [grpc-default-executor-106] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134503, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:13,026 [grpc-default-executor-115] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134507, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:16,027 [grpc-default-executor-106] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134511, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:19,027 [grpc-default-executor-114] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134532, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:22,027 [grpc-default-executor-105] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134540, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:25,028 [grpc-default-executor-104] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134544, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:28,028 [grpc-default-executor-117] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134548, 
target=172.16.90.69:9858}}. 
 2020-09-18 19:47:31,028 [grpc-default-executor-114] WARN 
org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: 
ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134552, 
target=172.16.90.69:9858}}. 
  

*trace from DN consisting of the followers:*

Streams of different raft-group is continually affected. This can be proved 
from the (term, index)

(t:9, i:8713), 

(t:7, i:12312)

(t:3, i:213868)

(t:10, i:12504)
 2020-09-18 19:51:49,498 [grpc-default-executor-113] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#320-t9,
 previous=(t:9, i:8713), leaderCommit=8714, initializing? false, entries: 
size=1, first=(t:9, i:8714), METADATAENTRY(c:8713): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:52,499 [grpc-default-executor-104] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#608-t7,
 previous=(t:7, i:12312), leaderCommit=12313, initializing? false, entries: 
size=1, first=(t:7, i:12313), METADATAENTRY(c:12312): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:52,499 [grpc-default-executor-109] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#1250591-t3,
 previous=(t:3, i:213868), leaderCommit=213869, initializing? false, entries: 
size=1, first=(t:3, i:213869), METADATAENTRY(c:213865): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:52,499 [grpc-default-executor-112] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#601-t10,
 previous=(t:10, i:12504), leaderCommit=12505, initializing? false, entries: 
size=1, first=(t:10, i:12505), METADATAENTRY(c:12504): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:55,499 [grpc-default-executor-109] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#785-t10,
 previous=(t:10, i:6660), leaderCommit=6661, initializing? false, entries: 
size=1, first=(t:10, i:6661), METADATAENTRY(c:6660): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:55,499 [grpc-default-executor-104] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#701-t6,
 previous=(t:6, i:9305), leaderCommit=9306, initializing? false, entries: 
size=1, first=(t:6, i:9306), METADATAENTRY(c:9305): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:58,499 [grpc-default-executor-114] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#611-t7,
 previous=(t:7, i:12312), leaderCommit=12313, initializing? false, entries: 
size=1, first=(t:7, i:12313), METADATAENTRY(c:12312): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:58,499 [grpc-default-executor-112] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#604-t10,
 previous=(t:10, i:12504), leaderCommit=12505, initializing? false, entries: 
size=1, first=(t:10, i:12505), METADATAENTRY(c:12504): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
 2020-09-18 19:51:58,499 [grpc-default-executor-113] WARN 
org.apache.ratis.grpc.server.GrpcServerProtocolService: 
591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#324-t9,
 previous=(t:9, i:8713), leaderCommit=8714, initializing? false, entries: 
size=1, first=(t:9, i:8714), METADATAENTRY(c:8713): 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
cancelled before receiving half close
  

*The fix is simple* 

We can reuse the the channel and stub, just need {{step 4}} to re-create the 
{{StreamObserver.}}

 


> Should not shutdown and re-create channel/stub in GrpcServerProtocolClient 
> when StreamObserver::onError() is called.
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: RATIS-1072
>                 URL: https://issues.apache.org/jira/browse/RATIS-1072
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>             Fix For: 1.1.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> [link title|http://example.com] 
> When InstallSnapshotResponseHandler::onError() or 
> AppendLogResponseHandler::onError() is called, it will:
>  # call rpcService.getProxies().resetProxy(getFollowerId()).
>  # set appendLogRequestObserver to be null.
> The recovery steps for this BiDi streaming call is actually: 
>  # call ManagedChannel.shutdown()
>  # re-create the channel from NettyChannelBuilder
>  # re-create the async stub from the channel.
>  # re-create a StreamObserver<OurRequest> from the async stub.
>  
> *Above steps is un-necessary and dangerous.*
>  
> *It is un-necessary:*
> according to the [reply from 
> grpc|[https://github.com/grpc/grpc-java/issues/7442]]]
> {quote}Yes, you don't need to recreate the channel (and stub) for starting a 
> new RPC call. The overhead for creating a channel is big.
> Receiving {{StreamObserver.onError()}} only indicates that specific RPC has 
> been terminated. The connection should be still there. So you can definitely 
> start new RPCs on the existing connection.
> {quote}
>  
> *It is dangerous:*
> In multi-raft configuraiton, if two leaders sit in DN1, their followers sit 
> in DN2, their steaming call will share the managed channel in 
> GrpcServerProtocolClient that targeting from DN1 to DN2.
> If leader1 calls StreamObserver::onError(), it will shut down and re-create 
> the underlying channel. In the same time, leader2's streaming call will be 
> affected, and call StreamObserver::onError() to shut down and re-create the 
> channel as well. The consequences is, the channel will be continually 
> shutdown and re-create.
>  
> *trace from DN consisting of the leaders:*
> We can see that, the channel is continually shutdown and re-create in every 
> 3s.
>  2020-09-18 19:47:04,025 [grpc-default-executor-93] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134495, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:07,025 [grpc-default-executor-115] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134499, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:10,026 [grpc-default-executor-106] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134503, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:13,026 [grpc-default-executor-115] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134507, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:16,027 [grpc-default-executor-106] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134511, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:19,027 [grpc-default-executor-114] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134532, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:22,027 [grpc-default-executor-105] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134540, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:25,028 [grpc-default-executor-104] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134544, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:28,028 [grpc-default-executor-117] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134548, 
> target=172.16.90.69:9858}}. 
>  2020-09-18 19:47:31,028 [grpc-default-executor-114] WARN 
> org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper\{delegate=ManagedChannelImpl{logId=134552, 
> target=172.16.90.69:9858}}. 
>   
> *trace from DN consisting of the followers:*
> Streams of different raft-group is continually affected. This can be proved 
> from the (term, index)
> (t:9, i:8713), 
> (t:7, i:12312)
> (t:3, i:213868)
> (t:10, i:12504)
>  2020-09-18 19:51:49,498 [grpc-default-executor-113] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#320-t9,
>  previous=(t:9, i:8713), leaderCommit=8714, initializing? false, entries: 
> size=1, first=(t:9, i:8714), METADATAENTRY(c:8713): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>  2020-09-18 19:51:52,499 [grpc-default-executor-104] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#608-t7,
>  previous=(t:7, i:12312), leaderCommit=12313, initializing? false, entries: 
> size=1, first=(t:7, i:12313), METADATAENTRY(c:12312): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>  2020-09-18 19:51:52,499 [grpc-default-executor-109] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#1250591-t3,
>  previous=(t:3, i:213868), leaderCommit=213869, initializing? false, entries: 
> size=1, first=(t:3, i:213869), METADATAENTRY(c:213865): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>  2020-09-18 19:51:52,499 [grpc-default-executor-112] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#601-t10,
>  previous=(t:10, i:12504), leaderCommit=12505, initializing? false, entries: 
> size=1, first=(t:10, i:12505), METADATAENTRY(c:12504): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>  2020-09-18 19:51:55,499 [grpc-default-executor-109] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#785-t10,
>  previous=(t:10, i:6660), leaderCommit=6661, initializing? false, entries: 
> size=1, first=(t:10, i:6661), METADATAENTRY(c:6660): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>  2020-09-18 19:51:55,499 [grpc-default-executor-104] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#701-t6,
>  previous=(t:6, i:9305), leaderCommit=9306, initializing? false, entries: 
> size=1, first=(t:6, i:9306), METADATAENTRY(c:9305): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>  2020-09-18 19:51:58,499 [grpc-default-executor-114] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#611-t7,
>  previous=(t:7, i:12312), leaderCommit=12313, initializing? false, entries: 
> size=1, first=(t:7, i:12313), METADATAENTRY(c:12312): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>  2020-09-18 19:51:58,499 [grpc-default-executor-112] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#604-t10,
>  previous=(t:10, i:12504), leaderCommit=12505, initializing? false, entries: 
> size=1, first=(t:10, i:12505), METADATAENTRY(c:12504): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>  2020-09-18 19:51:58,499 [grpc-default-executor-113] WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 591f38ce-0b11-4003-8528-7a5802f89ea8: installSnapshot onError, lastRequest: 
> df173506-6978-47e5-8aef-034cafd221a6->591f38ce-0b11-4003-8528-7a5802f89ea8#324-t9,
>  previous=(t:9, i:8713), leaderCommit=8714, initializing? false, entries: 
> size=1, first=(t:9, i:8714), METADATAENTRY(c:8713): 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: 
> cancelled before receiving half close
>   
> *The fix is simple* 
> We can reuse the the channel and stub, just need {{step 4}} to re-create the 
> {{StreamObserver.}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to