[ 
https://issues.apache.org/jira/browse/RATIS-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968910#comment-16968910
 ] 

Hanisha Koneru commented on RATIS-649:
--------------------------------------

After this patch, RaftServer restart is failing. 

In HDDS-2392,{{ RaftServer#start()}} fails with following exception:
{code:java}
java.io.IOException: java.lang.IllegalStateException: Not started
        at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
        at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
        at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
        at 
org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
        at 
org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215)
        at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110)
        at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Not started
        at 
org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176)
        at 
org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143)
        at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
        at 
org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182)
        at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84)
        at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
        at 
org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136)
        at 
org.apache.ratis.server.impl.RaftServerMetrics.<init>(RaftServerMetrics.java:70)
        at 
org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62)
        at 
org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:119)
        at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590){code}
 

I traced back the error and the root cause is the new {{RaftServerMetrics}} 
initialization in {{RaftServerImpl}} (line 119). In RaftServerMetrics 
initialization, we are passing {{server.getPeer()}} to 
{{addPeerCommitIndexGauge().}} But the server is not started yet and this 
causes the _IllelageStateException_ in {{GrpcService#addressSupplier.}}

Without {{addPeerCommitIndexGauge()}} call in RaftServerMetrics, 
{{RaftServer#start()}} succeeds.



cc. [~avijayan], [~shashikant]

 

> Add metrics related to ClientRequests 
> --------------------------------------
>
>                 Key: RATIS-649
>                 URL: https://issues.apache.org/jira/browse/RATIS-649
>             Project: Ratis
>          Issue Type: Sub-task
>          Components: server
>    Affects Versions: 0.4.0
>            Reporter: Shashikant Banerjee
>            Assignee: Aravindan Vijayan
>            Priority: Major
>             Fix For: 0.5.0
>
>         Attachments: RATIS-649-000.patch, RATIS-649-001.patch, 
> RATIS-649-002.patch
>
>
> Following metrics would be good to have to measure the load and the 
> processing time of client requests:
>  
> |numReadRequestCount|Number of read type requests received on the leader|
> |numWriteRequestCount|Number of write type requests received on the leader|
> |numWatchForMajorityRequestCount|Number of Watch for Majority type requests 
> received on the leader. 
>  |
> |numWatchForAllRequestCount|Number of Watch for All type requests received on 
> the leader.|
> |raftClientReadRequestLatency|Time required to process read type requests |
> |raftClientWriteRequestLatency|Time required to process write type requests|
> |raftClientWatchForMajority|Time required to process WatchForMajority 
> requests|
> |raftClientWatchForAllRequests|Time required to process WatchForAll requests|
> |requestQueueLimitHitCount|Number of times the no of pending requests in the 
> leader hit the configured limit.|
> |numRequestRetryCacheHitCount|No of of Request Retry Cache hits. This gives 
> an idea of retries via Raft clients because of request timeouts or 
> exceptions.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to