[ https://issues.apache.org/jira/browse/RATIS-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968910#comment-16968910 ]
Hanisha Koneru commented on RATIS-649: -------------------------------------- After this patch, RaftServer restart is failing. In HDDS-2392,{{ RaftServer#start()}} fails with following exception: {code:java} java.io.IOException: java.lang.IllegalStateException: Not started at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70) at org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: Not started at org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176) at org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143) at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182) at org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84) at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136) at org.apache.ratis.server.impl.RaftServerMetrics.<init>(RaftServerMetrics.java:70) at org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62) at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:119) at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590){code} I traced back the error and the root cause is the new {{RaftServerMetrics}} initialization in {{RaftServerImpl}} (line 119). In RaftServerMetrics initialization, we are passing {{server.getPeer()}} to {{addPeerCommitIndexGauge().}} But the server is not started yet and this causes the _IllelageStateException_ in {{GrpcService#addressSupplier.}} Without {{addPeerCommitIndexGauge()}} call in RaftServerMetrics, {{RaftServer#start()}} succeeds. cc. [~avijayan], [~shashikant] > Add metrics related to ClientRequests > -------------------------------------- > > Key: RATIS-649 > URL: https://issues.apache.org/jira/browse/RATIS-649 > Project: Ratis > Issue Type: Sub-task > Components: server > Affects Versions: 0.4.0 > Reporter: Shashikant Banerjee > Assignee: Aravindan Vijayan > Priority: Major > Fix For: 0.5.0 > > Attachments: RATIS-649-000.patch, RATIS-649-001.patch, > RATIS-649-002.patch > > > Following metrics would be good to have to measure the load and the > processing time of client requests: > > |numReadRequestCount|Number of read type requests received on the leader| > |numWriteRequestCount|Number of write type requests received on the leader| > |numWatchForMajorityRequestCount|Number of Watch for Majority type requests > received on the leader. > | > |numWatchForAllRequestCount|Number of Watch for All type requests received on > the leader.| > |raftClientReadRequestLatency|Time required to process read type requests | > |raftClientWriteRequestLatency|Time required to process write type requests| > |raftClientWatchForMajority|Time required to process WatchForMajority > requests| > |raftClientWatchForAllRequests|Time required to process WatchForAll requests| > |requestQueueLimitHitCount|Number of times the no of pending requests in the > leader hit the configured limit.| > |numRequestRetryCacheHitCount|No of of Request Retry Cache hits. This gives > an idea of retries via Raft clients because of request timeouts or > exceptions.| -- This message was sent by Atlassian Jira (v8.3.4#803005)