[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-08-10 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576845#comment-16576845
 ] 

Tsz Wo Nicholas Sze commented on RATIS-260:
---

{quote}
In this test, one of the nodes was shut down permanently. This can result into 
a situation where a candidate node is never able to move out of Leader Election 
phase.
{quote}
I just have checked the current code again.  I cannot see how this could 
happen.  I suspect that the candidate node cannot talk to the other nodes in 
this failure case so that it won't able to move out from Leader Election.



> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-260.00.patch
>
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: java.net.ConnectException: Connection refused
> ... 11 more
> {code}
> 

[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-08-10 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576842#comment-16576842
 ] 

Tsz Wo Nicholas Sze commented on RATIS-260:
---

{quote}
No, it is a bug in LeaderElection.waitForResults(LeaderElection.java:214) 
according to the given stack trace.
{quote}

Sorry [~shashikant].  My above comment was wrong.  The stack trace indeed shows 
that the StatusRuntimeException is wrapped by an ExecutionException.  Catching 
StatusRuntimeException seems not helpful.
{code}
java.util.concurrent.ExecutionException: 
org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception
{code}

> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-260.00.patch
>
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: 

[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-08-10 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576770#comment-16576770
 ] 

Shashikant Banerjee commented on RATIS-260:
---

Thanks [~szetszwo], for the review. The issue is not recreatable consistently 
with Ozone.

As discussed with [~msingh], it was hit after 50 runs of Freon in cluster once. 
I ran basic Freon in Ozone and it worked well.

> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-260.00.patch
>
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: java.net.ConnectException: Connection refused
> ... 11 more
> {code}
> This happens because of the following lines of the code during requestVote.
> {code}
> for (final RaftPeer peer : others) {
>   final RequestVoteRequestProto r = 

[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-08-10 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576727#comment-16576727
 ] 

Tsz Wo Nicholas Sze commented on RATIS-260:
---

+1 patch looks good.

[~shashikant], have tested it with Ozone to see if this can fix the problem?

> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-260.00.patch
>
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: java.net.ConnectException: Connection refused
> ... 11 more
> {code}
> This happens because of the following lines of the code during requestVote.
> {code}
> for (final RaftPeer peer : others) {
>   final RequestVoteRequestProto r = server.createRequestVoteRequest(
>   peer.getId(), electionTerm, lastEntry);
>   service.submit(
>   () -> 

[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-08-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575979#comment-16575979
 ] 

Hadoop QA commented on RATIS-260:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} root: The patch generated 1 new + 50 unchanged - 
1 fixed = 51 total (was 51) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m 
17s{color} | {color:green} root in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 8s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/ratis:date2018-08-10 
|
| JIRA Issue | RATIS-260 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934968/RATIS-260.00.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  checkstyle  
compile  |
| uname | Linux 02f4772880c3 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-RATIS-Build/yetus-personality.sh
 |
| git revision | master / 6a2c3d5 |
| Default Java | 1.8.0_171 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-RATIS-Build/288/artifact/out/diff-checkstyle-root.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-RATIS-Build/288/testReport/ |
| modules | C: ratis-server U: ratis-server |
| Console output | 
https://builds.apache.org/job/PreCommit-RATIS-Build/288/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-260.00.patch
>
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 

[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-07-09 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536660#comment-16536660
 ] 

Tsz Wo Nicholas Sze commented on RATIS-260:
---

In stead of catching ExecutionException, it should also catch 
StatusRuntimeException or more.

> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: java.net.ConnectException: Connection refused
> ... 11 more
> {code}
> This happens because of the following lines of the code during requestVote.
> {code}
> for (final RaftPeer peer : others) {
>   final RequestVoteRequestProto r = server.createRequestVoteRequest(
>   peer.getId(), electionTerm, lastEntry);
>   service.submit(
>   () -> server.getServerRpc().requestVote(r));
>   submitted++;
> }
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (RATIS-260) Ratis Leader election should try for other peers even when ask for votes fails

2018-07-09 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536658#comment-16536658
 ] 

Tsz Wo Nicholas Sze commented on RATIS-260:
---

> This happens because of the following lines of the code during requestVote. 
> ...

No, it is a bug in LeaderElection.waitForResults(LeaderElection.java:214) 
according to the given stack trace.

> Ratis Leader election should try for other peers even when ask for votes fails
> --
>
> Key: RATIS-260
> URL: https://issues.apache.org/jira/browse/RATIS-260
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>
> This bug was simulated using Ozone using Ratis for Data pipeline.
> In this test, one of the nodes was shut down permanently. This can result 
> into a situation where a candidate node is never able to move out of Leader 
> Election phase.
> {code}
> 2018-06-15 07:44:58,246 INFO org.apache.ratis.server.impl.LeaderElection: 
> 0f7b9cd2-4dad-46d7-acbc-57d424492d00_9858 got exception when requesting 
> votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
> at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
> at 
> org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.shaded.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
> at 
> org.apache.ratis.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
> at 
> org.apache.ratis.shaded.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:281)
> at 
> org.apache.ratis.grpc.server.RaftServerProtocolClient.requestVote(RaftServerProtocolClient.java:61)
> at 
> org.apache.ratis.grpc.RaftGRpcService.requestVote(RaftGRpcService.java:147)
> at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: y128.l42scl.hortonworks.com/172.26.32.228:9858
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.ratis.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at 
> org.apache.ratis.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.apache.ratis.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> ... 1 more
> Caused by: java.net.ConnectException: Connection refused
> ... 11 more
> {code}
> This happens because of the following lines of the code during requestVote.
> {code}
> for (final RaftPeer peer : others) {
>   final RequestVoteRequestProto r = server.createRequestVoteRequest(
>   peer.getId(), electionTerm, lastEntry);
>   service.submit(
>   () ->