[ 
https://issues.apache.org/jira/browse/GIRAPH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997634#comment-16997634
 ] 

Hudson commented on GIRAPH-1230:
--------------------------------

FAILURE: Integrated in Jenkins build Giraph-trunk-Commit #1791 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/1791/])
GIRAPH-1230 (dionysios: 
[http://gitbox.apache.org/repos/asf?p=giraph.git&a=commit&h=f8d017e61d66ec56b17ecf796743d6851c2f0988])
* (edit) 
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java
* (edit) checkstyle.xml
* (edit) 
giraph-core/src/main/java/org/apache/giraph/comm/netty/ChannelRotater.java
* (edit) giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
* (edit) giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java


> Fix Netty reconnection issues
> -----------------------------
>
>                 Key: GIRAPH-1230
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1230
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Dionysios Logothetis
>            Assignee: Dionysios Logothetis
>            Priority: Major
>
> - The LogOnErrorChannelFutureListener is called when a channel operation was 
> complete and it was checking whether the channel failed, in which case it 
> tried to resend any requests. Doing this required to wait until a channel had 
> been re-established. However, doing a wait operation from the same thread 
> that calls the handler, causes a BlockingOperationException from Netty. So 
> this is not effective.
> - Upon a channel closing, we have logic that will try to re-open the channels 
> doing a max number of retries.. But we also had logic in the ChannelRoterator 
> that would throw an exception if we didn't find any channel. This does not 
> give the opportunity to re-conenct. 
> - Whenever the client closes the connection, the server catches this 
> (Connection reset by peer) and throws an exception as well, so the job fails 
> immediately. This does not give the opportunity to the client to re-connect. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to