Dionysios Logothetis created GIRAPH-1230:
--------------------------------------------

             Summary: Fix Netty reconnection issues
                 Key: GIRAPH-1230
                 URL: https://issues.apache.org/jira/browse/GIRAPH-1230
             Project: Giraph
          Issue Type: Bug
            Reporter: Dionysios Logothetis


- The LogOnErrorChannelFutureListener is called when a channel operation was 
complete and it was checking whether the channel failed, in which case it tried 
to resend any requests. Doing this required to wait until a channel had been 
re-established. However, doing a wait operation from the same thread that calls 
the handler, causes a BlockingOperationException from Netty. So this is not 
effective.
- Upon a channel closing, we have logic that will try to re-open the channels 
doing a max number of retries.. But we also had logic in the ChannelRoterator 
that would throw an exception if we didn't find any channel. This does not give 
the opportunity to re-conenct. 
- Whenever the client closes the connection, the server catches this 
(Connection reset by peer) and throws an exception as well, so the job fails 
immediately. This does not give the opportunity to the client to re-connect. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to