Dionysios Logothetis created GIRAPH-1230:
--------------------------------------------
Summary: Fix Netty reconnection issues
Key: GIRAPH-1230
URL: https://issues.apache.org/jira/browse/GIRAPH-1230
Project: Giraph
Issue Type: Bug
Reporter: Dionysios Logothetis
- The LogOnErrorChannelFutureListener is called when a channel operation was
complete and it was checking whether the channel failed, in which case it tried
to resend any requests. Doing this required to wait until a channel had been
re-established. However, doing a wait operation from the same thread that calls
the handler, causes a BlockingOperationException from Netty. So this is not
effective.
- Upon a channel closing, we have logic that will try to re-open the channels
doing a max number of retries.. But we also had logic in the ChannelRoterator
that would throw an exception if we didn't find any channel. This does not give
the opportunity to re-conenct.
- Whenever the client closes the connection, the server catches this
(Connection reset by peer) and throws an exception as well, so the job fails
immediately. This does not give the opportunity to the client to re-connect.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)