dlogothetis opened a new pull request #118: Fix issues with channel 
re-connection
URL: https://github.com/apache/giraph/pull/118
 
 
   - The LogOnErrorChannelFutureListener is called when a channel operation was 
complete and it was checking whether the channel failed, in which case it tried 
to resend any requests. Doing this required to wait until a channel had been 
re-established. However, doing a wait operation from the same thread that calls 
the handler, causes a BlockingOperationException from Netty. So this was not 
effective.
   - I removed the call to the method that waits to re-establish the connection 
and send any requests. Besides, we already have a thread that periodically 
checks and re-sends any unsent requests, and also re-establishes any closed 
channels.
   - Upon a channel closing, we have logic that will try to re-open the 
channels doing a max number of retries.. But we also had logic in the 
ChannelRoterator that would throw an exception if we didn't find any channel. 
This does not give the opportunity to re-conenct. So I removed this.
   - Whenever the client closes the connection, the server catches this 
(Connection reset by peer) and throws an exception as well, so the job fails 
immediately. This does not give the opportunity to the client to re-connect. I 
changed this so that whenever  a server sees a "Connection reset by peer" 
exception, it does not fail. Still failing in all other cases.
   
   Tests
   - Unit tests
   - Snapshot tests
   - Ran with job that would consistently fail due to connection errors, which 
now succeeds.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to