[ https://issues.apache.org/jira/browse/GIRAPH-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717946#comment-16717946 ]
ASF GitHub Bot commented on GIRAPH-1213: ---------------------------------------- Github user majakabiljo commented on the issue: https://github.com/apache/giraph/pull/96 I used a pipeline which runs 100 jobs and was always getting at least a few jobs stuck with open network requests. Running it with more logging helped identify these two issues, and after the change it was 100% successful. > Fix issues with network requests retries and add more logging > ------------------------------------------------------------- > > Key: GIRAPH-1213 > URL: https://issues.apache.org/jira/browse/GIRAPH-1213 > Project: Giraph > Issue Type: Bug > Reporter: Maja Kabiljo > Assignee: Maja Kabiljo > Priority: Major > > Fixing two bugs: > * When channel fails, we are currently retrying all requests towards the > destination machine from the channel, instead of just ones which are > happening on the concrete channel. > * In practice, we've noticed BlockingOperationException can get thrown when > we wait to connect on channel in which case we silently don't send the > request we are trying to send, so catching this exception and retrying > instead. > Also added logging of channel ids to be able to debug issues related to > network requests not delivering easier. -- This message was sent by Atlassian JIRA (v7.6.3#76005)