[
https://issues.apache.org/jira/browse/GIRAPH-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717946#comment-16717946
]
ASF GitHub Bot commented on GIRAPH-1213:
----------------------------------------
Github user majakabiljo commented on the issue:
https://github.com/apache/giraph/pull/96
I used a pipeline which runs 100 jobs and was always getting at least a few
jobs stuck with open network requests. Running it with more logging helped
identify these two issues, and after the change it was 100% successful.
> Fix issues with network requests retries and add more logging
> -------------------------------------------------------------
>
> Key: GIRAPH-1213
> URL: https://issues.apache.org/jira/browse/GIRAPH-1213
> Project: Giraph
> Issue Type: Bug
> Reporter: Maja Kabiljo
> Assignee: Maja Kabiljo
> Priority: Major
>
> Fixing two bugs:
> * When channel fails, we are currently retrying all requests towards the
> destination machine from the channel, instead of just ones which are
> happening on the concrete channel.
> * In practice, we've noticed BlockingOperationException can get thrown when
> we wait to connect on channel in which case we silently don't send the
> request we are trying to send, so catching this exception and retrying
> instead.
> Also added logging of channel ids to be able to debug issues related to
> network requests not delivering easier.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)