[
https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343028#comment-15343028
]
Hudson commented on GIRAPH-1077:
--------------------------------
FAILURE: Integrated in Giraph-trunk-Commit #1626 (See
[https://builds.apache.org/job/Giraph-trunk-Commit/1626/])
GIRAPH-1077: Jobs getting stuck after channel failure (majakabiljo:
[http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=51f09376456ed8dadc2e801afaa495863fd7ee3b])
* giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
> Jobs getting stuck after channel failure
> ----------------------------------------
>
> Key: GIRAPH-1077
> URL: https://issues.apache.org/jira/browse/GIRAPH-1077
> Project: Giraph
> Issue Type: Bug
> Reporter: Maja Kabiljo
> Assignee: Maja Kabiljo
>
> When a channel fails currently we just log the failure. Since we don't wait
> on open requests from every place, checking requests doesn't get called
> always, and we've seen issues with jobs staying stuck, for example during the
> input stage when request for split to read from worker to master fails. When
> we know that channel failed, we should try to resend the requests from that
> channel.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)