[ 
https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342452#comment-15342452
 ] 

Maja Kabiljo commented on GIRAPH-1077:
--------------------------------------

https://reviews.facebook.net/D59895

> Jobs getting stuck after channel failure
> ----------------------------------------
>
>                 Key: GIRAPH-1077
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1077
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>
> When a channel fails currently we just log the failure. Since we don't wait 
> on open requests from every place, checking requests doesn't get called 
> always, and we've seen issues with jobs staying stuck, for example during the 
> input stage when request for split to read from worker to master fails. When 
> we know that channel failed, we should try to resend the requests from that 
> channel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to