[ 
https://issues.apache.org/jira/browse/GIRAPH-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427056#comment-13427056
 ] 

Jaeho Shin commented on GIRAPH-274:
-----------------------------------

Thanks [~initialcontext] for the hints on timeout option.  
{{mapred.task.timeout}} appears to be it.

Regarding the separate thread reporting progress, it's still unclear to me why 
we can't approach this timeout problem from the other end.  Maybe it's because 
I haven't been bitten by zombies yet, but at least with this separate thread, 
we can be sure that timeouts will not happen because of blocking calls to an 
underlying system which we have no control of and impossible to report progress 
within.  Besides, we can always make the thread smarter to observe worker's 
actual progress (in terms of Giraph) and stop reporting progress to Hadoop when 
it sees the current superstep or # vertices read/written or computed stand 
still for too long.  Currently, we hit the timeout while staying blocked by all 
sorts of unknown sources.  I believe by flipping this problem into finding true 
positives instead of ruling out false positives, Giraph will have full control 
over the timeout mechanism and it will be much easier for us to make it more 
reliable.
                
> Jobs still failing due to tasks timeout during INPUT_SUPERSTEP
> --------------------------------------------------------------
>
>                 Key: GIRAPH-274
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-274
>             Project: Giraph
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: Jaeho Shin
>            Assignee: Jaeho Shin
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-274.patch
>
>
> Even after GIRAPH-267, jobs were failing during INPUT_SUPERSTEP when some 
> workers don't get to reserve an input split, while others were loading 
> vertices for a long time.  (related to GIRAPH-246 and GIRAPH-267)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to