[jira] [Updated] (GIRAPH-274) Jobs still failing due to tasks timeout during INPUT_SUPERSTEP

Jaeho Shin (JIRA) Mon, 30 Jul 2012 21:50:40 -0700

     [ 
https://issues.apache.org/jira/browse/GIRAPH-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jaeho Shin updated GIRAPH-274:
------------------------------

    Attachment: GIRAPH-274.patch

Here is our patch that adds several progress() calls after a careful code 
review with Greg Malewicz.  It seems missing progress() call from 
BspServiceWorker#reserveInputSplit() was causing the timeout for idle workers 
during the INPUT_SUPERSTEP.  There were many more spots where Giraph is doing a 
blocking call, but we left comments due to either not having access to the 
Context or the source code.  This seems to be an endless effort and it'll only 
pollute Giraph's codebase as we try to fix more timeout cases.

We definitely need a better systematic way to keep our Giraph jobs from timing 
out.  One possibility is to run a separate thread from GraphMapper#run() which 
reports progress as long as the task don't crash, and stop worry about calling 
progress().  Do you think this is a good idea?  Will this cause any trouble in 
the underlying Hadoop/MapReduce stack?  If we're using MapReduce only for 
scheduling resources, then I believe there should be no reason for us to 
conform to MapReduce conventions of not using threads.
                
> Jobs still failing due to tasks timeout during INPUT_SUPERSTEP
> --------------------------------------------------------------
>
>                 Key: GIRAPH-274
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-274
>             Project: Giraph
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: Jaeho Shin
>            Assignee: Jaeho Shin
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-274.patch
>
>
> Even after GIRAPH-267, jobs were failing during INPUT_SUPERSTEP when some 
> workers don't get to reserve an input split, while others were loading 
> vertices for a long time.  (related to GIRAPH-246 and GIRAPH-267)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-274) Jobs still failing due to tasks timeout during INPUT_SUPERSTEP

Reply via email to