[ 
https://issues.apache.org/jira/browse/GIRAPH-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427086#comment-13427086
 ] 

Eli Reisman commented on GIRAPH-274:
------------------------------------

I think on our cluster there's a couple of options you set to ensure it takes. 
I mentioned it as an emergency workaround, I can't use it here, but your 
situation might be different.

The zombie thing is more an annoyance than a problem for us, but our ops people 
don't like it when it happens, and users seem to forget often during testing 
and debugging to kill their jobs. Then Giraph devs get blamed...

Making the thread smarter doesn't sound like it buys us anything, as it still 
needs to read progress data on regular intervals, and do work to measure and 
evaluate it...and then call progress just as often anyway. This seems like it 
means replacing progress calls with more overhead and perhaps synchronization 
just to make the code prettier? Giraph is already up against the wall 
resource-wise, especially during INPUT_SUPERSTEP.

I think the beautiful solution you're looking for here involves a move from 
Hadoop to YARN. For now, I just want my jobs to last through load-in again. 
Anyway, there are calls in the old 246 patch that will make this happen. Sorry 
if its ugly, I don't blame you for a few dry heaves. The excitement of seeing 
your output data come out the other side will help the nausea pass. :)

                
> Jobs still failing due to tasks timeout during INPUT_SUPERSTEP
> --------------------------------------------------------------
>
>                 Key: GIRAPH-274
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-274
>             Project: Giraph
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: Jaeho Shin
>            Assignee: Jaeho Shin
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-274.patch
>
>
> Even after GIRAPH-267, jobs were failing during INPUT_SUPERSTEP when some 
> workers don't get to reserve an input split, while others were loading 
> vertices for a long time.  (related to GIRAPH-246 and GIRAPH-267)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to