[ https://issues.apache.org/jira/browse/GIRAPH-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422881#comment-13422881 ]
Eli Reisman commented on GIRAPH-267: ------------------------------------ This is nice work for sure! I messed around with putting it in the lock, but the fact is you need progress calls at various times all through BspServiceWorker to keep a healthy job from timing out, and it felt like a better idea to let someone see where and how often its happening, and to have it locally configurable at the command line. I think when you want to use waitForever, it should wait forever like it says, and when its not called for, it should not be waitForever. > Jobs can get killed for not reporting status during INPUT SUPERSTEP > ------------------------------------------------------------------- > > Key: GIRAPH-267 > URL: https://issues.apache.org/jira/browse/GIRAPH-267 > Project: Giraph > Issue Type: Bug > Components: graph > Affects Versions: 0.2.0 > Environment: Facebook Hadoop > Reporter: Jaeho Shin > Assignee: Jaeho Shin > Fix For: 0.2.0 > > Attachments: > 0001-Made-PredicateLock-report-progress-and-removed-Conte.patch, > GIRAPH-267.patch, GIRAPH-267.patch > > > Job with a skewed and long (>600secs in my case) INPUT_SUPERSTEP fails for > some tasks not reporting their status. From BspServiceWorker#setup(), I > could tell while some workers were still loading inputSplits, others finished > theirs early and hanged on PredicateLock#waitForever(), and got killed after > the timeout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira