[ https://issues.apache.org/jira/browse/GIRAPH-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433345#comment-13433345 ]
Eli Reisman commented on GIRAPH-246: ------------------------------------ sorry about this last comment, the autocomplete on JIRA search got me here before I realized I was supposed to be commenting/uploading to 214. my bad. Ignore. > Periodic worker calls to context.progress() will prevent timeout on some > Hadoop clusters during barrier waits > ------------------------------------------------------------------------------------------------------------- > > Key: GIRAPH-246 > URL: https://issues.apache.org/jira/browse/GIRAPH-246 > Project: Giraph > Issue Type: Improvement > Components: bsp > Affects Versions: 0.2.0 > Reporter: Eli Reisman > Assignee: Eli Reisman > Priority: Minor > Labels: hadoop, patch > Fix For: 0.2.0 > > Attachments: GIRAPH-246-10.patch, GIRAPH-246-11.patch, > GIRAPH-246-1.patch, GIRAPH-246-2.patch, GIRAPH-246-3.patch, > GIRAPH-246-4.patch, GIRAPH-246-5.patch, GIRAPH-246-6.patch, > GIRAPH-246-7.patch, GIRAPH-246-8.patch, GIRAPH-246-9.patch > > > This simple change creates a command-line configurable option in GiraphJob to > control the time between calls to context().progress() that allows workers to > avoid timeouts during long data load-ins in which some works complete their > input split reads much faster than others, or finish a super step faster. I > found this allowed jobs that were large-scale but with low memory overhead to > complete even when they would previously time out during runs on a Hadoop > cluster. Timeout is still possible when the worker crashes or runs out of > memory or has other GC or RPC trouble that is legitimate, but prevents > unintentional crashes when the worker is actually still healthy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira