[ https://issues.apache.org/jira/browse/GIRAPH-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Reisman updated GIRAPH-246: ------------------------------- Attachment: GIRAPH-214-6-option1.patch This is about as I can get without attempting to subclass/reimplement more chunks of Hadoop side stuff in order to accomplish "option1" just wanted to put it up while I consider options for anyone who might have an idea to peek at. > Periodic worker calls to context.progress() will prevent timeout on some > Hadoop clusters during barrier waits > ------------------------------------------------------------------------------------------------------------- > > Key: GIRAPH-246 > URL: https://issues.apache.org/jira/browse/GIRAPH-246 > Project: Giraph > Issue Type: Improvement > Components: bsp > Affects Versions: 0.2.0 > Reporter: Eli Reisman > Assignee: Eli Reisman > Priority: Minor > Labels: hadoop, patch > Fix For: 0.2.0 > > Attachments: GIRAPH-246-10.patch, GIRAPH-246-11.patch, > GIRAPH-246-1.patch, GIRAPH-246-2.patch, GIRAPH-246-3.patch, > GIRAPH-246-4.patch, GIRAPH-246-5.patch, GIRAPH-246-6.patch, > GIRAPH-246-7.patch, GIRAPH-246-8.patch, GIRAPH-246-9.patch > > > This simple change creates a command-line configurable option in GiraphJob to > control the time between calls to context().progress() that allows workers to > avoid timeouts during long data load-ins in which some works complete their > input split reads much faster than others, or finish a super step faster. I > found this allowed jobs that were large-scale but with low memory overhead to > complete even when they would previously time out during runs on a Hadoop > cluster. Timeout is still possible when the worker crashes or runs out of > memory or has other GC or RPC trouble that is legitimate, but prevents > unintentional crashes when the worker is actually still healthy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira