Eli Reisman created GIRAPH-246:
----------------------------------
Summary: Periodic worker calls to context.progress() will prevent
timeout on some Hadoop clusters during barrier waits
Key: GIRAPH-246
URL: https://issues.apache.org/jira/browse/GIRAPH-246
Project: Giraph
Issue Type: Improvement
Components: bsp
Affects Versions: 0.2.0
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Minor
Fix For: 0.2.0
This simple change creates a command-line configurable option in GiraphJob to
control the time between calls to context().progress() that allows workers to
avoid timeouts during long data load-ins in which some works complete their
input split reads much faster than others, or finish a super step faster. I
found this allowed jobs that were large-scale but with low memory overhead to
complete even when they would previously time out during runs on a Hadoop
cluster. Timeout is still possible when the worker crashes or runs out of
memory or has other GC or RPC trouble that is legitimate, but prevents
unintentional crashes when the worker is actually still healthy.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira