Eli Reisman created GIRAPH-246:
----------------------------------

             Summary: Periodic worker calls to context.progress() will prevent 
timeout on some Hadoop clusters during barrier waits
                 Key: GIRAPH-246
                 URL: https://issues.apache.org/jira/browse/GIRAPH-246
             Project: Giraph
          Issue Type: Improvement
          Components: bsp
    Affects Versions: 0.2.0
            Reporter: Eli Reisman
            Assignee: Eli Reisman
            Priority: Minor
             Fix For: 0.2.0


This simple change creates a command-line configurable option in GiraphJob to 
control the time between calls to context().progress() that allows workers to 
avoid timeouts during long data load-ins in which some works complete their 
input split reads much faster than others, or finish a super step faster. I 
found this allowed jobs that were large-scale but with low memory overhead to 
complete even when they would previously time out during runs on a Hadoop 
cluster. Timeout is still possible when the worker crashes or runs out of 
memory or has other GC or RPC trouble that is legitimate, but prevents 
unintentional crashes when the worker is actually still healthy.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to