Eli Reisman created GIRAPH-256:
----------------------------------

             Summary: Partitioning outgoing graph data during INPUT_SUPERSTEP 
by # of vertices results in wide variance in RPC message sizes
                 Key: GIRAPH-256
                 URL: https://issues.apache.org/jira/browse/GIRAPH-256
             Project: Giraph
          Issue Type: Improvement
          Components: bsp, graph
    Affects Versions: 0.2.0
            Reporter: Eli Reisman
            Assignee: Eli Reisman
             Fix For: 0.2.0


This relates to GIRAPH-247. The unfortunately named 
"MAX_VERTICES_PER_PARTITION" fooled me into thinking this value was regulating 
the size of initial Partition objects as they were composed during 
INPUT_SUPERSTEP from InputSplits each worker reads.

In fact this configuration option only regulates the size of the outgoing RPC 
messages, stored locally in Partition objects but decomposed into Collections 
of BasicVertex for transfer to their eventual homes on another (or this) 
worker. There they are combined into the actual Partitions they will exist in 
for the job run.

By partitioning these outgoing messages by # of vertices, metrics load tests 
have shown the size of the average message is not well regulated and can create 
overloads on either side of these transfers. This is important because:

1. Throughput and memory are at a premium during INPUT_SUPERSTEP.
2. Only one crashed worker in a Giraph job causes cascading job failure, even 
in an otherwise healthy workflow.

This JIRA renames the offending variables/config options and further regulates 
outgoing graph data in INPUT_SUPERSTEP by the # of edges and THEN the # of 
vertices in a candidate for transfer. This much more effectively regulates 
message size for typical social graph data and has been show in testing to 
greatly improve the amount of load-in data Giraph can handle without failure 
given fixed memory and worker limits.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to