[ https://issues.apache.org/jira/browse/GIRAPH-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Reisman updated GIRAPH-256: ------------------------------- Attachment: GIRAPH-256-3.patch Quick fix to patch file (using git, forgot '--no-prefix') and quick tune-up to make monitoring of candidates for transfer a bit more efficient and accurate. As stated in GIRAPH-247, avoiding frequent calls to partition.getEdgeCount() is a big efficiency win. Passes mvn verify, cluster use, etc. > Partitioning outgoing graph data during INPUT_SUPERSTEP by # of vertices > results in wide variance in RPC message sizes > ---------------------------------------------------------------------------------------------------------------------- > > Key: GIRAPH-256 > URL: https://issues.apache.org/jira/browse/GIRAPH-256 > Project: Giraph > Issue Type: Improvement > Components: bsp, graph > Affects Versions: 0.2.0 > Reporter: Eli Reisman > Assignee: Eli Reisman > Labels: patch > Fix For: 0.2.0 > > Attachments: GIRAPH-256-1.patch, GIRAPH-256-2.patch, > GIRAPH-256-3.patch > > > This relates to GIRAPH-247. The unfortunately named > "MAX_VERTICES_PER_PARTITION" fooled me into thinking this value was > regulating the size of initial Partition objects as they were composed during > INPUT_SUPERSTEP from InputSplits each worker reads. > In fact this configuration option only regulates the size of the outgoing RPC > messages, stored locally in Partition objects but decomposed into Collections > of BasicVertex for transfer to their eventual homes on another (or this) > worker. There they are combined into the actual Partitions they will exist in > for the job run. > By partitioning these outgoing messages by # of vertices, metrics load tests > have shown the size of the average message is not well regulated and can > create overloads on either side of these transfers. This is important because: > 1. Throughput and memory are at a premium during INPUT_SUPERSTEP. > 2. Only one crashed worker in a Giraph job causes cascading job failure, even > in an otherwise healthy workflow. > This JIRA renames the offending variables/config options and further > regulates outgoing graph data in INPUT_SUPERSTEP by the # of edges and THEN > the # of vertices in a candidate for transfer. This much more effectively > regulates message size for typical social graph data and has been show in > testing to greatly improve the amount of load-in data Giraph can handle > without failure given fixed memory and worker limits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira