[ https://issues.apache.org/jira/browse/GIRAPH-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898542#comment-13898542 ]
Claudio Martella commented on GIRAPH-800: ----------------------------------------- the main issue for me with this approach is what we think a static graph is. I can easily see that a static graph is (currently) a graph for which algorithms don't use the mutable api (or change edge values). However, with this, we would also cancel out the "create vertex on message reception", which often happens simply due to inputformats, more than an actual "mutable" graph during the computation. for example, a vertex appears only in the "second column" of a simple <id> <id> edgeinputformat. > Resolving mutations on a large graph causes timeouts > ---------------------------------------------------- > > Key: GIRAPH-800 > URL: https://issues.apache.org/jira/browse/GIRAPH-800 > Project: Giraph > Issue Type: Bug > Components: graph > Affects Versions: 1.1.0 > Environment: hadoop1 > Reporter: Craig Muchinsky > Fix For: 1.1.0 > > Attachments: GIRAPH-800.patch > > > When processing a graph with a large number of mutations and/or a large > number of messages per superstep, the pre-superstep logic can appear to be > hung up and eventually the graph times out either because of mapreduce task > inactivity or hitting the max superstep wait. > While its possible to tune around this by adding a strategic call to > context.progress() in NettyServerWorker.resolveMutations() and bumping up the > giraph.maxMasterSuperstepWaitMsecs setting, it would seem this part of the > code might need some optimization. > As an example, in a graph with 2B vertices and 2.5B edges the transition > between supersteps with 1B messages in flight can take 15-30 minutes on a > cluster with 228 workers (2 threads, 8GB RAM per worker). > While the vertex resolve processing can be time consuming, I believe its the > check for missing vertices (second loop within > NettyServerWorker.resolveMutations()) that is the real performance > bottleneck. I haven't identified a fix to this logic as of yet, but I did > identify a possible workaround. I believe when dealing with a static and > complete graph the resolveMutations() call can be skipped all together. A > quick test of this theory yielded a 3x performance improvement in my sandbox. -- This message was sent by Atlassian JIRA (v6.1.5#6160)