[ https://issues.apache.org/jira/browse/GIRAPH-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898465#comment-13898465 ]
Roman Shaposhnik commented on GIRAPH-800: ----------------------------------------- Ping! Can somebody please review this? This looks OKish to me to be included into 1.1.0. > Resolving mutations on a large graph causes timeouts > ---------------------------------------------------- > > Key: GIRAPH-800 > URL: https://issues.apache.org/jira/browse/GIRAPH-800 > Project: Giraph > Issue Type: Bug > Components: graph > Affects Versions: 1.1.0 > Environment: hadoop1 > Reporter: Craig Muchinsky > Fix For: 1.1.0 > > Attachments: GIRAPH-800.patch > > > When processing a graph with a large number of mutations and/or a large > number of messages per superstep, the pre-superstep logic can appear to be > hung up and eventually the graph times out either because of mapreduce task > inactivity or hitting the max superstep wait. > While its possible to tune around this by adding a strategic call to > context.progress() in NettyServerWorker.resolveMutations() and bumping up the > giraph.maxMasterSuperstepWaitMsecs setting, it would seem this part of the > code might need some optimization. > As an example, in a graph with 2B vertices and 2.5B edges the transition > between supersteps with 1B messages in flight can take 15-30 minutes on a > cluster with 228 workers (2 threads, 8GB RAM per worker). > While the vertex resolve processing can be time consuming, I believe its the > check for missing vertices (second loop within > NettyServerWorker.resolveMutations()) that is the real performance > bottleneck. I haven't identified a fix to this logic as of yet, but I did > identify a possible workaround. I believe when dealing with a static and > complete graph the resolveMutations() call can be skipped all together. A > quick test of this theory yielded a 3x performance improvement in my sandbox. -- This message was sent by Atlassian JIRA (v6.1.5#6160)