[
https://issues.apache.org/jira/browse/GIRAPH-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898465#comment-13898465
]
Roman Shaposhnik commented on GIRAPH-800:
-----------------------------------------
Ping! Can somebody please review this? This looks OKish to me to be included
into 1.1.0.
> Resolving mutations on a large graph causes timeouts
> ----------------------------------------------------
>
> Key: GIRAPH-800
> URL: https://issues.apache.org/jira/browse/GIRAPH-800
> Project: Giraph
> Issue Type: Bug
> Components: graph
> Affects Versions: 1.1.0
> Environment: hadoop1
> Reporter: Craig Muchinsky
> Fix For: 1.1.0
>
> Attachments: GIRAPH-800.patch
>
>
> When processing a graph with a large number of mutations and/or a large
> number of messages per superstep, the pre-superstep logic can appear to be
> hung up and eventually the graph times out either because of mapreduce task
> inactivity or hitting the max superstep wait.
> While its possible to tune around this by adding a strategic call to
> context.progress() in NettyServerWorker.resolveMutations() and bumping up the
> giraph.maxMasterSuperstepWaitMsecs setting, it would seem this part of the
> code might need some optimization.
> As an example, in a graph with 2B vertices and 2.5B edges the transition
> between supersteps with 1B messages in flight can take 15-30 minutes on a
> cluster with 228 workers (2 threads, 8GB RAM per worker).
> While the vertex resolve processing can be time consuming, I believe its the
> check for missing vertices (second loop within
> NettyServerWorker.resolveMutations()) that is the real performance
> bottleneck. I haven't identified a fix to this logic as of yet, but I did
> identify a possible workaround. I believe when dealing with a static and
> complete graph the resolveMutations() call can be skipped all together. A
> quick test of this theory yielded a 3x performance improvement in my sandbox.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)