[ 
https://issues.apache.org/jira/browse/GIRAPH-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898465#comment-13898465
 ] 

Roman Shaposhnik commented on GIRAPH-800:
-----------------------------------------

Ping! Can somebody please review this? This looks OKish to me to be included 
into 1.1.0.

> Resolving mutations on a large graph causes timeouts
> ----------------------------------------------------
>
>                 Key: GIRAPH-800
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-800
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>    Affects Versions: 1.1.0
>         Environment: hadoop1
>            Reporter: Craig Muchinsky
>             Fix For: 1.1.0
>
>         Attachments: GIRAPH-800.patch
>
>
> When processing a graph with a large number of mutations and/or a large 
> number of messages per superstep, the pre-superstep logic can appear to be 
> hung up and eventually the graph times out either because of mapreduce task 
> inactivity or hitting the max superstep wait.
> While its possible to tune around this by adding a strategic call to 
> context.progress() in NettyServerWorker.resolveMutations() and bumping up the 
> giraph.maxMasterSuperstepWaitMsecs setting, it would seem this part of the 
> code might need some optimization.
> As an example, in a graph with 2B vertices and 2.5B edges the transition 
> between supersteps with 1B messages in flight can take 15-30 minutes on a 
> cluster with 228 workers (2 threads, 8GB RAM per worker).
> While the vertex resolve processing can be time consuming, I believe its the 
> check for missing vertices (second loop within 
> NettyServerWorker.resolveMutations()) that is the real performance 
> bottleneck. I haven't identified a fix to this logic as of yet, but I did 
> identify a possible workaround. I believe when dealing with a static and 
> complete graph the resolveMutations() call can be skipped all together. A 
> quick test of this theory yielded a 3x performance improvement in my sandbox.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to