[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171073#comment-13171073
 ] 

Avery Ching commented on GIRAPH-45:
-----------------------------------

I think that a read messages-by-vertex at a time from disk will reduce memory 
pressure more than the partition-based storage.  I'm assuming that 
key=vertex_id and value=message_list in your explanation.  How do you keep the 
keys together in the file?  For instance, suppose that you get the following 
tuples <vertex_id, message_list>

<0, 2.0, 3.0>
<3, 1.0>
<7, 34.0>
<4, 23.0>
<3, 20.0>

In a bad scenario, you have to spill to disk after each tuple.  The files 
totally are out of order and your index <vertex, bytes offset> looks something 
like:
<0, 0>
<3, 24>
<7, 40>
<4, 56>

But if I'm understanding this scheme, wouldn't each vertex need to scan the 
entire file if the vertices keep coming and are totally random?  

I suppose that another way to do this is to use the partition-based method and 
add a small change.  If the partition is deemed to large to load in memory and 
sort, it could be read and re-dumped into n files, where n is chosen such that 
there is a good chance that it produces small enough files so that every one of 
them can fit in memory at a time.  This can be done recursively.
                
> Improve the way to keep outgoing messages
> -----------------------------------------
>
>                 Key: GIRAPH-45
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-45
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
> potential problem to cause out of memory when the rate of message generation 
> is higher than the rate of message flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or 
> some approach to spill messages into disk.
> The below link is Dmitriy's suggestion.
> https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to