[ https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171073#comment-13171073 ]
Avery Ching commented on GIRAPH-45: ----------------------------------- I think that a read messages-by-vertex at a time from disk will reduce memory pressure more than the partition-based storage. I'm assuming that key=vertex_id and value=message_list in your explanation. How do you keep the keys together in the file? For instance, suppose that you get the following tuples <vertex_id, message_list> <0, 2.0, 3.0> <3, 1.0> <7, 34.0> <4, 23.0> <3, 20.0> In a bad scenario, you have to spill to disk after each tuple. The files totally are out of order and your index <vertex, bytes offset> looks something like: <0, 0> <3, 24> <7, 40> <4, 56> But if I'm understanding this scheme, wouldn't each vertex need to scan the entire file if the vertices keep coming and are totally random? I suppose that another way to do this is to use the partition-based method and add a small change. If the partition is deemed to large to load in memory and sort, it could be read and re-dumped into n files, where n is chosen such that there is a good chance that it produces small enough files so that every one of them can fit in memory at a time. This can be done recursively. > Improve the way to keep outgoing messages > ----------------------------------------- > > Key: GIRAPH-45 > URL: https://issues.apache.org/jira/browse/GIRAPH-45 > Project: Giraph > Issue Type: Improvement > Components: bsp > Reporter: Hyunsik Choi > Assignee: Hyunsik Choi > > As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a > potential problem to cause out of memory when the rate of message generation > is higher than the rate of message flush (or network bandwidth). > To overcome this problem, we need more eager strategy for message flushing or > some approach to spill messages into disk. > The below link is Dmitriy's suggestion. > https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira