[ 
https://issues.apache.org/jira/browse/GIRAPH-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599949#comment-13599949
 ] 

Claudio Martella commented on GIRAPH-528:
-----------------------------------------

That's exactly what I meant. In-memory, it would not change anything as it 
would keep the current architecture. The big difference would be that we could 
stop assuming that the user might have additional private data in his/her 
Vertex. What this buys us, is that we can write back OOC only vertices that 
have changed state, and the same goes for Edges. We'd have to hook the methods 
that change this state to get this done (setVertexValue, addEdge etc.).

Basically we would end up managing this data like the OOC messages, with 
multiple files (potentially compacting them when they exceed a certain 
number?), as we would spill to disk sequentially instead of overwriting old 
values with seeks. At loading time, you keep the values coming from the most 
recent file.

Does this make sense? Any comments?


                
> Decouple vertex implementation from edge storage
> ------------------------------------------------
>
>                 Key: GIRAPH-528
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-528
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-528.patch, GIRAPH-528.patch, GIRAPH-528.patch, 
> GIRAPH-528.patch, GIRAPH-528.patch, GIRAPH-528.patch
>
>
> This is meant to address the following issues:
> 1) The Vertex hierarchy is too complex and sometimes hard to work with 
> (Vertex, SimpleVertex, MutableVertex, SimpleMutableVertex...).
> 2) Changing the underlying edge storage implementation for an existing 
> algorithm requires editing your vertex to extend a different one.
> 3) In the general case (e.g. when not using ByteArrayVertex with the current 
> EdgeStore), moving edges from the EdgeStore to the vertices is an additional 
> step that can be avoided.
> My proposal is the following:
> - Make EdgeStore an interface. An implementation should deal with 
> (concurrent) insertion of edges during input superstep; iteration over a 
> vertex's edges during computation; insertion/deletion of edges during 
> mutations (optional?); checkpointing.
> - The default EdgeStore will be the current byte-array implementation, which 
> is generic (works with any choice of <I, V, E, M>) and reasonably optimized.
> - Only one Vertex class, which the user extends for the sole purpose of 
> defining compute(). I don't necessarily agree that it should be an interface, 
> because we still want to provide methods like getEdges(), sendMessage(), and 
> those should delegate to the EdgeStore/MessageStore of choice.
> - Switching edge storage implementation is done by passing the EdgeStore 
> class as an option. One can also define his own ad-hoc EdgeStore (e.g., 
> backed by primitive arrays).
> I think we should also extend this idea to MessageStore, making it possible 
> to override that functionality too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to