[ 
https://issues.apache.org/jira/browse/GIRAPH-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332260#comment-15332260
 ] 

Hassan Eslami commented on GIRAPH-1073:
---------------------------------------

https://reviews.facebook.net/D59691

> Decouple out-of-core persistence infrastructure from out-of-core computation
> ----------------------------------------------------------------------------
>
>                 Key: GIRAPH-1073
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1073
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Hassan Eslami
>            Assignee: Hassan Eslami
>
> In the current out-of-core infrastructure, the persistence layer is heavily 
> intertwined with the scheduling and out-of-core engine. This makes it 
> complicated to try new features for the persistence layer. The following 
> changes are needed:
>  * The persistence layer should be decoupled from out-of-core infrastructure. 
> This way one can simply implement and plug different data accessors for 
> various persistence resources, e.g. local file system data accessor, HDFS 
> data accessor, serialized in-memory data accessor, etc.
>  * We should be able to address out-of-core data in a more efficient and 
> flexible way. Currently, data are accessed/addressed through string literals 
> in various locations of the code. This should be changed so data can be 
> accessed through a unified, more flexible data indexing mechanism.
>  * With different implementations of data accessor, now there may be more 
> emphasis on having more IO threads. It is important that these IO threads are 
> load-balanced. Currently, partitions are assigned to IO threads using a hash 
> function. Hash function tent not to balance load with small number of data 
> points (partitions in this case).
>  * Currently, out-of-core uses `BufferedInputStream` and 
> `BufferedOutputStream` along with the default (de)serialization mechanism. 
> The IO bandwidth achieved in the current implementation is low. One can 
> simply use: 1) Unsafe (de)serialization mechanism to optimize for memory 
> bandwidth during (de)serialization process, 2) RandomAccessFile's read and 
> write interface to have lower level access to the local file system and avoid 
> overheads in reading/writing from/to local files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to