[
https://issues.apache.org/jira/browse/GIRAPH-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332260#comment-15332260
]
Hassan Eslami commented on GIRAPH-1073:
---------------------------------------
https://reviews.facebook.net/D59691
> Decouple out-of-core persistence infrastructure from out-of-core computation
> ----------------------------------------------------------------------------
>
> Key: GIRAPH-1073
> URL: https://issues.apache.org/jira/browse/GIRAPH-1073
> Project: Giraph
> Issue Type: Improvement
> Reporter: Hassan Eslami
> Assignee: Hassan Eslami
>
> In the current out-of-core infrastructure, the persistence layer is heavily
> intertwined with the scheduling and out-of-core engine. This makes it
> complicated to try new features for the persistence layer. The following
> changes are needed:
> * The persistence layer should be decoupled from out-of-core infrastructure.
> This way one can simply implement and plug different data accessors for
> various persistence resources, e.g. local file system data accessor, HDFS
> data accessor, serialized in-memory data accessor, etc.
> * We should be able to address out-of-core data in a more efficient and
> flexible way. Currently, data are accessed/addressed through string literals
> in various locations of the code. This should be changed so data can be
> accessed through a unified, more flexible data indexing mechanism.
> * With different implementations of data accessor, now there may be more
> emphasis on having more IO threads. It is important that these IO threads are
> load-balanced. Currently, partitions are assigned to IO threads using a hash
> function. Hash function tent not to balance load with small number of data
> points (partitions in this case).
> * Currently, out-of-core uses `BufferedInputStream` and
> `BufferedOutputStream` along with the default (de)serialization mechanism.
> The IO bandwidth achieved in the current implementation is low. One can
> simply use: 1) Unsafe (de)serialization mechanism to optimize for memory
> bandwidth during (de)serialization process, 2) RandomAccessFile's read and
> write interface to have lower level access to the local file system and avoid
> overheads in reading/writing from/to local files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)