Hassan Eslami created GIRAPH-1073:
-------------------------------------

             Summary: Decouple out-of-core persistence infrastructure from 
out-of-core computation
                 Key: GIRAPH-1073
                 URL: https://issues.apache.org/jira/browse/GIRAPH-1073
             Project: Giraph
          Issue Type: Improvement
            Reporter: Hassan Eslami
            Assignee: Hassan Eslami


In the current out-of-core infrastructure, the persistence layer is heavily 
intertwined with the scheduling and out-of-core engine. This makes it 
complicated to try new features for the persistence layer. The following 
changes are needed:
 * The persistence layer should be decoupled from out-of-core infrastructure. 
This way one can simply implement and plug different data accessors for various 
persistence resources, e.g. local file system data accessor, HDFS data 
accessor, serialized in-memory data accessor, etc.
 * We should be able to address out-of-core data in a more efficient and 
flexible way. Currently, data are accessed/addressed through string literals in 
various locations of the code. This should be changed so data can be accessed 
through a unified, more flexible data indexing mechanism.
 * With different implementations of data accessor, now there may be more 
emphasis on having more IO threads. It is important that these IO threads are 
load-balanced. Currently, partitions are assigned to IO threads using a hash 
function. Hash function tent not to balance load with small number of data 
points (partitions in this case).
 * Currently, out-of-core uses `BufferedInputStream` and `BufferedOutputStream` 
along with the default (de)serialization mechanism. The IO bandwidth achieved 
in the current implementation is low. One can simply use: 1) Unsafe 
(de)serialization mechanism to optimize for memory bandwidth during 
(de)serialization process, 2) RandomAccessFile's read and write interface to 
have lower level access to the local file system and avoid overheads in 
reading/writing from/to local files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to