Hassan Eslami created GIRAPH-1048:
-------------------------------------
Summary: Redesign of out-of-core mechanism (first patch --
out-of-core mechanism keeping fixed number of partitions in memory)
Key: GIRAPH-1048
URL: https://issues.apache.org/jira/browse/GIRAPH-1048
Project: Giraph
Issue Type: New Feature
Reporter: Hassan Eslami
Assignee: Hassan Eslami
The current out-of-core mechanism implemented in Giraph suffers from a few
issues:
- It does not integrate well with a flow-control mechanism in which rate of
incoming/outgoing messages are controlled according to available memory,
- It does not control data generation/processing rate by compute/input threads,
which is crucial in input superstep, and also compute supersteps in some
applications,
- It does not utilize the disk bandwidth properly due to concurrent disk
accesses (IO interference),
- It suffers from high overhead due to successive manual GC calls, even when
the high-memory pressure cannot be addressed by offloading data to disk,
- And yet, it has a complicated design making it difficult to debug and improve
upon.
- It is very difficult to try different out-of-core policies, making it
impossible to tune the mechanism.
A simple to tune/program, flexible, and yet efficient out-of-core
infrastructure is needed in Giraph. In this JIRA we propose a redesign of
out-of-core mechanism, in which a) the logic of IO operations, b) the logic of
out-of-core decisions, c) data-structures supporting out-of-core operations,
and d) the actual logic for the computation are 4 different decoupled entities.
Some IOCommands and an IOScheduler address the logic behind IO operations, an
OutOfCoreEngine and a MetaPartitionManager address the logic for out-of-core
decisions, several disk-backed data-structures are responsible to keep
necessary data, and finally, the old in-memory computation mechanism interact
with the out-of-core infrastructure seamlessly.
This JIRA is created to set the ground for the out-of-core infrastructure, and
as an initial proof-of-concept, a simple out-of-core policy using the mentioned
infrastructure is implemented. The out-of-core policy in this JIRA, also called
fixed out-of-core policy, tries to keep a certain (user defined) number of
partitions in memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)